Crossing the Bridge from Unstructured Data to Structured Data and the Data Warehouse Environment
Bill Inmon
Principal
Inmon Data Systems
Monday, April 24, 2014
8:30 am - 4:45 pm
Level: Introductory

For a while now there has been the question of how to meaningfully get data from the unstructured environment to the structured data warehouse environment. Merely pulling data from one environment to another with a search engine does not address the context of unstructured data, and context of unstructured data is extremely important. Based on the research by Bill Inmon and Inmon Data Systems over the past three years, this full day tutorial and workshop addresses the thematic approach to reading and interpreting unstructured data so that it is useful in the data warehouse, structured environment. The details of how to go from unstructured data to structured data without using linguistic approaches is outlined in a clear manner. In addition, some different applications of unstructured data in the data warehouse structured environment will be discussed.

Part I - Lecture
  • Unstructured data – what is it
  • Why unstructured data is important to address
  • The unstructured marketplace
  • Why crossing the bridge between the two environments is important
  • Search engines – where do they fit
  • Link associations
  • Two approaches – the linguistic approach, the thematic approach
  • The linguistic approach
  • Complexity
  • Single language problem
  • The thematic approach
  • A simplistic and efficient approach
  • Ability to handle multiple languages

Part II - Exercises

  • Stop words
  • Stemmed words
  • Synonyms
  • Alternate spellings
  • Internal categories
  • External categories
  • Hot word categories
  • Document attribute id
  • Ranking words based on occurrence
  • Operating on multiple languages
  • Linkage by common keys
  • Linkage by secondary keys
  • Cross referencing with metadata

Part III - Lecture

  • Visualization
  • Finding natural associations
  • Executive dashboard
  • Alerts based on content found
  • CRM
  • Ranking customers attitudes
  • Compliance
  • Sarbanes Oxley
  • Email management
  • Finding blather
  • Space reduction
  • Why put unstructured data in the data warehouse after it has been processed?

Bill Inmon, world-renowned expert, speaker and author on data warehousing, is widely recognized as the "father of data warehousing." He is creator of the Corporate Information Factory and more recently, creator of the Government Information Factory. As an author, Bill has written more than 650 articles on a variety of topics about building, using, and maintaining the data warehouse and the CIF. His works have been published in Data Management Review and The Business Intelligence Network, where he continues to be a featured columnist. He has written 46 books, many of which have been translated into nine languages; one has sold over one-half million copies. As entrepreneur, Bill founded and took public Prism Solutions in 1991. In 1995, Bill went on to found Pine Cone Systems, later named Ambeo. In 2003, Bill co-founded Inmon Data Systems, Inc. and created the Government Information Factory, an architectural blueprint for building government information systems.