|
Entity Resolution & Entity Extraction in Data Integration Initiatives
|
![]() James McGinn
Associate
Booz Allen Hamilton
|
|
March 8, 2007
8:30 AM - 9:30 AM
Level: Intermediate
Do you have a need to integrate data from multiple
different structured and unstructured sources? Are you concerned about
duplicate data and which records are accurate? These are just some of
the challenges that face IT professionals and project managers who work
on systems integration initiatives on a daily basis. When integrating
data from multiple systems, sometimes containing structured and unstructured
data, two critical components of Data Cleansing emerge; Entity Resolution
and Entity Extraction.
Entity Resolution is a form of Data Cleansing and is better known as the “de-duplication” of data or more accurately the process of identifying and linking records together that could be the same entity. Entity Resolution is generally performed on data, formatted in fixed fields, and residing in a structured format. Entity Extraction is a form of Data Cleansing used during Data Integration specifically focusing on unstructured data. Sometimes referred to as “Text Mining” or “Information Extraction”, Entity Extraction is the process by which unstructured data in files like word documents, email, and PDF files can be searched and given meaning from the body of text.
Mr. McGinn, an Associate in the Data Migration practice
of Booz Allen Hamilton, Inc., is responsible for the delivery of enterprise
wide information strategy and architecture, business intelligence and
data migration solutions. He has designed and built data architectures,
data migration strategies, and data warehouses for clients in the commercial
and government sector.
James specializes in helping organizations design, develop, and implement information systems infrastructures that are optimized to enable those organizations to meet industry challenges. He has worked as a project manager, developer, and analyst to numerous companies and government agencies to devise and build technical architectures for migrating legacy databases, data profiling, data quality, data management, and data warehouse solutions to help those organizations leverage their information and gain or solidify a competitive advantage. |