The Role of Entity Resolution & Entity Extraction in Information Quality Initiatives
Do you have a need to integrate data from multiple different structured and unstructured sources? Are you concerned about duplicate data and which records are accurate? These are just some of the challenges that face IT professionals and project managers who work on systems integration initiatives on a daily basis. When integrating data from multiple systems, sometimes containing structured and unstructured data, two critical components of Information Quality emerge; Entity Resolution and Entity Extraction. Entity Resolution is a form of Data Cleansing and is better known as the “de-duplication” of data or more accurately the process of identifying and linking records together that could be the same entity. Entity Resolution is generally performed on data, formatted in fixed fields, and residing in a structured format. Entity Extraction is a form of Data Cleansing used during Data Integration specifically focusing on unstructured data. Sometimes referred to as “Text Mining” or “Information Extraction”, Entity Extraction is the process by which unstructured data in files like word documents, email, and PDF files can be searched and given meaning from the body of text. |
| Speaker Bio |
Mr. McGinn, an Associate in the Data Management practice of Booz Allen Hamilton, Inc., is responsible for the delivery of enterprise wide information strategy and architecture, business intelligence and data migration solutions. He has designed and built data architectures, data migration strategies, and data warehouses for clients in the commercial and government sector. James specializes in helping organizations design, develop, and implement information systems infrastructures that are optimized to enable those organizations to meet industry challenges. He has worked as a project manager, developer, and analyst to numerous companies and government agencies to devise and build technical architectures for migrating legacy databases, data profiling, data quality, data management, and data warehouse solutions to help those organizations leverage their information and gain or solidify a competitive advantage. James can be reached at mcginn_james@bah.com. |