DAMA + Wilshire Meta-Data Conference - Data Strategy Track

Problem: A VP demands a consolidated customer sales report across product lines from different divisions. As you dig into the request, it’s clear that sale is defined differently. Some have it as gross, some net before taxes, some after taxes. Worse yet, there are codes everywhere. Some have the same code name but there are different value sets and different meanings.

This tutorial introduces the key topics and scenarios involved in creating interoperable data environments. Identified as well are the problems that commonly occur such as complexity and latency.

The tutorial defines the levels of data interoperability to be achieved. Presented too is the overall content and construction of a metadata repository environment critical component to a success strategy.

The tutorial details the various scenarios that must occur to achieve a data interoperability environment including enterprise architectures, information systems plans, data model engineering, and then both reverse and forward engineering.


In today's distributed, web-based development environment, the biggest frustration for developers is having to "hard-couple" their applications to specific database structures, particularly normalized base tables. The key to creating flexible, reusable data structures that can support web services and application objects is to abstract, or decouple, the functionality of the database from its underlying structure. The end goal is to create a rule-based (or "policy-based") data abstraction layer that can be easily changed, and used by multiple application objects and web services. Some of the techniques that will be presented include:
  • Views
  • Data abstraction layers
  • Data access objects
  • Data integration services
  • Triggers
  • Fundamental stored procedures
  • Complex data types
  • User-defined functions

In the development of both Business Intelligence and Services Oriented Architecture solutions, there is a requirement for data integration architecture.

This session will take the attendees through the development of a best practices data integration architecture that supports both the delivery of Business Intelligence and the incorporation of a SOA foundation.

Learn about the many facets of data integration architectures including:
  • History of data integration architectures
  • Key role of metadata
  • Iterative development 11 step process
  • Including Data Governance & Stewardship
  • Distributed vs. Centralized Models
  • Architectural considerations
  • Team composition and resourcing Real world examples and an open question and answer period will allow attendees to learn about the development of data integration architectures while receiving practical guidance and advise.

Managing health care information looms as one of the most important issues of the next decades. Scores of organizations have been gathering data on the state of Americans’ health, and the effort will accelerate as the baby boomers age and require more and more accurate tracking of their health and treatment status.

The United States Health Information Knowledgebase (USHIK) is one response to making sense of the plethora of diverse health care datasets. As a metadata registry for health care information, USHIK contains and links to the data elements and information models of Standards Development Organizations (SDO’s) and other health care organizations to facilitate the ease with which public and private organizations can harmonize information formats of health care standards. USHIK implements a metadata registry methodology based on ISO/IEC 11179, Information technology – Metadata Registries, and is sponsored by the Agency for Health Research Quality (AHRQ) and has been guided by the American National Standards Institute’s Health Informatics Standards Board.

With over twelve thousand data elements and related items, USHIK supports data sharing with cross-system and cross-organization descriptions of common units of health data. Since 2004, USHIK has been used to register selected Consolidated Health Informatics (CHI) standards under sponsorship of the Federal Health Architecture’s CHI Council. Most recently, the Biosurveillance Technical Committee of the Healthcare Information Technology Standards Panel (HITSP) utilized USHIK to perform comparisons among selected standards to document and support their decision-making process.

Among the capabilities of the USHIK are:
  •  Describing data using common characteristics. Promoting development of good data names and descriptions assists users of shared data to have a common understanding of a unit of data's meaning, representation, and identification. This insures the data quality of shared information.
  •  Providing multiple ways to locate data descriptions. Providing both standard and custom 'drill-down' methods to data descriptions allows users to recognize different points of view to narrow and focus on data definitions to be retrieved. The number of data definitions does not overwhelm the user.
  •  Allowing Web access to provide easy access and promote use of standards. Good data descriptions become standards. When these standards are re-used, interoperability between systems is easier, more efficient, and data quality improves.

Within the two main paradigms of integration, data integration and application integration, there exists a “data divide.” Integration Competencies Centers (ICC) were established to try to bridge this gap, increasing integration consistency and productivity by coordinating integration across the enterprise, and loosely coupling the two paradigms with data dictionaries, meta-data management and best practices. This has positive effect on integration consistency and productivity, but they suffer from a lack of an appropriate end-to-end role related tool support that limits their influence and effectiveness.

What they really need are tools and processes that naturally bridge the “divide” without the need of a large upfront investment or disrupting the existing work processes. A reusable, pervasive, executable transformation specification mechanism is the only way to bridge the gap between an ICC and the implementations in the field.

In this session, Itemfield CTO Peter Cousins will cover specification-driven data transformation, an excellent solution for bridging the data divide. In his experience working with some of the largest companies in financial services and telecommunications, Peter has been most successful leveraging the tool most comfortable and familiar to both business analysts and data modelers – Excel spreadsheets.

From this presentation, audience members will learn:
  • Why ICCs cannot bridge the data divide
  • Importance of specification-driven transformation for structured, semi-structured and unstructured data
  • Definition of well-defined spreadsheet templates and tool
  • Benefits to using Excel as a mapping tool
  • Anecdotal customer evidence that supports the use of this tool

Data Integration (DI) is a hot topic, with hundreds of vendor in the space. There is a great deal written, most of which addresses the tools and technology to facilitate the physical integration. Tools are great, but are still just tools. To ensure a successful DI initiative, a framework composed of a strategy, standards, designs and governance needs to be included. The presentation covers:
  • Understanding of DI and the nature of data (relational, states, types).
  • DI Framework
    • Structure for integration (standards, guidelines, processes, policies, DI "rules", integration patterns).
    • Integration decisions made by the business through data stewardship and DI issue resolution (security, compliance, quality standards).
    • DI Design/Plan (business design, source data research/analysis, target integrated design, transformation rules/mapping).
    • Strategy and governance for the on-going maintenance (change management, data quality program). This presentation is meant to raise awareness of the importance of a DI framework to the success of a data integration initiative.

Do you have a need to integrate data from multiple different structured and unstructured sources? Are you concerned about duplicate data and which records are accurate? These are just some of the challenges that face IT professionals and project managers who work on systems integration initiatives on a daily basis. When integrating data from multiple systems, sometimes containing structured and unstructured data, two critical components of Data Cleansing emerge; Entity Resolution and Entity Extraction.

Entity Resolution is a form of Data Cleansing and is better known as the “de-duplication” of data or more accurately the process of identifying and linking records together that could be the same entity. Entity Resolution is generally performed on data, formatted in fixed fields, and residing in a structured format.

Entity Extraction is a form of Data Cleansing used during Data Integration specifically focusing on unstructured data. Sometimes referred to as “Text Mining” or “Information Extraction”, Entity Extraction is the process by which unstructured data in files like word documents, email, and PDF files can be searched and given meaning from the body of text.
  • Entity Resolution & Entity Extraction defined
  • Data Integration Pillars
  • Entity Resolution
  • Standardization
  • Matching
  • Survivorship
  • Entity Extraction
  • Business Need
  • Benefits of Entity Resolution & Entity Extraction
  • Entity Resolution Case Study (FDIC CAS)

A key to any data integration effort is understanding the personal, cultural, and political environment and consciously employing proven principles to enable success. The most successful data integration efforts usually share one thing in common: they developed and implemented effective strategies that provided fertile cultural and political ground for success. This seminar will share techniques to help understand key principles and empower participants in meeting objectives and moving toward effective integration. It will provide case histories of successful and unsuccessful efforts, illustrating why some integration programs succeed and others fail. The instructor will share principles and actions that can either help or hinder integration efforts. The instructor will also share various insights, showing pitfalls of where data integration efforts can and have gone off course. There will be interactive exercises where participants can practice handling difficult issues that commonly arise by applying principles leading to effective Integration. Participants of this session will gain:
  • An understanding of political and cultural factors for which successful data integration teams need to be aware and  prepared
  • Tools and principles to enable data integration such as keys in developing trust, gaining funding, delivering value,   facilitating common vision, managing conflict, developing effective integration procedures, building off of other’s work,   and gaining buy in.
  • Real life stories of how culture and politics either killed or fostered effective Integration programs.
  • Case examples and exercises allowing participants to practice overcoming challenges that Integration professionals  often face.
  • Education and experience in preparing for cultural and political challenges as well as applying powerful techniques for  developing more effective environments, in this non threatening, classroom setting.


Close Window

Wishire Conferences DAMA International