|
2003
Enterprise Data Forum
This report compiled
and edited by Tony Shaw, Program Chair, Wilshire Conferences, Inc.
TUTORIALS Implementing Universal Data Models to Integrate Data Len Silverston The Universal Data Model for Parties, Roles, and Relationships provides a solid foundation for data integration and allows data about people and organizations to be stored in one consistent place and a complete profile of all the roles and relationships for each party. This model may be implemented in many different ways, for example, by allowing a single PARTY table, separate PEOPLE and ORGANIZATION tables, a single PARTY ROLE tables, separate tables for roles such as CUSTOMER, EMPLOYEE, PARTNER, a single PARTY RELATIONSHIP table, and/or separate relationship tables such as EMPLOYMENT and CUSTOMER CONTACT RELATIONSHIP. In order to populate the tables, a "system of record" strategy and "pattern matching" strategy is needed. The integrated data store identifiers (such as party_id) need to cross reference the application keys and there are three main database structures that can accommodate this: placing a foreign key in the application table, building a cross reference table between the enterprise key and the application, or a combination of both. The architecture for implementing these integrated structures may be virtual (for example common XML schemas) or physical (for example using an operational data store) XML Schemas for the Data Architect James Bean XML Schemas (W3C) are becoming the prevalent form of metadata for constraining an XML transaction or message. XML Schemas provide a robust set of rules and constraints that in some cases go well beyond the typical capabilities of a logical or physical data model. Yet, XML Schemas present tremendous similarities and synergies with the traditional roles of the data architect. Of significant importance are: - Data Containers Also of importance are the robust capabilities of XML Schemas to support reuse. This allows the data architect to promote standard metadata, definitions, and rules throughout the enterprise. Establishing and Running an Effective Data Stewardship and Information Quality Improvement Program Kurt Allebach Poor quality data has a tactical impact on organizational performance causing process failures and low customer satisfaction. Many companies are implementing Data Stewardship and Continuous Quality Improvement programs. Key to success in these efforts are: 1. A thorough and agreed understanding
of the various data stewardship roles within an organization and the critical
relationships between them. Mastering Reference Data Malcolm Chisholm Reference data is also known as code tables, lookup tables, and domain values. It is found in all databases and is highly shared within, and even between, enterprises. This presentation examined the special management challenges that are unique to reference data and options for meeting these challenges. Not addressing these challenges can result in data quality problems and misfiring of business rules, which are often driven by reference data values. Specific attention was given to: 1. Managing the extensive metadata
that is linked to reference data, and reusing it to provide assistance to both
developers and business users. A Fundamental Framework for Evaluating Data Management Technology and Practice Fabian Pascal The speaker maintains that the majority of data management practitioners operate in “cookbook”, product-specific mode, without really knowing and understanding the fundamental concepts and methods underlying their practice. For example: what data means, what is a data model, data independence, etc. This tutorial provide a fundamentally correct way to evaluate data management technologies, products and practices. It helped practitioners understand data fundamentals that are either ignored or distorted in the industry, how to apply these fundamentals in day to day business, and how to use them in the evaluation of the technologies being promoted. Process Modeling Concepts Marcie Barkin Goodwin Process Modelers have a good idea of what it takes to create a process model – but they don’t (or shouldn’t) work in a vacuum. Business users and managers play a critical role in contributing to and evaluating process models, and it’s important that they understand what a process model is communicating. This introductory presentation focused on empowering both new modelers and the non-modelers on the team - giving them a better understanding of the modeling process and the meaning of model components, allowing them to participate with confidence. Marcie explained: - What is Process Modeling and why
do it anyway? Converting a Logical Data Model to a Physical Database Thomas Haughey This presentation described a step-by-step process for transforming a detailed (data element level of detail), properly normalized Logical Data Model (LDM) into a Physical Data Model (PDM). Two major data modeling steps/phases must have preceeded this transform: Conceptual Data Modeling, in which the entities and relationships of concern are carefully modeled according to well-defined rules which are unvarying; and Logical Data Modeling, in which detailed data elements are formed and defined to meet the detailed information requirements of the enterprise, and then properly normalized (using the unvarying "normal forms" as the rules) into the entity/relationship structure, resulting in a strictly and properly normalized Logical Data Model, ready for transformation into physical form for implementation. In transforming an LDM into a PDM, the "rules" which must be followed are in fact defined by the business, and are the physical constraints (ie., speed of performance, space, cost, physical placement/distribution, security and integrity protection) which the business desires to be met. Unfortunately, typically, many of these constraints are mutually exclusive, ie., if the PDM satisfies one of them, then it cannot satisfy another. The very methodical, organized step-by-step process described in the tutorial showed how each of these considerations is analyzed and incorporated as the LDM is transformed into the PDM to arrive at a "best balanced" PDM which will optimally meet the best balance of the physical constraints imposed by the business. XML Prototyping - Models and Structures James Bean The broad scale acceptance and adoption of XML as a method of describing transactions and messages (e.g. EAI, Web, OLTP) presents a tremendous opportunity for the Data Architect. XML is a robust, self-describing metadata language. Given the importance of metadata and standards, the Data Architect can add tremendous value to the process. However, while there are numerous affinities and similarities between data modeling and XML Prototypes, there are also a number of important differences. This presentation included a number of techniques for modeling transaction and message oriented XML in a manner that resembles traditional data modeling. In addition, a number of key XML prototyping concepts were covered: XML Structure Models XML Architectural Container Forms Enterprise Metadata Implementation: Learning from “Best Practices” R. Todd Stephens Enterprise metadata and EAI may seem like a strange relationship but the reality is that metadata is one of the most critical elements of a solid application integration effort. Todd discussed and provided a three year learning curve for the attendees of this workshop. The session reviewed seven perspectives of an enterprise metadata effort. - Enterprise metadata environment Metadata and the principles that define this technology must be expanded into the other areas of the enterprise environment. Interfaces, components, schemas, DTD, web services, systems, documents, web pages, metrics as well as the components of the traditional database metadata effort needs to be looked at from a different view. The organization that plans on implementing an enterprise architectures needs to take a long look at the data architecture and ensure that it includes a heavy dose of metadata. Implementing a Message-Based Data Integration Strategy Dave McComb Service Oriented Architectures mark a major shift from the way systems have been built for the last decade. One of the most profound changes in how distributed databases will be integrated. Most modern systems will employ a Service Oriented Architecture which will consist of: - Message Queues The partitioning of systems into applications will be drastically changed to accommodate shared enterprise services. One of the key disciplines to prevent this from becoming chaos will be "Enterprise Message Modeling"which is similar to Enterprise Data Modeling, but differs in that it concerns itself primarily with modeling that data that will be shared via messages to keep the applications and services in synch. End to End Security for SQL Server 2000 Data Morris Lewis Because of new government regulations concerning terrorism, hacking, and privacy, SQL Server database administrators are responsible for ensuring the safety and confidentiality of their data no matter where it is used. Common business practices relating to authenticating users’ identities and authorizing access to data are no longer sufficient to meet government requirements for protecting data, nor are they able to withstand even the simplest hacking tools available to attackers. New environments like Microsoft’s .NET Framework do offer new ways to protect data, but they also add new complexity to the production environment as a whole. Finally, application design is moving away from creating monolithic programs that implement all services users will need to a breaking apart of services and functionality into individual components that will often reside on multiple different computers, thus creating an environment in which data may pass through several transformations as it flows between the client and SQL Server. This tutorial covered techniques for building more secure environments for your data. Information Modeling in a Changing Environment Graham Witt Change is an unavoidable feature of every system acquisition project, with requirements not only being refined during the course of the project but frequently evolving into something quite different. Agile methods have evolved in response to this situation but, since the tools and techniques generally available to information modelers do not provide much in the way of support for rapid change, it is tempting to dispense with information modeling as if it were an unnecessary and time-wasting distraction. This presentation described a variety
of techniques that enable an information modeler to respond to a changing environment
and add value to an agile or conventional project in such an environment. Topics
include: Building a Business Case for Enterprise Metadata R. Todd Stephens Business cases are essential tools for strategic analysis and decision making, and for tangibly defining the expected costs and returns associated with a metadata project. A business case is a tool that works for a business to look ahead, allocate resources, focus on key points, and prepare for problems and opportunities. Tie your business case to the priorities
of the business strategy within your organization, such as improving the
product, increasing revenue and profit, better customer focus, cost reduction
or improving internal efficiencies. Examples of metadata business case
drivers include: Objective Data Quality Assessment David Loshin Because data quality issues are relevant only within the business context in which inspected data is used, data quality levels can only be measured with respect to business data consumer expectations. Relying on subjective measurements determined by software vendors only provides a subjective assessment from the point of view of an external party with little stake in the ultimate project success. Objective data quality measurement relies on metrics relating directly to how information is being used and how missed expectations impact the business. Once expectations are isolated and understood, we can define assertions that capture those expectations that are used for measuring how information complies with those data quality rules. These rules, which seed our objective data quality metrics, are knowledge-based metadata related to the data sets, suitable for incorporation into the metadata repository. This tutorial discusses the process of exploring information, identifying data quality rules, and isolating noncompliance as a sequence of stages: Marrying SQL, XML, Web Services and Grid Services Ken North The race is on between IBM, Microsoft,
and Oracle to provide persistent data management for XML applications, for Web
services, embedded applications, Web stores, grid services, and other software.
SQL DBMS products will increasingly be judged on how well they support traditional
tasks (such as transaction processing) while evolving to provide new capabilities
(such as integrated business analytics). The latest releases of data management
software from the big three vendors unite SQL with multidimensional and document-centric
(XML) data and grid computing. Problems will continue to arise with interoperability,
data aggregation, and data and application integration. For this reason XML,
messaging, and Web services are becoming increasingly important. Valuing Information and Knowledge John Ladley An organization’s information
portfolio contains great potential value. Many information management departments
want to demonstrate this potential value of information and knowledge to upper
management. There are multiple reasons for this: However, intrinsic value, or potential value, has no meaning to CEOs. Nor however, would a single CEO tell you information is NOT as asset to their organization. The investment in moving and storing information is astronomical. It would seem then, that information as an ASSET means that it has to appear somewhere on a balance sheet on an on going basis, not only as a result of a merger, or calculation of goodwill (FASB 141 142). The quality of information within an organization affects the value of that organization. It affects the stock price. Most work to date on valuing information has been related to these intangible classifications. However, John discussed approaches to valuing information that focus on the concept that Information has no value unless it is actually used. As long as the data sits in a database, the capital invested in placing it there has not return. Like the winter coat, it is not valuable until needed. You can create value by applying the data to business decisions, or improving processes, or even re-selling the information. In other words, if professionals make decisions and take actions they are accountable then for some percent of profit. An organization with valuable information-based
projects would be able to take these projects requiring information and knowledge,
and assign an increase in profit to professional INVOLVED in the project(s).
For example, a project to clean up customer data results in an increase in client
retention of 2%, and a bottom line delta of US$10,000,000. The organization
has 12,000 professionals. 2000 of these are connected in some way with the business
processes and project used to improve retention. Measuring the delta per professional
results in: The ACCOUNTING efficacy of this exercise is certainly not industry standard. But it can be used to make an effective business case. And certainly, if the intangible aspect is unsuitable, a Net Present Value could be developed for the $5,000,000, and this number used as an indicator of the new ‘asset.’
When Best Practices Aren't Good Enough Katherine Hammer One of the largest contributors to the failure of e-commerce and the general skepticism about technology can be attributed to a failure to understand the complexity and importance of data integration management to every IT initiative. While there is a general understanding that data integration is important - to business intelligence, enterprise application integration, customer relationship management, and so on - there is little attempt to treat data integration as an key factor in an effective enterprise architecture. This presentation reviewed a number of major problems with the way software evaluation is conducted and suggested how companies can overcome this problem. Specific advice included: · From this day forward, do
not purchase additional enterprise software UNLESS: Enterprise Modeling and Metadata: Steps (and Mis-steps) from a Real-World Project Ray McGlew IMS Health began a process to create a logical data model of the data it collects, manages, and distributes to clients. This model was designed to assist the company in consolidating many stove-piped applications in several countries. We have made significant progress in creating this enterprise data model, and have added goals to bring some organization to the process. We have also started a comprehensive Global Metadata Repository implementation to complement the model. This presentation outlined the activities the team undertook to keep management excited enough to keep funding this infrastructure project. Using AI for Data Audit and Quality: Barclays Bank Case Study Adrian McKeon Most IT systems cannot measure the
accuracy of outputs: Small data quality flaws at the start of a project magnify into inexplicable defects in end user outputs. Objectives cannot be translated into measurable performance indicators and nobody knows why. Barclays used artificial intelligence to audit trail the history of each data record at sub field, field and record level from the source system to the warehouse. Information is stored in a virtual data layer separate from the IT infrastructure. Audit trails made the workings of the IT infrastructure transparent and end users are able to validate output, identify errors and track back to fix them. Using SAP’s Business Intelligence Solution for Enterprise Data Warehousing Kevin McDonald Kevin discussed the overall architecture
and features of the SAP BW product. Among the key points he made in terms of
how the world’s leading organizations are using BW to implement enterprise
data warehouses: KEYNOTE: Database and Software Trends Ken North Panelists: Bob Bickel, Bickel Advisory Service Ken and his panel discussed a number of major technical and non-technical trends that are affecting database and software environments today. These trends include: - Pervasive computing: billions of
devices producing petabytes of data A Little Appreciation - Is it too much to ask for? Graeme Simsion Panelists: Karen Lopez, InfoAdvisors 1. Agreement that there is a real issue with data modeling (and modelers) being properly valued, and that this can translate into applications being developed on poor foundations. 2. A number of suggestions from panellists as to how to promote data modeling; direct access to business stakeholders was considered the critical factor. 3. Experiences from audience and panel suggest that good data modeling is critical in agile methods; models once established are difficult to change, and 'general practitioners' do not produce sufficiently resilient models. 4. Data modelers should be integral players in project teams (rather than "arms-length" consultants in a data management group). 5. Evidence presented (Mandracchia) that formal data modeling in development translated into reduced development costs. 6. "Make yourself useful" (Maguire, Lopez) was offered as a key tactic for staying employed: more formally, data modelers were encouraged to broaden their skills, and (particularly) to develop a knowledge of their organization's database platform and approach. 7. Data modelers have a key role to play in a packaged software environment - not only evaluating packages but advising on tailoring and integration. 8. Given the move towards architectures based on messaging rather than a common data model, data modelers need to work closely with other architects and be prepared to offer advice at the detailed (attribute definition and formatting) level. 9. Data modeling skills in analysis and user communication translate readily into other analysis fields. Introduction to Unstructured Data Management David Raab Barry Graubart Pandu Nayak Jose Colon Unstructured Data presents challenges to traditional data management. It compromises the majority of corporate information, but is often difficult or impossible to access. This session explored the types of unstructured data, the challenges of managing it, and the technologies available to help meet those challenges. Problems: Solutions include: Best Practices: Service-Oriented Integration and Process Ron Schmelzer Integration is not about simply plugging two systems or organizations into each other. The vision of "plug and play" application and system integration is a pipe dream that may be appropriate for the distant future, but right now enterprises face the more immediate challenge of connecting arbitrary systems in a manner that is cost effective, manageable, efficient and secure. Ron Schmelzer, discussed Web services and the Service-Oriented Architecture (SOA), as they represent an approach for integrating systems using an abstracted methodology called Service-Oriented Integration (SOI). Major trends and objectives for organizations are: - “Thrift” is the New
Normal UML for Database Design Terry Quatrani Application and database modelers can now speak one language - UML. No longer are database analysts, modelers, and designers relegated to the tail end of the development lifecycle. Now database designers can participate from the inception of the project, helping shape those early decisions that often have a critical impact on the system’s data. Also, being able to link together the object and data models, and thereby improving the understanding of both, helps yield higher quality systems. The UML is the standard language for visualizing, specifying, constructing and documenting the artifacts of a software –intensive system. It will address ALL aspects of the software development lifecycle. It continues to be extended and refined by soliciting industry feedback and including the best constructs from other methods and languages. In building a visual model of a system,
many different diagrams are needed to represent different views of the system.
The UML provides a rich notation for visualizing our models. This includes the
following key diagrams: To Federate or Consolidate: 10 Things to Consider Regarding Data Integration Ho-Chun Ho There are many mechanisms to integrate
data: data layer integration, data access layer integration, application-specific
solutions, application-integration frameworks, workflow or business process
integration frameworks, digital libraries with portal-style integration, search-engine-oriented
integration, data warehousing, and database federation. When architecting data
in support of any of these integration options data architects often face a
tough decision-- we need to choose between consolidating data and federating
data. This presentation will share the insight including technological issues,
organizational impact, as well as cost factors. Among other considerations,
this presentation addressed: XML, Your Data, and You Evan Levy The presentation "XML, Your Data, and You" presented the benefits of using XML as a core technology of an IT environment. The industry analysts have found that 60% of development budgets are focused on data integration. The benefits of XML include code simplicity, increased integration functionality, and significant cost savings. - XML isn't just a single technology,
but a family of technologies that will greatly simplify the aspects of data
access -- data manipulation, data display formatting, and data integration. Data Profiling Technology: How to Recover Metadata and More Jack Olson Data Profiling is the use of analytical techniques on data for the purpose of developing a thorough knowledge of its content, structure, and quality. The basic steps are: 1. Gather metadata Active Metadata Adrienne Tannenbaum Panelists: Adrienne Tannenbaum started this workshop by illustrating today's naiveté when it comes to maintaining an "active outlook". She emphasized the need to keep the following parts of the metadata solution world current and up-to-date: · The ROI - Most organizations
start with an immediate metadata solution ROI (like reducing development costs
via the provision of efficient impact analysis) and then forget about it once
that initial objective is met. Successful metadata solutions evolve, and hence
so does the ROI - the return should become greater over time Vendors then presented overviews of how their specific products support "active metadata". An interactive Q&A then resulted, with attendees running out of time to continue their questions! Name/Address Matching and Consolidation David Raab Panelists: Ramesh Menon Michael Dunkerley Record matching is mre than just name/address matching. It is important because identifying the relationships between customers, contacts, organizations, consumers, households etc within one system or database, or across multiple systems or databases, is a basic requirement of today's systems….data consolidation, customer centricity, data cleanup, risk minimization, information reliability, sales & marketing efficiency. Equally as important is identifying the relationship between the identity data in a transaction (or file) and the data stored in databases and files…..Customer identification, duplicate prevention, search reliability. Missing the data you have about a customer or prospect can mean bad business. Failing to successfully match against alert lists, fraud reports, and other important “screening” data can be disastrous to your business as well as have severe social impacts. Searching and matching is a two step process……first find the candidates that could be relevant to the search….second match the candidates to the search data to provide a confidence level or match result. Finding candidates in today’s data volumes require keys….some type of partitioning of the data that provides variable selectivity of the candidates in a search to (a) allow for the required real-time performance and processing turn-around and (b) limit the false match. If the keys do not support finding all of the relevant candidates, there is nothing that the matching can do to uncover a match. If the keys, and the ways the keys and other data required for matching are stored in the database, do not support returning the results within the timeframe required, then the effectiveness of the results is diminished. The design of the keys is critical to the success of a search and matching system. The design of the database to hold the keys and other matching data is critical to the success of a search and matching system. Maximizing search and matching reliability
and quality requires:
A Comparison of UML Class Models and ERDs for Data Modeling Paul Dorsey Many people question whether any part of the Unified Modeling Language (UML) can be used for data modeling. Some have suggested creating a new tool to explicitly support data modeling. However, with some extensions, the UML can be used very effectively to design databases. This session provided an overview of UML class diagram syntax as it pertains to data modeling and a discussion of how each drawing element can be implemented in a relational database. One of the most challenging problems in mapping an object-oriented design into a relational database is how to implement generalizations. The traditional mapping of each class to a table generates logically correct but unusable systems. Redundant storage of inherited attributes along the inheritance path is a strategy that allows modelers to use generalization without hesitation. The speaker also covered how logical Primary Key specification is still useful in class diagram data models and how the rules of normalization can be adapted to support object-oriented database design. Exploring the Converging Worlds of Data Integration Tools Faisal Shah ETL or EAI toolset choices should be driven first by corporate meta-information culture, then with consideration given to existing toolset investments and architectural fit. Organizations need to address ETL and EAI holistically and at the same time understand that there are still significant differences between the tools and ways to approach integration projects -- each treat latency, unit of work granularity, meta data integration, third-party product integration, and other product dimensions differently. While EAI and ETL tools continue to grow closer together, there are still significant advantages to using each for its original purpose, for instance ETL is data-centric and meta-data driven and is excellent at bulk data handling while EAI process-centric and event-driven. - As a technologist, be careful not
to fall in love with integration technologies Differentiating strengths: XML: A Component of Metadata and Enterprise Architecture Ben Jenkins This session focuses on how the Extensible Markup Language (XML) applies to metadata, and offered an overview of one real-life implementation of an enterprise XML repository. As the universal language for data on the web, XML gives you the power to deliver unambiguous data meaning between business applications, search capabilities, and static/dynamic presentation. The Metadata XML is a landmark in
Data architecture evolution. RDF Can be used in a variety of application
areas: RDF: the basic model consists of
three (3) object types: Dublin Core: A standard set of “properties”
to consider when identifying an RDF resource How to Build a Great Relationship with your Data using Profiling and Quality Assessment Techniques Brad Darrach Unfortunately, most implementation
projects have finite timetables with job implications. Tools for Success Next steps Integrating Data with Processes: Data Model-Driven Applications using CASE Jose Borja This presentation used a live demo of a CASE-driven application to demonstrate: 1. How CASE applications: Enterprise Common Data Architecture - Roadmap to Integration Daniel Paolini The creation of an Enterprise Common Data Architecture (ECDA) is a major yet essential commitment to any long-term strategic initiative to support data reusability. This architecture forms the foundation for collecting, storing, managing and controlling privacy of and access to data on an enterprise basis. GIGA Group defines Data Architecture as comprising “the vision, principles and standards that guide the creation, use and management of data and the deployment of data-related technology within an enterprise. The scope of data architecture includes the governance of all activities and processes involved in the definition, creation, formatting, storage, access and maintenance of data. While data warehouse and data mart architectures are important parts of a data architecture, these represent specialized subsets of the domain.” NJ CDA Standards & Practices NJ CDA Physical Components General Advice: Enterprise Information Integration (EII): A Logical View of the Distributed Enterprise Nitin Mangtani Business imperatives for real time information integration are increasing. Organizations looking to empower their employees with decision-driving information are faced with one common challenge: providing a unified view of critical business data. In today’s extended enterprise, the time between when a piece of information is generated and the time when it is consumed is constantly shrinking. Delivering up-to-the-minute information to customers here not only increases the service level and therefore customer loyalty, it also decreases customer service costs ETL/Data Warehousing XQuery Why a native query language? Enterprise Data Maturity Burt Parker Peter Aiken Enterprise-wide management of data is understanding the current and future data needs of an enterprise and making that data effective and efficient in supporting business activities. Data management must answer to two
sets of inputs: enterprise-wide data needs and functional user data needs. Answering to both sets of inputs ensures that data management directives and goals, data models, data designs, and data assets fulfill both “top-down” as well as “bottom-up” requirements. The feedback loops ensure the continued viability of the data program over time. Enterprise Data Management Semantics in Business Systems Dave McComb Semantics is the study of meaning.
Most of what we do as data modelers and system developers is directly involved
with the meaning of the information we are managing, and yet we spend little
time with the study of the underlying discipline. This talk covered the eight
levels of semantic precision that could be practiced in a given application:
Quantifying the Operational and Financial Risks in Enterprise Data Architectures Chito Jovellanos This conference session discussed the components of operational risk, with particular attention to the contributions from transactional systems and data. The evolution of the ‘Basel II Accord’ provided the framework for identifying specific external and internal risks, and methods for offsetting and hedging those risks. A summary of significant event loss data and their relative risk metrics were presented, spanning both high-frequency low-cost events (eg, transactional effectiveness) and low-frequency high-impact events (eg, natural disasters). Risk offsets include expensing, capital set-asides, outsourcing, insurance, and potentially, structured products (derivatives). Data Modeling: A Developer's View Roland Berg Working with developers: Developer think: Developers main concerns? Teach Developers to Read Models Explain the importance and meaning
of the model Important!!! Explain that each model type (conceptual, logical, physical) is one interpretation of the layer above and that while it must be a valid implementation of the model above it does not need to mirror it’s structure. Summary XML Vs. Relational - The Top Ten Differences Jim Stewart This presentation explained the fundamentals that differentiate XML and Relational technology. Although the differences between XML and relational seem obvious, the problem of selecting the right one it is not that simple when you take a closer look. Arguments for and against the use of the two approaches show confusion about the best uses for each, how they are different or similar, and apparently conflicting claims in several areas. The advent of XML DBMSs makes the issue even more confusing. This presentation identified the top differences (and similarities) between the two technologies in a way that will help data professionals understand them, select the correct solution for an application and make sound architectural decisions on their use. The Top Ten “differences”
that reviewed were: The Structural Integrity of Source Data Joseph Novella Many data profiling and assessment efforts tend to focus on domain studies tasks and techniques. However, understanding the content and scope of individual columns and fields is only the first step in developing a complete data assessment. Structural Integrity is the measure
of an object’s enduring ability to serve its designer’s purpose,
during changing and/or challenging conditions or stress, without failing or
collapsing. Data’s Structural Integrity means that: Summary Positioning Metadata as a Business Asset (and get funding for your project) Daniel Riehle Metadata is seen as a technical necessity, not as a business asset. This is a strategic error on the part of IT organizations. The reality is that funding for Metadata will never be significant if you can not morph metadata into a enterprise-wide business asset. This presentation showed how to use an extensible repository, COM objects, XML and ASP to generate self-guidance of business people through the mountains of technical metadata inherent in your IT organization. As an example of the technique, the presentation relies on the need to reflect “Lineage”, the journey that business-valued information takes from first source (Legacy systems) through final targets (BI Tools). To properly position metadata as a business asset (and hence secure funding for the IT effort), you must make the requisite detailed metadata transparent to the business user yet keep these users’ need foremost in the overall architecture of the enterprise metadata. It is a sneaky detail that by focusing on the business value of metadata, you accumulate the technical metadata required to streamline the activities of IT. Key take-away points: Role Transformation: Data Analyst and Architect Jane Carbone The role of the data architect is to provide the plan for enterprise data. This position works with the business (e.g., data stewards) to formulate data policies and plans that support enterprise goals, reduce costs and leverage the use of existing assets. This position works across the IT organization to ensure effective implementation, architecture compliance and conflict resolution. This position is a member—Architecture Governance Council. Reports to Chief Architect. The scope of this position includes all enterprise data (DW, databases, flat files, externally acquired data, etc.) with emphasis on mission-critical common data. Responsibilities include development
of: Why Data Marts Proliferate: Business Semantics and Data Integration in Conflict Robert Klopp In most Business Intelligence implementations, data marts proliferate, creating a data management headache. Often this proliferation occurs despite the availability of a comprehensive, integrated, data warehouse. The conflict between business semantics
and data integration The Role and ROI of Enterprise Schema Management Peter Hallett Gartner says that “The average enterprise has a median of 14 databases and spends 60 to 70% of its application development creating ways to access disparate data.” Enterprise Schema Management is a new concept driven by the need to improve information sharing, cut costs and increase responsiveness to changing requirements. Involving repository technology and a highly collaborative process, success requires leadership from the IT and business architects who establish best practices. With increasing usage of XML Schema, now is the time to treat schemas as an enterprise asset to encourage re-use and ensure interoperability. ONE of the Greatest justifications
for ESM is RE-USE – increase programmer productivity, speed up the completion
of projects. Additional justifications include: Data Quality Assessment & Measurement: Developing Data Quality Process Measures Shaun Williams Joan Brooks Ranked by Forbes Magazine as the 9th largest private employer, H-E-B Grocery Company has over 300 retail outlets in Texas and Mexico, 55,000 employees, and close to 10 billion dollars in annual revenues. H-E-B’s systems process utilizes hundreds of information systems, many of which are legacy systems existing on antiquated platforms with little or no “on-line” data quality checks. In 2001, the senior leadership team
at H-E-B formed the Data Integrity Group with these primary objectives: The Data Integrity department has spent the last 1½ years assessing and cleansing data that will be integrated with key Merchandising and Supply Chain applications. In addition, the group has developed a methodology, process, and technology for tracking data quality exceptions within a given process, and assigning cost metrics to these exceptions. This presentation discussed the processes, methodologies, tools and measurements implemented by H-E-B’s Data Integrity department in order to improve and sustain data quality, by discussion of the following topics: Design/Integrate Data Quality into
HEB Culture Metadata-Driven On-demand Data Integration Patricia Klauer Dina Bitton This session described a metadata-driven data integration infrastructure that helps to preserve legacy code, while allowing the organization to move forward with new uses for the data. This approach not only solves present problems of data integration such as data migration and data mart proliferation, but also provides a platform for future growth and changing business requirements. The authors addressed key issues
such as: The Metadata repository allows creation of an enterprise information platform metadata store of enterprise data, regardless of OS, DBMS, location or format. It provides a global platform for standards enforcement and a horizontal data layer with unlimited reusability. Extended business benefits include:
Scientific Data - Challenges and Solutions Olga Brazhnik Olga’s talk covered: Enterprise Semantic Models: Buy, Borrow, or Build? Eli Israel This session provided describe an introduction to semantic models and the roles they can play in an enterprise. It presented the qualities that identify a good model and the organizational factors affecting a model's adoption. Additional questions answered include: The relative strengths and weaknesses of using relational models, object models, and XML Schemas for this purpose were discussed.
Kevin Cavanaugh Effective marketing requires judicious
use of customer data and coordinated treatment logic across many different operational
systems and touchpoints. Unique data & business rule challenges are introduced
when distributed operational systems are coupled with the analytical & marketing
processing requirements needed to support real-time customer dialogs across
these systems. This session explored system architectures and new data model
requirements for coordinating outbound and inbound customer treatment strategies
based on batch, real-time & event-triggered marketing scenarios across different
touch points, including: Standards-Based XML Management - an Insurance Case Study Senthil Kumar Insurance carriers spend millions of dollars each year processing policy applications and claims submitted by brokers and agents. This document-centric business is ideally placed to exploit XML, for which the ACORD standard has been defined for the exchange of policy information across the insurance supply-chain. ACORD addresses the challenge of standardizing extensions to documents to accomodate rapid changes in business. This session focused on how one insurance carrier manages their XML documents with automated validation yet allowing for rapid development of extensions to their XML schemas, taking advantage of web services, business rules and predictive modeling. ACORD has been adopted by: Savings: Summary: How to Uncover the Truth behind your Data John Longley Paul Nettle This presentation describe how any enterprise can reap similar rewards by adopting the approach TAKEN BY The Cleansing Project (TCP) of the UK Ministry of Defence. These benefits were: - $30M savings generated Key Lessons were: Experiences with Meta-Data Management Across Tool Types Christine Mandracchia Effective meta-data management continues to be an undertaking requiring balance between the involved organizational roles and processes, and the capabilities of the specific tools available. A combination of issues needed to
be resolved in order to effect the "round trip" of meta-data among
the data modeling tool, through a meta-data hub, to the ETL tool, to the BI
reporting tool, and back: Many of the work flow issues did not arise until after the technical tool interface issues had been resolved. Meta-data policies that were adopted are: - the business representatives now
"own" the business data names, their definitions, and the approved
corporate abbreviations and acronyms list The SDLC methodology and project management protocols are being refined to allow the business representatives the opportunity to provide approved business names and definitions during the requirements gathering phase of a project, prior to or in conjunction with the data modeling efforts. Many of these work flow issues did not surface until after the technical tool interface issues had been resolved. Open Source Data Warehousing and Databases John Poole The Open Source revolution is rapidly transforming the software industry in terms of both development practices and business models. Once regarded exclusively as the realm of software hobbyists, Open Source has become the software model of choice for many organizations. Data Warehousing and business intelligence (DW/BI) can benefit greatly from Open Source technologies, which enable both the construction of robust IT infrastructures, as well as an emerging class of software solutions for DW/BI, advanced analytics, and business performance management. Motivations for using Open-source
in DW/BI are: Schema Matching and Data Mapping Tools Chito Jovellanos This SIG session provided an overview of commercial products, in-house implementations, and R&D initiatives related to schema and data mapping tools. Mapping issues taken from the implementation of a data repository in the securities industry was used a case-study. Several SIG participants indicated their interest in producing a survey paper on automated mapping tools and issues for potential presentation at EDF 2004. Modeling Business Rules David Hay Other than terms, facts, and certain constraints, data models are limited in their ability to portray business rule constraints. Indeed, the more generalized a model becomes, the less it is able to show the business rules that constrain the domain being modeled. Where the topic of the model is itself rules, however (as in regulatory agencies), then the rules themselves can be modeled. In conclusion…an entity/relationship
model fundamentally cannot show constraints on:
Service Based Architectures - Defining the Issues for Data Professionals Robert Abate This discussion review the components
that have been found to provide a robust foundation for Enterprise Applications
Integration [EAI], Enterprise Information Integration [EII] and Web Services.
Specific implementations of the SBA/SOA were reviewed including issues and honest
commentary on: Implementing an Integrated Enterprise-Wide Database for Applications Ulka Rodgers Project management lessons from a
major integration effort were discussed. Management lessons learned were: Project management lessons learned
were: Technical lessons learned were:: Best Practices in Business Intelligence for ERP & CRM Suites Aaron Zornes Competitive differentiation depends
on better analysis of ERP & CRM operations to drive continuous process improvement
— & contribute to top- & bottom-line growth. This presentation
gave META Group’s bottom line prescription as: Federal Enterprise Architecture: The Business Process Models Rob Cardwell The Federal Enterprise Architecture
(FEA) is a business-based framework constructed as a collection of interrelated
"reference models" that will facilitate cross-agency analysis, and
the identification of duplicative investments, gaps, and opportunities for collaboration
among Federal agencies. The Reference Models act as a Target Architecture for
which each agency will align. OMB and agencies will use the FEA for describing
and analyzing information technology (IT) and other capital investments, and
to improve Federal government service to the citizen. It includes a strong focus
on delivering services to the citizen along with government- to- government
process and information exchanges along with consolidating and integrating the
services along lines of business. The DRM is: Entities, Objects, and XML Schemas in One Model--Are You Crazy? Richard Hecht Is it possible that using different approaches in one modeling method can result in documenting and communicating data better than any one of these approaches by itself? This presentation explored a modeling methodology that combines the best of multiple approaches and furthermore maps the logical data to real physical implementations. This is not textbook and theory. This is actual practice and reality that arose from a need to help business users and developers better understand data and how it is implemented. Specifically, this presentation addresses: The examples showed how the synergism created from this novel approach helps produce schematics that improve the documentation and communication of enterprise data and its physical implementations. Applying these concepts and techniques can help produce meaningful and useful information to answer the questions developers and business users have about data and proves that combining multiple approaches may not be as crazy as it first appears. Meeting the Requirements for Sarbanes Oxley David Steinberg Peter Vink David Kotler In response to recent corporate and accounting scandals, in July 2002, Congress adopted the Sarbanes-Oxley Act of 2002. In signing Sarbanes-Oxley, President Bush stated that its provisions are the “most far-reaching reforms of American business practices since the time of Franklin Delano Roosevelt.” In basic terms, Sarbanes-Oxley created the Public Company Accounting Oversight Board and charged it with “oversee[ing] the audit of public companies that are subject to the securities laws, and related matters, in order to protect the interests of investors and further the public interest in the preparation of informative, accurate, and independent audit reports for companies the securities of which are sold to, and held by and for, public investors.” At present, Sarbanes-Oxley’s requirements are directed only to “public companies that are subject to the securities laws.” Individuals within public companies – not just CEO’s and CFO’s – have new obligations and more severe liability (both criminal and civil) for improper acts. What Does Sarbanes-Oxley Require? Getting Started What is a Control Process? Deriving a Data Model for Service Architectures Peter Aiken A metadata-based understanding is gained by a development process that applies eight transformations - organized into two phases - to each enterprise architecture/legacy system component. The eight transformations are applied in order to when effectively and efficiently developing an architectural component that is capable of delivering architectural and business engineering value. The transformations were illustrated in the presentation. Transformations and other forms of data analysis occur using model refinement and validation (MR/V) sessions in conjunction with key subject matter expertise (SME). Supporting Corporate Data Integration Projects Melvin Jones This presentation explored services, processes and procedures necessary to support data integration projects operating in a shared environment using repository-based ETL tools. Conclusions and Recommendations SAP Business Information Warehouse "Top 10 Pitfalls" -- What Every DW Professional Must Know Aaron Zornes SAP’s BW requires fine-tuning
as indicated by other enterprises’ “best practice” experiences.
This will enable business to leverage its SAP R/3 investment to achieve maximum
value. This presentation gave META Group’s bottom line prescription as: Department of Defense Net-Centric Data Strategy Alan Perkins In May of 2003, the DoD CIO announced
a new Net-Centric Data Strategy that radically changes their data management
and delivery paradigm. The transformation will be from standards-based, build-time
data administration to consumer-based, run-time, network-centric data/information
delivery. The vision is a virtual "marketplace" where data Producers
and Consumers find each other and "trade" information Commodities.
The approach has these tenets: The Goals are: The Critical Success Factors are: In addition, there are a number of
issues that, if not addressed, will result in sub-optimal implementation of
the strategy: In other words, the entire DoD data environment must be “net-centric.” 3-D Knowledge Models Henry Feinman The structure of Enterprise knowledge is complex – as complicated as any concrete product of our modern organizations. Other areas of endeavour have encountered the problem of complexity: Automotive design, structural architecture, computer chip design. In all these modelling domains the solution has been to use layers, perspective and 3-D CAD to build models that guide creation of these complex objects. Firstly: Data – Structured
and Unstructured, Business rules, Process, Entymologies – which of these
belong to our modelling endeavours? What part of Enterprise knowledge is captured
by our models, what part remains without? This presentation stepped through the model using VRML, parts explosions, and other tools available to modern CAD modellers to see what insights become available using an integrated 3-D knowledge model. It answered the questions: · What is Enterprise Knowledge? Tools for Modeling Privacy Requirements Karen Lopez Privacy is primarily a management problem, rather than a technical one. Recommended Actions for coming to
grips with privacy issues: Harnessing Technology to Fulfill Compliance Requirements Jeff Canter Companies evaluating OFAC software should consider the significant costs of implementing the interfaces between the OFAC software product and the many administrative systems they have. In doing so, you soon realize that the OFAC software cost ceases to be a major factor when compared to the cost of implementing the interfaces. Consequently, it is critical to select the best system from the most stable vendor available to minimize the possibility of having to redo the interfaces later.” Selection Recommendations: - Identify an established provider
with experience in compliance Managing the Downside of Data and Message Standardization Chito Jovellanos As part of the ‘Risk and Pitfalls of Messaging and SOA’ panel, this presentation highlighted commoditization, semantic monocultures, and complexity as issues to manage in order to offset the downsides of data and message standardization. Data metrics for scoping the impact of these factors were presented. Theory vs. Reality When It Comes to Faster, Better, Cheaper Approaches for Information Integration Dave Schrader Despite the hype, we still lack rigorous frameworks for evaluating the design and run-time impacts of new technologies and methodologies like Service Oriented Architectures, XML-based approaches to metadata and information exchange, and distributed vs. centralized approaches to enterprise information placement. This talk walked us through various architectures, explained how data warehousing solves many of the problems apparently motivating the latest hype cycle for EII, and outlines some experiments which are needed to understand not only the changes and potential advantages in design and reuse but also potential downsides in the run-time performance for various classes of applications. Metadata Based Enterprise Information Integration Arvind Shah 1. True Process Integration cannot
be accomplished without enterprise wide Data Sharing. Customer Data Integration (CDI) |