Sponsored by:

Data Discussions is a series of interviews with leading data management experts and practitioners,
presented by Wilshire Conferences. Click here to sign up to receive future editions.
FORWARDING THIS NEWSLETTER TO YOUR COLLEAGUES IS ENCOURAGED.

March 24, 2004

Metadata Best Practices:
Getting Metadata into the Development Cycle

An Interview with Doug Stacey of Allstate Insurance

Doug Stacey is the Team Lead for Metadata Infrastructure Support at Allstate Insurance Company. Allstate has one of the most sophisticated metadata programs that I know of, especially in terms of it’s integration with the actual run-time delivery of applications. A year ago, he and his team were rewarded for the excellence of their implementation with the Wilshire Award for Metadata Best Practices (Note: the 2004 Wilshire Award is now open for nominations). Doug is going to offer much of his metadata experience and strategy at the upcoming DAMA International Symposium and Wilshire Meta-Data Conference on May 2, in Los Angeles. Recently I asked him to share his story for Data Discussions.

Tony Shaw, Wilshire Conferences (Wilshire): Doug, your metadata efforts at Allstate are among the most sophisticated I’m aware of. Can you please explain the genesis of your metadata strategy at Allstate? What were the business problems you were attempting to address and why did you choose a strategy based around metadata management?

2004 Wilshire Award for
Meta-Data Best Practices

CALL FOR ENTRIES
Entries due April 7, 2004

Doug Stacey (Stacey): Thanks Tony. The main problem we were faced with in the mid ‘90s was to build an enterprise data warehouse. It soon became abundantly clear that in order to pull that off we were going to have to understand the data that was coming together from different systems in order to know what was really what. We had both the problems of data being the same in two different systems but being called different things, and data with the same name in different systems really representing two different things. I think it’s a common occurrence after years of application development with no centralized data strategy.

Fortunately we had some astute data management people here at the time who realized that if we were going to go to all the trouble of researching and defining the data headed towards the warehouse we’d better darn well make sure we’d never have to go through that pain again. It had to be captured for future reference. That was the genesis of the meta data repository.

Since that time we’ve branch out from just being a warehouse support function to more generalized data management covering operational systems as well as the warehouse.

The business problem was simple. Business users must understand and have confidence in the data they are working with. Without both understanding and confidence they are unable to be effective in their jobs.

Wilshire: Could you explain the architecture of your solution?

Stacey: Sure. We have a centralized metadata repository where we collect and store the meta data that holds interest for us. We then extract information from that management environment and push it to the warehouse for use in our Business Intelligence tool, or put it in place to support applications running the production work load.

We also have a web based meta data viewer to let application developers and business users reference the data in the repository.

Wilshire: One of the aspects of your implementation which the judges found particularly compelling during last year’s award judging process is your incorporation of metadata into delivering production applications. You’re not just collecting and storing the metadata but you’re actually using it at run-time. Could I ask you to describe this is some detail please…how you do it exactly?

Stacey: Well, as I said we gather metadata from several sources and store it in the repository. Primarily we gather logical data models, physical schemas, mapping information between the two, and codes and business values for enumerated domains (those for which we can list out the values, like STATE).

The information that is most useful to applications at run-time is the meta data around coded fields. In a perfect world all applications will use the same code values to represent the same piece of information. Needless to say, we don’t live in a perfect world. For instance, we’ve researched and documented 18 different ways our systems represent something as simple as Gender Type. It’s not practical for us to go back to each application and force them to change to a common representation of gender type across the organization. We can, however, document how each system represents gender type. We then tie that information together by matching on the business value, in this case let’s say ‘male’. We then deploy this information in our Universal Codes Translation table (or UCT). This is a production, run-time table (actually a series of tables) owned by the data management group which contains meta data from the repository to facilitate translations between applications.

An application typically knows who they are and where they’d like to send data. By referencing the UCT with this information, plus indicating what domain they are interested in and the value they needed translated, they can determine how the receiving application refers to that value in their world.

We also find that applications rely on the UCT to do lookups and validations. If they are accepting information from a GUI screen and a user enters a value of ‘03’ for, oh…say, Agent Role Type, the application can bounce that 03 against the UCT for their application and that domain and determine if it is valid or not. Likewise the UCT can be used to populate the business values into a dropdown on a GUI.

Wilshire: Now, you’ll be talking about all this during your workshop in Los Angeles on May 2 of course, but for the time being can you give me some idea of what you had to do to change the development and management “culture” in order to reach this objective, and then, to continue it? And what sort of issues did you face along the way?

Stacey: That’s a great question Tony, and a very important one. From the development perspective we have to provide value in the typical development lifecycle. We need to reduce some point of pain. In development, in the insurance industry, that point of pain was clearly the management of the codes. We have a group of Codes Analysts that are engaged, or hired, to work on a project. What we offer the development team is doing the research and, more importantly, the on-going maintenance of the codes used in their application. We set up a simple process where if the business introduces a new code they notify us and we take care of documenting it with a proper definition and getting it into the repository. We then schedule a deployment of that new code into the UCT. This greatly eases the burden on the application developer of having to try and manage their own look up table and figure out what to call it and match it to other systems they may be interfacing with. We take care of all of that for a small annual maintenance chargeback.

We can really impress them when they have a code that is new to their system but its one we have already captured as part of another project. We can give them the preferred coding scheme and all the associated business values in minutes. That can save a lot of work effort on their part.

The work of the Data Administrators and their logical modeling can require a little more “selling”. Some see the value more than others. If it’s a set of data being made available to the end user community through a warehouse and business intelligence tool it’s a no-brainer. The end users have become accustomed to seeing real English names and definitions and at this point would not be satisfied with anything less. Those names only come from the logical model. If it’s an operational system we usually concentrate on the value of the codes work and then deliver an estimate for both the codes and the modeling work. Present them as a packaged deal if you will.

As for the management culture, well, we were very fortunate that in the mid-90s as the warehouse was in its formative stages to have management that recognized the value of data management activities. Frankly, I think they had seen the problems associated with not doing data management themselves as they came up through the ranks. However it happened we had strong support for an Enterprise Data Architecture that described much of the meta data environment.

As for issues, you’ll always find people who are ‘non-believers’ and insist that they can get the work done without our involvement for much less money. What they usually don’t take into consideration is the ongoing maintenance of what they are creating. I can’t say enough about the importance of change management processes. We’ve developed those processes and we have the discipline to follow them.

Ultimately you have to provide a recognizable value for your customer. If you do that you will not only sway people over but also you’ll find, as we have, that you’ll get a base of people who wouldn’t consider approaching a project without data management as part of the plan.

Wilshire: Can you provide any usage figures with us? Number of users for example, and how often they access the metadata? And perhaps how this is changing over time?

Stacey: Let’s see…it’s been awhile since I’ve looked at the numbers…we have about 1000 unique users come to the web site in any given month. They generate about 50,000 “page views” a month. We’ve added some features recently so those numbers could very well be up by now.

That’s just the MetaData Viewer tool though. There are somewhere around 3000 queries executed against the warehouse each month so each of those users benefit from data management. And then you get into the numbers that are really hard to estimate. For instance, the Claims area is a big user of the UCT so most anyone who uses a Claims system is indirectly using our metadata as a dropdown pick list or when a code is validated on data entry. How many people is that?? I don’t know, 10s of thousands I would guess. We definitely see the usage increasing as more and more applications rely on our services.

Wilshire: I’m curious about executive support for this effort, both initially and now. Was it always there? Has it changed?

Stacey: Yes, I think it’s always been there. Data is a big part of the business in an insurance company. Think about what our product is…is protection, it’s piece of mind. We don’t have huge manufacturing plants or other big-ticket item assets. One of our main assets is the data. You’d be crazy not to manage something that is so much a part of the value of your business. Executive support was there at the beginning and it continues to be there today. I think the equation is very easy for them and they know there is benefit.

Maybe the insurance business is unique in that respect but I doubt it. I think you can make a case for the importance of the data asset in any company and once you’ve done that the next step is to present the case to manage it.

Wilshire: And what sort of measurements have you applied to your metadata project? Do you have any ROI results you can provide? And how did you calculate your ROI numbers please?

Stacey: I’ll be going over the details of a couple of ways to calculate ROI in my workshop in May so I won’t go though a bunch of numbers here. Let me say though that the key is reuse. You must put together something that estimates how much time is saved by the fact that you capture this information once and then are in a position to reuse it on subsequent projects. Let’s see if I can sight some numbers off the top of my head. For every domain that we have researched, precisely defined, and stored in the repository, that is mapped to a physical reference (a column in a table for instance) it is actually mapped to an average of 7 or 8 physical references. That adds up to a huge savings in application development when looked at across projects. When you apply the numbers for the cost of personnel and how many projects you are likely to do, etc. you can usually come up with a pretty attractive picture.

It’s always somewhat frustrating to me when I talk about ROI as what is so hard to capture, but what you know is providing a lot of value, is having the business user deal with terms and definitions that they can understand. Also, to have a common language being spoken across business groups is a huge benefit. It all falls into that “improved productivity” hole and it’s just difficult to measure. I know it’s a benefit though! Make sure you include that in any ROI discussion you are having.

Wilshire: An obvious question to wrap up this part of the conversation then…where do your metadata efforts go from here?

Stacey: We are actually in the middle of a very exciting time. There has been another big push here to take data management to the next level. Much of this is being done to more closely align our output with the needs of the business. I don’t want to get into the details but we’ll be expanding the types of meta data we collect and providing more means to feed that back to the business user. I think I’ll have lots to talk about at future Wilshire Conferences!

Wilshire: Thanks Doug, though you’re not eligible to win the Wilshire Award this year, you’ll be one of the judges. So I appreciate your time today, and we’ll look forward to your talk in Los Angeles on May 2, at the DAMA International Symposium and Wilshire Meta-Data Conference.


This editon of Data Discussions is sponsored by SearchDatabase.com & SeachOracle.com:
Access free white papers, time-saving tips & more Oracle-specific information resources at SearchOracle.com.
SearchDatabase.com features tech tips, white papers & more for DBAs and developers. Click for more info.



Join us for the
Wilshire Meta-Data Conference
and DAMA International Symposium

May 2-6, 2004 • Century Plaza Hotel • Los Angeles, California USA

The World's Largest Vendor-Neutral Data Management Conference

The 16th annual DAMA International Symposium and 8th annual Wilshire Meta-Data Conference will be held May 2-6, 2004 at the Century Plaza Hotel in Los Angeles, a beautiful venue adjacent to Beverly Hills. Hear 40 case studies outlining strategies of companies that have implemented successful data management projects. There will be more than 120 speakers in all, covering meta data, enterprise architecture, data and process modeling, unstructured data, business rules, data integration, XML, business intelligence, data warehousing, information stewardship, and more. Keynote Speaker Chris Date. Click here for details.


This "Data Discussions" is a series of interviews with leading data management experts and practitioners, presented by Wilshire Conferences. Click here for links to more Data Discussions interviews.

Click here to sign up to receive future editions.
For sponsorship information, contact Rick Froton at 603-305-0660.


©2004Wilshire Conferences, Inc. May be quoted with full attribution.