Universal
Data Models:
Simsion Interviews Silverston
Len
Silverston
President
Universal Data Models, LLC |
Graeme
Simsion
Senior Fellow
University of Melbourne |
This edition
of Data Discussions brings together two of data modeling’s leading
writers and educators. Len Silverston is one of the foremost proponents
of, and authorities on, universal data models. His two books (The
Data Model Resource Book, Volumes 1 & 2) are among the most
widely read and recommended on the subject. Len’s questioner is
Graeme Simsion – his Data
Modeling Essentials (co-written with Graham Witt) is considered
a seminal book the field. And as readers of Data Discussions will
know, Graeme tends also to be a provocateur, recognized for his
willingness to question conventional wisdom.
Both Graeme
and Len will be speaking at the upcoming Enterprise
Data Forum in Philadelphia, November 3-6. Len is teaching a
tutorial on Implementing
Universal Data Models to Integrate Data, and Graeme will be
chairing “the Modeling Forum” – an entire conference track devoted
to current issues in Data Modeling.
Putting these two gentlemen
together seemed like a good way to delve into the topic of Universal
Data Models, so let’s see if we were right…
Graeme
Simsion (Simsion):
Len, basics first…What exactly is a Universal Data Model?
Len
Silverston (Silverston): Webster’s dictionary defines “universal”
to mean “applying to a great variety of uses: comprehending, affecting
or extending to the whole:” Therefore a Universal Data Model is
a template or re-usable data model that is generally applicable
and that can be used by a great number of organizations to save
time and effort while offering holistic perspectives.
The idea of re-using
common data models is more obvious than the other aspect of Universal
Data Models which provides a holistic perspective. Data modelers
(including myself) are sometimes focused on particular data requirements
and may not always completely see the entire picture. This is where
a “universal” view can help.
For instance, when building
a product pricing data model, the modeler may model a PRODUCT PRICE
COMPONENT entity not realizing that these price components apply
not only to the base price of a product but also to other things,
for example, discounts or surcharges that are based on geography,
or by agreement, or based on the type of customer. Therefore setting
up a more generic PRICE COMPONENT entity offers a more re-usable
and holistic approach instead of having multiple entities to maintain
pricing structures. Likewise, when developing a CRM application
instead of adding fields to the CUSTOMER entity, Universal Data
Models can offer alternatives illustrating that maybe the name or
contact information should be associated with a PERSON, ORGANIZATION
or PARTY. This way the party’s information is consistent when this
same party is involved in another role, for example as a PROSPECT
or WEB SITE VISITOR.
Universal Data Models
include common data constructs applying to most organizations as
well as industry specific data constructs. For example, common data
constructs that apply to most organizations would include data models
for information about people, organizations, roles, relationships
between people and organizations, contact information, products,
services, inventory, pricing, requirements, quotes, orders, agreements,
shipments, projects, invoicing, payments, budgeting and accounting.
There are Universal Data
Models for many industries that build upon these common constructs
and offer additional extensions that may only be applicable to a
certain industry. For example, a manufacturing Universal Data Model
includes many of previously mentioned common data constructs but
also includes additional data constructs such as design engineering
models. Likewise, the insurance Universal Data Models includes additional
common constructs for claims processing, which are actually an extension
of the invoicing models since they both represent a request for
reimbursement.
Additionally,
there are also data warehouse Universal Data Models offering common
ways of modeling data warehouse and star schema constructs for example
regarding sales analysis, human resource analysis or financial analysis.
Simsion: How do clients respond to you having “ready
made” answers before you’ve ascertained their requirements?
Silverston:
I make it extremely clear to my client’s that I do not know their
particular requirements before I arrive and that it is extremely
important to keep a very open mind when collecting information requirements.
However, by having a toolkit of many “best practice” data constructs
for common data modeling problems, along with alternatives and pros
and cons behind these alternatives, then data modelers (such as
myself) can be much more prepared and data modeling efforts can
be tremendously streamlined.
If a data modeler
has available to him/her a set of Universal Data Models to re-use,
then when a common requirement is stated, the template models can
be applied. For example, if the data modeler discovers that they
need to maintain customer demographics as well as various types
of phone numbers, email addresses, and postal addresses for a customer,
then they can apply Universal Data Models that have been through
many iterations over the years and provide ideas about ways to best
model these structures.
I have made
many mistakes in implementing database and data warehouse designs
over the years. As a result, I have continued to find better and
better ways of modeling data and implementing databases. The Universal
Data Models provide not only my experience but lessons from many
very experienced professionals about effective methods for modeling
common constructs. Mistakes in data models and in databases are
one of the most costly aspects of systems development since they
represent a foundational component of the system. It is critical
to be aware of pitfalls in modeling common structures so we don’t
keep on repeating the same mistakes!
Another benefit
of having “ready made” modeling solutions is that questions can
be asked proactively regarding possible information requirements.
For example, if the subject data area is about customer relationship
management, the template data structures can highlight possible
areas to discuss such how non-solicitation requests are handled,
whether preferred calling times are needed, or if there a need to
maintain numerous last names, first names, or middle names for a
person.
Simsion: Do you see your model for a given subject
area as constituting the one right answer or only one possible answer?
Silverston:
Yes, I have the only one right answer and that makes everyone else’s
model wrong. I’m, of course, being facetious but we’ve all seen
this type of attitude in data modeling efforts and it poses a disservice
to our community. I believe the data management community is largely
about integration, which involves working together – not proving
each other “right” or “wrong”.
Pardon the tongue
in cheek response, but my real answer is that the models I am providing
are NOT the only one right answer for a subject data area, or even
for a very specific data construct! In my opinion, there is no one
right answer, especially when offering “universal” constructs that
can be generally applied to different situations. In my books, I
sometimes show alternatives to modeling various structures and point
out the pros and cons of each. When consulting, I will often provide
models that are not what I have in my Universal Data Model repository
but they are variations of the Universal Data Models based upon
the specific needs of the client.
I believe the
debate over being “right” when it comes to modeling a specific data
modeling construct can be very costly. I have been involved in many
efforts where data modelers debate over the “right” way to model
a particular construct. I was involved in one effort where the company
spent 150 million dollars on a data model that was ultimately shelved
and a great deal of this was spent arguing over which data modeler’s
construct was “right” (by very experienced modelers).
I also believe
that knowing and understanding various perspectives and possibilities
is very powerful. When data modelers have differences of opinions,
I will usually ask them to model the data requirements as they see
it and very often several excellent models emerge. From these various
alternatives, along with their pros and cons, an informed decision
can be made.
I have attended
Karen Lopez’s (moderator of the Data Modeling List) outstanding
conference session entitled “Data Modeling Contention Issues”. She
brings up various issues in data modeling (such as abstract versus
specific modeling, use of surrogate keys, use of a conceptual data
model, and even the idea of using template models) to a group of
experienced modelers and has participants publicly rate their responses
on a scale from 1 (strongly agree) versus 5 (strongly disagree).
What I loved about attending this session is that even though there
are near-religious debates about what participates believe is the
“right” way, she constantly brings awareness that “the most successful
discussions are ones where both sides learn something new about
the others viewpoint”.
The question
is not whether a Universal Data Model is the one “right” answer:
the question is can I re-use a Universal Data Model construct that
has successfully worked in other situations at other organizations,
in this current situation? Does the Universal Data Model construct
offer a useful perspective that I did not consider before I looked
at it? How can this Universal Data Model save me time and effort
in meeting a data requirement so I don’t have to re-invent effective
models for modeling common constructs?
Simsion:
How do your universal models compare with David Hay’s data model
patterns?
Silverston:
Again, the idea of having multiple perspectives is powerful. David
Hay has provided valuable contributions to our field and has diligently
and courageously offered his perspectives regarding ways to model
common constructs in his books, articles, speaking and consulting.
I believe that the basic idea behind “universal data models” or
“data model patterns” is the same – offering possibilities about
ways of modeling common constructs.
While we have
both offered our data modeling perspectives on some of the same
subject data areas such as parties, orders, accounting and manufacturing,
we have each also provided more extensive focus on certain subject
data areas. For instance (and not to be exhaustive), David has contributed
valuable, extensive template data models regarding modeling metadata,
business rules, and laboratory data models, while I have offered
comprehensive models for data warehousing and many industries such
as telecommunications, health care, insurance, professional services,
financial services, travel, and e-commerce.
The first sentence
of my book (“The Data Model Resource Book, Volume 1”) is “If you
can see more of the whole, you are moving closer to the truth”.
Therefore, the more perspectives that you can understand, the more
possibilities exist for you and the better position you are in to
make an effective, informed decision.
Simsion: What about industry models?
Silverston:
Good point. There are many industry models that exist that are also
good sources for ideas, perspectives, and effective ways of modeling
industry constructs.
For example,
I reviewed the health care HL7 model (which is publicly available
on the web) and gained some insight and understanding about the
type of data needed and the ways in which data could be modeled
in the health care models. There are many industry models available
such as the ARTS data model for retail or the CDISC data model for
the pharmaceutical industry.
I would highly
encourage using numerous sources for re-usable, proven data modeling
constructs that are applicable to your task at hand. Why not take
advantage of other people’s knowledge and offerings as opposed to
re-inventing data constructs that have already been researched and
modeled?
Simsion:
Why do you publish only data models – shouldn’t we also have corresponding
process models?
Silverston:
Yes, there is a huge need for Universal Process Models! The functions
of marketing, sales, order processing, logistics, accounting, budgeting,
and many others are often common and Universal Process Models could
help greatly save time and again offer additional perspectives.
I actually had
a draft chapter dedicated to Universal Process Models in my last
book and it was cut because of time constraints – I wanted to do
my best on developing and updating the industry data models, which
was a big enough challenge! However, publishing Universal Process
Models is definitely on my “to do” list.
I also believe
that we shouldn’t just stop there, but as a mature industry, we
should have universal models for all cells in the Zachman framework.
Why not have various template models for all aspects of systems
development?
Simsion:
Do universal models mean that we have less need for professional
data modelers?
Silverston:
No. In my opinion, for the foreseeable future, there is a huge need
to have professional data modelers who can understand and model
information requirements. Universal Data Models are not a substitute
for this. They are designed to provide an effective toolkit for
the professional data modeler.
Perhaps, if
we as data modeling professionals have better tools and methods,
then the demand for data modelers could increase as a result of
increased effectiveness, proficiency and maturity in the data modeling
field.
Simsion:
How relevant are the universal models to an object oriented development
project?
Silverston:
Very relevant. Most of my clients implementing Universal Data Models
are in an object oriented environment. Very often, the Universal
Data Models serve as the foundation for a relational database and
then an object oriented class structure is superimposed on the Universal
Data Models to allow object oriented programmatic access. Sometimes
a relational database is not even involved and the Universal Data
Model (customized to the organization) is used as a basis for the
object class structures in object oriented programs. Other clients
use the Universal Data Models as a “universal” method for passing
data, for example via XML.
Simsion:
What are the main objections you’ve faced to the use of universal
models?
Silverston:
Some of the main objections are:
“Our organization
is very unique - There is no such thing as a standard or universal
model”
I usually respond
that, yes, your organization is very unique, however the types of
data that you capture are usually very common. There are many standard
data requirements such as personal demographics, contact information,
order, invoicing, web activity, accounting, etc. that are needed
for most organizations.
“Generic modeling
is a great theoretical idea that has no basis in reality”
The reality
is that a great deal of small and large companies have successfully
implemented these models. Additionally, many of these universal
data constructs are now found in the latest versions of very popular
software packages.
“They are too
high level to add real value”
I will point
out that while some universal data models have abstractions and
generalization for integration purposes, there are many models that
offer a significant level of valuable detail and research based
upon real world experience.
“I can figure
it out myself just as fast”
However, are
there possible ideas or pitfalls that you may not have considered?
And what about the time that it takes to document the models and
train less experienced modelers on the rationale and explanation
behind these standard constructs?
“I already have
a model, so what good do these do me?”
Good that you
have a model. Then using Universal Data Models can provide a checkpoint
against your data model to ensure completeness and possibilities
for alternative ways of modeling the information.
“We aren’t doing
custom development - we are just implementing packages”
Most of my clients
implement packages. The Universal Data Model is often used to show
the integrated information requirements across these packages in
order to manage consistent, complete, and accurate information across
the enterprise. The packages offer a physical database design while
the universal data models are a jump-start towards providing the
enterprise’s information requirements.
“We are just
primarily maintaining systems”
Universal Data
Models are often used to evaluate new database changes as new needs
emerge, thus providing possible paths toward more flexible, integrated
systems.
“We are focusing
on data warehousing”
I believe that
data modeling is a fundamental step in developing data warehouses.
Additionally, there are Universal Data Model constructs for common
star schema designs.
Simsion: As more and more organizations choose to buy “off the shelf”
software, are we living in the past in trying to improve data modeling
practices?
Silverston:
I don’t think that buying “off the shelf” packages eliminates the
need to properly define an organization’s information requirements.
As data models represent a statement of information requirements,
I have used data models to help organizations evaluate and select
suitable application packages. Furthermore, application packages
usually offer a great amount of flexibility and in order for organizations
to know how to best implement these packages, they really need to
fully understand their information requirements. Finally and maybe
most importantly, there exists a huge need to integrate and synchronize
data from various packages since most organizations cannot fulfill
their needs with a single application package. An enterprise data
model is a very effective means for showing how information relates
across these application packages as well as for building integrated
enterprise-wide data stores.
Regarding improving
data modeling practices, this is something that is critical. Our
track record for data modeling has not been great. Many data modeling
efforts have struggled because they have cost more and taken longer
than the associated perceived business value. I believe the answer
is not to stop doing data modeling (or to stop gathering information
requirements) but that we need to do it better, in less time, with
better tools and methods, such as using re-usable components as
we do in other aspects of systems development.
Simsion:
You’ve made the models available in electronic format: what do you
think is the current position and future of CASE technology?
Silverston:
Yes, I offer a Universal Data Model repository in the CASE, or data
modeling tool, known as ERwin. CASE tools are vital in providing
a practical facility to properly capture our data models.
There is a huge
gap in the current CASE tools and the sophisticated needs regarding
capturing information requirements. We need CASE tools with greater
abilities to synchronize and map between various models so that
we can properly maintain conceptual data models, business data models,
logical data models, physical data models, object models, process
models and any other models that make sense for capturing these
requirements.
I believe the
future of CASE technology is that it supports a need that won’t
go away: the need to capture information requirements. While the
form of the information capturing may take on many forms such as
entity relationship modeling, object oriented modeling, UML or ORM,
one of the most foundational elements of systems development is
capturing information requirements. Thus, in my opinion, data modeling
(in some form) and tools for data modeling (e.g., CASE) seem to
be an essential aspect of quality information system delivery as
we move into the future.
Simsion:
Anything else I should have asked?
Silverston:
You have covered the highlights with great, insightful
questions Graeme. I appreciate your time and your amazing contributions
in our field. Many thanks also to Tony Shaw from Wilshire Conferences
for allowing me to share my perspectives here in Data Discussions.
Join
us for the
Wilshire Meta-Data Conference
and DAMA International Symposium
May 2-6, 2004 Century Plaza
Hotel Los Angeles, California USA
The
World's Largest Vendor-Neutral Data Management Conference
The
16th annual DAMA International Symposium and 8th annual Wilshire Meta-Data
Conference will be held May 2-6, 2004 at the Century Plaza Hotel
in Los Angeles, a beautiful venue adjacent to Beverly Hills. Hear
40 case studies outlining strategies of companies that have implemented
successful data management projects. There will be more than 120 speakers
in all, covering meta data, enterprise architecture, data and process
modeling, unstructured data, business rules, data integration, XML,
business intelligence, data warehousing, information stewardship,
and more. Keynote Speaker Chris Date. Click
here for details.
©2003
Wilshire Conferences,
Inc. May be quoted with full attribution.