A Few Thoughts on Data Modeling and Kids' Soccer

An Interview with William G. Smith

I first met Bill Smith over a dozen years ago when he and Bob Holland were teaching data modeling classes for my former employer, Technology Transfer Institute (TTI). He was then, and still is today, one of the most gifted educators I have ever seen.  Bill brings a unique combination of experience to each of his courses – he’s a former marine, a popular speaker and an accomplished artist.  And, strange as it seems to most of us in the IT field, Bill still teaches his classes using individually hand-drawn overheads.  Does that make him a traditionalist, or just a cartoon lover?  But what really surprised me during our conversation was how similar Bill's point of view is to that of Fabian Pascal.  The two men are complete opposites in personality, yet they agree on so many things.  Let’s find out why…

Tony Shaw, Wilshire Conferences (Wilshire): Bill, it’s great to reconnect with you after so many years. Now I hate to remind you of this, but wasn’t the worldwide data mess supposed to be all cleaned up, normalized, and all data redundancy eliminated by now?  Tell me, what went wrong?

Bill Smith (Smith): Yes, I can distinctly recall meeting with my associates back in 1985, when I formed William G. Smith & Associates, to do some big-picture, long-term thinking about what courseware and consulting services we were going to offer to the marketplace. I remember we projected that most companies and government agencies would have enterprise models done, and their (then-existing) legacy data messes largely cleaned up within 10 years (which would have been in roughly 1995). Clearly, things haven't worked out that way, and I'm profoundly disappointed that I/we haven't been able to turn the tide more than we have.

The information "industry" has always been driven by the vendors of hardware, programming languages/facilities, and increasingly, application software packages. Our industry has been plagued with the superficial marketing fads of the next great solution, the savior product, "silver bullet" solutions, etc., which have generally and profoundly damaged the credibility of IS professionals inside of companies and government agencies. Managing information resources is a very difficult and costly proposition, and what we've really failed to do is to get the senior management level of business and government to face up to this, and to then go about it intelligently, methodically, using the same principles and approaches we use to manage any major, costly resource of an enterprise. We still have a clientele who wants to believe in the next silver bullet. People in business will still do just about anything to avoid thinking.

Kids' soccer coaches despair over the tendency of the kids to run randomly around the field, ganging and chasing the ball wherever it goes. They struggle to teach them the purpose of playing a field position, of having some organized strategy and tactics to move the ball downfield and to score. I think this is a very good analogy for Information Resource Management in American enterprises - they're still like a bunch of kids chasing the latest fad all over the field, until the next fad (fueled by the relentless marketing of those who have $ to gain) is thrown onto the field. The vendors of hardware and application software packages love a big data mess in a potential customer site; it makes it so much easier to sell the idea that the enterprise just doesn't have enough hardware and software to "handle" all that data.

Wilshire: You’re still teaching data modeling Bill, and are regarded as one of the best instructors in the field. Obviously the subject area hasn’t gone away, but it’s evolved differently from what you imagined or hoped for. How has your teaching evolved over the years to deal with this “disappointment.” What do you tell students in your classes today to prepare them for the discrepancy between the vision and the reality?

Smith: My message really hasn't changed much over these years. We spent a lot of time and effort carefully distilling and defining very solid and stable rules to govern conceptual and logical data modeling techniques, and we still teach those rule-based modeling techniques. The rules about how to organize data for storage haven't changed, and I seriously don't think they ever will -- we are at bedrock. Vendors will continue to come up with the next great thing, the next programming language, the next network doodad, but organizing data according to subject is just plain common sense; always has been, always will be.

We also preach that data models must be done with the direct involvement of the best and brightest business people. Modeling requires hard thinking, and resolution of conflicting, ambiguous, and inconsistent terminology, business rules, and sometimes, business objectives to result in a model which will allow data to be productively shared so that information float in the business is collapsed to zero, work is eliminated, and the different parts of the business can be integrated in time and space simply by knowing the same things at the same time. With all my heart, I wish this wasn't true, and that data modelers could just go into a closet, build a data model and then the resulting shared databases without troubling the business with all those questions; but building good, stable, sensible, shared databases hinges on intense and competent involvement of the business people - those are the facts, and there isn't any way around them.

Wilshire: So, at a high level, what major trends do you see in data modeling today?

Smith: As I consider the big picture, a couple or three trends come to mind:

(1) The soccer kids are currently running all around the Internet/b-to-b/e-whatever ball. Data modeling as a necessary corporate skill and discipline has slipped to some notion of irrelevancy, in favor of building zillions of very poorly designed websites (oh, of course most of them have background databases which are/should be intelligently accessible through the web interface). Somehow in the zeal for website/page design and programming, we've lost the idea that it's still necessary to carefully define the meaning of that data, document it, and organize it intelligently so that it can be easily found/accessed. (The webheads have also lost, or never knew most of the basic principles of proper overall design, interface design, and programming.) I can't tell you the number of times I've visited some company's or agency's website, and been totally confounded by the inability to find data that should be so straightforwardly accessible, but seems to be organized/stored in such a way that the user simply can't access it in the most natural ways. I can sometimes find the data through a completely convoluted, strange access path, but not through the way that would seem to be the no-brainer, natural path. I can't imagine how some of these data stores are built/stored to produce such bizarre results. Of course, as a customer, I usually leave such a site in disgust, and either try someone else's site, or more normally, start searching for an 800 number to call them on the phone, talk to a real person, and find what the heck I'm after.

(2) I also see a completely unjustified assumption of data modeling expertise on the part of many "data professionals". We still offer public data modeling seminars, and we get many people who come to the physical data modeling/design class, asserting that they are fully competent to do the conceptual and logical data modeling that must be done well before you ever get to physical data modeling. The vast majority of those people are completely ignorant of the rules, the techniques, and the process one must follow, and are unable to actually do the work. I think that CASE tool vendors also promote the misconception that if you know how to operate the CASE tool, you know how to model. However, these are two completely distinct and different skill sets.

(Another discipline that exhibits this same phenomenon is project management. Somehow, everyone who has ever worked on a project feels that they are now a fully qualified project manager; most can't even read/interpret a PERT chart, let alone properly plan a project, construct a complete task/deliverable network, specify quality criteria for each deliverable, accurately estimate resource requirements, convert these into rational time estimates, acquire competent team members, supervise and control the work, establish, maintain, and leverage communication channels, measure all the metrics which must be measured/monitored, devise and execute corrective action when needed, and lead and motivate people through example.)

(3) Other disciplines/players/vendors in the data world (the object-oriented crowd, and the data warehousing advocates) are painfully ignorant of the already-complete and fine data modeling techniques, and bring confusion, ambiguity, and retrograde movement to the table by trying to solve problems, reinvent wheels that have are done deals.

Wilshire: Let’s drill down into a couple of those. Why are these trends emerging, what do they mean for companies, for modelers, are they good or bad, etc.

Smith: In general, I think all three of the trends I described above have to be considered unhealthy for data professionals in general, and for the data health of businesses. They all contribute to greater data mess-making. (for reference, these comments are numbered corresponding to the previous question…eds).

(1) Implementation of data stores (which can be accessed via the Internet) is really no different than implementation of any data store. You should do a competent, well-considered conceptual, then logical, then physical data model. The less experienced/competent/predictable the user, the more sensible the data store (or the more interpretive/friendly/intelligent the user interface) must be. I believe data modeling skills are just as important in web-land, as they were for building "internal" (to the business) data stores. The rapid expansion and advance of the Internet has dramatically and profoundly enhanced the ability to "share the same data at the same time", thereby collapsing information float between a business and its customers, a business and its suppliers, a business and its regulators, etc. This should be the golden age of data resource management, and data professionals must help their businesses see the enormous opportunities to eliminate time, cost, and problems by sharing the same data at the same time via the Internet. Proper organization, documentation, and quality control of the data resource is more important than ever.

CIOs must be educated to recognize that the Internet is a wonderful new technological means, but we still have to store/retrieve data, and we still have to write programs for people to use to store/retrieve that data, whether it is through a website or not. It is NOT a totally new world, and the old principles still apply. Don't throw the baby out with the bath water.

(2) Companies must invest in skills. The world of computerized information is a very complex (exacerbated by our react-mode mess-making) technical world, and it requires technical competencies; it goes along with using information to work faster, cheaper, better, and more flexibly. Data professionals must decide where their interests/proclivities lie, and then press their companies to secure for them the training required.

CIOs must stop the "we can do anything with nothing" nonsense; they must be honest and tough with their peers (and responsible to their information professionals) about the true costs of the skills required to build and maintain a viable information environment. Further, they ALL should be trained in the IRM approach: any company could have a much healthier, useful, efficient, and effective information environment than what it currently has, at 10% of the current cost! All we have to do is change the way we go about it, from the "soccer kids running after the ball" approach, to an organized, "know exactly what the goal is" approach. The CIO must educate his/her peers, bring some intelligent strategy to the table, get models and an overall implementation/migration plan done, and get on with job of replacing/simplifying the legacy mess, rather than simply reacting.

(3) We data professionals must penetrate and integrate with other disciplines/players/vendors in the data world (the object-oriented crowd, and the data warehousing advocates), educate them, build on what has already been accomplished, and ensure that data modeling practices are uniform, consistent, and coherent. I have yet to encounter the situation/environment/requirement that can't be successfully and properly modeled using plain old conceptual, logical, physical data modeling disciplines.


Wilshire: Coming up at the DAMA Symposium and Wilshire Meta-Data Conference in Orlando on April 28, you’re teaching a one-day tutorial on Transforming Logical Data Models into Physical Models. The logical-to-physical transition seems to be a perennial stumbling block for many organizations (and individuals). Can you tell us a little about your approach to the task?

Smith: We teach and practice a very organized, step-by-step, methodical process for transforming a well-normalized logical data model into physical artifacts for storage (files/tables, record/row types, fields/columns, indices, etc.). We also teach (in concert with the Conceptual Data Modeling, and Logical Data Modeling disciplines) that there are very different and very specific considerations that one must deal with at each step of the process, but not before; each unto its time. In general, these physical design issues are: deciding physical distribution/placement of the data across multiple storage locations; deciding when to store derived data at specific locations; dealing with very large volumes of stored data and/or simultaneous transactions using the same data; ensuring security and integrity of the physical data resource; meeting performance requirements for critical business transactions against the data; and maintaining sensibility of the organization of the physical data for dynamic, spontaneous business retrieval. So, our Physical Data Modeling process properly incorporates a step to analyze, make decisions and determinations, and then incorporate each of these physical issues into the PDM.

Another feature of the PDM process we teach is that it is practical, yet very effective given the current CASE tools, many of which can generate very straightforward, unsophisticated DDL for a physical database from the logical data model if held in the CASE tool. Virtually none of them allow the Physical Data Modeler to qualify/quantify these physical design issues/constraints, feed them into the CASE tool as requirements, and have the CASE tool do a zillion-iteration design, truly coming up with the "optimal" physical design, given the constraints the business has imposed. So, under the worst of circumstances (business wants to distribute the data all over the map, there are huge volumes of data to store, there is a lot of candidate stored derived data, there are very stringent transaction performance requirements to be met, and data integrity must be perfect 24/7) this will be somewhat arduous human labor (Horrors! We have to think?????)

Wilshire: What do you see as the appropriate deliverables in the transformation process?

Smith: There are three necessary "inputs" to the Physical Data Modeler: (1) the logical data model; (2) the business' physical design requirements/constraints; (3) the technological bed in which the data is to be stored/processed.

The final deliverable(s), or "outputs" of the transformation process are complete DDL specifications of each and every physical artifact (each file/table, each record/row type, each field/column, each index, and any other physical artifacts used by the particular DBMS in which the data will be stored) which must be defined to/for the DBMSs so that it can properly allocate/format appropriate spaces in which to store the data, and to build all of its necessary internal metadata and control information to allow it to properly manage (store, retrieve, protect, coordinate, maintain integrity, monitor performance/usage, etc.) the stored data. On the way to that final DDL specification, there are a lot of interim deliverables, carried along from step to step as the work progresses to completion.

Wilshire: Who would you advocate should ideally perform this transformation? The DA, the DBA? Have you seen an ideal combination of roles?

Smith: Generally, there have evolved two skill specialties: the conceptual/logical data modelers (often belonging to an organization, and wearing titles associated with "DA"), and the physical design and database-daily-care-and-feeding folks (often belonging to an organization, and wearing titles associated with "DBA"). The skills/knowledge of these two specialties are significantly different: generally, DAs don't really need to know about the gory details of the DBMS to properly interpret business information requirements, define and organize/normalize data properly into a logical data model; but DBAs absolutely need in depth knowledge of the guts of the DBMS (how it stores/finds data, how it protects the data, how it sets priorities, how it ensures integrity, what's efficient and what's not, what conflicts with what, etc.).

I think having these people specialized and organized into two different groups makes sense, but both should work for the same Data Resource Manager - one competent boss to resolve issues. During the transform from logical data model to physical data model, the physical data modelers should always begin with the nice, tidy, 3NF logical data model, and should twist/modify it as little as possible in translating it into physical artifacts to meet the physical design requirements/issues they are asked to meet. We say that the best organizational setup is for the DBAs (people trained and competent to do physical design for the particular DBMS(s) being used) to do the actual logical-to-physical transformation, but that the DAs (people trained in the proper subject organization/normalization of the data, given all possible uses) must retain review/approval authority over the physical data model. The 3NF logical structure of the data truly represents the best possible organization of the data, given all possible uses. On the other hand, the physical data modeler has other issues/constraints to satisfy, and this usually involves some twisting of the logical structure. A balance must be struck. If irresolvable issues arise, the Data Resource Manager is the judge; the two groups each make their case, and the DRM must competently decide.

Wilshire: OK, enough on modeling. What about the rest of the data management function? The big question I hear people asking lately is whether “classic” centralized DM has a future? What’s your perspective? Do you think it’s a legitimate concern? Will the traditional data management function still be around in 10 years?

Smith: It's interesting that as data messes keep getting worse, companies keep recycling: recognize that we have a data mess which is making it difficult to succeed as a business; start up a DA function; find out it's tough going (not quick and easy) to clean up the mess; fire everybody in the DA function; make more data mess; start up a new DA function; find out it's now even tougher to clean up the mess; fire everyone in the DA function; etc., etc., etc. I've watched this happen in many, many of our client companies, despite our efforts to get DRM to take root, and deliver on the promise, power, and benefits of a well-engineered, shared data environment to make the business function faster, cheaper, better, and to be able to change faster and easier. EVERY business WANTS this, but few have actually carried through with what it takes to achieve it. Too hard (whine, whine).

The bigger the data mess, the more difficult it is to succeed as a business. In considering whether or not to become a customer of a business, I absolutely judge that business on its data competency; if they've got a mess, I can accurately project poor/slow/expensive service, constant screw-ups, and prices which are higher than they should be; a no-brainer. And, there is NO SILVER BULLET that any vendor can sell a business, implement for a business, or do for a business to clean up the data mess. Clear thinking, and resolute action are required. This will either be done, or the business will eventually not survive as its competitors DO achieve it. If you think the problem is going to go away, or that there is a silver bullet lurking around the next corner, you're going to be sorely disappointed.

In our consultancy, we have never advocated a universal, one-size-fits-all "centralized" or "decentralized" organizational approach for IRM, DRM, etc. We DO advocate that a business must define very carefully what it wants the end result information environment to be like, and then the path to get there, and the correct organizational approach and control to achieve it becomes very clear. One of Steven Covey's principles of effectiveness is "Start with the end." I couldn't believe in this more strongly, and we advocate it in everything we teach and do for our clients. BE ABSOLUTELY CLEAR ABOUT WHAT YOU'RE TRYING TO ACCOMPLISH, AND YOU'LL VERY LIKELY ACCOMPLISH IT. FUZZY THINKING LEADS TO FLOUNDERING, WASTE, AND FAILURE. Unfortunately, the history of DA in most companies/government agencies can only be characterized as the latter.

Wilshire: Based on what you see in your client organizations, what aspects of data management work well (most) of the time? What projects have a high failure rate?

Smith: Tough question, Tony, as I tend to be a very tough judge of data resource success/failure. I sometimes have a client who is really happy that they achieved some small, innocuous accomplishment, but I would judge it as simply a small, innocuous accomplishment. I believe in going for the real solutions to problems, describing the desired outcome absolutely clearly, and any discrepancy between that stated goal and the actual outcome is failure.

Most DBA groups do a fine job of managing the "daily-care-and-feeding" of the physical databases of the business: monitoring usage, tuning performance parameters, reorganizing data stores when needed, backing up/recovering, storing data offsite for disaster recovery, implementing and monitoring security controls, ensuring availability, etc. We seem to have our act together pretty well in this department.

I see some of our client companies using data modeling very well and effectively on single-system-oriented projects to build parochial "system-oriented" (as opposed to sharable across the entire business) databases. I've also seen a couple of data warehouses that were very competently modeled, and are standing up to the inevitably changing business questions and retrieval patterns very well.

I've also had some clients who have defined and implemented effective data stewardship functions, with business people effectively performing the defined tasks and making the decisions required of data stewards. These programs help the business realize that the data resource is their resource, not something the IRM/IS/IT organization owns, and I see some businesses seriously and competently accepting and fulfilling their data steward responsibilities and authorities. (But, implementing data stewardship over a data mess is much harder than doing so for a well-organized, non-redundant data resource!)

Conversely, I rarely see companies do a competent "enterprise" conceptual data model, and then plan and execute implementation projects (with the required logical and physical data modeling) from that enterprise model to result in a well-engineered, actively shared, minimally redundant, subject-organized data resource for the entire business to share. For some reason, this just seems to be pathetically beyond management's comprehension and ability to execute.

With a few noteworthy exceptions, I also see the whole "data warehousing" movement as a colossal, blithering mess - another "silver bullet" solution with very little clear thinking behind it, until the ignominious failures started piling up. Most companies are not competently using data modeling techniques to model and build data warehouse data stores. Data warehouses (databases in general) still tend to be built for one user/group, and optimized for the use of the moment, when the whole idea should be to build it for many/all uses, and to accommodate the inevitable and continuing change of questions that the business will ask. And, most companies have bought the "data warehouse" approach (build more data mess on top of the data mess you already have...hmmm, is this really a solution???) without critical thought; run off the cliff with the rest of the lemmings. If we could redirect all the money that has been poured down the data warehouse drain into building competent enterprise data models which incorporate time/states so the data can be shared (yes, even for management decision making!!!!) throughout the business, we, our economy, our society, would be so much further down the road to data cost/effectiveness.

Wilshire: Are there any generalized lessons to be learned from those observations?

Smith: Yup:

(1) Traditional "system-land" approach to managing data ensures a poorly organized, vastly redundant, poorly-documented, inconsistent, untimely, and constantly multiplying data mess. That data mess is hurting your business, making it more expensive, slower, poorly-performing, and much harder to change than it can/should be.

(2) To fix the data mess, you've got to manage the three major components of the "information resources" of the business (the data resource, the application/transaction code resource, and the technological resource - the columns in a framework) in a coherent, coordinated manner. There must be well-built conceptual models for each of these, and all development must conform to these models to ensure that the desired outcome is achieved.

Wilshire: If you were to play “guidance counselor” for a group of data administrators, what skills would you suggest they develop in order to either stay relevant in their current roles, or to move on with the next stage of their career?

Smith: As for staying in the role of data professional: every data professional must be competent in one or more of the following skills: conceptual data modeling, logical data modeling, (both include the ability to properly facilitate modeling sessions), physical data modeling, daily care and feeding of the physical data resource, and/or managing other data resource professionals (which implies a certain level of competence in all of the aforementioned). Too many data professionals are "out of the loop" in their companies because they really can't DO any of these things. Every action required to properly build, manage, and utilize a data resource for a business involves these skills. Call Bill immediately for training! (Sorry, Tony, couldn't resist a shameless plug there!)

As for next step in career: I think an experienced data resource professional is the best candidate for an effective CIO, and data resource professional should set their sites on this position. I see CIOs that are technology/hardware nuts, and their data environments (and usually their application code environments) are inevitably terrible messes. I see CIOs that are application nuts, and their technology and data environments are inevitably terrible messes. BUT, if the CIO is a data nut, not only will the data environment be well organized, minimally redundant, well-documented, and under control, but so will the technology and the application/transaction code! If you manage data well, it is almost impossible to have a code or a technology mess!

Wilshire: Bill, you have an interesting background and mix of talents. You graduated from Annapolis and were in the Marine Corps, and I’ve certainly seen that no nonsense, straight-to-the-point side of you as a professional. But you’re also a masterful educator with immense empathy for your students and a talented artist and painter. Forgive me for saying so, but those characteristics seem a little incongruous? Do you have any explanation for your multiple personalities?

Smith: I think it is simply genetics, and credit is certainly due to my parents. I've always been torn between left-brain and right-brain interests, and have had abilities on both sides of that supposed divide. My parents also were very encouraging of any interests we had, and supported and encouraged my drawing and painting as well as emphasizing good grades in math and science! We weren't wealthy folks, and I can still remember my mom gulping and signing that $30 check to pay for a bunch of oil paints and brushes when I was 10 years old. I knew the family was sacrificing, and it gave me a lot of incentive to do it well.

My military background is responsible for my willingness to keep beating my head against the same wall. We always learned to take the "hard right, not the easy wrong" path. In my career, I could have easily jumped on each passing band-wagon, made a lot of money pushing the fad of the moment, "givin' them what they want". But, I've always felt that there was a "hard right" involved here, and I've always seen my career contribution as being important in the context of our country, our economy, our society. I've never been willing to just "go with the flow" because that's what everyone else seemed to be doing. I've always envisioned the remarkable way things could be in this country, and cannot stop striving for that vision.

Wilshire: You were going to pursue painting as a second career – I assume that’s the explanation behind all your hand-drawn training materials? What’s with those Bill…is it a creative outlet for you, or you just can’t be bothered to learn Powerpoint?

Smith: The art career is in full swing - I've had a very busy three-year backlog of commissions to do (portraits, wildlife, and landscapes), and one of the websites I'll be building this year will display my paintings. I suppose I'll have to enter the gallery world this year as well, in order to gain wider name recognition in the art world. I do love to paint, and have done so since my mom bought me those oil paints when I was 10. Those who paint will relate to this statement: it is not something that I choose to do, or not to do; it is something that I have to do.

And yes, my background in illustration and cartooning are accountable for our hand-done course materials. I've always believed a picture is worth a thousand words, and I try to pictorialize every important point in our training. I can still do that much faster with a Sharpie pen and a template than I can with a computer/application - hands down. The text gets tiresome to letter, but it's interesting; on student evaluations, the feedback is almost 9:1 in favor of the hand-done material. I attribute that to the same phenomenon described by John Naisbett in the "High Tech - High Touch" chapter of his great book, Megatrends. He asserted that as the world becomes more and more technical, people will require a balance of more personal, "hand-done", touch-friendly. Personally, I hate to watch, read through the same old, same old Powerpoint-looking slides, and it seems that is true for 9 out of 10 others. Also, by the time portable laptop-based Powerpoint (or any software with similar capability) came along, I had already developed (mostly in airports, on planes, in hotel rooms) over 2000 pages of hand-done courseware. Scanning it into a computer, and/or converting it to Powerpoint format is a huge task, and there seems to be little reason to do so, as the techniques/subject matter we teach are very mature and stable, and don't require a lot of change.

Wilshire: Thanks Bill, it’s great to talk to you.


DATA MODELING SEMINARS
by William G. Smith



This "Data Discussions" is a series of interviews with leading data management experts and practitioners, presented by Wilshire Conferences. Click here for an index of other interviews. Click here to sign up to receive future editions. For sponsorship information, contact Rick Froton at 603-305-0660.


©2003 Wilshire Conferences, Inc. May be forwarded to colleagues and quoted with full attribution.