
Truth, Fads and Principles:
What's Wrong with the Database Industry?
An Interview with Fabian Pascal
This issue of Data Discussions is with Fabian Pascal, an independent
industry analyst, consultant, author and lecturer specializing in database
management. What makes Mr. Pascal especially “unique” is his reputation
for taking a highly critical and contradictory position in almost all
his public opinions. For example, he’s a prolific writer who calls
his regular columns “Setting Matters Straight” (quarterly, on
www.TDAN.com) and “Against the
Grain” (on www.DBAzine.com).
His own web site is called DATABASE DEBUNKINGS (www.dbdebunk.com).
And after a recent presentation, I told him the audience obviously liked
his talk, to which he replied “I must have done something wrong.”
Clearly he doesn’t suffer fools, so lest I demonstrate my own ignorance
and incur his wrath, I enlisted some help for this interview from my
colleagues Bob Seiner of TDAN.com and Craig Mullins of BMC Software
and DBAzine.com. Both gentlemen have an awareness of Fabian’s
hot buttons.
Tony Shaw, Wilshire Conferences (Wilshire): Fabian,
you’ve successfully positioned yourself as the curmudgeon of the database
world. Your opinions are highly likely to disagree with, and in
many cases anger, vendors and the trade press. This is rather
an unusual marketing strategy for a consultant, yet you seem to relish
the reputation. Have you always had this contrarian personality
or was there a pivotal moment of disillusionment that put you on this
path?
Fabian
Pascal (FP): I’ve been called worse, which does not bother me. If
telling the truth and relying on science--rather than uninformed personal
opinions, poor reasoning, or regurgitating vendor press releases--invites
anger from vested interests—which is expected--so be it. This is what
marketing should be based on, it should be the rule, not the
exception. The issue is not disagreement per se, but the basis
for it: it’s OK to disagree, but you better reason and ground your position
in knowledge. It is that which is lacking in the industry: people disagree
for no good reason, they just don’t know what they’re talking about
and, what is worse, they don’t want to know.
The question
is how you define success. There is little doubt that I would have been
much better off financially had I not bucked the industry. But
I must be able to look myself in the mirror, and I wouldn’t be able
to do that had I been saying, like everybody else, that which is popular,
but incorrect. So my measure of success is the degree to which I do
not compromise on principles.
Wilshire: You are particularly critical of the corruption
of the relational model by vendors. And I’ve read where you’ve said
there are no relational database management systems, but rather they
are all SQL DBMSs. Can you explain please?
FP: The
facts is that a vast majority of professionals do not know/understand
the relational model, and what is worse, they do not bother to learn
it; but it does not stop them from criticizing it. If they really knew
it, they would be able to compare the products to it and see what the
discrepancies are, as well as their practical implications. Without
such background it is very difficult to understand and appreciate what’s
wrong.
I keep hearing “If
the relational model is so good, why hasn’t it been implemented right
yet?” But this is a copout. The answer is implicit in the question:
If you don’t bother to educate yourself on the subject, why should you
rely on somebody else to do the right thing and bring you the right
solution, and how could you tell that’s indeed what it is? In that state
I could sell you anything, and vendors do. Why should vendors bother,
if their customers buy all the marketing nonsense--like “it’s theory,
and therefore not practical”-- and whatever fad they come up with? If
vendors or the press tell you that “post-relational” DBMSs are better
than RDBMSs, when in reality they are old, nonrelational technologies,
how can you figure it out if you don’t know what a RDBMS is,
and what it’s supposed to do for you? It’s so much easier for vendors
to sell to uninformed customers.
The relational model
is simply the application of logic to database management. Whether practitioners
are aware of it or not, whether they like it or not, databases are collections
of predicates and DBMS are essentially logic inference engines. And
they better like it, because logic is what guarantees correctness
of the information recorded in databases, and the answers obtained from
them (with correctness defined as consistency—internal, and with
the business rules in effect). Can anybody say with a straight face
that corrupting the foundation of database management is acceptable?
Yet, this is precisely what they say when they ignore or violate relational
principles. Do any of the proponents of other approaches or technologies
really believe they can replace logic as a basis for database management,
and what exactly do they propose to replace it with that is better?
I have yet to get an answer to this question.
SQL
itself, as well as its commercial implementations, failed to adhere
to the model and violated it in a plethora of ways, all of which cause
numerous practical problems. What is more, it is a poorly designed language,
difficult to implement and, therefore, products suffer from serious
implementation flaws on top of relational and language weaknesses. Ted
Codd, Chris Date, David McGoveran, myself and others have amply documented
this so anybody who is interested can find the information in our writings.
It is generally thought that the problems are due to SQL products being
relational, while in reality they are due to their not being relational
enough. If logic is ignored, what do you think the consequences
will be? It’s like building bridges ignoring the laws of physics.
Wilshire:
Well I’m guessing you’re not on Larry Ellison’s Christmas
card list. Yet, for the time being we’re stuck with the products that
the vendors have built for us today. So what advice do you have for
practitioners who have the bulk of their data in SQL database systems?
What can they do to mitigate the potential problems and pitfalls you’ve
identified?
FP:
There is no magic solution; if products are bad, they’re bad. Learn
data fundamentals and the relational model and assess technologies,
products and practices accordingly (you can take my, or Chris Date’s
seminars for that purpose :)). Know and understand what the deficiencies
are and how to minimize their impact. Don’t rely exclusively on vendors,
and do not believe anything you read in the trade press. Have your own,
solid base of knowledge to evaluate things, not industry claims.
Without
such knowledge there is no reason to assume that what practitioners
and the industry are doing makes sense. I mean, consider: SQL DBMSs,
object DBMSs, “universal DBMSs”, multivalue DBMSs, XML DBMSs--do we
really need a new technology every few years? Why, if each is claimed
to be the right solution? Do our informational needs fundamentally change
so often? That in itself indicates something is wrong. It’s profitable
for the industry, but a costly proposition for users.
Wilshire:
Craig points out that you’ve written about a “True” Relational
DBMS (TRDBMS). What would it take to build one? And what would happen
if some vendor ever did create one? How would that vendor help to make
the TRDBMS successful; what would happen to that entire legacy SQL out
there?
FP:
It is difficult to develop and sell a TRDBMS, given the way in which
the industry operates. The big vendors, like Microsoft, Oracle and IBM,
are vested in their existing technologies, with large installed user
bases; they are not likely to make fundamental changes. And a small
company does not have the resources to compete with them.
Be
that as it may, there are two startups that came up with some goods.
One is Alphora, that implemented
Dataphor, a product based on the proposals of Chris Date and Hugh Darwen
in THE THIRD MANIFESTO.
The other is Required Technologies,
but unfortunately I cannot say much about this one for all sorts of
reasons. Stay tuned to DATABASE DEBUNKINGS!
There is little doubt that these are superior to SQL products (although
they are forced, unfortunately, to support SQL) and much closer to the
model, and those who tried Dataphor, recognized the benefits. But they
require informed, educated users, who take risks, and I’m afraid that’s
a scarce commodity.
It’s precisely
because languages never die and make migrations very costly and difficult,
that things should be done right in the first place. But the industry
operates the wrong way: they keep forcing users to migrate and map constantly,
instead of doing productive work. It’s really mindless.
Wilshire:
Both Craig and Bob suggested I would get a strong reaction
if I ask you about XML – both as a language, and the use of XML for
data management.
FP:
XML was invented by text publishers, who had no knowledge of data
management, purportedly for data exchange. But exchange requires a physical
format, not a data model. First, there are tons of formats in the industry
and any one could have been used, why invent yet another? And second,
XML is actually a bad physical format for exchange; it is highly and
unnecessarily inefficient, to the point where it is increasingly violated
to get performance out if it.
Now
they are adding a data model to it, to be able to do any data management
(see Tags Do Not a Language Make)
and, as Chris Date points out, the first thing they had to do to define
their “model” was to discard the notion of an XML document as the
fundamental data object! What can you conclude from this
fact? The model they did come up with is the same hierarchic model which
we discarded 30 years ago and replaced with SQL, because it was too
complex, inflexible and lacked rigor. I call the whole insanity “The
Exchange Tail and the Management Dog”, the title of my new seminar.
Would such regressions be accepted if practitioners understood data
fundamentals? No way.
Wilshire:
There are obviously lots of people and companies that are
pushing XML as a standard for data integration. If XML is not the answer,
what is?
FP:
Whenever so many push something so hard, it is suspect: it smacks of
yet another fad.
Integration
is to IT what motherhood and apple pie is to American culture: everybody
wants it and promises it. But very few really understand what it means.
For example, they are talking about the “semantic web” and all that
jazz. But XML came out with very little semantics-–no integrity and
no manipulation—what kind of integration can you have based on that?
And how comfortable should you be with semantics added post-hoc to a
physical data format, and without a theoretical foundation?
For data management
you need truly relational RDBMSs (TRDBMS). For data exchange you can
have any efficient physical format, as long as it is agreed upon.
But the industry has not ever been able to agree even on standard physical
formats, how likely are they to agree on semantics, particularly something
so complicated as the semantics of hierarchies? They can’t even come
up with specifications because of this. When XML pushers talk about
it, it is almost always structure, not the purpose of the structure,
integrity and manipulation. And it’s there that complexities crop up.
That’s what the relational model was devised to eliminate.
Wilshire:
At the conference coming up in April (the DAMA International
Symposium and Wilshire Meta-Data Conference in Orlando, April 27-May
1) you’re conducting a workshop called “The Dangerous
Illusion: Normalization, Performance, Integrity and the Logical-Physical
Confusion.” What will you be talking about in the workshop?
FP: This
is an excellent example of how ignorance of fundamentals can lead people
astray. Most practitioners believe that relational design—normalization—is
bad for performance. But this is, of course, logically impossible: how
can logic, which governs the truth or falsehood of propositions about
the real world, have anything to do with implementation details
and performance? Practitioners denormalize for performance, without
realizing that if they get any performance gains (and that is
by no means certain), it does not come from denormalization, but actually
from ignoring the integrity implications of that redesign. In
fact, they trade unlikely performance gains for almost certain corruption,
without being aware of it.
When
I bring this up with practitioners, most do not know what I mean. Some,
however, say that they are aware of it and that they took the necessary
steps to prevent corruption. But I know this is not true. First, when
I ask them what are the integrity constraints that were added for this
purpose, they have no idea how to formulate them (for that you need
to know and understand the relational model). And second, had they added
those constraints, they would have realized the futility of denormalization
as a performance enhancer, because they would not have gotten any
gains! Nobody who understands the fundamentals would want to denormalize
for performance. And they sure would not blame the model, rather than
the products.
Wilshire:
And then later on you’re doing a Night School session on
“To Laugh
or Cry: More Fundamental Fallacies in Database Management.” What
new observations do you have in store for us?
FP:
The first part of the title, “To Laugh or to Cry?” says it
all. I will discuss blatant examples from the trade media and industry
practice, of the amount of prevalent ignorance, and the high cost that
you can end up paying if you don’t know your fundamentals to see through
it all. These are the “best” specimens from the tons I collect during
the year, and it’s not easy to select them, believe me. I post others
as weekly quotes at DATABASE DEBUNKINGS, but there are too many even
for that.
Wilshire:
Let me get your short takes on some of the “hot topics” in
the data management community today. What do you see is appropriate
to “De-Bunk” about these?.
FP:
I normally frown upon “sound bites”, which tend to contribute to the
problems I’ve been referring to. These topics require time to discuss
intelligently and meaningfully, particularly with an audience that is
not strong on the fundamentals. So what I can do is provide links to
writings on the subjects (either by Chris Date, or by me, or exchanges
I had with readers).
a)
UML:
b)
Dimensional modeling:
c)
OO (or OODBMS – your choice)
d)
Business rules
e)
ORM (Object Role Modeling):
f)
Agile Methods
g) MySQL (the
open source database)
Wilshire:
I like this one--Bob suggested I ask you this question.
You’ve written widely about your admiration for the relational model.
But is there anything in the 30 years since then that you would consider
to have been a positive “revelation”, or a contributing factor in the
advancement of the database management industry? Are there in fact
ANY positive revelations or are they just, as you call them, fads?
FP:
In science revelations are rare. Science advances slowly, gradually
and carefully, via the efforts of many involved in thinking, writing,
reviewing, discussing, testing, duplicating for validation, correcting
errors, etc. And in this sense there has been considerable advancement.
We know much more and understand better the relational model today,
then we did in the 70’s; and we find new insights and benefits all the
time, indication that it was indeed a revelation. Progress is
clear from works such as Date and Darwen’s THE THIRD MANIFESTO and,
with Lorentzos, TEMPORAL
DATA AND THE RELATIONAL MODEL. But it’s hard to come up with
revelations such as Codd’s, they don’t come that often anyway, let alone
in a field that operates the way the database field does.
Only
in industry and business can somebody wake up one morning and think
he invented something new, without ever bothering to check and find
out that it was tried and discarded in the past (like the hierarchic
model). This is how “revelations” such as XML or “universal DBMS” come
about, and they turn out to be nothing of the sort. In fact, in my editorials
and lectures I demonstrate how even academia has renounced its educational
and scientific functions and is rapidly becoming a certifying and training
vehicle for vendors, so not much can be expected from that quarter.
And
yet once in a while, against all odds, some revelation does occur, and
the implementation technology invented by Required Technologies promises
to be one. How it’s going to play in the industry, however, is a tossup.
The past is littered with superior technologies, products and practices
that failed because of the inefficient ways of the market.
Wilshire:
Finally Fabian, any predictions on what’s next for database
technology, or for the industry?
FP:
You wouldn’t know it from the media, vendors and pundits, but to be
honest, rather than do the usual optimistic closing, I must say that
things are continually deteriorating. Practitioners today know much
less fundamentals than previous generations, and if this trend continues,
future generations will know still less. As I already stated, with all
these fads coming and going, most of the time and effort goes into migrating
from one fad to another, in mapping from one model to another, and in
trying to make all the disparate technologies and acronyms work together.
It’s a colossal waste of resources, but in the absence of knowledge
and appreciation thereof—which not only is not rewarded, but punished—what’s
there to stop it?
Let
me close by quoting somebody much smarter than myself, who has recently
passed away (thanks to Paul Vernon for bringing this quote to my attention).
Note when he said it! Sound familiar?
"I
hope very much that computing science at large will become more
mature, as I am annoyed by two phenomena that both strike me as
symptoms of immaturity.
The one is the widespread sensitivity to fads and fashions, and
the wholesale adoption of buzzwords and even buzznotes. Write a
paper promising salvation, make it a "structured" something
or a "virtual" something, or "abstract", "distributed"
or "higher-order" or "applicative" and you can
almost be certain of having started a new cult.
The
other one is the sensitivity to the market place, the unchallenged
assumption that industrial products, just because they are there,
become by their mere existence a topic worthy of scientific attention,
no matter how grave the mistakes they embody. In the sixties the
battle that was needed to prevent computing science from degenerating
to "how to live with the 360" has been won, and "courses"
-- usually "in depth"!-- about MVS or what have you are
now confined to the not so respectable subculture of the commercial
training circuit. But now we hear that the advent of the microprocessors
is going to revolutionize computing science! I don't believe that,
unless the chasing of dayflies is confused with doing research.
A similar battle may be needed"
--E.W.
Dijkstra, My hopes of computing science, 1979
Wilshire:
Fabian, thank you.
Join
us for the
Wilshire Meta-Data Conference
and DAMA International Symposium
May 2-6, 2004 Century Plaza
Hotel Los Angeles, California USA
The
World's Largest Vendor-Neutral Data Management Conference
The
16th annual DAMA International Symposium and 8th annual Wilshire Meta-Data
Conference will be held May 2-6, 2004 at the Century Plaza Hotel
in Los Angeles, a beautiful venue adjacent to Beverly Hills. Hear
40 case studies outlining strategies of companies that have implemented
successful data management projects. There will be more than 120 speakers
in all, covering meta data, enterprise architecture, data and process
modeling, unstructured data, business rules, data integration, XML,
business intelligence, data warehousing, information stewardship,
and more. Keynote Speaker Chris Date. Click
here for details.

©2003 Wilshire Conferences,
Inc. May be forwarded to colleagues and quoted with full attribution.
|