
Apps are Temporary, but Data is Permanent
– An Interview with Craig Mullins
Craig
Mullins is Director, Technology Planning, for BMC Software.
And though he currently has ties to a specific software vendor, he’s
managed to maintain his status in the industry for authoritative and
unbiased advice on “Modern Database Administration.” He wrote
a book entitled Database
Administration: The Complete Guide to Practices and Procedures,
(2002, Addison Wesley — a 5 star Amazon rating by the way, congratulations!...Eds)
and is presenting a tutorial
at the DAMA
International Symposium and Wilshire Meta-Data Conference in Orlando
on April 28. He also writes prolifically in the field, and is
a contributor to various web publications (DBAzine.com, TDAN.com,
Database Trends and Applications, and others).
Craig and I had a
broad ranging conversation, throughout which he was highly opinionated
and plain-spoken. He offered a number of forthright views on
changing DBA roles, problems in database application development,
solutions for the data explosion and the future of DBMS technology.
Tony Shaw, Wilshire Conferences (Wilshire): Craig,
you’ve written that you see database administration is a misunderstood
function, and that it needs to be approached more rigorously as a
management discipline. Would you explain your perspective please?
Craig Mullins:
You’re right, database administration needs to be practiced in a more
rigorous manner. Too often the DBA is viewed as a fireman. This is
not meant to disparage firemen, of course, but the fireman is called
after the fire has started. Database administration practiced as a
management discipline would be focused on prevention first, cure second.
But I’m getting ahead of myself a bit here. Let me back up.
There are really several
layers of misunderstanding and poor DBA practices that need to be
addressed. The most pervasive these days, I think, is the situation
facing DBAs working for organizations where the Internet rules. As
companies move from traditional development to e-business development
there is the inevitable change in mindset that creates mad rushes.
This is the "get it done NOW!" mentality -- industry pundits
have even coined the phrase "Internet time" to describe
this phenomenon. Basically, when a business starts operating on "Internet
time" things move faster. One "Web month" is said to
be equivalent to about three standard months. There is a nugget of
truth in this load of crap, but it is mostly just a load of crap.
The nugget is this — Web projects move very fast because business
executives want to conduct more and more business over the Web to
save costs and to connect better with their clients. But just as frequently
projects are forced to move fast because someone read an article in
an airline magazine saying that Web projects should move fast and,
well, everyone else is moving fast so you'd better move fast, too,
or risk losing business.
This is bad for the DBA
because this approach is bad for database design. Why? Applications
are temporary, but the data is permanent. Organizations are forever
coding and re-coding their applications - sometimes the next incarnation
of an application is being developed before the last one even has
been moved to production. But when did you ever throw away data? Oh,
sure, you may redesign a database or move from one DBMS to another.
But what did you do? Chances are, you saved the data and migrated
it from the old database to the new one. Some changes had to be made,
maybe some external data was purchased to combine with the existing
data, and most likely some parts of the database were not completely
populated. But data lives forever.
So, the Internet causes
rapid delivery schedules which causes database administration to be
an afterthought. This means many organizations are trying to implement
database applications with no DBAs – instead, the application developer
acts as a pseudo-DBA and performs just the basics to get the application
delivered. Meaning, database design, performance, availability, and
maintenance will suffer.
Wilshire:
That all sounds bad, but you mentioned “several layers of misunderstanding”…
what else is missing?
Mullins:
So far I’ve just ranted on about those organizations doing database
administration on a “wing and a prayer,” but things are worse than
that. Many so-called mature organizations approach database administration
only as a “reactive” chore. This gets back to my fireman metaphor.
Oh, yes, everyone says they are “proactive” but that is usually a
great big lie. Many DBAs are up to their ears in paperwork, design
tasks, and performance tuning – with a line of folks out the door
of their cubicle looking for help. Now, how many of these DBAs do
you think are being proactive and looking for more “potential” problems
so they can fix them before they occur? None! They are all trying
to put out the fires that are on their desk.
Wilshire:
So what can be done?
Mullins:
A big step in the right direction would be to implement service level
management (SLM) and service level agreements (SLAs). Service level
management is the disciplined, proactive methodology and procedures
used to ensure that adequate levels of service are delivered to all
IT users in accordance with business priorities and at acceptable
cost. So, in order to effectively manage service levels, the business
needs to prioritize applications and identify the amount of time,
effort, and capital that can be expended delivering service for those
applications. When DBA is practiced as a management discipline every
database application will have an SLA attached to it.
A service level is a measure
of operational behavior. SLM ensures applications behave accordingly
by applying resources to those applications based on their importance
to the organization. Depending on the needs of the organization SLM
can focus on availability, performance, or both. In terms of availability,
the service level may be defined as "99.95% up time, during the
hours of 9:00 AM to 10:00 PM on weekdays." Of course, a service
level can be more specific stating "average response time for
transactions will be two seconds or less for workloads of 500 or fewer
users."
For a service level agreement
(SLA) to be successful all of the parties involved must agree upon
stated objectives for availability and performance. The end users
must be satisfied with the performance of their applications and the
DBAs and technicians must be content with their ability to manage
the system to the objectives. Compromise is essential to reach a useful
SLA.
A proper SLM discipline
makes performance management predictable. SLM manages the expectations
of all involved. Without a pre-defined and agreed upon SLA how will
the DBA and the end users know whether or not an application is performing
adequately. Not every application can, or needs to, deliver sub-second
response time. Without SLAs, business users and DBAs may have different
expectations, resulting in unsatisfied business executives and frustrated
DBAs. Not a good situation.
With SLM in place, DBAs
can adjust resources by applying them to the most mission critical
applications as defined in the SLA. Costs will be controlled and capital
will be expended on the portions of the business that are most important
to the business.
Wilshire: What do you see as the role of the
DBA in the application development process?
Mullins: That’s
a loaded question. Application folks generally don’t want to turn
over too much control of their applications to the DBA. And many DBAs
don’t want to get too ingrained in writing code. But the two functions
need to be joined at the hip in order to produce efficient database
applications. And this can be done without stepping on the toes of
either the DBA or the application developer.
The long and the short
of it is that DBAs must understand and participate in application
design, not necessarily the actual coding though. Application design
includes database concerns such as interfacing SQL with traditional
programming languages and the type of SQL interface to use. But every
aspect of program coding will affect the usability and effectiveness
of the application. Furthermore, each application program must be
designed to ensure the integrity of the data it modifies.
Designing a proper database
application system is a complex and time-consuming task and the choices
made during that process will impact the usefulness of the final delivered
application. An improperly designed and coded application may need
to be redesigned and re-coded from scratch if it is inefficient, ineffective,
or not easy to use. To properly design an application that relies
on databases for persistent data storage, the system designer must
match the application development languages and tools to the physical
database design and the functionality of the DBMS being used. The
first thing to be mastered, though, must be a sound understanding
of SQL. And, unfortunately, many application folks just do not have
enough SQL knowledge to formulate the most efficient query in every
situation. SQL is changing and growing, and the DBA will usually know
what is possible and what SQL formulations perform best when. The
application developer and the DBA need to work as a team to make sure
that this knowledge gets built into each and every database application.
Of course, there are other
issues, such as the interface to the programming language, drivers,
procedures for compilation, and so on. But let’s just summarize by
saying the DBA needs to be involved in application design to ensure
efficient database usage – and therefore efficient applications.
One final note here, the
DBAs need to participate in the design reviews for the code, too,
at least when the SQL is being reviewed.
Wilshire: It seems to me that data administration
and database administration are gradually coming closer together.
This is partly the motivation for us launching a DBA track at the
upcoming DAMA Symposium and Wilshire Meta-Data Conference in Orlando.
Would you agree with this observation about job function convergence,
and if yes, do you have any thoughts on why this is happening more?
And is it a desirable trend?
Mullins:
I agree that it does seem to be the trend, but I think this is an
unfortunate trend because it is happening for all the wrong reasons.
First let me state that I think the DA and DBA functions need to cross-train
so that the DBA understands more about the business and the DA understands
more about the technology. This enables both to do a better job because
the DBA will know why he is creating and maintaining the database
– and the DA will know what is possible given current technology and
why some things may need to change in order to be physically implemented.
So why is this a bad trend?
It seems to me that this trend is happening because when companies
cut back due to the economy (or for whatever reason) one place that
seems to get cut more often than not is the DA. Therefore, the DBA
is tasked with picking up the DA’s role now that the DA has been let
go. So the DBA does the bare minimum. Heck, he already has a more
than full-time job and can’t possibly pick up another full-time job.
So the DBA models as best he can, but other more important DA functions
drop by the wayside - repository maintenance, proper data usage, metadata
definition, stewardship programs, etc. all just kind of grind to a
halt – and over time, the practices die. Sure, the CEO or CIO still
says “We treat data as a corporate asset” but they don’t really.
Wilshire:
So in a large IT shop, what do you see as the ideal demarcation
between the roles of the DBA, DA and other data-related job functions?
Mullins: Data
administration (DA) separates the business aspects of data resource
management from the technology used to manage data. When the DA function
exists in an organization it is more closely aligned with the actual
business users of data. The DA group is responsible for understanding
the business lexicon and translating it into a logical data model.
The DA is involved more in the requirements gathering, analysis, and
design phases; DBA in the design, development, testing, and operational
phases.
Another differentiating
factor between DA and DBA is the focus of their effort. The DA is
responsible for issues such as identifying and cataloging the data
required by business users, production of conceptual and logical data
models to accurately depict the relationship among data elements for
business processes, production of an enterprise data model that incorporates
all of the data used by all of the organization’s business processes,
setting data policies for the organization, identifying data owners
and stewards, and setting standards for control and usage of data.
In short, the DA can be thought of as the Chief Data Officer of the
corporation. The DA position has nothing specifically to do with technology,
though.
The larger the organization,
the more likely that a DA function exists. However, when the DA role
is undefined in the organization, the DBA must assumes the mantle
of data planner and modeler for the organization. The DBA usually
will not be able to assume all of the functions and responsibility
of DA as summarized in this section because the DBA has many other
technical duties to perform that will consume most of his time, the
manager of the DBA group typically does not have an executive position
enabling him to dictate policy, the DBA generally does not have the
skills to communicate effectively with business users and build consensus,
and frankly, most DBAs are happier dealing with technical issues and
technicians, than with business issues and non-technicians
When DA and DBA functions
exist within the organization, the two groups must work very closely
with one another. It is not necessary that both have the same manager,
though it could make it easier to facilitate cooperation between DA
and DBA. At any rate, it is imperative that there be some degree of
cross-pollination of skills between the two groups. The DA will never
understand the physical database like a DBA, and the DBA will never
understand the business issues of data like a DA, but each job function
would be more effective with some knowledge about the other.
For anyone interested in more details, I tackle this question in much
more depth in my book, “Database Administration: The Complete Guide
to Practices and Procedures”
Wilshire:
To change tack a bit, let me ask you about increasing data stores.
Organizations are accumulating so much data nowadays, from so many
different sources. And I don’t just mean from a knowledge management
(“we don’t know what we know”) standpoint, but they really have little
idea what data they have or where it it’s stored. I talk to CTOs who
say this is their number one priority in the next two years –coming
up with a strategy to cope with their rapidly expanding data and figuring
out how to make use of it. Do you have any advice for them?
Mullins:
Interestingly enough, this problem could have been avoided too a great
deal by implementing and properly staffing a data administration function.
Oh, they might still have a lot of data, but at least they’d know
what it was, where it came from, and why it is there. Part of the
charter of a DA group is metadata management. If no data came into
the organization without first being categorized, classified, and
having metadata assigned to it, then at least we’d know what the data
was. Let’s face it, you can’t have it both ways. You can’t say out
of one corner of your mouth “We don’t have time or budget to define
our data” and then out of the other corner say, “But we want to know
what data we have, where it resides, why we have it, and when we can
get rid of it.”
Wait a minute, I take that
back. You can say it, but that’d make you a horse’s ass. (Can I say
that?)
Given that we are where
we are, here is some advice: implement tools that track data usage
and keep historical statistics. You might be able to convert application
performance monitors that track SQL to help out. For example, if the
monitor records the SQL statements then you’d be able to track what
columns in which tables were accessed when. With some additional work
you might even be able to determine which rows.
Of course, this works for
relational data only. What about all that non-relational data out
there in spreadsheets, word processing documents, data files, and
so on? One radical approach would be to change the security on those
files and then see who complains. As the complaints roll in you know
who is using the data. Of course, this can have a severe impact on
availability and I’m not recommending it as a general approach.
Another helpful tactic
would be for organizations to stop copying data willy-nilly and do
the hard work that is necessary for their applications and users to
utilize shared databases. It is a sad fact of life today that most
data is copied three, four, five, or more times – and there it sits.
The same piece of data used in the online systems, the batch systems,
for the data warehouse, on PCs in Excel, and, oh yes, it is in the
legacy IMS database, then we moved it to DB2, and then sent it off
to the Oracle database on Unix, too. Then we all sit back and complain
about how much data storage we need.
One final thought on this
topic: if your organization is serious about tackling the large amount
of data being stored, then you’ll need to get serious about data archiving.
For each database (or file) you need to assign a rule for when the
data can be archived to a cheaper storage type, or when the data can
be purged.
Wilshire:
Let me ask you a general open-ended question – what do you see
as the future of database and data management technology? What are
the major directions for the next 5-10 years?
Mullins:
At a high level, things will be getting more automated, more connected,
more wireless, and more complex.
The next wave of database
administration is intelligent automation. I refer to this as implementing
software scrubbing bubbles that “work hard, so you don’t have to.”
(Remember that commercial!) DBMS software and third party database
solutions are providing built-in intelligence to begin performing
tasks that, in the past, required human intervention; that is, required
a DBA. As these features become more robust and functional, the DBA
will be able to turn over proactive management of certain maintenance
and administrative tasks to the tools – leaving the DBA more time
to be proactive himself.
Regarding the wireless
aspect of things, pervasive devices (PDA, handhelds, cell phones,
etc.) will increasingly have a database system installed. The management
of that system is not so much an issue as will be the process of managing
the synchronization of the remote device with a centralized database
server. DBAs will need to get involved there to ensure successful
data synchronization. Another aspect of the wireless trend will be
wireless DBA – where the DBA uses a PDA to check on the health of
the databases. This will be another tool in the DBA’s bag of management
tricks – but the DBA will not be able to perform all of his tasks
using the PDA, just those that require round-the-clock vigilance such
as checking to see if the database instance is available or checking
the status of system parameters.
Let me borrow a phrase
from Ross Perot to describe another big database trend – “the giant
sucking sound.” This is the sound of the DBMS as it sucks up technologies
and functions that previously required you to purchase separate software.
Remember when the DBMS had no ETL or OLAP functionality? Those days
are gone. This will continue as the DBMS adds capabilities to tackle
more and more IT tasks.
Another trend impacting
DBAs will be a change in some of their roles as some of the recent
DBMS features actually start being used in more production systems.
One such trend I see is an off-shoot of the DBA role I call the Procedural
DBA.
Wilshire:
What exactly is a Procedural DBA? Can you elaborate on that a
bit please?
Mullins:
I coined the term Procedural DBA back in 1995 to describe a new type
of database administration required by modern database management.
The concept is a simple one: a new type of DBA is required to manage
the procedural logic that is increasingly being stored in relational
database management systems.
Initially, the sole purpose
of a DBMS was to store, manage, and access data. Over time, database
management systems have evolved by integrating procedural logic in
the form of complex triggers, stored procedures, and user-defined
functions. This code is tightly coupled to the DBMS. As these features
are exploited, management tasks such as administration, design, and
tuning typically are assigned to the current DBA staff by default.
But this is not always the best approach. What is required is an expansion
of the role of database administration.
The Procedural DBA takes
on the responsibility of ensuring that triggers, user-defined functions,
and stored procedures are developed appropriately and will run efficiently
and effectively once implemented. This requires both application programming
and database administration skills. You see, these procedural objects
have properties of both a program and a database object. So the Procedural
DBA must be able to span both disciplines to function. Oftentimes,
DBAs are not adequately trained in both areas. It is a distinctly
different skill to program than it is to create well-designed relational
databases.
Whether the procedural
objects are actually programmed by the Procedural DBA will differ
from shop-to-shop depending on the size of the shop, the number of
DBAs available, and the scope of implementation. Minimally, the Procedural
DBA should lead procedural object code reviews, assure reusability,
and manage the database environment to assure successful procedural
object implementation and use. Additionally, the Procedural DBA must
be on-call to help out when a stored procedure, trigger, or user-defined
function fails.
Wilshire:
And the notion of a DBA scrubbing bubble is interesting, too.
Can you shed some more light on that topic?
Mullins:
Sure. The trend I see is a continuing build-up of intelligent automation.
This differs from traditional automation. With traditional automation,
human interaction is necessary – an individual uses a tool to create
a script or set up an alert so that when X happens, the tool automatically
does Y.
With intelligent automation
the DBMS or tool is smart enough to figure it out without human intervention.
So, instead of the DBA needing to constantly monitor database statistics
to determine when to reorganize a table space, the tool will keep
track of the statistics and have built-in knowledge of what those
statistics mean. And we will see more knowledge being applied than
is currently happening. For example, in DB2 one of the indicators
for reorganization is cluster ratio. If the ratio falls below say
90%, then think about reorganizing. But software with intelligent
automation can take advantage of performance management software that
will examine access patterns. Say for that same table space 98% of
the access is random (as opposed to sequential). This means that cluster
ratio is not as important, so the “scrubbing bubble” can factor that
into the mix and schedule the reorganization based not just on statistics,
but actual usage patterns. This will be revolutionary.
Furthermore, the software
will understand the mitigating circumstances, such as “Can’t schedule
a REORG in the middle of the day when database activity is high”,
“Have to work the REORG into the batch window,” and “These particular
jobs will highly benefit from a REORG so is there room to squeeze
the REORG in before that.” This will be done by tighter integration
between database tools and systems management tools and by intelligent
analysis of job execution trends.
Wilshire:
Going back to the issue of growing data again for a moment, I’m
particularly curious to ask you about unstructured data. Not just
documents, but the whole range of unstructured “stuff”, like web pages,
maps, pictures, audio files, video images, etc. There are all sorts
of great reasons to try and bring more structure to this data in order
to make it more manageable, but are we raising our expectations too
high? I mean, making this variety of file types and sources available
and searchable is really a huge challenge. Can we meet user expectations
and integrate all this data effectively?
Mullins:
Not today we can’t. Seriously Tony, most organizations are barely
managing to handle their traditional data properly, so how can we
expect them to take on this huge task? And the DBMS products are not
up to the task yet either. Oh, they’ll allow you to store the unstructured
data as some type of LOB (large object), and even to access it if
you learn some different techniques. But management is another topic
altogether. Usually traditional database utilities (LOAD, UNLOAD,
REORG, etc.) will not operate on the LOBs. So once they are there,
you are sorta stuck with them.
Federation is an alternate
approach. One method is to just allow the DBMS to access the unstructured
data where it lies. You don’t pull it into the DBMS. This has certain
advantages because it extends the usefulness of the unstructured data
without putting additional management burden on the databases.
Wilshire:
I have to ask, since it’s the topic de jour (and seems to have been
for the past couple of years). XML – where are you seeing it used
to manage data? Is it working as expected? Are the available tools
satisfying the customer needs? What are the high-payoff data management
implementations? Any general observations on its efficacy for mainstream
corporate data management?
Mullins:
The hype around XML is just astounding. It started off as a tag-based
mechanism to make data transfer more understandable. Now, if you believe
everything you read, XML is going to slay dragons, leap from tall
buildings, and enable your Grandma to write an order entry application.
What nonsense! Sure, XML can be used as a format for exchanging data.
The tags help to make the data somewhat self-documenting. But I don’t
really see XML as anything all that groundbreaking.
And XML databases? Somebody,
please, stop it before it happens! XML is hierarchical – do you really
want to go back to the days of IMS? I know, I know, some of us never
left those days – and that is fine. But we don’t need another hierarchical
monster on our hands. Sure, we’ll probably be storing some XML data
in our relational databases – either as CLOBs or by breaking the document
apart and storing its elements as columns of tables. But let’s not
go backward and champion yet another type of DBMS. I wish the XML
DBMS movement all the success that Object DBMS movement has achieved
(sarcasm noted…ed).
For those who are not in
one camp or another yet, just take a look at XQuery – compare that
sucker to SQL for ease of use. SQL isn’t perfect, but after trying
to use XQuery you’ll likely run screaming back to SQL.
Wilshire: Craig, I’m envious that you manage
to write so prodigiously. When do you find the time to do so much
writing? What’s your secret? Perhaps I should ask the question the
other way around -- when do you do your “real” job?
Mullins:
I’ve always loved to write and I make the time to write a little bit
every day – no matter what. I suppose I am blessed because I don’t
find it to be a job, it is just something I enjoy doing. The only
time it can become burdensome is when I am working on a book and a
deadline is looming. Books are time-consuming; it is kind of like
working a second job. But it is so rewarding to finally hold the finished
product in your hand – and even more rewarding to hear from people
who read the book and find it useful!
As for the second question,
writing is part of my “real” job, so I am lucky there, too. But my
“real” job also feeds my writing. I present at user groups and conferences
and many times the attendees bring up great questions giving me ideas
to write about. And our products at BMC address all of the most troubling
DBA problems, so my “real” job helps to seed my writing with ideas.
Wilshire:
Thanks for taking the time to talk Craig.
Join
us for the
Wilshire Meta-Data Conference
and DAMA International Symposium
May 2-6, 2004 Century Plaza
Hotel Los Angeles, California USA
The
World's Largest Vendor-Neutral Data Management Conference
The
16th annual DAMA International Symposium and 8th annual Wilshire Meta-Data
Conference will be held May 2-6, 2004 at the Century Plaza Hotel
in Los Angeles, a beautiful venue adjacent to Beverly Hills. Hear
40 case studies outlining strategies of companies that have implemented
successful data management projects. There will be more than 120 speakers
in all, covering meta data, enterprise architecture, data and process
modeling, unstructured data, business rules, data integration, XML,
business intelligence, data warehousing, information stewardship,
and more. Keynote Speaker Chris Date. Click
here for details.
©2003 Wilshire Conferences,
Inc.
Data Discussions, the Wilshire Meta-Data Conference and the respective
logos
are service marks of Wilshire Conferences, Inc.
DAMA
is vendor-neutral and does not endorse any particular vendor. |