Object-Orientedness
in Databases - Motivation and the State-of-the-Art
by Kazimierz Subieta
(June 2006)
Back to Description of SBA and SBQL.
The field of object-oriented
databases (object bases) has emerged as the convergence of several research and
development threads, including new tendencies in object-oriented programming
languages, software engineering, multimedia, distributed systems, Web, as well
as the traditional relational database technologies. Actually, however, there
is a high degree of confusion concerning what “object-oriented”
means in general and “object-oriented database” in particular. For
many people the term “object-oriented” is a commercial buzzword
overloaded by a lot of meanings. Many professionals are trying to assign
strong, technical criteria to this buzzword, trying to distinguish
“object-oriented” systems from others.
What does make up object-orientation
in databases? The most popular view is that the databases consist of objects
rather than relations, tables or other data structures. The concept of
“object” is a kind of idiom or metaphor addressing the human
psychology and the way that humans perceive the real world, think and reason.
James Martin wrote [Mart93]: “my cat is object-oriented”, as he
observed that his cat distinguishes objects and predicts their behavior. This
is of course a nice metaphor to make the evidence that millions years of the
evolution have created in our minds mechanisms enabling us to isolate objects
in our environment, to name them, and to assign to them some properties and
behavior. The object-orientedness in computer technologies, from the
psychological point of view, is founded on inborn mechanisms of our minds, just
as the idea of a computer keyboard is based on the anatomic fact that humans
have hands and fingers.
Why is object-orientedness important
for computer technologies? For many years the professionals have observed the
growing complexity of software and software manufacturing methods. At the same
time the software becomes more and more responsible for vital branches of the
society life. Fighting with the complexity has became the major task of
software engineering. The complexity of software has became the main factor
disturbing the development of more and more sophisticated features that are
challenged in the human civilization processes.
Coping with the software
complexity requires a systematic approach. It is commonly believed that
complexity can be reduced by the following basic software engineering
principles:
Object-oriented tools,
languages, methodologies and systems in many ways contribute to the above
principles. Object and class encapsulation supports the refinement of
abstraction levels, thus allow the designer and programmer to reason on the
high abstraction level and then subsequently refine details.
Object-orientedness supports decomposing a project and corresponding software
into small conceptual units, in particular, objects and classes. Inheritance
and polymorphism are the main factors of the reuse of previously developed
artifacts. The object-oriented paradigm follows the natural human
psychology, thus reduces the distance between the human perception of the
problem (business) domain, an abstract conceptual model of the problem domain
(expressed, e.g., as a class diagram), and the programmer’s view of data
structures and operations. Minimizing the distance between the three views of
designers’ and programmers’ thinking (referred to as “conceptual
modeling”) is considered the major factor reducing the complexity of the
analysis, design, construction and maintenance of the software. This is
illustrated in Fig.44.

Fig.44. Object-oriented databases minimize the distance between the
human perception of real objects and objects understood as data structures
Object-oriented models offer notions
that enable the analyst and designer to map the business problem to the
abstract conceptual schema better. These notions include: complex objects,
classes, inheritance, typing, methods associated to classes, encapsulation and
polymorphism. There are several semi-formal notations and methodologies
(notably Unified Modeling Notation, UML) that make it possible to map
efficiently a business problem onto an object-oriented conceptual model. On the
other hand, object database systems offer similar notions from the side of data
structures, hence the mapping between the conceptual model and data structures
is much simpler than in the case of the traditional relational systems.
Object database systems combine
classical capabilities of relational database management systems (RDBMS) with
new functionalities assumed by the object-orientedness. The traditional
capabilities include:
New capabilities of object databases
include:
The wave of pure object-oriented
DBMS, which abandons assumptions of the relational database model, started in
1983, when D.Maier and G.Copeland presented a database management system with
the data model of Smalltalk. From that time many research prototypes and
commercial systems have been developed, among them Gemstone, ObjectStore,
Versant, O2, Objectivity/DB, Poet, Loqis, ODRA, and others.
This shift of the database paradigms
has caused a hot debate between advocates of relational systems, having already
a strong position on the market, and proponents of pure object-oriented
database management systems (OODBMS). To some extent, this debate was the
continuation of the old debate (early 70-ties) between the camp of DBTG CODASYL
systems based on the network model and proponents of the relational database
model. At that time the relational model eventually won, but some of its
promises have never been accomplished. The Codd’s 12 rules of relational
systems have notoriously been violated by vendors of commercial systems, which
have attached the buzzword “relational” to offered database
products, sometimes with little technical justification. Nevertheless, despite
a lot of trade-offs and commercial confusion, the relational model has been
successful as the conceptual and technical basis of many commercial relational
systems. This especially concerns SQL-based systems.
The current paradigm of researchers
and vendors from the relational world is conservative, if one concerns the root
idea of relational systems, but innovative concerning particular capabilities
that were built into new versions of relational systems. These include the
support for multimedia, Web, temporal and spatial data, data warehouses and
others. The extensions concern also some features of object-orientedness,
although in this respect the development is modest.
Unfortunately, the relational model
and the object model are fundamentally different, and integrating the two is
not straightforward. Current object-relational databases are commonly perceived
as eclectic and decadent, with a lot of ad-hoc, random solutions, with no
unified conceptual basis. As suggested by David Maier in
his famous interview, novel object-oriented features of object-relational
systems are rarely or never used. Standards within object-relational databases
(notably SQL-99), although rich with a lot of technical details, are considered
loose recommendations rather than strong technical specifications. Relational
databases can be considered as a particular case of object-oriented databases (with
no classes and with objects reduced to tuples), hence the term
"object-relational database" remains poorly defined or denotes almost
everything that has happened in the domain. The adjective
"relational" within the term "object-relational database"
looses its original meaning and has nothing in common with its mathematical
roots. Object-relational databases are not supported by any reasonable
mathematical theory and there are no serious attempts to develop it.
Actually it is difficult to discuss
which idea will be the winner and if there can ever be a winner. Few people
realize that relational databases store still ca. 12% of the total data stored
in all databases. There is a lot of information stored in files or in
proprietary databases that are not along any theoretical or ideological lines.
Hence, there is enough room for the coexistence of many ideas, including
relational, object-relational, object-oriented and XML-oriented models..
Object-oriented database models adopt
the concepts of object-oriented programming languages. Actually, there is no
agreement concerning their precise definition. The definitions presented below
are typical.
Technical details assumed by
designers in particular models, languages and products make concepts with the
same name (class, type, ADT, etc.) technically and practically very different.
Lack of commonly accepted definitions concerning the object model is considered
a weakness of object-orientation in databases. The standardization effort made
by the Object Data Management Group (ODMG) explained and unified many topics,
leaving however a lot them unexplained or making them even more obscure (e.g. a
metamodel).
The history of database manifestos started in
mid 80-ties, when E.F.Codd, the father of the relational model, published 12
rules of a true relational system. According to them, up to now, no commercial
RDBMS has been "truly relational". Current post-relational concepts
are going even farther and farther from the "ideals". Fortunately for
the relational systems and newer concepts, the "ideals" are
idealistic, unrealistic, Utopian.
The essential role in the development of object
DBMS was fulfilled by "The
Object-Oriented Database System Manifesto" by Atkinson et al, 1989.
One strong argument used by the relational camp was that there was no
reasonable definition of the object-database concept ("you guys
don’t even know what you’re talking about", object-orientation
presents "silly exercises in surface syntax"). The object database
manifesto has determined basic rules of object database systems, which abandon
the relational model. The characteristics of an object DBMS were separated into
three groups:
The object database manifesto was unacceptable
for the conservative wing of the relational camp. The competitive "The
Third Generation Database Systems Manifesto" by Stonebraker et al,
1990, postulates retaining all practically proven features of relational
database systems (notably SQL, as an “intergalactic dataspeak”) and
augmenting them modestly by new features, among them with some object-oriented
concepts. The manifesto is a random extract of primary and quite secondary
database features, expressed by a bit demagogic rhetoric. In particular, the
tenet that SQL is to be "intergalactic dataspeak" concerns future SQL
that will address much extended data model. Hence the manifesto postulates to
retain something what does not exist yet, and this is simply a nonsense.
"The
Third Manifesto" by Darwen and Date, 1995, postulates to reject both
object-orientedness and SQL (which - according to the authors - wasted the
ideals of the relational model), and to return to the bosom of the pure
relational model and 12 Codd's rules. The document is very controversial, at
least from the position of current trends of software engineering and
query/programming languages. The presented arguments are more ideological
("we know better, you should follow us") than technical and as such
are very hard to accept by the wide community of database professionals. There
is newest version of the manifesto, 2006, unfortunately, still retaining purely
ideological assertions.
The ODMG Standard
Object
Data Management Group (ODMG) was founded by a group of startup companies
who thought that traditional standard-making processes by official
standardization bodies were too slow and cumbersome. They got their first
publication (ODMG-93) out very quickly, with a big messages that everything
that is necessary for defining object-oriented database management systems is
already done: stop think, start implement. They also set expectations far too
high, by announcing that all the members were committed to delivering conforming
implementations by the end of 1994. Few professionals believed them, but of
course, it was part of the game. Till now, there is no evidence that the
standard is fully implemented by the ODMG members or by any other organization.
The last version of the standard appeared in 2000, then the ODMG announced the
big success of the standard and disbanded.
Unfortunately, the standard is no success and
some professionals consider it as a dead end that has impeded the progress in
object-oriented databases rather than supported it; c.f. the Suad Alagic paper
"The ODMG Object Model: Does it Make Sense?", 1997. The standard is
far from being complete (especially concerning the semantics and functionality
of defined languages) and contains a lot of bugs and inconsistencies, sometimes
fundamental ones, undermining consistent implementation. Hence the common
opinion that the standard is too early, immature, underspecified (especially
concerning the semantics of data model features and OQL) and incomplete. Some
professionals consider it also obsolete, in view of new XML and RDF-oriented
technologies. The standard presents also some inconsequences or conceptual
redundancies concerning the use of OQL and object querying and manipulation
interfaces defined directly as C++ (rough) specifications or Java interfaces
(presented as syntactic constructs only). The standard misses
essential features of current database systems, such as (updateable) views,
stored (server side) procedures, functions and classes, triggers, integrity
constraints, recursive queries, distributed databases, etc. From our side the
biggest disadvantages of the ODMG standard is the underspecified and unnecessarily
complex object model and essentially no idea how to cope with the semantics of
a query language. Lack of precise formal semantics makes it impossible to
develop strong type checkers and query optimizers. These drawbacks were the big
motivation for the research into SBA and SBQL.
On the other hand, we must realize that the
task undertaken by ODMG was obviously difficult. Although probably the standard
will not fulfill all expectations, it already plays an important role of
integrating research and development efforts devoted to object bases. The
standard has become a pivot and a relative point of many discussions and
comparisons concerning object-oriented databases. Currently, many projects both
in industry and academia are going along the lines that were determined by the
standard. Even if these projects take a critical position on the standard, it
becomes a departure point for various improvements and extensions. For this
reason the standard should be considered as a very important milestone for development
of future object bases, providing we realize what are its disadvantages.
The ODMG standard (version ODMG 3.0) consists
of the following parts:
The OMG set up the new Object Database Technology
Working Group (ODBT WG) and acquired the rights to develop new OMG
specifications for object-oriented databases. Although it is announced that the
new standard proposal will be based on the ODMG works, it is our hope that ODBT
WG will inherit all advantages of the ODMG standard and will abandon or improve
all its bad, controversial, redundant or non-implementable features. In our
opinion, it will be impossible without learning our experience with the
Stack-Based Approach (SBA) and the Stack-Based Query Language (SBQL).
Pure
Object-Oriented DBMS
Pure OODBMS products provide traditional
database functionality (e.g. persistence, distribution, integrity, concurrency
and recovery), but are based on the object model. They typically provide
permanent, immutable object identifiers to guarantee integrity. Pure OODBMS
also generally provide transparent distributed database capabilities (e.g.
transparent object migration, distributed transactions), and other advanced
DBMS functionality (e.g. support for Web, support for workgroups,
administrative tools). In comparison to their relational peers, OODBMS are well
suited for handling complex, highly interrelated data, particularly in cross-platform
and distributed environment. There are also some benchmarks showing that
performance of pure OODBMS is much better than RDBMS. This is due to the new
techniques, such as new caching methods, pointer swizzling, navigation via
pointer links instead of performing joins of relations, shifting processing on
the client side, and others.
There are however several perceived doubts
concerning OODBMS that are listed below:
Despite disadvantages, the market
forecast for OODBMS (in total) is optimistic. The great success of
object-orientedness on the ground of programming languages (C++, Java),
analysis and design tools (UML), interoperability middlewares (CORBA) and the
general tendency in software illustrated in Fig.44 creates a hope that sooner
or later object-oriented databases eventually conquer the market. New hopes are
also created by the OMG initiative concerning developing a new 4-th generation
standard for object-oriented databases.
Object-Relational
DBMS
Object-relational DBMS (ORDBMS) are based on
the idea that the relational model and SQL-92 implemented in the majority of
RDBMS should be extended. There are actually many ways to extend the relational
model, however, the one chosen by object-relational database vendors is very relation-centric.
A database still consists of a set of tables. A number of object features are
provided as an extension to this core relational model (multi-row tables,
references between rows, inheritance between tables, etc.).
At this time, there is no standard for the
object-relational model and each vendor has own object extensions. The
differences between Oracle-10, Informix Dynamic Server, UniSQL, IBM DB2 and
other object-relational products are significant. They exclude portability and
even a common conceptual and didactic basis. Various products present the
choice of quite random, redundant, limited and sometimes inconsistent features.
There is a new ANSI/ISO standard SQL-99, which is supposed to unify the
systems. There are doubts whether the standard will really play an essential
part in the development due the size and complexity, a lot of redundant
features comming from various schools of the IT business and a lot of
historical (obsolete) stereotypes that influenced the construction of the standard.
Essentially, the standard does not specify semantics or specifies it very
imprecisely. Moreover, the object features are just a minimal part of the
standard.
Object-relational companies have demonstrated
that this new technology can be implemented and that they are ahead in terms of
technology. The relational vendors have financial power, the ownership of the
already existing software development base, and a position as well as big
authority on the market. They rely on the trust of their former customers. The
pure motivation is the commercial profit rather than conceptual clarity or
software engineering principles. The products are criticized as huge, complex,
eclectic, thus decadent. David Maier
claims that essentially nobody is using new object-oriented extensions of
relational DBMS.
The key products of this technology include:
Informix Dynamic Server, Postgres, Illustra, IBM’s DB2 Universal
Database, Oracle-8/9/10, OSMOS, Ingres II, Sybase Adaptive Server, and others.
Almost every relational vendor has announced or promised object-relational
products. With except of some new systems (Postgress, Montage, Informix DS),
the systems are extensions of their relational predecessors. They are equipped
with facilities for efficient application development. These are: support for
multimedia and Web, spatial data, ADT-s, methods defined by the programmer,
collection-valued attributes, and others.
Many professionals
consider ORDBMS as a temporary result of the evolution from the relational to
the pure object-oriented technology. The current market, however, hardly
accepts new database ideas and languages, thus it may happen that this
evolution will take tens of years. Actually, it is very hard to predict where
this evolution will come.
XML and
RDF-oriented DBMS
Formerly, XML was promoted as
"better HTML" which will be able to bear structurally the semantic
information on Web resources. The information can be thus recognized and
utilized by some automatic software tools. Then XML files were recognized as a
way to exchange information among different software agents. In particular, XML
files can be a component of data exchange protocols or as a units that
parameterize and tune some software systems according to wishes of programmers
or users. Recently, however, XML is also perceived as a database meta-format
that makes it possible to store and to make available huge amount of
information.
There is some confusion between XML
considered as textual files and XML being stored within specialized
XML-oriented systems, such as Tamino or Berkley/DB. XML files have little
meaning as a way of massive data storage because of performance problems and
lack of many features such as transactions and security support. Parsed XML
that is stored in specially prepared databases have their origin in the source
XML files, but essentially they are stored as hierarchical data structures that
are 1:1 isomorphic with the structures of corresponding XML files. Such data structures
can be maintained and processed as objects in an object-oriented database.
Essentially, parsed XML data structures form an object-oriented database
allowing data structures more complex than relational ones, but much less
complex than fully-fledged object-oriented databases. Such an XML-oriented
database system does not support many features of object-orientedness, in
particular, pointer links among objects, classes and methods, inheritance,
encapsulation, polymorphism, etc.
Despite big commercial buzz around,
XML-oriented databases present essentially no new quality. It is difficult to
imagine why the users have to restrict data structures that they observe in
their business domain to pure hierarchies of objects, not supported by advanced
object-oriented database notions. Looking on Fig.44, for some database
applications limitation of data structures implied by XML will not be justified
from the point of view of the conceptual model of the problem domain. Moreover,
XML-oriented query/programming interfaces, such as XQuery, are not sufficiently
mature yet and even below the state-of-the-art in comparison to SQL.
Similar remarks concern RDF
(Resource Description Framework), which initially was promoted as a metaformat
for description of business ontologies. Currently some professionals are trying
to promote it as a universal database format. The model of RDF is fact-oriented
rather than object-oriented (where "facts" are coded as triples
<subject, feature, value>). Within this approach even boundaries of
objects are unknown and must be deduced by programmers from some informal
description. Many object-oriented features, such as classes with behavior,
inheritance among classes, etc. are outside of considerations. RDF Schema
facilities are much conceptually different from traditional database schemas.
Query languages for RDF, such as RDQL, are below the state-of-the-art.
Abstraction build in query languages, such as functions and views, are in the
infant stage. We are also not optimistic concerning the support of RDF by
mathematical theories, in particular, by formal logic. In our opinion, this
will not work in practice (as it is already shown by the entire Datalog
school). However, currently it is rather difficult to predict how this idea
will evolve and what will be the gains from it.
Java and C#
-oriented DBMS
In comparison to XML and RDF, Java
and C# object models are much more close to some ideal database object-oriented
model, but do not share all the necessary features that we have mentioned
before. This concerns:
·
Collections,
which in Java and C# are secondary rather than primary features. In particular,
nested collections have no counterparts.
·
Associations
implemented as pointer links are unavailable;
·
More
advanced forms of inheritance (multiple inheritance, multiple-aspect
inheritance, dynamic inheritance) are unavailable;
·
Persistence
is not an inborn feature of these languages. Essentially, Java and C#
persistence is based on relational databases and SQL rather then on truly
object-oriented databases.
·
Database
schema and metamodel are not considered in these languages as features
independent from particular applications acting on a database;
·
Query
languages that are embedded or integrated with these languages (Hibernate,
LINQ) are rather limited and referring (in the deep semantics) to relational
rather than to object-oriented databases. LINQ assumes straightforward mapping
between C# objects and relational tables; thus SQL as a back end. This
assumption involves a lot of limitations concerning data structures and the
query language features. Fully-fledged object-oriented query languages such as
SBQL are difficult to integrate with Java of C# programming environments, at
least according to our implementation experience (ICONS, YAOD, ODRA and J-ODRA).
·
Database
abstractions such as views, triggers, etc.
These limitations can be considered
secondary if one considers small non-critical applications. Limitations are
obviously fundamental for very large databases with a critical mission, thus
exclude the Java and C# persistency layers as true OODBMS.
Last modified: June 29, 2006