Object-Orientedness in Databases - Motivation and the State-of-the-Art
by Kazimierz Subieta (June 2006)
Back to Description of SBA and SBQL.
The field of object-oriented databases (object bases) has emerged as the convergence of several research and development threads, including new tendencies in object-oriented programming languages, software engineering, multimedia, distributed systems, Web, as well as the traditional relational database technologies. Actually, however, there is a high degree of confusion concerning what “object-oriented” means in general and “object-oriented database” in particular. For many people the term “object-oriented” is a commercial buzzword overloaded by a lot of meanings. Many professionals are trying to assign strong, technical criteria to this buzzword, trying to distinguish “object-oriented” systems from others.
What does make up object-orientation in databases? The most popular view is that the databases consist of objects rather than relations, tables or other data structures. The concept of “object” is a kind of idiom or metaphor addressing the human psychology and the way that humans perceive the real world, think and reason. James Martin wrote [Mart93]: “my cat is object-oriented”, as he observed that his cat distinguishes objects and predicts their behavior. This is of course a nice metaphor to make the evidence that millions years of the evolution have created in our minds mechanisms enabling us to isolate objects in our environment, to name them, and to assign to them some properties and behavior. The object-orientedness in computer technologies, from the psychological point of view, is founded on inborn mechanisms of our minds, just as the idea of a computer keyboard is based on the anatomic fact that humans have hands and fingers.
Why is object-orientedness important for computer technologies? For many years the professionals have observed the growing complexity of software and software manufacturing methods. At the same time the software becomes more and more responsible for vital branches of the society life. Fighting with the complexity has became the major task of software engineering. The complexity of software has became the main factor disturbing the development of more and more sophisticated features that are challenged in the human civilization processes.
Coping with the software complexity requires a systematic approach. It is commonly believed that complexity can be reduced by the following basic software engineering principles:
Object-oriented tools, languages, methodologies and systems in many ways contribute to the above principles. Object and class encapsulation supports the refinement of abstraction levels, thus allow the designer and programmer to reason on the high abstraction level and then subsequently refine details. Object-orientedness supports decomposing a project and corresponding software into small conceptual units, in particular, objects and classes. Inheritance and polymorphism are the main factors of the reuse of previously developed artifacts. The object-oriented paradigm follows the natural human psychology, thus reduces the distance between the human perception of the problem (business) domain, an abstract conceptual model of the problem domain (expressed, e.g., as a class diagram), and the programmer’s view of data structures and operations. Minimizing the distance between the three views of designers’ and programmers’ thinking (referred to as “conceptual modeling”) is considered the major factor reducing the complexity of the analysis, design, construction and maintenance of the software. This is illustrated in Fig.44.
Fig.44. Object-oriented databases minimize the distance between the human perception of real objects and objects understood as data structures
Object-oriented models offer notions that enable the analyst and designer to map the business problem to the abstract conceptual schema better. These notions include: complex objects, classes, inheritance, typing, methods associated to classes, encapsulation and polymorphism. There are several semi-formal notations and methodologies (notably Unified Modeling Notation, UML) that make it possible to map efficiently a business problem onto an object-oriented conceptual model. On the other hand, object database systems offer similar notions from the side of data structures, hence the mapping between the conceptual model and data structures is much simpler than in the case of the traditional relational systems.
Object database systems combine classical capabilities of relational database management systems (RDBMS) with new functionalities assumed by the object-orientedness. The traditional capabilities include:
New capabilities of object databases include:
The wave of pure object-oriented DBMS, which abandons assumptions of the relational database model, started in 1983, when D.Maier and G.Copeland presented a database management system with the data model of Smalltalk. From that time many research prototypes and commercial systems have been developed, among them Gemstone, ObjectStore, Versant, O2, Objectivity/DB, Poet, Loqis, ODRA, and others.
This shift of the database paradigms has caused a hot debate between advocates of relational systems, having already a strong position on the market, and proponents of pure object-oriented database management systems (OODBMS). To some extent, this debate was the continuation of the old debate (early 70-ties) between the camp of DBTG CODASYL systems based on the network model and proponents of the relational database model. At that time the relational model eventually won, but some of its promises have never been accomplished. The Codd’s 12 rules of relational systems have notoriously been violated by vendors of commercial systems, which have attached the buzzword “relational” to offered database products, sometimes with little technical justification. Nevertheless, despite a lot of trade-offs and commercial confusion, the relational model has been successful as the conceptual and technical basis of many commercial relational systems. This especially concerns SQL-based systems.
The current paradigm of researchers and vendors from the relational world is conservative, if one concerns the root idea of relational systems, but innovative concerning particular capabilities that were built into new versions of relational systems. These include the support for multimedia, Web, temporal and spatial data, data warehouses and others. The extensions concern also some features of object-orientedness, although in this respect the development is modest.
Unfortunately, the relational model and the object model are fundamentally different, and integrating the two is not straightforward. Current object-relational databases are commonly perceived as eclectic and decadent, with a lot of ad-hoc, random solutions, with no unified conceptual basis. As suggested by David Maier in his famous interview, novel object-oriented features of object-relational systems are rarely or never used. Standards within object-relational databases (notably SQL-99), although rich with a lot of technical details, are considered loose recommendations rather than strong technical specifications. Relational databases can be considered as a particular case of object-oriented databases (with no classes and with objects reduced to tuples), hence the term "object-relational database" remains poorly defined or denotes almost everything that has happened in the domain. The adjective "relational" within the term "object-relational database" looses its original meaning and has nothing in common with its mathematical roots. Object-relational databases are not supported by any reasonable mathematical theory and there are no serious attempts to develop it.
Actually it is difficult to discuss which idea will be the winner and if there can ever be a winner. Few people realize that relational databases store still ca. 12% of the total data stored in all databases. There is a lot of information stored in files or in proprietary databases that are not along any theoretical or ideological lines. Hence, there is enough room for the coexistence of many ideas, including relational, object-relational, object-oriented and XML-oriented models..
Object-oriented database models adopt the concepts of object-oriented programming languages. Actually, there is no agreement concerning their precise definition. The definitions presented below are typical.
Technical details assumed by designers in particular models, languages and products make concepts with the same name (class, type, ADT, etc.) technically and practically very different. Lack of commonly accepted definitions concerning the object model is considered a weakness of object-orientation in databases. The standardization effort made by the Object Data Management Group (ODMG) explained and unified many topics, leaving however a lot them unexplained or making them even more obscure (e.g. a metamodel).
The history of database manifestos started in mid 80-ties, when E.F.Codd, the father of the relational model, published 12 rules of a true relational system. According to them, up to now, no commercial RDBMS has been "truly relational". Current post-relational concepts are going even farther and farther from the "ideals". Fortunately for the relational systems and newer concepts, the "ideals" are idealistic, unrealistic, Utopian.
The essential role in the development of object DBMS was fulfilled by "The Object-Oriented Database System Manifesto" by Atkinson et al, 1989. One strong argument used by the relational camp was that there was no reasonable definition of the object-database concept ("you guys don’t even know what you’re talking about", object-orientation presents "silly exercises in surface syntax"). The object database manifesto has determined basic rules of object database systems, which abandon the relational model. The characteristics of an object DBMS were separated into three groups:
The object database manifesto was unacceptable for the conservative wing of the relational camp. The competitive "The Third Generation Database Systems Manifesto" by Stonebraker et al, 1990, postulates retaining all practically proven features of relational database systems (notably SQL, as an “intergalactic dataspeak”) and augmenting them modestly by new features, among them with some object-oriented concepts. The manifesto is a random extract of primary and quite secondary database features, expressed by a bit demagogic rhetoric. In particular, the tenet that SQL is to be "intergalactic dataspeak" concerns future SQL that will address much extended data model. Hence the manifesto postulates to retain something what does not exist yet, and this is simply a nonsense.
"The Third Manifesto" by Darwen and Date, 1995, postulates to reject both object-orientedness and SQL (which - according to the authors - wasted the ideals of the relational model), and to return to the bosom of the pure relational model and 12 Codd's rules. The document is very controversial, at least from the position of current trends of software engineering and query/programming languages. The presented arguments are more ideological ("we know better, you should follow us") than technical and as such are very hard to accept by the wide community of database professionals. There is newest version of the manifesto, 2006, unfortunately, still retaining purely ideological assertions.
The ODMG Standard
Object Data Management Group (ODMG) was founded by a group of startup companies who thought that traditional standard-making processes by official standardization bodies were too slow and cumbersome. They got their first publication (ODMG-93) out very quickly, with a big messages that everything that is necessary for defining object-oriented database management systems is already done: stop think, start implement. They also set expectations far too high, by announcing that all the members were committed to delivering conforming implementations by the end of 1994. Few professionals believed them, but of course, it was part of the game. Till now, there is no evidence that the standard is fully implemented by the ODMG members or by any other organization. The last version of the standard appeared in 2000, then the ODMG announced the big success of the standard and disbanded.
Unfortunately, the standard is no success and some professionals consider it as a dead end that has impeded the progress in object-oriented databases rather than supported it; c.f. the Suad Alagic paper "The ODMG Object Model: Does it Make Sense?", 1997. The standard is far from being complete (especially concerning the semantics and functionality of defined languages) and contains a lot of bugs and inconsistencies, sometimes fundamental ones, undermining consistent implementation. Hence the common opinion that the standard is too early, immature, underspecified (especially concerning the semantics of data model features and OQL) and incomplete. Some professionals consider it also obsolete, in view of new XML and RDF-oriented technologies. The standard presents also some inconsequences or conceptual redundancies concerning the use of OQL and object querying and manipulation interfaces defined directly as C++ (rough) specifications or Java interfaces (presented as syntactic constructs only). The standard misses essential features of current database systems, such as (updateable) views, stored (server side) procedures, functions and classes, triggers, integrity constraints, recursive queries, distributed databases, etc. From our side the biggest disadvantages of the ODMG standard is the underspecified and unnecessarily complex object model and essentially no idea how to cope with the semantics of a query language. Lack of precise formal semantics makes it impossible to develop strong type checkers and query optimizers. These drawbacks were the big motivation for the research into SBA and SBQL.
On the other hand, we must realize that the task undertaken by ODMG was obviously difficult. Although probably the standard will not fulfill all expectations, it already plays an important role of integrating research and development efforts devoted to object bases. The standard has become a pivot and a relative point of many discussions and comparisons concerning object-oriented databases. Currently, many projects both in industry and academia are going along the lines that were determined by the standard. Even if these projects take a critical position on the standard, it becomes a departure point for various improvements and extensions. For this reason the standard should be considered as a very important milestone for development of future object bases, providing we realize what are its disadvantages.
The ODMG standard (version ODMG 3.0) consists of the following parts:
The OMG set up the new Object Database Technology Working Group (ODBT WG) and acquired the rights to develop new OMG specifications for object-oriented databases. Although it is announced that the new standard proposal will be based on the ODMG works, it is our hope that ODBT WG will inherit all advantages of the ODMG standard and will abandon or improve all its bad, controversial, redundant or non-implementable features. In our opinion, it will be impossible without learning our experience with the Stack-Based Approach (SBA) and the Stack-Based Query Language (SBQL).
Pure Object-Oriented DBMS
Pure OODBMS products provide traditional database functionality (e.g. persistence, distribution, integrity, concurrency and recovery), but are based on the object model. They typically provide permanent, immutable object identifiers to guarantee integrity. Pure OODBMS also generally provide transparent distributed database capabilities (e.g. transparent object migration, distributed transactions), and other advanced DBMS functionality (e.g. support for Web, support for workgroups, administrative tools). In comparison to their relational peers, OODBMS are well suited for handling complex, highly interrelated data, particularly in cross-platform and distributed environment. There are also some benchmarks showing that performance of pure OODBMS is much better than RDBMS. This is due to the new techniques, such as new caching methods, pointer swizzling, navigation via pointer links instead of performing joins of relations, shifting processing on the client side, and others.
There are however several perceived doubts concerning OODBMS that are listed below:
Despite disadvantages, the market forecast for OODBMS (in total) is optimistic. The great success of object-orientedness on the ground of programming languages (C++, Java), analysis and design tools (UML), interoperability middlewares (CORBA) and the general tendency in software illustrated in Fig.44 creates a hope that sooner or later object-oriented databases eventually conquer the market. New hopes are also created by the OMG initiative concerning developing a new 4-th generation standard for object-oriented databases.
Object-relational DBMS (ORDBMS) are based on the idea that the relational model and SQL-92 implemented in the majority of RDBMS should be extended. There are actually many ways to extend the relational model, however, the one chosen by object-relational database vendors is very relation-centric. A database still consists of a set of tables. A number of object features are provided as an extension to this core relational model (multi-row tables, references between rows, inheritance between tables, etc.).
At this time, there is no standard for the object-relational model and each vendor has own object extensions. The differences between Oracle-10, Informix Dynamic Server, UniSQL, IBM DB2 and other object-relational products are significant. They exclude portability and even a common conceptual and didactic basis. Various products present the choice of quite random, redundant, limited and sometimes inconsistent features. There is a new ANSI/ISO standard SQL-99, which is supposed to unify the systems. There are doubts whether the standard will really play an essential part in the development due the size and complexity, a lot of redundant features comming from various schools of the IT business and a lot of historical (obsolete) stereotypes that influenced the construction of the standard. Essentially, the standard does not specify semantics or specifies it very imprecisely. Moreover, the object features are just a minimal part of the standard.
Object-relational companies have demonstrated that this new technology can be implemented and that they are ahead in terms of technology. The relational vendors have financial power, the ownership of the already existing software development base, and a position as well as big authority on the market. They rely on the trust of their former customers. The pure motivation is the commercial profit rather than conceptual clarity or software engineering principles. The products are criticized as huge, complex, eclectic, thus decadent. David Maier claims that essentially nobody is using new object-oriented extensions of relational DBMS.
The key products of this technology include: Informix Dynamic Server, Postgres, Illustra, IBM’s DB2 Universal Database, Oracle-8/9/10, OSMOS, Ingres II, Sybase Adaptive Server, and others. Almost every relational vendor has announced or promised object-relational products. With except of some new systems (Postgress, Montage, Informix DS), the systems are extensions of their relational predecessors. They are equipped with facilities for efficient application development. These are: support for multimedia and Web, spatial data, ADT-s, methods defined by the programmer, collection-valued attributes, and others.
Many professionals consider ORDBMS as a temporary result of the evolution from the relational to the pure object-oriented technology. The current market, however, hardly accepts new database ideas and languages, thus it may happen that this evolution will take tens of years. Actually, it is very hard to predict where this evolution will come.
XML and RDF-oriented DBMS
Formerly, XML was promoted as "better HTML" which will be able to bear structurally the semantic information on Web resources. The information can be thus recognized and utilized by some automatic software tools. Then XML files were recognized as a way to exchange information among different software agents. In particular, XML files can be a component of data exchange protocols or as a units that parameterize and tune some software systems according to wishes of programmers or users. Recently, however, XML is also perceived as a database meta-format that makes it possible to store and to make available huge amount of information.
There is some confusion between XML considered as textual files and XML being stored within specialized XML-oriented systems, such as Tamino or Berkley/DB. XML files have little meaning as a way of massive data storage because of performance problems and lack of many features such as transactions and security support. Parsed XML that is stored in specially prepared databases have their origin in the source XML files, but essentially they are stored as hierarchical data structures that are 1:1 isomorphic with the structures of corresponding XML files. Such data structures can be maintained and processed as objects in an object-oriented database. Essentially, parsed XML data structures form an object-oriented database allowing data structures more complex than relational ones, but much less complex than fully-fledged object-oriented databases. Such an XML-oriented database system does not support many features of object-orientedness, in particular, pointer links among objects, classes and methods, inheritance, encapsulation, polymorphism, etc.
Despite big commercial buzz around, XML-oriented databases present essentially no new quality. It is difficult to imagine why the users have to restrict data structures that they observe in their business domain to pure hierarchies of objects, not supported by advanced object-oriented database notions. Looking on Fig.44, for some database applications limitation of data structures implied by XML will not be justified from the point of view of the conceptual model of the problem domain. Moreover, XML-oriented query/programming interfaces, such as XQuery, are not sufficiently mature yet and even below the state-of-the-art in comparison to SQL.
Similar remarks concern RDF (Resource Description Framework), which initially was promoted as a metaformat for description of business ontologies. Currently some professionals are trying to promote it as a universal database format. The model of RDF is fact-oriented rather than object-oriented (where "facts" are coded as triples <subject, feature, value>). Within this approach even boundaries of objects are unknown and must be deduced by programmers from some informal description. Many object-oriented features, such as classes with behavior, inheritance among classes, etc. are outside of considerations. RDF Schema facilities are much conceptually different from traditional database schemas. Query languages for RDF, such as RDQL, are below the state-of-the-art. Abstraction build in query languages, such as functions and views, are in the infant stage. We are also not optimistic concerning the support of RDF by mathematical theories, in particular, by formal logic. In our opinion, this will not work in practice (as it is already shown by the entire Datalog school). However, currently it is rather difficult to predict how this idea will evolve and what will be the gains from it.
Java and C# -oriented DBMS
In comparison to XML and RDF, Java and C# object models are much more close to some ideal database object-oriented model, but do not share all the necessary features that we have mentioned before. This concerns:
· Collections, which in Java and C# are secondary rather than primary features. In particular, nested collections have no counterparts.
· Associations implemented as pointer links are unavailable;
· More advanced forms of inheritance (multiple inheritance, multiple-aspect inheritance, dynamic inheritance) are unavailable;
· Persistence is not an inborn feature of these languages. Essentially, Java and C# persistence is based on relational databases and SQL rather then on truly object-oriented databases.
· Database schema and metamodel are not considered in these languages as features independent from particular applications acting on a database;
· Query languages that are embedded or integrated with these languages (Hibernate, LINQ) are rather limited and referring (in the deep semantics) to relational rather than to object-oriented databases. LINQ assumes straightforward mapping between C# objects and relational tables; thus SQL as a back end. This assumption involves a lot of limitations concerning data structures and the query language features. Fully-fledged object-oriented query languages such as SBQL are difficult to integrate with Java of C# programming environments, at least according to our implementation experience (ICONS, YAOD, ODRA and J-ODRA).
· Database abstractions such as views, triggers, etc.
These limitations can be considered secondary if one considers small non-critical applications. Limitations are obviously fundamental for very large databases with a critical mission, thus exclude the Java and C# persistency layers as true OODBMS.
Last modified: June 29, 2006