©Copyright by Kazimierz Subieta.

Abstract Object Store Models

by Kazimierz Subieta

Back to Description of SBA and SBQL

As we have argued in the section devoted to syntax, semantics and pragmatics of query languages, we have to define data structures stored in an object database in an abstract way, but with the algorithmic precision. Our definition should be sufficiently universal to cover all features that can occur in object-oriented databases and XML repositories.

However current object models tend to be very complex, with many non-intuitive notions. Moreover, each object-oriented standard, programming language or database management system introduces an own object model, with very specific concepts that sound similar, but frequently have totally incompatible meaning. This concerns such popular concepts as class, inheritance, interface, type, and others. Nowadays XML technologies also contribute to this complexity. Although basically the XML model is not object-oriented, it introduces complex hierarchical objects, with a lot of own notions, especially in RDF, in XML Schema, in RDF Schema and in the family of technologies called Web Services. The ODMG standard for object-oriented databases involves many notions such as objects, literals, types, sub-types, interfaces, classes, inheritance, methods, overriding, polymorphism, collections (various kinds), structures, relationships, operations, exceptions, and others. The SQL-99 standard is even more complex, because it involves similar concepts and additionally mixes them with nested relations having a lot of peculiarities, abstract data types (ADT-s) and other features.

Unfortunately for current technologies (and fortunately for SBA) the majority of this complexity is caused by secondary features and lack of attempts to generalize, simplify the ideas, and to avoid redundant notions. For instance, one can introduce both classes and ADT-s, but conceptually the notions are the same. Similarly, the ODMG standard introduces both sets and bags as collections, but sets could be removed as a particular case of bags. If we assume the object relativism principle, then there is no need to distinguish objects and attributes. There are more such redundancies which stem from various streams of research and development, and some historical dust around the object-oriented notions that has been collected for years.

Now it is the time for cleaning the dust. For description of semantics of query languages the complexity of the underlying data model is a very negative factor. A consequence of the complexity of the object model is the complexity of a query language concerning its syntax, semantics and pragmatics. Complexity of semantics implies much more difficult implementation and optimization. In particular, due to complexity of SQL-99 many professional doubt if it is entirely implementable. Optimization of queries and programs in SQL-99 will be very challenging and in many cases impossible due to chaotic language design decisions and unknown interaction between various data structures and language’s constructs. Complexity of pragmatics leads to long documentation, extensive user manuals, long training time, long application development time and more chances for errors.

Complex semantics is also more difficult for keeping consistency. According to the conceptual closure principle, each feature of an object model must be reflected in syntax, semantics and pragmatics of a language addressing the model. The precise semantics of the language requires defining all states according to the model (the set State). The complexity of the object model causes the complexity of the set State and consequence, the complexity of definition of semantics. This leads to more difficulties during formal analysis of the semantics, decreasing the potential for query optimization, much more challenging strong type checking, and much more difficult the control over completeness and mutual interaction between different constructs of the language. A complex object model causes also the “metamodel management nightmare” (after Won Kim), that can be observed e.g. in the ODMG standard[1].

Currently, the commercial world neglects or ignores the problem of the complexity of the object model and its influence on the complexity of semantics and pragmatics. The claims that for SQL-99 or ODMG OQL one can easily build a formal model are not justified at all; they belong to the marketing offices liars’ game rather than to honest and technically sound assertions. Languages are designed with no care about minimality and consistency of introduced constructs. All the commercial proposals of standards are underspecified. Holes in the specifications cause that implementations of the proposals cannot be compatible. These circumstances, together with ad hoc extensions introduced by software manufacturers, to a big extent undermine the sense of the standards.

For these reasons there is necessity to simplify object models by developing such abstractions over them that cover all the required features introduced in practical languages by minimal set of notions. We remind that just the simplicity of the relational model was the source of its success, because make it possible to reason about various properties and language constructs in intuitive and formal ways.

In contrast to the relational model, object models must be more complex for better conceptual modeling capabilities (what is just the essence of the object-orientedness). It is difficult to crate a single model that would be at the same time simple and covering all the features of object models. There are also some didactic reasons: a lot of features can be explained on a very simple (but still quite universal) model, and then, next and next features can be added by generalization of this simple model. For these reasons SBA is based not on a single object store model, but on the family of models that are enumerated AS0, AS1, AS2 and AS3[2]; a model with a higher number introduces more sophisticated features. The list of models is of course open - there are a lot of possibilities to make variants of them or introduce new features. However, the list is complete in this sense that - according to the best of our knowledge - there is no feature or notion of currently used or proposed in practice object languages and systems that would be not covered by some of these store models. All the store models AS0, AS1, AS2 and AS3 are based on the same few formal primitives.

The basic features of the introduced store models are the following:

This family of models is rich enough to have a hope that some next conceptual feature of object-oriented artifacts will not create a new quality for semantics of defined query/programming languages. The most important is to understand the assumptions of SBA and the SBQL semantics for the simplest AS0 model. After that, it is quite easy and natural to extend the semantics to higher-order store models.

We have to warn the readers that our store models are not the same as data models. Store models are formal constructs and have almost nothing in common with concepts, ideological constraints and rhetoric, mathematical decorations, beliefs and stereotypes that are usually associated with data models. A store model is simply an abstract view on data structured stored in the database (and in other media) and is orthogonal to any ideologies such as the relational model or XML. Lack of a formally precise store model causes that the definition of semantics will be always vague and obscure; not clear for the developers and programmers of a query language engine. In our opinion, this is the case of ODMG and SQL-99 standards, which do not present the formal store model for objects explicitly, but explain them informally by other (also sometimes obscure) notions, such as types, classes, ADTs, etc.

In all the introduced store models we use the same three elementary notions:

Last modified: December 31, 2007

[1] However, we disagree with Won Kim that this can be an argument of favor of object-relational models. In our opinion, the SQL-99 object model leads to the much more painful “metamodel management nightmare”.

[2] Originally, AS0, AS1, AS2 and AS3 were denoted M0, M1, M2 and M3, correspondingly. The change was caused by a clash with the notation used by OMG for different UML (meta) models.