AS1 Store Model: Classes and Inheritance
by Kazimierz Subieta (March 2006)
Back to Description of SBA and SBQL.
Back to Abstract Object Store Models
Back to AS0 Store Model
The concept of class is an abstraction in thinking and programming which intention is to capture both static properties of objects (i.e. their structure) and dynamic properties of objects (i.e. operations that can be performed on objects or by objects). The definition of a class addresses human minds (i.e. it supports conceptual modeling) and software engines, i.e. it allows to maintain correctly software-side data structures that are called objects. The conceptual modeling role of a class (emphasized, in particular, in UML) is very important, but has secondary meaning for the semantics of query languages. Hence we do not discuss it. We consider class as a programming entity that is associated somehow with the definition and maintenance of the data structures called objects.
In this role the concept of class has two different meanings. The first one (popular among theoreticians) has its origin in mathematics and says that from the semantic point of view a class is a set of objects (c.f. abstraction classes implied by an equivalence relation). This is wrong definition for classes understood as software units, because it does not reflects important properties of classes, such as methods. All known object-oriented programming languages, standards and systems assume (usually implicitly) another definition of the class concept, which says that:
Usually, object-oriented literature distinguishes two kinds of invariants that are stored within classes: typing information (i.e. names of object’s attributes together with their types) and methods (operations) that can be fired on objects (together with typing information concerning methods’ parameter and output). This is not obligatory. For instance, Smalltalk objects have no types, hence classes in Smalltalk contain no typing information. On the other hand, some object-based models do not involve methods. We can also present other invariant kinds that are stored within classes. In particular, CORBA IDL interfaces may also specify exceptions and reactions on exceptions, ODMG interfaces and classes may contain additionally relationships among objects, extents and keys. Some database models as an object invariant stored within a class include also the objects’ name.
A class that is most specific for an object inherits more general invariants from its super-classes. For instance, a class FirstYearStudentClass inherits more general invariants from the class StudentClass, and yet more general invariants from the class PersonClass.
As follows from the discussion, we consider classes and types as two different semantic beings, with different roles. Typing information is necessary to strong static type checking of queries or programs acting on objects and/or for dynamic checking of objects’ structure. Classes may contain types as a particular kind of invariants, but in principle this is not obligatory (although desirable). Classes contain all the invariants that can be factored out as a common part of objects’ semantics, in particular, methods, objects’ name, etc.
Typing information is also the main component of an object interface, but again types and interfaces are different programming beings, with different semantic roles.
Again, interfaces can also be defined without typing information, although the mix of typing information with interfaces we consider desirable.
Also classes and interfaces are different programming beings which should not be confused (as e.g. in the ODMG standard). Classes contain implementation, while interfaces are specifications only. Perhaps the most fundamental difference between classes and interfaces is that classes can be the subject of trade (they can be sold and bought), while interfaces cannot. This subdivision is independent on keywords that are used in a particular artifact. In the ODMG standard no classes can be specified; both keywords class and interface denote interfaces and the semantic difference between the concepts that the ODMG suggests is artificial and invented.
Interfaces are important as a pragmatic part of a query language, but essentially they have little significant for the semantics of query languages. More precisely, typing information that is a component of an interface has some influence on the semantics, but not the major one; we return to this issue later. The AS1 store model that we intend to define will not be associated with types and interfaces. Interfaces, however, is a usual way to deal with encapsulation; we return to this issue when we will consider the AS3 store model. Types and schemata (as more difficult features that usually expected) will be introduced much later.
Concerning classes, we can distinguish two forms of them:
· Classes that are parts of a source text file prepared by the programmer in some text editor. In some languages and systems (e.g. C++) this is the only form in which the classes exist. No concept of a class exists in the run-time environment; after compilation a class looses its identity, it is a part of an executable code and cannot be identified by any programming means. In such cases we will say that classes have the second-class citizenship: they exist in the source code, but they cannot be identified or manipulated during run-time.
· Classes are run-time entities that can be identified and manipulated (e.g. tested, bound, created, removed or altered) during run time. Such a class must possess its identity on the same principle as the identity for objects.
If only the first form exists, the binding of all names referring to properties of a class and occurring in a query must be done during the compilation time, i.e. a query must be compiled and linked together with the compilation and linking of a class it refers to. However, this is contrary to a basic property of queries, which in many cases must be created and interpreted during run-time. For instance, one can create and execute an ad-hoc query during operation of the database, when the whole program containing classes is already compiled and is currently executed. Because queries occur within client applications rather than within database servers, the second class citizenship means that classes are not properties of a database, which is contrary to the data independence principle.
Hence, concerning the semantics of a query language, the second class form is essential. Of course, the first form nevertheless must exist - the programmer determines classes within a source text file, including source codes of implementations of methods. After compilation such a class is converted into the second form, which is then used by the query engine.
In the AS1 store model with deal with classes in the second form only. A class is an object recorded in an object store. We associate with this object a special meaning and operations, but it will be clear after we define semantics of query operators.
ODMG essentially assumes no class stored as an object on the side of the database server. The standard presents only the first form of the class/interface representation; the same concerns the meta-model of the database (which is introduced informally and extremely obscurely). Absence of an object representing a class during run-time causes that binding of the properties of classes (e.g. names of methods) within OQL queries (which are run-time rather than compile time entities) has unknown addressee. Hence, in our opinion, the ODMG standard violates the assumptions of typical programming languages’ early binding mechanisms. Practically, this means that binding methods in OQL is non-implementable for majority of cases.
In AS1 an object store is defined as a five-tuple <S, C, R, CC, SC>, where:
· S is a set of (perhaps nested and linked) objects, as in AS0.
· C is a set of classes. Classes are objects too.
· R is a set of identifiers of root (start) objects, as in AS0. Usually we assume that identifiers of classes are not among root identifiers.
· Relation CC Ì IC × IC determines inheritance among classes. IC Ì I denotes identifiers of classes. If <i1, i2> Î CC, then the class identified by i1 inherits from the class identified by i2. The relation CC should not contain cycles.
· Relation SC Ì IS × IC determines membership of objects in classes. IS Ì I denotes identifiers of objects which are not classes. If <i1, i2> Î SC, then the object identified by i1 is a member of the class identified by i2.
Each invariant stored within a class should be decorated by a flag determining its kind (a method, an object name, an export list, a trigger, etc.) but in our examples we skip these flags treating them as self-evident.
Note that the AS1 model is a superset of the AS0 model. We do not require that each object must belong to a class. It makes a sense to establish classes only in cases when they store some non-trivial invariant of a population of objects. In no such an invariant can be established, determining a class makes no sense because it does not change anything in the semantics.
In Fig.8 and Fig.9 we present an example of an AS1 object store.
Fig.8. Example of an AS1 object store
As before, in the graphical representation the identifiers of root objects are within circles. Identifiers of classes are not among root objects, hence we assume that queries cannot directly refer to classes. For some purposes, e.g. administration of the store, we can imagine that class objects such PersonClass and EmpClass can be manipulated, e.g. removed or altered. Under this assumption, identifiers of them should belong to root identifiers, but perhaps only in the special administrative mode. An arrow with a big white triangle end denotes inheritance (CC) and thick gray arrows denote membership of objects within classes (SC). Classes contain methods, together with their compiled implementation. Methods are understood as procedures with some specific scoping rules; this will be explained later. To simplify the picture in this representation we do not present formal parameters of the methods; this feature will also be introduce later. Note that this is an abstract view; the relations CC and SC can be implemented physically in many ways. For instance, the SC relation can be implemented by containers storing objects belonging to particular classes. So far we also say nothing about how the relations CC and SC will be used by the query execution engine; this will be considered later. Our intention is to define the abstract store in which such relations can be recorded.
The AS1 store model covers also multiple inheritance and the possibility that one object is a direct member of more than one class. We allow for pairs <i, i1>, <i, i2> Î CC such that i1 ≠ i2; similarly for SC. Such situations can be handled by the defined query engine with no difficulties. There are cases when multiple inheritance and multiple membership are reasonable, thus we do not forbid them.
The AS1 model is an abstraction over the most popular models of object-oriented programming languages, modeling tools and database systems. It allows for accomplishing the substitutability principle. The principle is quite easy to implement through a proper name binding algorithm within the query execution engine. Although the model seems to be simple and natural, it leads to problems concerning, in particular, multiple inheritance and repeating inheritance. In particular, if we assume that classes A and B are developed independently, they may contain methods having the same name and type and we define a class C that inherits from A and B, then one of the two fundamental principles of object-orientedness - the substitutability principle or the open-close principle - must be violated. The AS1 store model has also severe disadvantages as a database model, because (as we have argued before) substitutability is in contradiction with the concept of collections of objects and the open-close principle. For these reasons we introduce the AS2 store model, which is the cure for all the conceptual shortcomings of AS1.
Last modified: December 31, 2007