©Copyright by Kazimierz Subieta..

SBQL for the AS1 Store Model

by Kazimierz Subieta

Back to Description of SBA and SBQL.

Back to the AS1 store model.

 

In the AS1 store model we have introduced a set of objects C representing classes, a relation CC representing inheritance among classes and a relation SC representing the membership of objects in classes. In this section we specify how these new features can be involved into the mechanism of query evaluation based on SBA. In fact, the corresponding extensions of semantics and implementation mechanisms are simple and natural. We have to consider possible corrections to the features of SBA that we have introduced to the model AS0, namely:

·        The domain Result of results returned by queries;

·        The construction of the ENVS stack;

·        Rules for binding names on the ENVS stack;

·        The construction of QRES stack;

·        Function nested;

·        Procedure eval for evaluation of queries.

Among six presented features only the last one is affected by the change of the store model from AS0 into AS1. The idea is that for a non-algebraic SBQL operator that acts on an object that is a member of some classes the ENVS stack is augmented not only by the internal features of this object, but also by the internal features of all the classes that the object belongs to. The internal features of an object and of its classes are calculated by the same function nested and pushed on ENVS in the proper order, which reflects a psychological or ergonomic view of the programmer: the object being processed is most close to his/her imagination, then he/she considers its direct class, then its superclass, etc. In this way invariants of the object that are stored within its classes would become available for binding during processing of the object by a non-algebraic operator. The situation on ENVS during processing an object by a non-algebraic operator is presented in Fig.32.

Fig.32. A fragment of an object and class diagram and the state of ENVS during processing of an object by a non-algebraic operator

For instance, consider the situation presented in Fig.9 and assume that we have a query:

(Emp where name = “Poe”) . (name, netSal, age)

In Fig.33 we present the situation on ENVS when the dot operator processes the Poe’s object. In the subquery after the dot name is bound on the top section of the stack (binders to internal properties of the Poe’s object), netSal is bound in the section below the top (binders to internal properties of the EmpClass), and age is bound in the third section from the top (binders to internal properties of the PersonClass).

Fig.33. State of ENVS during processing the Poe’s object by the dot operator

Let us to assume that a function allClasses(e) acts on any query result. For e being a single object identifier it returns sequence{ c1, c2, ..., ck } of identifiers of the classes such that:

<e, ck> SC, <ck, ck-1> CC, ... , <c3, c2> CC, <c2, c1> CC

In other words, the function allClasses(e) returns the identifiers of all the classes that the object identified by e is a member, in the order from the most general class c1 till the most specific class ck. For e being struct{ i1, i2, ..., it} we assume that:

allClasses(e) = allClasses(i1) allClasses(i2) ... allClasses(it)

where is a concatenation of sequences. For all other forms of e the function allClasses(e) returns the empty result. So far we do not consider multiple inheritance and the situation when classes identified by e = struct{ i1, i2, ..., it} are in some conflict. We discuss the cases a bit later.

The corresponding part of the procedure eval that processes a non-algebraic operator can be described as follows.

procedure eval( q : query ) // change for the AS1 store model

begin

.......

case q is recognized as q1 θ q2: // the query q consists of a non- algebraic operator θ joining subqueries q1 and q2

begin

partialResults: bag of Result;

partialResult, finalResult, e: Result;

 

partialResults := Ø; //empty bag

eval( q1 ); //Evaluation of the first query; the result is at the top of QRES

for each e in top( QRES ) do

begin

for each c in allClasses (e) do

push( ENVS, nested(c)); //pushing internals of classes at ENVS

push( ENVS, nested(e) ); //new section at ENVS

eval( q2 ); //Evaluation of the second query; the result is at the top of QRES

partialResult := partialResultOfθ (e, top(QRES) );

partialResults := partialResults { partialResult }; // new partialResult included to the partialResults bag

pop( QRES ); //removing the result(q2) from QRES

pop( ENVS ); //removing the top section with nested(e) from ENVS

for each c in allClasses (e) do

pop( ENVS ); //removing class sections from ENVS

end;

finalResult := mergePartialResultsθ( partialResults ); // forming the final result

pop( QRES ); //removing the result(q1) from QRES

push( QRES, finalResult); //the final result of the entire query is pushed at the top of QRES

end

.......

end

Old parts of the procedure are in blue, the new parts are in black. These simple changes, plus the mechanism of invoking methods (see below), are the only amendments to the SBQL query engine. However, in the following we will show that the AS1 model implies difficulties with multiple inheritance and with collections, hence some next decisions and improvements can be necessary.

Our policy of pushing internals of classes on ENVS automatically supports the property known as overriding. If for instance a method income is defined for both classes, PersonClass and EmpClass, then in the query:

Emp where income > 2000

the name income is bound to the income method that is a property of the EmpClass; the income method within the PersonClass is invisible (is overridden). The policy in general supports the substitutability principle. In particular, if in a correct query having name Person one substitutes this name by Emp, the query is still correct; for instance:

Person where age > 20

and

Emp where age > 20

are both correct queries. Substitutability, however, leads to some semantic problems with collections and with updating operations. We will return to this issue later.


Invoking Methods in AS1

Till now we have assumed that binding a name is simply a search within ENVS for a proper binder (binders) and then, return its value (values). In case of binding names of methods this action is extended by invoking the method. So far we are unable to present all the semantics related to invocation of methods; this will be done in the section devoted to imperative (procedural) abstractions of SBQL. Below we roughly explain what should happen when the name m of a method is to be bound:

·        Before the binding of the method its parameters are evaluated and stored at QRES.

·        The name m is bound on ENVS as previously.

·        After the binding the method m is invoked. The invocation requires a new section that is pushed at the top of ENVS (so-called activation record).

·        The new section contains binders to the local environment of the method m, which consists of binders to the actual method’s parameters (taken from QRES) and binders to local method’s objects. Local objects and (sometimes) values of parameters are stored in a special (volatile) part of the object store. The section contains also so-called return track, i.e. information on the place of the program where the method was invoked.

·        The program control goes to the method body.

·        When the control leaves the method body, the mechanism performs the following actions:

         If the method is terminated by a “return q” command, where q is a query, result(q) is pushed on QRES as usual.

         The volatile part of the object store dedicated to this method is cleaned.

         The return track is recorded, and then the new section is removed from ENVS.

         The control goes to the method calling program according to the return track.

The action is a bit more complex if the class that the method m belongs to is subdivided into public and private parts; we return to this issue when we consider the AS3 store model. Note also that during executing of the method’s body a part of the ENVS stack should be invisible due to the lexical scope rules; this issue is also discussed later.

For instance, let us to consider a class MyClass that contains a method MyMethod having the following code:

method MyMethod( p1: T1, p2: T2): T {

x1: T3; x2: T4;

method body;

return q;

};

MyMethod has two parameters: p1 of type T1 and p2 of type T2, and returns the result (determined by query q) of type T. MyMethod declares two local objects x1 of type T3 and x2 of type T4. Let MyMethod be invoked in a query:

q0 where MyMethod(q1, q2) = 1000

where q0 is a query returning identifiers of objects being members of the class MyClass, q1 and q2 are actual parameters of MyMethod, which is called in the context of the non-algebraic operator where (the same reasoning is for other non-algebraic operators and for any other query involving MyMethod after such an operator). Let iMC denotes the identifier of the MyClass object and let result(q0) = bag{r1, r2, ...}. Fig.34 shows some states of ENVS (above the time line) and QRES (below the time line).

Fig.34. Evaluation of a method

The above picture will be discussed in the section devoted to imperative abstractions, where we will consider methods of parameter passing and general scope rules.


Multiple Inheritance

The definition of the AS1 store model allows for multiple inheritance among classes and multiple membership of objects in classes. The multiple membership can be reduced to multiple inheritance, hence we do not consider this case separately. If we allow that a class can inherits from two or more super-classes, in general it is possible that there will be a conflict. Let class C inherits from A and B. If A contains a method named m and B contain a method named m, then the class C can inherit only one of the methods; the other one will be invisible. In the AS1 model there is no cure for such a situation if we assume the open-close principle; i.e. when classes A and B are developed independently, e.g. by different external companies and their source codes are unavailable. In this case the class C, which inherits both from A and B, but one of the methods cannot be inherited, violates the substitutability principle. For instance, let C inherits the method m from A, but not inherits the method m from B. In such a case objects of the class C cannot be considered as objects of the class B. Some proposals allow in such a case for changing (virtually) during inheritance the name of the method m for the class C; for instance, the method m that is not inherited from B during inheritance receives a virtual name mB. This option solves the problem only partially, when a given object is considered as a member of C. When an object of the class C is considered as an object of the class B (due to substitutability) it still requires the method m, which will be taken from A rather than from B.

Despite this anomaly, multiple inheritance is a very useful option for conceptual modeling, hence it is not a good strategy to forbid it. In particular, multiple inheritance makes it possible to involve abstract classes, i.e. classes having no members, including so-called mixin classes, i.e. classes storing invariants of many different classes. The model with multiple inheritance is introduced in many practical proposals, including programming languages such as C++, the CORBA standard, the ODMG standard, etc. Smalltalk and Java, however, do not introduce multiple inheritance, what can be considered disadvantageous for conceptual modeling. In programming languages such as C++ multiple inheritance leads also to some problems with physical representation of objects.

In our case the multiple inheritance can be involved into the definition of the function allClasses, which returns the sequence of class identifiers for a given object (objects), in the reverse order. In case of multiple inheritance there are many such possible orders and the algorithm for calculating the function must choose one of them. It is not important which of the orders is to be taken, because in the case of conflicts between names of methods no good order exists.

A problem with multiple inheritance is the result of mixing in one environment properties coming from different independently developed classes. Such a mix must sometimes lead to naming conflicts. The problem has a radical solution in the AS2 store model, which allows one to substitute multiple inheritance by dynamic object roles and dynamic inheritance.

A problem similar to multiple inheritance appears when the function allClasses(e) acts on e = struct{ i1, i2, ..., it}, where i1, i2, ..., it are identifiers of objects being perhaps members of different classes. In this case the conflict between names of methods occurring in these classes is also possible. This time, however, the conflict can be explicitly removed by the programmer by using proper SBQL options, for instance, by auxiliary naming.


Collections in the AS1 Store Model

The biggest problem with the AS1 model concerns collections. The problem is recognized, in particular, in the ODMG standard. There is a conflict between substitutability and collections. According to substitutability, each Emp object is also a Person object. However, in the AS1 model each object has one name hence it belongs to one collection. For instance, if some non-algebraic operator processes Person objects, one can expect that it automatically will take also all Emp objects residing in the same environment (e.g. a database). However, the standard binding mechanism disallows for that. For example, (c.f. Fig.9) the query:

Person where age < 20

will take into account the Doe’s object only. Poe and Kim will not be processed, because the name of corresponding objects is Emp rather than Person.

One can consider in such a case that the query can be formulated as follow:

bag(Person, Emp) where age < 20

Such a solution is not only awkward, it still does not solve the problem due to the open-close principle. Assume that one writes a PersonCollectionClass for the PersonClass, implementing all the required methods. Then, much later someone defines a specialized class EmpClass class that inherits from PersonClass. After that, some amendments must be introduced to the PersonCollectionClass and the class must be recompiled. This can be impossible if the source code of the PersonCollectionClass is unavailable. Anyway, the open-close principle is violated.

There are several solutions of this problem, but all of them require some additional assumptions concerning the AS1 model and some changes to the binding mechanism. In particular, we can assume that each object has as many external names as classes it belongs to. For instance, (c.f. Fig.9), we can assume that the Poe’s and Kim’s objects have additionally the external name Person. The binding mechanism is modified in such a way that whenever a binder Emp( oid ) is put on ENVS, the binder Person( oid ) is put on ENVS too. For instance, the bottom section of ENVS presented at Fig.33 will have the binders:

Person(i1), Person(i5), Person (i9), ... , Emp(i5), Emp(i9), ...

The strong typing mechanism should assure in such cases that specific properties of Emp cannot be used when the object is regarded as Person.

Alternatively, we can assume that each concrete class contains a name of its members as an invariant. This method does not require assigning many names for an object. For instance, the PersonClass requires that all the members have the name Person, and the EmpClass requires that all the members have the name Emp. The binding mechanism is modified in such a way that, for instance, when the name Person is bound first the class containing such name as an invariant is identified, PersonClass in this case. Then, according to the CC relation, all classes that inherit from the PersonClass are identified, in particular, the EmpClass. Then, the bind function is extended to bindAll function that takes all invariant names of objects from all the subclasses of the given class. For instance, it holds:

bindAll( Person ) = bind( Person ) È bind( Emp )

bindAll( Emp ) = bind( Emp )

In further examples concerning SBQL for the AS1 model we assume this method. Classes with invariant names of their member objects are implicitly assumed in UML class diagrams and in the ODMG standard.

Such a binding strategy solves the problem, but the requirement that each concrete class must possess an invariant name for all its members may introduce limitations for some applications. As some cure for this inconvenience, we can assume the possibility that a specialized class may “override” this invariant, in particular, it may drop it. For instance, an EmpClass having Emp as the invariant object name can be specialized by AnyEmpClass, which inherits everything from EmpClass but the invariant object name. Alternatively, we can assume that the name of an object is determined by its class, but the programmer can change it for a concrete object (after such a change the object is not seen as a member of an extent of the class, although inherits from it all the properties). Another solution is that an object name is determined by its class, but the programmer can assign to the object an arbitrary alias. Yet another solution is that an invariant name determined by a class concerns only persistent (database) objects and do not concern volatile objects being local to a user session or belonging to local environments of procedures or methods. There are perhaps other solutions. Without experiments made for real applications is hard to say which is the best for the further programmers.

In AS1 there is also a problem with typing of pointer objects. In the tradition of programming languages, a pointer object is specified by the type of an object that it points too. The tradition of databases is that a pointer link is specified by the name of an object that the pointer link leads to. Such an assumption is necessary if one wants to specify precisely a database schema, which (as we have discussed previously) is an inherent pragmatic part of any query language. The assumption is explicitly taken by the ODMG standard and implicitly by a lot of other proposals aiming specification of query languages. Trying to make a coincidence of these two ideas we can say that a pointer link should be specified both by the type of an object it leads to and by the name of the object. Assuming that an object name is an invariant for a class we can reconcile both ideas with no such pain: pointer objects will be specified by the name of the class of an object that the pointer leads too, and the class will contain both the type of the object and its invariant name. More detailed discussion on typing issues for SBQL we present later.

The AS1 has also other disadvantages, for instance, lack of multi-aspect inheritance, lack of repeating inheritance, etc. As in the case of the problem with multiple inheritance, the radical solution to all the problems with the AS1 store model is the AS2 store model.


Examples of SBQL Queries for the AS1 Store Model

Fig.35 presents an object-oriented database schema as an UML-like class diagram. Note that cardinalities are assigned not only to relationships, but to any object declaration, to any attribute, etc. Lack of cardinalities means the default cardinality [1..1]. Relationships such as worksIn/employs should be understood as declarations of pointer objects. Address is a complex attribute (which cannot be directly expressed in UML). Person, Emp and Dept are invariant names of the member objects for the corresponding classes. Binding of name from a super-class means implies binding all names from its sub-classes; for instance, binding of Person implies additional binding of Emp. The method budget returns the annual budget of a department.

Fig.35. A database schema according to the AS1 store model

 

E.AS1.1

Get names of departments and the average age of their employees (inheritance of the method age).

SBQL:

Dept . struct(dname, avg(employs.Emp.age() ) )

SBQL:

Dept . (dname, avg(employs.Emp.age))

SBQL:

Dept . (dname, avg(employs.Emp.age) as avgAge)

In general, methods having no parameters can be written without parentheses, for instance, age is an equivalent to age(). The last query returns a bag of structures struct{iDept, avgAge(some_real) } consisting of the identifier of a department iDept and a binder with name avgAge and some real value.

 

E.AS1.2

Get name, the net salary and the boss name for programmers working in departments located in “Paris” (inheritance of name).

SBQL:

(Dept where $ loc as x (x = “Paris”))

join (employs.Emp where “programmer” in job).

(name as name, netSal as netSal, (boss.Emp.name) as boss )

The query returns a bag of structures having three binders: struct{ name(iname1), netSal(some_real), boss(iname2) }.

 

E.AS1.3

Get employees that for sure live in the cities where their departments are located (inheritance of Address).

SBQL:

Emp where $ Address as a ($ (worksIn.Dept.loc) as l (a.city = l))

 

E.AS1.4

For each employee get the name and the percent of the annual budget of his/her department that is consumed by his/her salary.

SBQL:

Emp . (name as n, (((if exists(sal) then sal else 0) as s).

((s * 12 * 100)/(worksIn.Dept.budget)) as percentOf Budget)

 

E.AS1.5

Imperative construct: for each person having no salary give the minimal salary in his/her department.

SBQL:

for each (Emp where not exists(sal)) as e do

e.changeSal( min(e.works_in.Dept.employs.Emp.sal) );

In this example we have assumed that the method changeSal inserts a subobject sal with a proper value into the given Emp object. We also assume that for each department there is at least one employee having the salary. Note that the program is not optimal - it counts the minimal salary for each employee having no salary. More optimal program should calculate the minimal salaries for proper departments, then get it to proper employees, for instance:

SBQL:

for each (Dept where $ employs.Emp (not exists(sal)) as d join

min(d.employs.Emp.sal) as minSal do

for each (d.Employs.Emp where not exists(sal)) as e do

e.changeSal( minSal );

This manual optimization can be substituted by an automatic optimization of the first SBQL program, but developing a proper algorithm could be a challenge.


Last modified: December 31, 2007