©Copyright by Kazimierz Subieta

Imperative Constructs in SBQL

by Kazimierz Subieta

Back to Description of SBA and SBQL.

In SBA we reconstruct query languages’ concepts from the point of view of programming languages (PLs). The approach is motivated by our belief that there is no definite border line between querying and programming; thus there should be a universal theory that uniformly covers both aspects. The design of modern and universal database PLs having querying capabilities requires methods and principles that are already acknowledged by the common practice of developing compilers and interpreters. Practically useful PLs deal with object-oriented notions (classes, methods, inheritance, etc.), procedures and functional procedures (including recursive ones), parameter passing methods, various control statements, binding strategies, scoping rules, modularization, strong typing, etc. They should follow software engineering principles such as orthogonality, modularity, minimality, universality, genericity, typing safety, and clean, precise semantics. SBA is an alternative to theoretical concepts emerging on the object-oriented wave, such as nested relational algebras, object algebras, object calculi, F-logic, comprehensions, structural recursion, monoid calculus, functional approaches, etc. Careful analysis of them and our implementation experience convinced us that all of them are limited and inadequate for this kind of query/programming languages that we intend to develop.

The SBA solution relies on adopting a run-time mechanism of PLs and introducing necessary improvements to it. The main syntactic decision is the unification of PL expressions and queries; queries remain the only kind of PL expressions. For instance, in SBA there is no conceptual difference between expressions such as  2+2  and  (x+y)*z  and queries such as   Employee where Salary = 1000  or  (Employee where Salary = (x+y)*z).Name  . All such expressions/queries can be used as arguments of imperative statements, as parameters of procedures, functions or methods and as a return from a functional procedure. This approach is unique to SBQL – other PLs that involve querying capabilities always separate queries and expressions (especially if SQL is used as a query language). Even PLs that are claimed to be “integrated” with queries, such as PL/SQL or .Net C# + LINQ, assume some subdivision between queries and expressions. We consider this as an obvious sign of the impedance mismatch - a severe plague of current database applications. We believe that currently only SBQL consequently and in 100% has removed this plaque.

SBQL queries can be embedded within statements that can change the database or program state. Concerning this issue we follow the state-of-the-art standards known from majority of programming languages. Typical imperative constructs are creating a new object, deleting an object, assigning new value to an object (updating) and inserting an object into another object. We also introduce typical control and loop statements such as if…then…else…, switch, while loops, for each iterators and others. Some peculiarities are implied by queries that may return collections; thus there are possibilities to generalize imperative constructs according to this new feature.

Treating queries as programming language’s expressions requires an essential assumption concerning the semantics of queries. There was a lot of discussion in the literature concerning what queries have to return. For instance, an SQL query returns a tables that is a value rather than a collection of references to tuples (tuple identifiers). The ODMG standard assumes that queries return objects what is an obvious nonsense (see the discussion on the closure property) or some elliptic terminology. However, for some updating operators, in particular, for assignment, deleting and inserting, queries-expressions should return references to objects (i.e. object identifiers) rather than values or objects. Therefore during the development of SBA and SBQL we have returned to the classical notions of programming languages. In SBQL from the very beginning we have assumed that queries never return objects but references to objects (in particular). This assumption is inevitable if one has to adopt queries as expressions of a programming language.

The presence of imperative constructs gives for the programmer some freedom if he/she should use queries or a sequence of programming statements. This freedom is first of all constrained by the conceptual modeling – the programmer should program his/her task as simply as it can be from his/her point of view. Usually declarative queries are much more comprehensive than sequences of commands. There is also a performance aspect: declarative queries are much more prospective for automatic query optimization than sequences of imperative statements.

In this chapter we describe in detail all the imperative constructs that we suggest to introduce in an integrated query/programming language. However, this is only a suggestion – we do not want to declare which construct can belong to SBQL and which cannot. Our goal is to develop the principle, not details, and to provide the basis for further examples presented in this book.

 

Declarations of Objects

In general, we make no difference between objects, as understood in object-oriented programming languages, and variables, as understood in classical programming languages. Both concepts are considered triples <i, n, u>, as defined in the store models AS0, AS1, AS2 and AS3, where i is an internal unique object identifier, n is an object name (not necessarily unique), and u is an object value, which can be atomic (e.g. integer value), can be a pointer (an identifier of another object) or can be a set of objects. This definition is recursive and can define objects with any number of nested objects, with many hierarchy levels. In this description we avoid to use the term variable, naming variables objects too. Some people distinguish objects and variables according to their attitude to classes: objects are members of classes, unlike variables. However this distinction is secondary looking at the semantics of SBQL: we can simply assume that variables are objects for which classes are empty (i.e. there is no methods that are associated to objects, hence creating such an empty class is omitted). Usually we assume that both objects and variables are associated with types that allow for strong compile-time type checking of the contexts in which objects/variables are used. Objects that have the nature of variables are associated with types only, while proper objects are associated with types that are invariants of classes.

Objects can be created in different environments. The most obvious environment is a local environment of the procedure or method where the given object creating instruction is put and executed. Such objects are volatile by definition, because their life is terminated when their procedures/methods are completed. Such objects are not available for any external bindings, e.g. from external procedures or methods. Objects can also be created within the module in which the given procedure/method is put and executed. By definition, such objects are shared among all the procedures or methods that are inside the given module. In other words, bodies of procedures or methods that are members of the given module can refer to all objects that are stored inside the module. Objects of a module can be also available for binding from other modules, under the condition that this is explicitly allowed by some module interface or export list (this is a form of encapsulation). In such a case usually references to objects are preceded by the module name (but this can be simplified in some cases).

Objects can also be persistent, that is, they are elements of a database. Persistent objects are also members of modules, but all their properties (values, OIDs, etc) outlive switching off the application (and computer) that executes the given module. After restoring the application, its persistent objects become available in exactly the same form and value as before the switch off.

Objects can be shared among many parallel processes. To avoid inconsistencies, access to shared objects must be disciplined by ACID transactions.  Usually it is assumed that persistent objects are always shared and volatile objects are never shared. In our view, sharing and persistence are orthogonal features. In particular, it is possible that persistent objects are not shared (they are available to only one application, hence the transactional semantics makes little sense). A typical case concerns a client application that need to store persistently some results of calculations. And v/v, volatile objects that are stored within modules can be shared, hence should follow the transactional semantics. Hence some syntactic option should distinguish persistent objects and shared objects. Depending on a persistency/sharing model this option can belong to different definitions. We avoid the concept that some classes (hence all its member objects) or types are qualified as persistent/shared. This concept makes impossible to define classes or types that are relevant to local, volatile and persistent objects, hence violates the principle of orthogonal persistence. Instead of that we can assume that proper flags (keywords) are assigned to modules rather than to classes.

Following strong typing, all objects that are to be created must be declared within a proper environment. Declaration has the form:

objectName: type [cardinality];

For instance:

x: integer;

An object named x with an integer value is declared; cardinality [1..1] is skipped.

Employee: record {

            name: string;

            salary: integer;

            job: string;

            worksIn: ref Company[0..1];

 } [0..*];

A collection of objects named Employee is declared. The collection has any number of elements. The type is record (we avoid the term structure known from C/C++) having a subobject name (of string type), a subobject salary (of integer type), a subobject job (of string type) and an optional (cardinality [0..1]) subobject worksIn being a pointer to an object named Company (keyword ref). In our convention, typing of a pointer requires the name of the object that the pointer points to rather than the type of the object (unlike majority of programming languages). This decision is caused by database-orientation: in the database schemata object names are first class citizens while types are rather second-class and at high abstraction level are omitted at all.

The programmer is responsible to put such declarations in proper places, for instance, within the body of a procedure/method, within a proper module, etc. When declaration is valid, corresponding objects can be created by special instructions. When the lower cardinality of the type of an object is 1 ( this also concerns the cardinality [1..1], usually skipped) declaration of an object is equivalent to its creation with some default value. To be ready to use such an object must be initialized by an assignment.

 

Creating Objects

Creating an object requires determining two features: the state of a new object and its location (where the object is to be created). The second feature is usually related to object persistence - a newly created object is to be persistent when it has to be stored in a persistent environment (i.e. a database or persistent module).

Creation of an object can be determined by a very simple syntax:

instruction ::= create query;

query ::= create query;

It is assumed that the query being the argument of the create instruction should return all the information that is necessary to create an object or objects, in particular, names of objects and subobjects and their values. The concept of binder that was introduced in previous chapters is a very good mean for determining that. Hence it is assumed that  the query returns a (perhaps) nested binder or several such binders. The instruction creates one or more objects according to the binders returned by the query, assuming that names of objects are determined by names of binders, values of objects are the same as the values of binders and nesting of objects exactly follows the nesting of binders. If the instruction is used as a query, it returns a references (a bag of references) of a newly created object (objects).

E.IM.1

Create an Employee object.

SBQL:

create (“Doe” as name, 1000 as salary, “analyst” as job,

            (Company where compName = “ECME”) as worksIn) as Employee; 

 

The query being the argument of create returns the nested binder Employee( struct{ name(“Doe”), salary(1000), job(“analyst”), worksIn(iECME ) } ), where iECME is the identifier of the ECME company object. This binder is used to create an object by a simple rule which assigns a new unique identifier to each (nested) binder.

Some SBQL users consider this syntax as a bit awkward in the context of creating instructions, thus one can invent a more friendly syntax (perhaps, only in this context). In the ODRA system the syntax  q as n  can be substituted by n(q) , hence the above example can also be written as:

E.IM.2

Create an Employee object.

SBQL:

create Employee( name( “Doe”),

                             salary(1000),

                             job(“analyst”),

                             worksIn( Company where compName = “ECME” )

                           ); 

 

In general, in this book we avoid the discussion on syntactic issues hence we will use the form E.IM.1 rather than E.IM.2. Both forms we consider semantically equivalent.

The create instruction must obey the strong typing system. In SBQL it is impossible to create a new object that does not conform to declared specifications. If the object is created as an element of a collection, the cardinality of the collection after creation is dynamically checked. Creating objects requires also altering the environment stack ENVS, which must be augmented by a corresponding binder (binders) in a proper stack section. If the created object is to be shared, it should follow the transactional semantics, i.e. it is locked till the given transaction will be committed. After committing, the object is to be available for other transaction thus its binder must be propagated to all current environment stacks of particular applications.

Concerning the functionality and semantics of this instruction we need to give several comments.

a)      In the query being the argument of this instruction returns several binders, the instruction creates as many objects as binders. In particular, if the query returns the empty bag, no object is created and this is not considered a failure. This rule is valid for any level of object nesting. For instance, if Company  objects are declared as:

Company: record {

            compName: string;

            location: string [1..*];

            employs: ref Employee[0..*];

 } [0..*];

then the instruction:

E.IM.3

Create a Company object.

SBQL:

create ( “ECME” as compName,

            bag(“Paris”, “London”, “Rome”) as location,

            (Employee where job = “analyst”) as employs) as Company; 

creates an object with three location subobjects and with some number of pointer objects employs leading to Employee objects.

b)      Because the order of subobjects within an object is inessential, the order of binder constructs in the query is inessential too. For instance, E.IM.4 is equivalent to E.IM.1. This rule, however, can be changed if the defined query language has features that make the order of subobjects essential (this, for instance, is a desired feature of XML query languages).

E.IM.4

Create an Employee object.

SBQL:

create ( “analyst” as job,

            (Company where compName = “ECME”) as worksIn

            “Doe” as name, 1000 as salary,) as Employee; 

 

c)      In general, a query being the argument of a create instruction can return binders with references as values. In such a case the instruction performs dereferencing (according to the specified type). The dereferencing is not performed in cases when the type specification provides a pointer object. For instance, the Employee type provides worksIn pointers and the Company type provides employs pointers (through ref keywords), hence the results of corresponding queries    Company where compName = “ECME”     and    Employee where job = “analyst”    are not dereferenced. In some cases ref keywords can be added before such queries, to explicitly show in the source code that dereferencing should not be performed and a pointer object is to be created. Using these keywords should be checked by the strong typing mechanism, to avoid programmers’ mistakes. E.IM.5 is an equivalent to E.IM.1, but there is more opportunity for strong type checking.

E.IM.5

Create an Employee object.

SBQL:

create (“Doe” as name, 1000 as salary, “analyst” as job,

        ref(Company where compName = “ECME”) as worksIn) as Employee; 

 

d)      It may also happen that a query being the argument of the create instruction will return an identifier of a complex objects, but the type does not provide that a pointer is to be created. In such a case the identifier is dereferenced according to a simple rule that maps the corresponding object into binders, by removing all object identifiers and retaining the nested structure of objects.

E.IM.6

Create a copy of the Doe’s object.

SBQL:

create (Employee where name = “Doe”) as Employee; 

Assume that the Doe’s object is represented as the following nested triples:

<i1, Employee, {<i2, name, “Doe”>, <i3, salary, 1000>, <i4, job, “analyst”>, <i5, worksIn, iACME>}>

In this case the query Employee where name = “Doe” will return i1. Then, the create instruction requires dereferencing of  i1:

deref( i1 ) = struct( name( “Doe”), salary(1000), job(“analyst”), worksIn(iACME) )

Hence such an instruction is reduced to the case presented in E.IM.1 and E.IM.2. Dereferencing of references to a complex object is a natural extension of the dereferences of references to atomic objects (which return their values).

 

Locations of Created Objects

The location of an object is precisely determined by the strong typing system, thus in majority of cases there is no need to determine its locations by special syntax. During the static analysis phase (compilation) the type of a newly created object can be determined. This type is compared with type signatures that are stored at the static environment stack (S_ENVS – it will be explained in a chapter devoted to strong typing). The comparison starts from the top of the stack to its bottom and should determine a stack section having the corresponding signature. This environment is the proper one for creating the new object. If no corresponding signature is found on the stack, the statement is incorrect and the strong typing mechanism should display a typing error. 

For instance, if the database contains specification of objects Employee and the local environment of some procedure contains a similar specification, then the instruction of creating an Employee object within this procedure will locate the object in the local environment rather than in the database.  The local environment section is higher on the environment stack than the database section, hence it will be visited by the binding mechanism earlier that the database section.

If the environment is untyped (what we consider a very disadvantageous case) then the location of created objects must be determined by special syntax. In particular, in the LOQIS system [SMA90, Subi91] the keyword create can be augmented by keywords permanent or temporary. The first one denotes the database and the second one denotes a global user environment. If no such a keyword is specified, the new object is created within a local environment of the procedure that contains the instruction. A similar solution is also used in the ODRA system, but with a bit different meaning. SBQL implemented in ODRA is strongly typed. Keywords permanent, temporal and local are used for better understanding of programs by the programmer and present additional opportunity for strong type checking. There is, of course, a lot of other syntactic decisions concerning this aspect.

Because creating and deleting objects requires actions on an environment stack (perhaps on many environment stacks, if objects are shared among many applications or processes) the organization of environment stacks should support these operations. Actually, this optimization causes a bit different concept of environment stack construction. This new concept has been implemented both in Loqis and in ODRA. In this concept binders that are to be stored at the environment stack are stored at the object store rather than a the environment stack. The stack stores references to object store environments. This is illustrated in Fig. 62 (compare Fig.3 and Fig.17). The upper part of this figure presents the conceptual view on ENVS that is used to define the semantics of SBQL. The lower part of this figure presents the optimized version, where binders in some stack sections are substituted by an identifier of the proper object store environment. In this way operators create and delete that act on a store environment need not to update all the ENVS stacks that are currently present in all user sessions or processes. In Loqis and ODRA this organization is further optimized by inserting a section with proper binders (the result of the nested function) at the beginning of each complex object.

Fig.62. Conceptual and optimized environment stack

The instruction can be augmented by an explicit clause determining where a new object is to be created. The syntax can be as follows:

instruction ::= create query within query;

query ::= create query within query;

In the construct create q1 within q2 the query q1 determines the state of the newly created object and the query q2 determines the object inside which the new object is created[1]. Note that the above instruction is some generalization of the SQL insert instruction. As previously, the second syntactic form assumes that the construct returns references to newly created objects.

E.IM.7

Create an Address object within the Doe’s object.

SBQL:

create (“Romeas city, “Boogie” as street, 88 as number) as Address

            within Employee where name = “Doe”; 

The instruction must confirm to specification of types. The above instruction is correct only when the type of Employee includes an Address subobject in the form as presented above.

In principle, the above instruction can be considered as redundant, because it can be accomplished by two instructions: create and insert:

E.IM.8

Create an Address object within the Doe’s opbject.

SBQL:

create (“Romeas city, “Boogie” as street, 88 as number) as Address;

insert Address into Employee where name = “Doe”; 

However, for two reasons this method would be inconvenient for programmer. First, it requires explicit specification of the type of object to be inserted and declaring a proper variable, which means two or more additional lines of code. Second, there could be problems with naming. For instance, declaring a local object Address could be awkward if the database would also contain objects Address – in this case binding to these database object becomes impossible from the body of the current procedure code, hence some additional tricks could be necessary.

 

Deleting Objects

In SBQL we have assumed explicit deleting, as in SQL. Some authors claim that object programming languages should not allow for explicit deleting, as such an operation is dangerous. It may cause that some pointers or references to deleted objects will become inconsistent (will became so-called dangling pointers). Hence an object is to be removed implicitly and automatically by the garbage collection mechanism in situation when the object is no more available (e.g. all pointers leading to it are nullified).

While these arguments have some value in case of lower-level languages, they are totally misleading in case of database programming languages. First of all, deleting an object has clear conceptual and business meaning. Conceptual modeling of programs is incomparably more important than any considerations concerning machine actions and features. Secondly, in the shared data environment the programmer may not enough rights and may have severe difficulties to recognize and remove/nullify all the pointers that lead to the object. There are also other arguments, related to database schema and operation on the schema. Summing up, arguments that are valid for low-level programming are totally inadequate for business-level programming that we want to support.

The SBQL syntax for deleting objects is the following:

instruction ::= delete query;

We assume that the query returns references to objects that have to be deleted. The instruction physically deletes all of them, so they are no more available for further querying. If there are pointers that lead to the deleted objects they are deleted too or nullified. In ODRA it is assumed that each pointer from object A to object B is physically coupled with (invisible) backward pointer from B to A, hence this operation can be easily performed and always leaves the database in a consistent state. The instruction makes no difference concerning which kind of object is considered and on which object hierarchy level it is located – in the same way we may delete objects, attributes, views, stored procedures, etc. The instruction must conform to the type specification. Type failures can be detected statically (during compilation), e.g. deleting a subobject name within Employee, or dynamically (during runtime), in particular, violating declared cardinalities of a collection.

 

E.IM.9

Delete all objects of analysts.

SBQL:

delete Employee where job = “analyst”; 

 

E.IM.10

Delete the Address subobject from the Doe’s object.

SBQL:

delete ( Employee where name = “Doe”).Address; 

 

There are several peculiarities with this instruction:

a)      Duplicates returned by the query should not cause failures. It may happen for various reasons that the query being the argument of the delete instruction will return duplicate references. If the object is deleted, an attempt do delete it the second time may result in failure. Note that this requirement could be difficult to reduce to the case  delete unique(query); , where the unique function removes duplicates. As will be shown a bit later, duplicates may appear due to other rules and can be tangled within complex structures. Hence each instruction should collect identifiers of objects already deleted and check each next deletion if it is necessary or not.

b)      Duplicates within pointers leading to deleted objects should not cause failures. If the list of references to objects to be deleted contains duplicates the list of pointers that lead to the deleted object contain duplicates too. If these pointers are to be removed, the program should not fail.

c)      The query may return binders with references rather than references alone. In such cases the deleting operation should not fail. It should remove objects according to their references being the values of the binders.

E.IM.11

Delete the Doe’s object.

SBQL:

delete Employee as e where e.name = “Doe”; 

d)      A query being the argument of the delete instruction may return arbitrarily complex structure with references. This feature makes it possible to organize complex (cascade) deleting within one query.

E.IM.12

Delete the ECME company together with all the analysts that work for it.

SBQL:

delete (Company where compName = “ECME”) as c

            join (c.employs.Employee where job = “analyst”) as a; 

The query will return the bag of structures: bag( struct( c(iComp1), a(iEmp1), struct( c(iComp2), a(iEmp2),… ) ). Note that in this case the query will return duplicates of c(iECME) binders, which should be handled without failures.

 

Assignment

The assignment operator can be introduced in the classical version:

instruction ::= query :=  query ;

Semantically, the assignment leftQuery := rightQuery requires that the leftQuery returns a reference to an object and the rightQuery returns a value, which will be the new value of the object. If the rightQuery returns a reference too, then the dereferencing operator is implicitly applied. The operator does not change the identifier of the object. In SQL the above operator is accomplished in the form of the update clause which also allows to make more than one update within one statement. The same effect we can achieve by combining the assignment operator with the for each operator (the example will be shown later). The SQL syntax we consider disadvantageous and not sufficiently orthogonal with other query language constructs, hence we do not use it. We also consciously avoid the use of the equality as an assignment operator (as in C/C++. Java, C#, etc.) using = as a comparison (following some hundreds years of the mathematical tradition and common elementary school teaching[2]).

In case when an object referenced by the leftQuery is specified with the cardinality [0..1], [0..*], etc. it may happen that the leftQuery will return an empty result. This causes that the programmer for all objects specified by a cardinality with lower bound 0 must check the existence of the object before the assignment and create it if it does not exist. This can be considered as an awkward feature. There is a possibility to change the semantics of the assignment operator to avoid this inconveniency, see the section Assignment to Absent Object.

The assignment operator in the above syntax cannot be macroscopic. That is, the leftQuery should return exactly one reference (if the assignment to absent object is not implemented) and the rightQuery should return exactly one value. If these queries return bags it would be impossible to determine which value is to be assigned to a particular reference. If they return sequences, the situation is not better, because it could be very difficult to assure that the sizes of sequences are the same and the orders of references and values are correct. Moreover, in many cases parts of the queries must be repeated in left and right queries. It is much easier to use the operator for each to achieve the effect of the macroscopic assignment.

In the Loqis system we also implemented the syntax

instruction ::= update query ;

with the assumption that the query returns a two-column table (a bag of two-element structures), where the first column contains references and the second column contains values. Then, for each row of this table the corresponding value is assigned to the corresponding reference. This form is a bit more powerful than the previous assignment form embedded within a for each loop. However, its syntax appears to be awkward for large queries and the advantage over the for each construct is difficult to justify. In our examples we show the use of this construct and the reader will be able to assess if it has some advantages or not.

The same assignment operator can be used to update pointer objects and to update complex objects. We illustrate these cases by examples, see the schema from Fig.63.

 

Fig.63. A database schema

 

E.IM.13

For all programmers get rise of 5% (disregard optional jobs).

SBQL:

for each Emp where job = “programmer” do { sal := sal * 1.05; }

SBQL:

update Emp where job = “programmer”. (sal , sal * 1.05);

 

E.IM.14

Increase the value of a local variable x by 1.

SBQL:

x := x + 1;

SBQL:

update x , x + 1;

 

E.IM.15

For all employees older than 40 get rise of 100 and change their job to engineer.

SBQL:

for each Emp where age > 40 do {

   sal := sal + 100;

   job := “engineer”;

};

SBQL:

update (Emp where age > 40 ). bag((sal , sal + 100), ( job , “engineer”));

The second form is typologically incorrect (regarding the current SBQL implementation), as it joins within one bag a structure of two reals and a structure of two strings. It would be possible in languages without strong typing or with heterogeneous (variant) types. However, such cases are not recommended.

Note also that both statements are incorrect if job is absent. The full and correct version of this statement could be the following:

E.IM.16

For all employees older than 40 get rise of 100 and change their job to engineer (a fully correct version).

SBQL:

for each Emp where age > 40 do {

   sal := sal + 100;

   if not exists job then create “nothing” as job;

   job := “engineer”;

};

Such an if..then statement must be inserted in every case when lower cardinality of the object type is 0. This is considered awkward, thus can be improved by altering the semantics of the assignment; see the section Assignment to Absent Object.

E.IM.17

For departments located in Paris change this location to Berlin.

SBQL:

for each (Dept.loc as dl) where dl = “Parisdo { dl := “Berlin”; }

Note that loc within Dept-s are collections and we have to update some elements of them. The assignment concerns the “auxiliary variable” dl, but the update will concern some loc subobject. In contrast to other query languages that we are aware of, SBQL auxiliary variables may bear references to objects as well.

E.IM.18

Move all programmers to the department managed by Doe.

SBQL:

for each (Emp where exists job) where job = “programmer” do {

    worksIn :=  Dept where  (boss.Emp.name)  = “Doe”; }

This statement updates pointers worksIn and employs. We have assumed (following ODRA implementation) that pointers worksIn and employs are bidirectional: updating of one of them causes automatic updating of its twin. We have also assumed that worksIn is typed as a pointer to  a Dept object, hence ref before Dept is optional (we skip it). If the query language is not strongly typed and has no bidirectional pointers, the statement is more complicated:

E.IM.19

Move all programmers to the department managed by Doe.

SBQL:

for each ((Emp where exists job) where job = “programmer”) as p do {

    create (ref Dept where  (boss.Emp.name)  = “Doe”) as dd;

    delete Dept.employs where Emp = p; //removing old employs pointers

    create ref p  as employs within dd.Dept; //new employs pointers

    p.worksIn :=  dd; //new worksIn pointers

};


A local object dd is used to store a pointer to the Doe’s department.

The statement can also be used to update complex objects. The semantics is very similar to the semantics of the create instruction. The left side of the updating instruction must return the reference to an object being update. The reference is not changed after updating. At the beginning, all subobjects of the object are deleted. Then, new subobjects are inserted according to the right side of the instruction. For a complex object it should contain a structure with binders; each binder is then changed into a subobject by generating a new unique object identifier. Names/values of binders are the names/values of created subobjects.

E.IM.20

Get new data for the employee with eno = 223344.

SBQL:

(Emp where eno = 223344) := (

    “Lee” as name, 1980 as birthYear,

    (“Londonas city, “Wall” as street, 15 as house ) as address,

    223344 as eno, “programmer” as job, 2000 as sal,

    (Dept where dno = 789) as worksIn

);

SBQL:

(Emp where eno = 223344) := (

    name( “Lee” ), birthYear( 1980 ),

    address( city( “London” ), street( “Wall” ), house( 15 ) ),

    eno(223344), job( “programmer” ), sal( 2000 ),

    worksIn( Dept where dno = 789 )

);

 

The second syntactic form uses n(q) rather than  q as n. It seems to be better for reading.

While the identifier of an object being updated is not changed after updating, this does not concern its subobjects. In ODRA we assumed that all old subobjects are cancelled and new subobjects receive new identifiers. Although some identifiers of the subobjects can be retained, in general it hay happen (due to optional and repeating subobjects) that some identifiers must be lost and some new identifiers must be generated. Semantically, it is difficult to imagine the situation in which the persistence of the identifiers of subobjects may have some meaning. However, if such situations exist, the identifiers of the subobjects can be retained as far as possible. This requires a bit more sophisticated implementation.

Our definition of the assignment operator allows also for cases when the right side query returns a reference to an objects. Such a reference is then dereferenced according to a simple rule, as follows:

·        For atomic objects <i, n, v>: deref( i ) = v;

·        For pointer objects <i1, n, i2>: deref( i1 ) =  i2;

·        For complex object <i, n, { <i1, n1, T1>,  <i2, n2, T2>,…,<ik, nk, T1> }>, where Ti is an atomic value, an identifier or a set of objects:

      deref( i ) = struct{ n1(deref(i1)), n2(deref(i2)), … nk(deref(ik))}.

Note that the definition is recursive and general. Below we present examples.

E.IM.21

Dereferencing a reference to a complex object

Object:

<i, Emp, {

    <i1, name, “Lee”>,

    <i2, birthYear, 1980>,

    <i3, address, {<i4, city, “London”>, < i5, street, “Wall” >, < i6, house, 15>}>,

    <i7, eno, 223344>,

    <i8, job, “programmer” >,

    <i9, sal, 2000>,

    <i10, worksIn,  i789>

}>

deref(i):

struct{

    name( “Lee” ),

    birthYear( 1980 ),

    address( struct{city( “London” ), street( “Wall” ), house( 15 ) }),

    eno(223344),

    job( “programmer” ),

    sal( 2000 ),

    worksIn(i789)

}

 

E.IM.22

For Nec assign all the data stored within the Doe’s object.

SBQL:

(Person where name = “Nec”) := (Person where name = “Doe”);

 

E.IM.23

Change the Poe’s address to the address of Doe (disregard optional addresses).

SBQL:

(Person where name = “Poe”).address := (Person where name = “Doe”).address;

 

In some languages additional variants of the assignment are introduced: x += y denoting x := x+y, x -= y denoting x := x –y, etc. Such shortcuts are a bit controversial, although many programmers consider them useful.

The assignment operators meets also some problems with the substitutability principle. In the assignment X := Y both X and Y can be typed by type T, but according to the substitutability X can return a reference to an object typed by some subtype T1 of T. This causes some doubts what to do with attributes that are present in T1 but absent in T. For instance, we can take the example E.IM.22, but the person “Nec” is also an employee, hence his object may have also subobjects eno, job and  worksIn (and perhaps optional subobjects sal and manages). According to our standard semantics of the assignment, in the first stage the internal of the object referenced by X is deleted, then the new internal is created, according to Y. However, because “Doe” need not to have subobjects eno, job and  worksIn, after the assignment the object referred by X cannot be typed as an Emp object. Hence the assignment has changed the type of the object, what may cause various inconsistencies in other parts of the program. For instance, the object is no more connected by a link manages, hence some Dept object has no boss subobject, what is inconsistent with its type.

Perhaps there are simple solutions of this problem. For instance, we can cancel only those subobjects that are delivered by the type of X, and leave without changes other subobjects. This means some complication of the semantics. Usually more complicated semantics is more difficult to implement and is more error prone. Perhaps some programming experience for such cases is required to propose a reliable and simple solution.

 

Inserting Objects

Insertion allows to insert an object into another object[3]. The syntax is as follows:

instruction ::= insert query into query;

instruction ::= insert query into module;

Semantically, an instruction insert q1 into q2 acts as follows. The result of q1 is a reference or a bag of references to objects (perhaps empty) and the result of q2 is a single reference to a complex object. The instruction moves the object(s) referenced by q1 inside the object referenced by q2. Objects referenced by q1 are physically moved without changing their structure and identifiers (i.e. they disappear in their old places). If q1 returns an empty bag, no action is performed and no error or exception is reported. If q2 returns an empty bag or the bag with more than one element the instruction is incorrect and causes an error or exception. Attempts to inserting an object into itself should also result in errors or exceptions. We also assume that both q1and q2 can return binders with references as values; in such cases names of the binders are ignored.

In the second syntactic form insert q into module we assume that modules can be specified by different options that are unavailable as queries. For instance, modules can possess their specific names or can be specified by keywords such as local, global, persistent, etc. The semantics of this instruction is identical as the semantics described above.

In ODRA the syntax of the instruction reminds an assignment:

instruction ::= lquery :< rquery;

where lquery ::= query, rquery ::= query; an object (objects) referenced by rquery is inserted into an object referenced by lquery.

Performing the inserting instruction should not violate type constraints, i.e. moving objects referenced by q1 should not cause violating types in their old place or violating the type of the object referenced by q2. Note that in our case types include cardinalities, which cannot be violated after performing the instruction.

The instruction should be perceived by the programmer as a physical moving of an object (objects) into another place without any change of their properties. However, this could be relaxed for cases when a volatile objects is inserted into a persistent object or v/v. Depending on the method of assigning identifiers to objects (that may depend on its persistence status) the identifier of the inserted objects and the identifiers of all its subobjects can be changed.

For the AS2 store model this instruction should allow for inserting a new role into an object.

Similarly to creating instructions, the insert instruction should leave all the environment stacks that are currently operating by running processes or threads in a consistent state. Efficient implementation of this requirement requires changes in physical stack’s organization, as presented in Fig.62.

E.IM.24

For each employee younger than 30 and having no job attribute, insert the subobject job with the value “assistant” (cf. Fig.63).

SBQL:

for each (Emp where age < 30 and not exists job) as e do {

    insert create “assistant” as job into e;  }

We follow the semantics of the create instruction which returns the reference of a newly created object. Note some typing peculiarity. Formally, because the object job is created in the current environment, we should assume that the type of the variable job is declared within it. However, because the object is created only for a moment, we can relax this requirement and skip declaring its type. The types will be finally checked after performing the entire instruction.

E.IM.25

Move the Paris location of the Sales department to the Ads department.

SBQL:

insert (Dept where dname = “Sales”). loc as p where p = “Paris

    into Dept where dname = “Ads”;

Note that the first query returns a binder named p with a reference to a loc subobject.

E.IM.26

Move all the employees older than 65 from the current module to the Retired module.

SBQL:

insert Emp where age > 65 into Retired;

Note that after the insertion all links worksIn/employs and manages/boss are not changed.

Sometimes the programmer wants to insert a copy of an object into a given object. The syntax could be as follows:

instruction ::= insert copy query into query;

instruction ::= insert copy query into module;

The copies of objects receive new unique identifiers.

E.IM.26

For all the persons with no addresses copy the address of  Doe.

SBQL:

insert copy (Emp where name = “Doe”).address

     into Emp where not exists address;

 

 

Changing Object Name

In many business applications objects can change their business role without changing their identities. A business role is frequently expressed by an object name. In terms of programming languages, change the name of a business objects requires change the name of an object or a programming variable. In majority of programming languages this is impossible because names of objects are second class citizens: they can be manipulated in a source code but cannot be manipulated during run time. The situation is different in databases, where all the names are available during run time, but usually there is no programming capabilities to change them. Additionally, changing object name can be constrained by its type.

In the current implementation of SBQL in the ODRA system all the names of objects (subobjects) are first-class citizen due to the assumed late binding of program entities. Hence there is no obstacles to change these names, according to the current business needs. This feature is so far not implemented in ODRA, but it is planned. Changing object name should be compatible with the declared types and the cardinalities of collections The syntax can be as follows:

instruction ::= rename query to query;

Semantically, rename q1 to q2 implies the following action. Query q1 returns references to objects to be renamed and query q2 returns a string that will be the new name of the objects. The new name is assigned to the objects determined by q1.

E.IM.27

Assume the database has collections Emp and SeniorEmp of the same type and class. Move all employees older than 50 to the collection SeniorEmp.

SBQL:

rename Emp where age > 50 to “SeniorEmp”;

 

Control Statements

There are several good patterns of control statements that are developed in existing programming languages and we do not want to invent something radically new. Mostly we follow Java. We present them only for completeness of the definition and for using them in examples. In different versions of SBQL we have implemented the following constructs:

instruction ::= if query then programBlock

instruction ::= if query then programBlock else programBlock

programBlock ::= { Instructions } |  { variables Instructions }

Instructions := instruction | instruction Instructions

In the statement if q then p query q should return a Boolean value. If q returns true the program block p is executed, otherwise no action is performed. In the statement  if q then p1 else p2 if query q returns true then p1 is executed, otherwise p2 is executed. A programBlock can contain declarations of local variables; their scope is limited to the block.

instruction ::= case query do labelledInstructions endcase

instruction ::= case query do labelledInstructions else programBlock endcase

labelledInstructions ::= labelBlock | labelBlock labelledInstructions

labelBlock ::= label  :  programBlock

label ::= literal

This switching command is well known from other languages. In the syntax we assume that a label must be unique literal of the type determined by the query. After the query returns some value, it is compared with all the labels in the given case statement. If a label matches the query result the corresponding programBlock is executed. If no label matches the query, then no action is performed (the first syntactic variant) or the programBlock after else is performed (the second syntactic variant). 

Other assumptions concerning labels are possible. In Loqis we implemented a variant where a label can be determined by any query (which should return an atomic value, which is to be compared with the result of the query after case). This variant gives a bit more possibilities for the programmer. However, because it is less explicit concerning the source code, it causes more opportunities for bugs.

Other group of control statements concerns organizing loops. In the current ODRA implementation these instructions follow Java, with the well-known semantics:

instruction ::= while query do programBlock

instruction ::= do programBlock while (query)

instruction ::= for(initstmnt; query; incrstmnt) do programBlock

initstmnt ::= instruction

incrstmnt ::= instruction

E.IM.28

For intervals [0, 99], [100, 199], [200, 299], … , [2000, 2099] print an interval and the number of employees earning a salary from the interval if the number is not zero.

SBQL:

for( x := 0; x <= 20; x := x+1 ) do {

    y: integer; z: integer;

    y := 100 * x;

    z := count(Emp where sal >= y and sal <= y+99);

    if z <> 0 then { print( y, y+99, z); }

}

 

For each statement

The for each statement follows the semantics of non-algebraic operators of SBQL. It operates on the environment stack ENVS in a similar way as non-algebraic operators do. The operator processes collections of unknown sizes.

instruction ::= for each query do programBlock

The semantics of the instruction for each q do p is as follows. First, query q is evaluated; it can return a single element e, a bag of elements bag{e1, e2, …} or a sequence of elements sequence{ e1, e2, …}. A singe element e is considered bag{e}. For each element e returned by q the following actions is performed:

·        nested(e) is calculated and pushed at the top of ENVS. If e is a reference to an object that is a member of class C1, which is a subclass of C2, which is a subclass of C3, etc. then the top of  ENVS contains the following sections (starting from the top): nested(e), nested(iC1), nested(iC2), nested(iC3), … . In all details the rule is the same as explained for non-algebraic operators in the models AS1, AS2 and AS3. For the AS2 model (dynamic object roles) super-roles also influence ENVS, as explained previously. For the AS3 model (encapsulation) instead of the nested function the nested_public function is applied.

·        Program p is executed within this new state of the environment stack.

·        All sections that are pushed at ENVS in the first step are popped.

If q returns a bag then the order of elements e is not determined, but all of them must be taken into account. If q returns a sequence, then the order of elements e is determined by this sequence. If an empty bag (sequence) is returned, the instruction does nothing and this case do not cause errors or exceptions.

In general, the instruction can be unsafe, e.g. when program p removes an object having a reference returned by q. In ODRA we do not prevent this inconsistency just in this place; in general, exception is risen if any operation is to be performed on non-existing object.

One can consider a syntactic variant of the for each instruction for the case when q returns exactly one element.

instruction ::= with query do programBlock

In this case the machine checks if query returns exactly one element ad causes typological error or exception otherwise. The semantics is exactly the same.

Unlike other proposals concerning for each statements, we do not introduce a special iteration variable (or variables). In SBQL such special variables are unnecessary, as they can be introduced through as or group as operators as auxiliary names with the standard semantics (see here).

The for each operator has been already exemplified in examples E.IM.13, E.IM.15, E.IM.16, E.IM.17, E.IM.18, E.IM.19, E.IM.24. Some next examples are presented below.

E.IM.29

A variant of E.IM.13 with an “iteration variable”. For all programmers get rise of 5% (disregard optional jobs).

SBQL:

for each (Emp where job = “programmer”) as e do { e.sal := e.sal * 1.05; }

 

Two “iteration variables”:

E.IM.30

Print names of employees and names of their bosses in the alphabetic order according to the names of employees.

SBQL:

for each Emp as e join (e.worksIn.Dept.boss.Emp) as b order by e.name do {

    print(“Employee ” + e.name + “ works for ” + b.name);

    newline();

}

 

E.IM.31

Increase the salary of  Doe by 100 and change its job to engineer (assume there is exactly one Doe).

SBQL:

with Emp where name = “Doe” do {

    sal := sal +100;

    if not exists job then {create “” as job;}

    job := “engineer”;

}

 

Low-level Iterators

The presented above constructs for organizing program loops, including the for each statement, are sufficient to program almost all programming cases which require iterations. However, there are tasks when these capabilities are insufficient. The literature mentions an example of such tasks, namely, merging two sorted collections into one sorted collection. Assume that the collections are named C1 and C2 and the resulting collection is C. The algorithm can be represented in a pseudocode as follows:

C := empty_sequence;

e1 := get_first_element(C1);

e2 := get_first_element(C2);

while e1 <> null or e2 <> null do {

    if e2 = null or e1 < e2 then {

        C := C + e1; //concatenate C with e1

        e1 := get_next_element(C1);

    }

    else {

        C := C + e2; //concatenate C with e2

        e2 := get_next_element(C2);

    }

}

 

The algorithm cannot be coded by two nested for each statements, because a next element to be processed can be taken from the first or from the second collection, according to the result of comparison. For coding such tasks we need functions such as get_first_element, get_next_element and similar. In programming languages such functions are collected into a construct called iterator. An iterator is a data structure that stores an iterator state (e.g. a reference to a processed collection and a reference to a currently processed object) and some (usually small) collection of functions such as getFirst, getNext, getPrior, etc. If an iterator is to be applied to a result of a query, its state must contain this result. Except SQL, no programming language implements so general iterators. Usually iterators traverse stored collections, but generalization of them to collections derived by queries or views seems not to be a challenging task.

Low-level iterators imply a similar problem as for each statements in case when a collection being traversed by a iterator is augmented or reduced in its loop. Such modifications must be implemented with an extreme care, to avoid the situation when some element is to be processed two times or not processed at all.

In SQL such iterators are known as cursors. A big disadvantage of SQL cursors is that their names are global for the application, what makes difficulties e.g. with nesting iterators within functions that can be called from many parts of an application program (in particular, recursive functions). This can also be considered as a sign of impedance mismatch, because SQL interpreter (by definition) ignores the environment stack determining the scope for names[4]. In integrated languages such as SBQL names and states of iterators should follow the stack-based discipline, similarly to names of all other local data structures in the stack-based approach. There are many good patterns of iterators in existing programming languages, e.g. Java, and we do not want to contribute to this state-of-the-art. In this chapter we give no syntax and semantics of iterators, considering them as further development of SBQL. 

Other Issues

There are several other issues related to imperative statements in SBQL. They are well recognized in other object-oriented programming languages and are orthogonal to the stack-based approach that we want to explain on these pages. In ODRA we implemented exceptions taking the syntax and semantics from Java. Recently we have also implemented a prototype version of templates, taking the idea from C++. Implementation of other features are considered.


Last modified: February 23, 2010



[1] In the ODRA system this instruction has the syntax remaining the assignment operator:

lQuery :<< name(rQuery);

The left side determines where a new object must be created and the right side determines the state of the new object.

[2] In SQL the character = is used as an equality and as an assignment in a single update statement. Such an overloading of the operator is obviously error-prone.

[3] Note that the insert clause in SQL inserts new tuples into a table, hence from the conceptual point of view it corresponds to the create instruction explained above. SQL has no an inserting clause as introduced in this section.

[4] For this reason in the DBPL project [MRSS92] the author was forced to organize an own explicit stack of cursors that exactly duplicates the actions of  the environment stack of Modula-2 (the programming language for DBPL in which SQL clauses were implicitly nested).