©Copyright by Kazimierz Subieta.

SBQL - Algebraic Operators

by Kazimierz Subieta (February 2006)

Back to Description of SBA and SBQL.

In the description of the query evaluation procedure eval we have presented the operational semantics of typical algebraic operators that we would like to introduce in SBQL. We do not fix which operators belong to SBQL and which do not belong. We assume that SBQL contains all operators which can be necessary to illustrate some topics by examples and all operators that developers of some SBQL interpreter wish to introduce. There are myriads of algebraic operators, thus fixing some particular subset of them is not reasonable. All of them can be defined and implemented on the same semantic and implementation basis. Moreover, after implementing a sufficiently rich set of operators, next ones can be implemented in quarters or hours. Determining the set of operators that belong to SBQL is of curse important if some organization wish to have SBQL as a standard. So far such a goal looks as very distant (if real at all).

There is a subtle difference between operators that are build in the language and operators that are added on. Build in operators are the property of a particular interpreter. Added on operators are possible if an interpreter accepts external libraries (of procedures, functions or classes) in some general way, e.g. as dynamically linked functions a la MS Windows dll-s. Because eventually we extend SBQL to fully-fledged programming language, it is also possible to have some added on functionalities written in SBQL (in particular, procedures, functions, classes/methods and views). The operators can perform complex computations, for instance, statistical or data mining functions. In general, we are not interested in how complex is the implementation of a particular build in or added on operator.

To simplify the explanation, besides the description of algebraic operators in the form of the eval procedure we will use a more abstract (actually, denotational) view on semantics by introducing the function result. We consciously take the same function name as the name of the domain Result because of the similarity of the concepts. The function result abstracts from the QRES stack. It returns the final result of the query being its argument. In particular, we can take the following semantic rules:

·         If is a symbol denoting a unary algebraic operator, then result( q1 ) = ( result( q1 )). As before, is the operator denoted by the symbol .

·         If is a symbol denoting a binary algebraic operator, then result( q1 q2 ) = ( result( q1 ), result( q2 ) ).

·         And so on.

Note that the function result is implicitly parameterized by the state, i.e. by the object store and ENVS. At this level of abstraction we need not to talk about the stack QRES at all. However, this is only some didactic facility; formal semantics is to be given by the procedure eval.

In general, we assume that all algebraic operators have no side effects, i.e. they are pure functions. Side effects may concern, for instance, updating operations in a database, some visual effects on the user screen, some writes to files, etc. Side effects of operators occurring in query languages usually undermine query optimization and frequently are difficult to understand by the programmers. Nevertheless, operators with side effects could be necessary for many applications.  However, their use must be disciplined. This will be possible in imperative programming extensions of SBQL.

Operators and Comparisons for Primitive Types

·         We assume generic and overloaded comparison operator = (and its complement ≠) for all the primitive types that we introduce. The operators have obvious meaning. Examples: 2 = 2, xy,  sal ≠ 2000, name = “Doe”, etc. We assume that operators = and ≠ can also act on complex values (e.g. structures, sets, collections), however with the caution that their meaning must be univocally defined. Operators may also act on object references, but the comparison is shallow, e.g. i1 = i2 returns true if i1 and i2 refer to the same object. We do not assume any form of a deep comparison, i.e. comparison of objects that the argument references refer to. (Sometimes, however, deep comparisons may be enforced by the strong typing system and automatic dereferences, but this is another story that we explain elsewhere.) The operators are undefined for arguments of different types, for instance (real) 3.14 = (int) 3 is illegal. Such comparisons, however, will be possible due to automatic coercions, which will be introduced a bit later.

·         For primitive types having natural linear ordering we introduce generic and overloaded comparison operators <, ≤, >, ≥. These operators we can apply to integer numbers, real numbers (float), strings (assuming some predefined lexicographical order, perhaps corresponding to the current character coding table), dates and times (assuming the linear time axis), etc. Operators are undefined (not allowed) for arguments of different atomic types and are undefined for complex values and identifiers. Examples: 2 < 3, xy,  sal ≥ 2000, name > “Doe”, etc.

·         For primitive numerical types we introduce typical arithmetic operators +, -, *, /. Operator - is unary and binary. Their meaning is typical.

·         We assume several string-oriented operators and comparisons, such as the concatenation of strings (the overloaded + operator), is_a_substring and is_a_superstring. We also provide parameterized string operators, such as taking a substring from i-th to j-th character (perhaps in the C/C++ convention), some functions, such as changing strings to capital letters, counting the size of a string, etc.

·         We assume several build in functions acting on numerical values, such as the integer part of a real number, rounding a real to integer, square root, logarithm, power, etc. The syntax and meaning e.g. as in Pascal.

·         We also provide some functions for dates, for instance, change the date into the number of days since 1900.01.01 and v/v, changing data formats, changing the time to the number of mili-seconds, etc.

The above operators and comparisons cause some temptation of the developers of query language interpreters to extend their usual meaning to arguments being collections. For instance, one wants to define result( bag(3, 7) + bag(1, 4, 5)) = bag{ 4, 7, 8, 8, 11, 12 }. This philosophy resulted in SQL in special comparison operators, such as =any and =all. Taking such extensions of popular operators with commonly understood semantics is however rather dangerous. In many cases the programmer may have problems with understanding what they actually mean. They also weaken the strong typing system.

Thus we discourage such extensions of their semantics. Collections are to be processed by other operators, in particular, quantifiers. The commonly understood operators, such as =, ≠, <, ≤, >, ≥, +, -, *, /, should not extend their meaning. Using of them should be disallowed for arguments being collections of primitive types. SBQL provides options that make it possible to achieve the desired results without such extensions of the semantics of these operators.

Aggregate Functions, Removing Duplicates

Aggregate functions act on arguments being collections (bags or sequences) of integer or real numbers. For empty collections the functions are undefined (not allowed). We assume the typical set of numeric aggregate functions known from SQL:

·         sum - the sum of argument numbers; for instance, result( sum( bag(3, 6, 7) ) ) = 16,  sum( Emp.sal ) calculates the total sum of employees’ salaries.

·         avg - the arithmetic average of argument numbers, for instance, avg( bag(3, 6, 7) ) returns 5.333...,  avg( Emp.sal ) calculates the average of employees’ salaries.

·         max - the maximal number from the argument collection.

·         min - the minimal number from the argument collection.

Several next similar functions can be useful for some applications; for instance, median, geometric average, variance, etc.

This set is usually augmented by:

·         count - the size on any collection; the function is defined for any kind of a collection, any type of its elements and for empty collections.

·         exists - test for existence; it returns true if the argument collection is not empty and false if it is empty.

·         distinct - the function removing duplicates from the argument collection, for instance, result( distinct( bag(3, 6, 3, 3, 6, 7) ) ) = bag{3, 6, 7}. For sequences distinct returns the sequence where each next repetition of an element is removed, for instance, result( distinct( sequence(7, 7, 3, 6, 3, 3, 6) ) ) = sequence{7, 3, 6}.

In SQL functions sum, avg, max, min and count have two semantic faces. If they are not under the scope of the group by clause, they have the usual semantics, as presented above. If they are under the scope, they calculate their values for groups determined by the group by clause and in this semantic sense can be used within the select and having clauses. Moreover, SQL assumes quite unusual (apparently more friendly) syntax, where names of these functions are under the select clause; for instance, to calculate the average salary of employees one has to write    select avg( sal ) from Emp    rather than more logical formulation    avg( select sal from Emp )   . (This syntax is also an option in ODMG OQL.)

In SBA and SBQL we reject such doubtful syntactic and semantic inventions. No function from the above list will have two different semantic faces. The regular syntax of their use will not be violated. Moreover, we do not introduce the group by operator at all, for its severe disadvantages, such as non-orthogonality, semantic reefs that it implies and implementation-oriented flavor that makes difficulty for query optimization (we discuss this later). All the pragmatic goals that can be achieved through the group by clauses can be achieved in SBQL through other operators, in easier, more logical and more consistent way. We will show that on examples. Lack of the group by clause causes that specific problems with a special conditional clause (having), non-orthogonality, semantic reefs and query optimization no more exist.

In SQL there is also semantic schizophrenia concerning null values. Despite big words on the meaning of null values and special constructs for serving them in queries, aggregate functions sum, avg, max, min ignore them totally. It is not quite clear how they are treated by count, exists and distinct. The group by clause collects all null values in a single group, what makes quite a pragmatically ill situation for aggregate functions. As before, we avoid this schizophrenia in SBQL simply by assumption that our store models and query languages do not introduce null values at all. Instead of null values we propose to deal with options, i.e. collections having zero or one element (in other words, cardinalities [0..1]). We discuss this issue in the section devoted to processing irregular data.

Operators and Comparisons on Collections

These operators come from the set theory, which introduce such operators as union of sets, intersection of  sets, difference of sets, cartesian product, set containment (, ), membership of an element in a set (operator ), and so on. Actually, we have no sets as data structures, but bags and sequences, hence we have to adopt these set-theoretic operators to a bit different situation. We have already introduced some of these operators, in particular:

·         struct( A, B) - the constructor of structures, which in the SQL jargon is known as cartesian product. If query A returns bag{a1, a2, ...} and query B returns bag{b1, b2, ...}, then the query struct( A, B) returns bag{ struct{ a1, b1}, struct{ a1, b2},..., struct{ a2, b1}, struct{ a2, b2}, ...}. If A returns an individual element a, and B returns an individual element b, then struct( A, B) returns an individual element struct{a, b}. In all other cases individual elements are converted to one-element bags. Our syntax allows us to drop struct and to write simply (A, B). We assume that struct{ a, struct{b, c} } is equivalent to struct{ struct{ a, b}, c } } and is equivalent to struct{ a, b, c }. We also assume that a structure with one element is equivalent to this element: struct{ a } = a. Similar properties are then induced on the level of the query language semantics: in particular, query struct( struct( A, B), C ) is equivalent to query  struct(A, struct(B, C )) and is equivalent to query struct( A, B, C ). If A returns an individual element, then struct(A) is equivalent to A.  As  follows from the above, struct(A, B, C, D, E,...) is a shorthand for composition of several binary struct operators: struct(A, struct(B, struct( C, struct( D, struct( E,...))))). If A or B returns an empty bag, then struct( A, B) returns an empty bag. It is quite unclear what struct( A, B) has to return if A and/or B return sequences; hence we propose to leave such cases undefined (i.e., to forbid them). Unlike SQL, we do not limit the cases when our “cartesian product” can occur - it may occur in any place of a query if it makes sense for the programmer and the strong typing is not violated.

·         bag( A, B) - the constructor of bags, which in the SQL jargon is known as union of tables (relations). If query A returns bag{a1, a2, ...} and query B returns bag{b1, b2, ...}, then the query bag( A, B) returns bag{ a1, a2, ..., b1, b2, ...}. If A or B returns an individual element, it is treated as a one-element bag. We assume that bag{ a, bag{b, c} } is equivalent to bag{ bag{ a, b}, c } } and is equivalent to bag{ a, b, c }. We also assume that a bag with one element is equivalent to this element: bag{ a } = a; our strong typing system will be prepared to handle such cases correctly. Similar properties are then induced on the level of the query language semantics: in particular, query bag(bag( A, B), C ) is equivalent to queries  bag(A, bag(B, C )) and bag( A, B, C ). If A returns an individual element, then bag(A) is equivalent to A.  As before, bag(A, B, C, D, E,...) is a shorthand for composition of several binary bag operators: bag(A, bag(B, bag( C, bag( D, bag( E,...))))). If A or B returns an empty bag, then it does not influence the result bag. It is quite unclear what bag( A, B) has to return if A and/or B return sequences; hence (as for struct) we propose to leave such cases undefined (to forbid them). As for struct, the operator may occur in any place of a query if it makes sense for the programmer and the strong typing is not violated.

·         sequence( A, B) - the constructor of sequences, known as concatenation of sequences. If query A returns sequence{a1, a2, ...} and query B returns sequence{b1, b2, ...}, then result( sequence( A, B) ) = sequence{ a1, a2, ..., b1, b2, ...}. If A or B returns an individual element, it is treated as a one-element sequence. We assume that sequence{ a, sequence{b, c} } is equivalent to sequence{ sequence{ a, b}, c } } and to sequence{ a, b, c }. We also assume that a sequence with one element is equivalent to this element: sequence{ a } = a; as for bags, our strong typing system will be prepared to such cases. Similar properties are then induced on the level of the query language semantics: in particular, query sequence( sequence( A, B), C ) is equivalent to queries  sequence( A, sequence( B, C ) ) and sequence( A, B, C ). If A returns an individual element, then sequence(A) is equivalent to A.  As  before, sequence(A, B, C, D, E,...) is a shorthand for composition of several binary sequence operators: sequence( A, sequence( B, sequence( C, sequence( D, sequence( E,...))))). If A or B returns an empty sequence, then it does not influence the result sequence. It is quite unclear what sequence( A, B) has to return if A and/or B return bags; hence (as for struct and bag) we propose to leave such cases undefined (to forbid them). As for struct and bag, the operator may occur in any place of a query if it makes sense for the programmer and the strong typing is not violated.

Examples of application of struct, bag and sequence are the following:

(Emp, Dept) where name = dname          (the query is equivalent to    struct(Emp, Dept) where name = dname   )

(Emp where name = “Poe”).( sal, worksIn)      (the query is equivalent to    (Emp where name = “Poe”).struct(sal, worksIn)   )

struct( name(“Kim”), sal(3000), worksIn( Dept where dname = “Trade”) )

bag( (Emp where sal < 1000) , ((Dept where dname = “Trade”).employs.Emp) )

sequence(“one”, “two”, “three”, “four”)

Besides these operators, we can consider to introduce next ones:

·         in - bag or sequence inclusion operator. A query A in B returns true if each element of the bag that is returned by A can be found in the bag returned by B. For sequences, the query A in B returns true if the sequence returned by A is a sub-sequence of the sequence returned by B. It is quite unclear what A in B has to return if A returns a bag and B returns a sequence (or v/v); hence we propose to leave such cases undefined (i.e., to forbid them).  If A returns an empty bag or an empty sequence, the query A in B returns true. If A returns an individual element, it is treated as a one element bag or one element sequence (depending on what returns B).

·         contains -  a reverse bag or sequence inclusion operator; A contains B is equivalent to B in A.

·         intersect - makes a sense for bags only. A query A intersect B returns a bag consisting of elements that occur both in the bag retuned by A and in the bag returned by B. Because a bag can contain the same element x many times, the result of A intersect B contains the minimal number of x from the numbers of occurrences of x in A and in B. If A or B returns an individual element, it is treated as one-element bag.

·         subtract - difference of bags. A query A subtract B returns a bag of elements that occur in the bag returned A and do not occur in the bag returned by B. If element x occurs in the result of  A k1 times, and in the result of B k2 times, then the result A subtract B contains x if k1 - k2 > 0 and the number of occurrences of x in the result is k1 - k2.

·         i-th element of a sequence. We denote this operator query1[query2], where query1 returns a sequence, and query2 returns an integer number. Unlike C, C++, Java and ODMG, we propose to return to the oldest tradition where the first element of a sequence is indexed by the number 1 (and not by 0). For instance, result( sequence(7, 6, 3)[2] ) = 6. Note that the operator makes no sense for bags.

·         Elements of a sequence from i-th to j-th. Similarly as before, we denote this operator query1[query2.. query3], where query1 returns a sequence, query2 returns an lower index and query3 returns an upper index of the sequence elements that are taken as the result. For instance, result( sequence(7, 6, 3,11, 5, 6, 7, 34, 3)[4..7] ) = sequence{11, 5, 6, 7}

There are some less important operators, such as proper sub-bags and sub-sequences, continuous sub-sequences, cutting heads and tails of sequences, i-th element from the end of a sequence, taking a random single element from a bag, etc. We can introduce them if there will be some significant applications.

Coercions and Dereferences

Coercions are functions that change the types and representation of values. In a lot of cases coercions are implicit, to avoid annoying style of programming. For instance, if x is an integer number and y is a real number, then in the query x + y the value returned by x is automatically coerced to a real number. Similarly, in the query Emp.(name + “ earns “ + sal) the operator + is recognized as concatenation of strings, hence integer values returned by sal are implicitly coerced to strings. The presence of implicit coercions is sometimes called ad hoc polymorphism.

Another kind of implicit coercions concerns changing bags or sequences into single elements, and v/v. For instance, in SQL a select clause can occur within a where clause, but in this case the result of the select clause is automatically coerced to an individual value. Such coercions are typical for SBQL, which unifies a structure, bag or sequence having one element x and this element x. For instance, one can use a query (get employees earning more than Kim):

Emp where sal > ((Emp where name = “Kim”).sal)

The query assumes implicit coercion of the bag returned by    (Emp where name = “Kim”).sal    into a single value (of the Kim’s salary). If this is impossible, because the company does not employ a person named Kim, or the company employs more than one person named Kim, the dynamic type check will return a typing error (exception).

A consistent and universal handling of implicit coercions requires a static strong typing system, which during the type checks of a query recognizes the necessity of a coercion, inserts a proper explicit coercion operator into the query syntactic tree and instantly augments the tree with some parts performing dynamic type checks. In this way the static (compile-time) strong typing system delegates corresponding type checks to run-time. For instance (see Fig.2), considering the query    Dept where location = “Paris   (where location returns a bag), the syntactic tree of this query is augmented by a part performing the check if indeed location returns a single value; this part returns a type error if the bag contains two or more elements. We explain this issue in detail when we will come to the specification of the SBQL typing system.

Another commonly assumed kind of implicit coercions are implicit dereferences. Assuming an object <i, n, v>, the dereference of i returns v. For instance (see Fig.2), for the query    Emp where name = “Poe”    the subquery name returns a reference; it is automatically dereferenced to a value of the object pointed to by this reference. This is the job of the static strong type checker, which must properly change the syntactic tree of a query, see Fig.27. Usually, the implicit dereference concerns values of primitive types. However, dereferences of identifiers of complex objects can be also consistently defined; we explain this issue later.

For the programmers convenience coercions and dereferences can be explicitly introduced as algebraic operators. The typical syntactic convention for that is known from languages such as C, C++, Java, etc. as cast. We use this convention for run-time explicit conversions of types. Casts will be written in the syntax

query ::= (typeName) query

Note however that in contrast to C, where arbitrary casts are allowed (and this is rather a compile-time facility), we assume that our cast are run-time conversions, thus can be allowed or disallowed. Sometimes they may cause compile time-typing errors. For instance (int)Emp is disallowed and will be discovered as a type error. In some cases coercions may cause a run-time error. For instance, the query (int) X, where X is of type string, causes conversion of the string stored at X into integer, but this is of course not always possible.

We also use casts with collection constructors. For instance,    (sequence) Emp    changes the bag returned by Emp into a sequence with an arbitrary order of elements. Similarly, we can use    (bag) query   to convert a sequence into a bag; the operator has no effect if the query returns bag.

In the following (in the AS1, AS2 and AS3 models) such casts will also be used to convert an object into an object of a more specific or a more general class. For instance,   (Student) (Emp where sal > 2000)    converts references to objects of employees earning more than 2000 into references to Student objects, but this conversion rejects such Emp objects that are not at the same time Student objects.

In some cases there is also a need for explicit dereference operators. This will be possible by the operator deref that takes a single references or a collection of references and then returns a corresponding value or a collection of values. For instance (see Fig.2):

result( Emp.name ) = bag{i2, i6, i10}

result( deref(Emp.name) ) = bag{ “Doe”, “Poe”, “Lee” }

Conditional Queries

The semantics of conditional queries is rather obvious:

result( if q1 then q2 ) = if result(q1) = true then result(q2) else bag{}

result( if q1 then q2 else q3) = if result(q1) = true then result(q2) else result(q3)


if count(Emp) < 10 then Emp.name else (bag)((sequence)Emp.name)[1..10])

As in Oracle, we can also consider a more general case of conditional queries that take the form of a switch operator.

Defining Auxiliary Names

Auxiliary names appear as a property of query languages in connection with the relational calculus, considered a mathematical basis for relational query languages. The relational calculus introduced such names as “variables” iterating over relations (tuple relational calculus) or over columns (domain relational calculus). The variables can be then used in selection conditions, in a clause forming the output from a query, etc. Similar variables occur in predicate calculus and in lambda calculus. QUEL, one of the first relational query languages, makes the use of such “tuple variables” obligatory. Query-By-Example is considered a language based on domain relational calculus.

SQL introduces auxiliary names, but their use is not obligatory. They may appear in a from clause after names of relations. Sometimes they are called “synonyms” of relation names, but this is improper association, because in where clauses and in select clauses they behave just as variables ranging over relations. In other terminology they are called “correlation variables” or “correlation names”. For instance, assume the relational schema (worksIn is a foreign key for d#, boss is a foreign key for e#):

Emp(e#, name, job, sal, worksIn)

Dept(d#, dname, location, boss)

We can formulate the request “get designers earning more than their bosses” as the following SQL query:

select x.name from Emp x, Dept y, Emp z

where x.job = ‘designer’ and x.worksIn = y.d#

and y.boss = z.e# and x.sal > z.sal

This query defines three correlation names x, y and z. The use of x and z cannot be avoided because both refer to the same table Emp. However, the use of such variables in SQL is not obligatory and one can formulate the same query without the variable y:

select x.name from Emp x, Dept, Emp z

where x.job = ‘designer’ and x.worksIn = Dept.d#

and Dept.boss = z.e# and x.sal > z.sal

Sometimes this is explained by the rule that each relation named R has by definition the correlation name named R.

This idea of auxiliary naming in SQL seems to be quite semantically clear. However, in other query languages the situation is not obvious. For instance, such variables are necessary in quantified predicates as “variables bound by quantifiers”, in for each iterators as a variable iterating over some calculated domain, in group by clauses, in order by clauses, etc. Many contexts when such “variables” may appear and complex data structures that are to be served by such variables cause the necessity to explain their semantics on the more general ground that is done in the relational calculus.

An example of difficulties with correct definition of the semantics of such auxiliary names is supplied by the ODMG standard. Consider the following OQL query, p.104 of [ODMG00]:

select * from Students as x, x.takes as y, y.taught_by as z

where z.rank = "full professor"

The query defines three names x, y, z, considered “iteration variables”. According to the scope rules (p. 112), their scope is limited to this query (they have no meaning outside it). But it appears (p. 105) that the type returned by this query is:

bag<struct (x:Students, y:Section, z:Professor)>

The semantic qualification of these names is changed: instead of being “iteration variables” they become structure field labels. Because the output of the query can be used in another (outer) query, the names can be used outside the query where they have been defined (which is inconsistent with the assumed scope rules).

This example rises very fundamental questions:

(1)   Which formal model makes it possible to change iteration variables into structure field labels? Clearly, the relational calculus does not provide such situations.

(2)   What is the real scope for x, y, z?

(3)   How the semantics can be correctly formalized? How it can be generally and consistently implemented?

(4)   How this situation can be correctly statically typed?

The ODMG OQL grammar introduces at least 7 different contexts where such auxiliary naming can be used. Till now, the formal semantics of all these contexts is non-existent (explained by toy examples rather than by formal specification). An attempt to define the semantics of one of these contexts (on the level of static typing only) obviously failed, as shown above.

Indeed, SBQL is the first query language where the semantics of such auxiliary naming is generally and correctly defined. The semantic definition is very simple, but has no precedents in any mathematical theory, in programming languages and in former query languages. This idea was implemented in the Loqis system in 1988, at least five year earlier than the ODMG proposed the first version of the object-oriented database standard. Our solution of the problem requires the notion of binder (introduced previously) that occurs only in SBA and SBQL. The semantics covers static strong typing (as will be shown later) and is fully consistent with the introduced two stack apparatus. Moreover, it makes no special situation for query optimization methods.

In our attempts to formalize the semantics of auxiliary naming we have rejected the relational calculus as not sufficiently precise for all the contexts where such naming can occur. Our idea was to integrate smoothly auxiliary naming with the stack-based idea. The lighthouse that we would like to follow was the following:

Each name occurring in a query or in a program must be subordinated to the same scoping and binding rules that are accomplished by the environment stack mechanism. This also concerns all auxiliary names defined inside queries.

After taking this tenet, the questions that we would like to solve were the following:

·         How definitions of auxiliary names have to be internally represented?

·         Where the definitions have to be stored?

·         How the definitions have to be used when a particular auxiliary name occurring in a query has to be bound?

·         When the definition of an auxiliary name has to be removed, assuming it is no more valid?

There is a long history of several ideas that we tried to follow. In the Linda system we have assumed that the definition of an auxiliary name in a query is equivalent to the declaration of a local programming variable. Unfortunately, this was inconsistent; in particular, makes no answer to the last question.

The answer for all the above questions was invented in 1988, during the implementation of the Loqis system. The solution appears to be simple, universal (covering all the cases of auxiliary naming), very easy to implement, and free from disadvantages. The solution can be very surprising for all the fans of the relational calculus, because it totally rejects the concept of “variable”.

Defining an auxiliary name within a query is a unary algebraic operator parameterized by the name.

The operator sticks the name with each element of its argument (i.e. it makes binders).

Following OQL we assume the syntax:

query ::= query as name

As usual in SBQL, such a query has the semantics independent from other operators. It is as follows:

·         Let query q returns bag{a1, a2, a3,...}.          Then, the query   q as n     returns    bag{n(a1),  n(a2), n(a3),...}.

·         Let query q returns sequence{a1, a2, a3,...}. Then, the query   q as n     returns    sequence{n(a1),  n(a2), n(a3),...}.

·         Let query q returns an individual element a. Then, the query   q as n     returns    n(a).

As previously, the operator may occur in any place of a query if it makes sense for the programmer and the strong typing is not violated. In particular, the operator can be nested. For instance, if q returns an individual element a, then   (q as n1) as n2    returns a nested binder    n2(n1(a))   . As will be shown, nesting of this operator is important for creating complex values (nested structures). There is no constraint concerning the argument query q. For instance,   2 as two    is a legal query that returns the binder two(2).

We will show in the following that this simple operator in connection with other SBQL operators presents a very powerful and universal facility to deal with almost all the contexts when auxiliary names for queries or sub-queries are needed. This concerns the following cases:

  • “Iteration variables” (tuple, domain, collection “variables”) that occur e.g. in a from clause of SQL and OQL; for instance    Dept as d   . The programmer can use such a variable in any place where he/she wants to; unlike OQL and SQL, this is not limited to any construct of SBQL. In particular, the construct can be used to define dependent joins as in OQL, for instance, below is an SBQL query that is the equivalent of the above OQL query:

·         (Student as x join (x.takes.Lecture) as y join ( y.taught_by.Professor) as z) where z.rank = "full professor"

  • In the query we have skipped some parentheses assuming that operator as is stronger than join, dot is stronger than where and =, and operators join are executed from left to right.
  • “Variables bound by quantifiers”: as we will see, SBQL does not force to use such variables. If necessary, such a “variable” can be established by the operator as; for instance   Emp as e (e.sal > 10000)   .
  • Labels of elements of a structure(s) created by a query, for instance,    Emp. ( name as n, sal as s )   . Such application of the operator can be used to construct a nested complex value,. For instance, the query  
  • ( ”Lee” as name, 900 as sal, (“Romeas city, “Boogie” as street, 13 as house) as address  ) as Emp
  • returns the nested binder    Emp( struct{ name(“Lee”), sal(900), address( struct{city(“Rome”), street(“Boogie”), house(13) } ) } )     .
  • “Cursor” in for each statements. As for quantifiers, such a cursor is not obligatory in SBQL (as will be shown later). If necessary, such a “cursor” can be established by the operator as; for instance,   for each Emp as e do e.sal := e.sal +100;    .
  • Defining new attributes for virtual views; the examples will be shown later, when we will come to the theme.
  • Some other SBQL contexts that so far have no counterparts in other query languages; for instance, overloading views.

One of the SBQL contexts of auxiliary naming is not covered by the operator as. Thus we have introduced in SBQL another naming operator that we denote group as. As before, group as is a unary algebraic operator parameterized by a name. The definition of the group as operator is even simpler that the definition of the as operator.

We assume the syntax:

query ::= query group as name

As usual in SBQL, such a query has the semantics independent from other operators. Semantics is the following:

result(  q group as n  ) = n(result( q))

The query   q group as n    returns a single binder named n, which wraps the entire result of the query q. For instance, if result( Emp ) = bag{i1, i5, i9} then result( Emp group as e ) = n( bag{i1, i5, i9} ). The operator is necessary to define complex values that include named collections treated as single elements. It has also some meaning in conceptual modeling of queries, as it allows one to subdivide a complex query into several simpler queries (in a way that is different from the define clause of OQL). There is come conceptual intersection with the group by operator of SQL and OQL. In general, however, pragmatic contexts (practical applications) of using the group by operator and our group as operator are totally different. As we noted before, in SBQL we do not introduce the group by operator because it is not necessary and causes far more problems than it is worth. An example of the use:

(Emp where sal > 4000) group as FatGuys .

( (FatGuys where job = “programmer”) group as FatProgs , (FatGuys where job = “manager”) group as FatMgrs ).

bag(FatProgs.( “Programmer ” + name + “ is well paid”), FatMgrs.( “Manager ” + name + “ has good salary”) )

Last modified: December 31, 2007