SBQL - Non-Algebraic Operators

Back to Description of SBA and SBQL.

The essence of the Stack-Based Approach and SBQL is the concept of non-algebraic operators. We distinguish several non-algebraic operators: selection, projection/navigation, dependent/navigational join, quantifiers, ordering and (several kinds of) transitive closures. All non-algebraic operators are binary, i.e. they connect two queries. Despite syntactic similarity to algebraic operators, their semantics cannot be reduced to some algebra similar to the relational algebra or object algebras.

This assertion may look strange and unbelievably for the readers who have been convinced for decades that selection, projection and join are ordinary algebraic operators in the relational algebra. Such a treatment of these operators, however, was possible under severe constraints on their power and by assuming some not quite fair formal trick that affects their semantics. Our conclusion is even stronger: this unfair formal trick has caused that the research on query languages has gone into wrong direction and has severely impeded the progress in the domain.

The trick consists in some mix of a formal theory and its meta-language. Consider the expression of the relational algebra:

σsal>1000( Emp )

It consists of the selection operator σ qualified by the predicate sal > 1000. Strictly speaking, the selection operator is not a single operator, but the infinite family of operators having as many elements as selection predicates. The expression sal > 1000 does not belong to the language of the algebra. This is a meta-language expression determining which selection operator from the infinite family has to be chosen. This expression is informal, is out of mathematics, essentially this is informal comment that is used to explain the topic.

Hence, at least half of the expression above is fake mathematics. The operator σ and Emp are first-class citizens, they are a part of the mathematical theory. The attribute name sal, operator < and constant 1000 are second-class citizens, they are a part of the informal meta-language, essentially a comment. Semantics of the second part cannot be expressed in terms of the relational theory and there is no theory which deals with it. Of course, we know the argument that the meta-language could be formalized too. But in this way we would obtain a new formal system that would be quite different from the relational algebra. (This can be done in many ways and SBA is actually one of them.) In the pure relational algebra even the natural join operator cannot be expressed, because names of attributes are not properties of the theory. Of course, we also know the argument that indices are “so simple” that there is no room for ambiguities. We reject this argument because in mathematics nothing is obvious, everything must come from axioms and every even smallest formal semantic problem is a big problem.

This especially concerns so-called “object-algebras”, where simple indices may take the form as in the above expression:

σNetSal(sal +100)>1000( Emp )

The index takes a complex form, with own semantics and pragmatics, but everything is fake mathematics: indices are comments rather than formulas.

A similar conclusion holds for all the other operators of the relational algebra. The above arguments can be repeated for the relational calculus and approaches based on formal logic. All these theoretical concepts subdivide the semantics of query languages into two worlds: (1) the semantics with strong mathematical treatment, and (2) the semantics which can be explained by informal comments only. If formal semantics of a query language such as SQL (claimed to be based on the relational algebra or relational calculus) concerns only a part of this language, the semantics of the entire query language is informal. Taking the current state of SQL, we can estimate that at most 10% of its functionality can be explained formally by the relational algebra or calculus. The next 90% is outside of any theory.

This subdivision of the concepts into formal and informal we have considered frustrating and unacceptable, especially for more sophisticated data models such as object-oriented and XML-oriented models. Moreover, the subdivision is completely unnecessary. It is caused by some religious attitude to the relational algebra and other lame database theories. It is not a big problem to develop another mathematical theory that consistently covers all the issues. The denotational semantics of programming languages is an excellent example of such a theory.

But actually we have lost our belief that the mathematical method is good for specification of any practical language. We have already presented the fundamental disadvantages of mathematics as a method of specification of semantics of practical artifacts. We follow our formal specification method based on operational semantics or abstract implementation. Our basic assumptions essentially distinguish our approach from the relational algebra and its object-oriented counterparts:

In SBQL no operator is indexed by an informal meta-language expression.

In SBQL each data name occurring in a query has the same semantic treatment.

There is no subdivision on first-class and second class names (operators), as in the relational algebra.

In this way we avoid the subdivision of the SBQL semantics into formal and informal parts. All operators, including all algebraic operators such as + and <, and operators such as selection, projection and join, are on the same level of the semantic description. We also avoid subdivision of names occurring in queries into first class and second class citizens (as in the above relational algebra example). The object relativism principle requires the uniform treatment of object names occurring on different levels of object hierarchy. Moreover, from the very beginning we have assumed that each name occurring in a query is bound to run-time entities according to the same simple stack-based binding mechanism that we have explained previously.

Although the description presented below is simple and intuitive, it requires some attention and imagination from the reader, especially if he/she has little experience with recursive functions and compiler construction. Be patient and persistent, understanding the mechanism that we propose will open for your mind a quite new and very attractive world.

Procedure eval for Non-Algebraic Operators

All non-algebraic operators are defined according to the same semantic pattern. It can be surprising that apparently very different operators, such as selection, projection/navigation, join and quantifiers, have very similar semantics. It is, however, a very positive property allowing for many inferences independent on the kind of operator, including definition of semantics and some methods of query optimization.

Let θ be a symbol denoting a non-algebraic operator, and let   q1 θ q2    be a query with the operator θ. The idea of the semantics is that the query q2 is evaluated for every element e of the bag (or sequence) returned by the query q1. Each evaluation of q2 is performed in a bit different run-time environment. Namely, if q2 is evaluated for e, then the environment stack ENVS is augmented by the result of function nested applied to e. After evaluation of q2 for this e ENVS is reduced to the previous state. Assume that result( q1 ) = bag{ e1, e2, e3 }. Fig.28 presents successive states of ENVS. Query q2 is evaluated three times. The final result of evaluation of q1 θ q2 is a function of the result returned by q1 and all the results returned by q2; the function depends on θ. Fig.28. Schema of evaluation of a query involving a non-algebraic operator

Fig.28 presents sequential processing of the elements of bag{ e1, e2, e3}, but theoretically this can be done in parallel. However, parallel execution leads to hard issues that will not be considered in this description. Informally we can describe this procedure in the following terms:

1.      Evaluate query q1.

2.      For each e Î  result(q1) do the following steps:

·         Calculate nested(e). It returns a set of binders.

·         Push nested(e) as the top section on ENVS.

·         Evaluate query q2 in this new environment.

·         Calculate a partial query result through some function partialResultOfθ (e, result(q2) ); the function depends on θ.

·         Pop (remove) the top section from ENVS.

1. Merge all partial result into the final result. It is done by some function mergePartialResultsθ( partialResult1, partialResult2, ..., partialResultk), which also depends on θ.

If query q1 returns a sequence, the procedure is similar, but the order of partial results is significant and is taken into account by the mergePartialResultsθ function. The non-algebraic operators can also act on individual values. It is treated as a one-element bag, but for some operators (e.g. dot) if the result of the entire query will also be a one-element bag, then it is coerced to its element.

A part of the procedure eval that specifies the formal semantics of non-algebraic operators is presented below. Then, the specification will be explained in detail for particular non-algebraic operators.

procedure eval( q : query )

begin

.......

case q is recognized as q1 θ q2: // the query q consists of a non- algebraic operator θ joining subqueries q1 and q2

begin

partialResults: bag of Result;

partialResult, finalResult, e: Result;

partialResults := Æ; //empty bag

eval( q1 ); //Evaluation of the first query; the result bag is at the top of QRES

for each e belonging to the bag at top( QRES ) do

begin

push( ENVS, nested(e) ); //new section at ENVS

eval( q2 ); //Evaluation of the second query; the result is at the top of QRES

partialResult := partialResultOfθ (e, top(QRES) );

partialResults := partialResults È { partialResult }; // new partialResult included to the partialResults bag

pop( QRES ); //removing the result(q2) from QRES

pop( ENVS ); //removing the top section with nested(e) from ENVS

end;

finalResult := mergePartialResultsθ( partialResults ); // forming the final result

pop( QRES ); //removing the result(q1) from QRES

push( QRES, finalResult); //the final result of the entire query is pushed at the top of QRES

end

.......

end

As can be deduced from this procedure, the state of ENVS is changing during evaluation, but the final state is the same as the initial state. QRES is also used, but finally it is augmented by one section keeping the entire result of q1 θ q2.

Below we present descriptions of particular non-algebraic operators, including description of corresponding functions partialResultOfθ and mergePartialResultsθ. We also present some minimal changes to the procedure that may be required if the first query will return a sequence or an individual element. In this chapter we do not present transitive closure operators - they will be presented later.

Selection

Syntax:  query ::= q1 where q2

Typing constraint: q2 must return true or false.

Function partialResultOfwhere : for given e it returns bag{e} if q2 returns true, and bag{} (empty bag) if q2 returns false.

Function mergePartialResultswhere : it returns the bag-sum of partial results.

For sequences: partial results keep order and the final result keeps the order too.

For individual elements: an element x is coerced to the bag{x}. The final result is thus bag{x} if q2 returns true, and bag{} if q2 returns false. Other operators can coerce final bag{x} back to x.

Examples (c.f. Fig.2 and Fig.3):

 Get references to employees earning more than 1000 Emp where (sal > 1000) Get a reference to the employee named “Poe”; the reference should be named x (Emp as x) where ((x.name) = “Poe”) Get a reference to the Trade department located in Paris (Dept where (dname = “Trade”))           where (location contains “Paris”) Get reference to the Trade department located in Paris (another formulation) Dept where ((dname = “Trade”)            and (location contains “Paris”))

More examples will be presented after introducing next operators. In further examples we will skip a lot of parentheses assuming some obvious (and a bit random) precedence rules of operators. In the concrete syntax these precedence rules must be of course formalized.

Fig.29 shows steps of the evaluation of a simple query (cf. Fig.3). Magenta boxes show the state of the environment stack. Grey boxes present result of particular (sub)queries. Fig.29. Steps of evaluation of the query Emp where sal > 1000

Syntax:  query ::= q1 . q2

Function partialResultOf. : for given e it returns top(QRES), i.e. the result(q2).

Function mergePartialResult. : it returns the bag-sum of partial results. Partial results cannot be sequences.

For sequences: partial results must be sequences. They are concatenated in the proper order into the final result.

For individual elements: if q2 returns an individual element too, then the final result is result(q2). Otherwise result(q1) is coerced to a bag and the situation is as previous.

Examples (c.f. Fig.2 and Fig.3):

 Get references to salaries of all employees Emp . sal Get references to names of employees earning more than 1000 (Emp where (sal > 1000)) . name Get references to names of employees working in the Trade department (((Dept where (dname = “Trade”)). employs).Emp).name Get references to names of employees working in the Trade department (path expression) (Dept where (dname = “Trade”)). employs. Emp.name Get references to worksIn pointers for employees earning more than 1000 (Emp where (sal > 1000)) . worksIn Get references to Dept objects connected by worksIn pointers for employees earning more than 1000 ((Emp where (sal > 1000)) . worksIn).Dept

Third and fourth queries show that path expressions are not a special construct in SBQL. A path expression is a combination of several binary dot operators. In the fourth example we have simply assumed the syntactic convention that dot operators are evaluated from left to right. Because dot operators can be freely combined with other operators, our path expressions are the most universal and orthogonal to other operators as one can imagine.

Note the difference between the two last queries. The first one identifies pointers worksIn, and the second one identifies objects that the pointers are pointing to. In ODMG OQL there is no possibility to express such a distinction. However, it is important for two reasons:

(1) Conceptual modeling: a path expression clearly shows for the programmer where the navigation ends up;

(2) Updating operations: sometimes we need to update pointers, and sometimes we need to update the objects that the pointers point to. No such distinction in OQL causes that one of these updating kinds cannot be expressed through the query language.

In ODMG OQL path expressions are limited by the assertion that the query before a dot (i.e. q1 in our case) must return an individual element rather than a collection. If it returns a collection, then instead of   q1.q2   one shall write   select q2 from q1   . In SBQL there is no such an assumption: the dot operator can act on individual elements and on collections as well. This feature of OQL looks unreasonable and causes our suspicion that the ODMG guys tend to confuse syntax and semantics.

Syntax:  query ::= q1 join q2

Function partialResultOfjoin : for given e and result(q2) = bag{r1,r2,..., rk} the function returns bag{ struct{e,r1}, struct{e,r2}, ..., struct{e,rk} }. The bag{r1,r2,...,rk} is stored at the top of QRES. Results r1,r2,...,rk can be simple elements or structures. Note that we have assumed no directly nested structures, for instance, struct{a, struct{b, c}, d} is equivalent to struct{a, b, c, d}. Structures, however, can be nested assuming other query language constructors, for instance,  struct{a, n(struct{b, c}), d} is not equivalent to struct{a, b, c, d}.

Function mergePartialResultjoin : it returns the bag-sum of partial results. Partial results cannot be sequences.

For sequences: partial results must be sequences of structures. They are concatenated in the proper order into the final result.

For individual elements: if q1 and q2 returns individual elements, then the final result is struct{e, result(q2)}. Otherwise result(q1) is coerced to bag and the situation is as previous.

Examples (c.f. Fig.2 and Fig.3):

 Get references to all employees with references to their salaries Emp join sal Get references to all employees with references to their worksIn pointer objects Emp join worksIn Get references to all employees with references to their departments Emp join (worksIn.Dept) Get references to all employees (named e) with references to their departments (named d) (Emp as e) join ((e.worksIn.Dept) as d) Assume that the Dept objects contain boss pointer objects leading to employees being managers of departments. Get references to names of employees working in Roma departments, together with references to names of their managers ((Emp as e) join ((e.worksIn.Dept) as d)). (e.name,  d.boss.Emp.name)

Quantifiers

Syntax: for quantifiers we use the traditional prefix syntax:

query ::= \$ q1 q2 | " q1 q2

Sometimes we will also use our standard infix syntax:

query ::= q1 \$ q2 | q1 " q2

In implementation symbols \$ and " should be substituted by keywords, so we assume also the syntax:

Typing constraint: q2 must return true or false.

Function partialResultOf\$  and partialResultOf" :  for given e it returns top(QRES), i.e. the result(q2).

Function mergePartialResult\$ : it returns true if the bag of partial results contains at least one result true. Otherwise it returns false.

Function mergePartialResult" : it returns true if the bag of partial results is empty or if all partial results are true. Otherwise it returns false.

For sequences: as for bags.

For individual elements: result(q1) is coerced to a bag and the situation is as previous.

Examples (c.f. Fig.2 and Fig.3):

 Is it true that some employee earns less than 1000? \$Emp (sal < 1000) Emp \$ (sal < 1000) Get departments where all the employees earn more than 1000 Dept where ("(employs.Emp)(sal > 1000)) Dept where ((employs.Emp) " (sal > 1000)) Is it true that there are employees with no address? \$ (Emp as e ) count(e.address) = 0 Get departments where all employees have an address Dept where (" (employs.Emp) exists(address)) Assume that the Dept objects contain boss pointer objects leading to employees being managers of departments. Get departments with employees earning more than their boss. (Dept as d) where \$((d.employs.Emp) as e) (e.sal > d.boss.Emp.sal ) Is it true that there is a department where all the employees for sure live in Paris? \$ Dept ( " (employs.Emp) ( \$ address (city = “Paris”))) Is it true that there is a department where all the employees live in Paris or their address is unknown? \$ Dept ( " (employs.Emp) (" address (city = “Paris”)))

More detailed examples of SBQL queries and comparisons with SQL and OQL can be found in the next section.