SBQL - Non-Algebraic Operators
Back to Description of SBA and SBQL.
The essence of the Stack-Based Approach and
SBQL is the concept of non-algebraic operators. We distinguish several
non-algebraic operators: selection, projection/navigation,
dependent/navigational join, quantifiers, ordering and (several kinds of)
transitive closures. All non-algebraic operators are binary, i.e. they connect
two queries. Despite syntactic similarity to algebraic operators, their
semantics cannot be reduced to some algebra similar to the relational algebra
or object algebras.
This assertion may look strange and
unbelievably for the readers who have been convinced for decades that
selection, projection and join are ordinary algebraic operators in the
relational algebra. Such a treatment of these operators, however, was possible
under severe constraints on their power and by assuming some not quite fair
formal trick that affects their semantics. Our conclusion is even stronger:
this unfair formal trick has caused that the research on query languages has
gone into wrong direction and has severely impeded the progress in the domain.
The trick consists in some mix of a formal
theory and its meta-language. Consider the expression of the relational
algebra:
σ_{sal}_{>1000}( Emp )
It consists of the selection operator σ qualified by the predicate sal > 1000. Strictly speaking, the
selection operator is not a single operator, but the infinite family of
operators having as many elements as selection predicates. The expression sal > 1000 does not belong to the
language of the algebra. This is a meta-language expression determining which
selection operator from the infinite family has to be chosen. This expression
is informal, is out of mathematics, essentially this is informal comment
that is used to explain the topic.
Hence, at least half of the expression above is
fake mathematics. The operator σ and Emp are first-class citizens, they are a part of the mathematical
theory. The attribute name sal,
operator < and constant 1000 are second-class citizens, they are a part of
the informal meta-language, essentially a comment. Semantics of the second part
cannot be expressed in terms of the relational theory and there is no theory
which deals with it. Of course, we know the argument that the meta-language
could be formalized too. But in this way we would obtain a new formal system
that would be quite different from the relational algebra. (This can be done in
many ways and SBA is actually one of them.) In the pure relational algebra even
the natural join operator cannot be expressed, because names of attributes are
not properties of the theory. Of course, we also know the argument that indices
are “so simple” that there is no room for ambiguities. We reject
this argument because in mathematics nothing is obvious, everything must come
from axioms and every even smallest formal semantic problem is a big problem.
This especially concerns so-called
“object-algebras”, where simple indices may take the form as in the
above expression:
σ_{NetSal}_{(sal +100)>1000}( Emp )
The index takes a complex form, with own
semantics and pragmatics, but everything is fake mathematics: indices are
comments rather than formulas.
A similar conclusion holds for all the other
operators of the relational algebra. The above arguments can be repeated for
the relational calculus and approaches based on formal logic. All these
theoretical concepts subdivide the semantics of query languages into two
worlds: (1) the semantics with strong mathematical treatment, and (2) the
semantics which can be explained by informal comments only. If formal semantics
of a query language such as SQL (claimed to be based on the relational algebra
or relational calculus) concerns only a part of this language, the semantics of
the entire query language is informal. Taking the current state of SQL, we can
estimate that at most 10% of its functionality can be explained formally by the
relational algebra or calculus. The next 90% is outside of any theory.
This subdivision of the concepts into formal
and informal we have considered frustrating and unacceptable, especially for
more sophisticated data models such as object-oriented and XML-oriented models.
Moreover, the subdivision is completely unnecessary. It is caused by some
religious attitude to the relational algebra and other lame database theories.
It is not a big problem to develop another mathematical theory that
consistently covers all the issues. The denotational semantics of programming
languages is an excellent example of such a theory.
But actually we have lost our belief that the
mathematical method is good for specification of any practical language. We have already presented the fundamental
disadvantages of mathematics as a method of specification of semantics of
practical artifacts. We follow our formal specification method based on
operational semantics or abstract implementation. Our basic assumptions
essentially distinguish our approach from the relational algebra and its
object-oriented counterparts:
In SBQL no operator is
indexed by an informal meta-language expression.
In SBQL each data name occurring in
a query has the same semantic treatment.
There is no
subdivision on first-class and second class names (operators), as in the
relational algebra.
In this way we avoid the subdivision of the
SBQL semantics into formal and informal parts. All operators, including all
algebraic operators such as + and <, and operators such as selection,
projection and join, are on the same level of the semantic description. We also
avoid subdivision of names occurring in queries into first class and second
class citizens (as in the above relational algebra example). The object
relativism principle requires the uniform treatment of object names
occurring on different levels of object hierarchy. Moreover, from the very
beginning we have assumed that each name occurring in a query is bound to
run-time entities according to the same simple stack-based binding
mechanism that we have explained previously.
Although the description presented below is
simple and intuitive, it requires some attention and imagination from the
reader, especially if he/she has little experience with recursive functions and
compiler construction. Be patient and persistent, understanding the mechanism
that we propose will open for your mind a quite new and very attractive world.
Procedure eval
for Non-Algebraic Operators
All non-algebraic operators are defined
according to the same semantic pattern. It can be surprising that apparently
very different operators, such as selection, projection/navigation, join and
quantifiers, have very similar semantics. It is, however, a very positive
property allowing for many inferences independent on the kind of operator,
including definition of semantics and some methods of query optimization.
Let θ be a
symbol denoting a non-algebraic operator, and let q_{1
}θ
q_{2} be a
query with the operator θ. The
idea of the semantics is that the query q_{2}
is evaluated for every element e of
the bag (or sequence) returned by the query q_{1}.
Each evaluation of q_{2} is
performed in a bit different run-time environment. Namely, if q_{2} is evaluated for e, then the environment stack ENVS is
augmented by the result of function
nested applied to e. After evaluation
of q_{2} for this e ENVS is reduced to the previous state.
Assume that result( q_{1 }) = bag{ e_{1},
e_{2}, e_{3} }. Fig.28 presents successive states of ENVS. Query q_{2} is evaluated three times.
The final result of evaluation of q_{1
}θ
q_{2} is a function of the result returned by q_{1} and all the results
returned by q_{2}; the
function depends on θ.
Fig.28. Schema of evaluation of a query
involving a non-algebraic operator
Fig.28 presents sequential processing of the
elements of bag{ e_{1}, e_{2}, e_{3}}, but theoretically this can be done in
parallel. However, parallel execution leads to hard issues that will not be
considered in this description. Informally we can describe this procedure in
the following terms:
1. Evaluate query q_{1}.
2. For each e Î result(q_{1}) do the following steps:
·
Calculate
nested(e). It returns a set of binders.
·
Push
nested(e) as the top section on ENVS.
·
Evaluate
query q_{2} in this new
environment.
·
Calculate
a partial query result through some function partialResultOf_{θ}
(e, result(q_{2}) );
the function depends on θ.
·
Pop
(remove) the top section from ENVS.
If query q_{1}
returns a sequence, the procedure is similar, but the order of partial results
is significant and is taken into account by the mergePartialResults_{θ}
function. The non-algebraic operators can also act on individual values. It is
treated as a one-element bag, but for some operators (e.g. dot) if the result
of the entire query will also be a one-element bag, then it is coerced to its
element.
A part of the procedure eval that specifies the formal semantics of non-algebraic operators
is presented below. Then, the specification will be explained in detail for
particular non-algebraic operators.
procedure eval(
q : query )
begin
.......
case q is recognized as q_{1 }θ q_{2}: // the query q consists of a non- algebraic operator θ joining subqueries q_{1 }and q_{2}
begin
partialResults:
bag of Result;
partialResult, finalResult, e: Result;
partialResults := Æ; //empty
bag
eval( q_{1 }); //Evaluation of the
first query; the result bag is at the top of QRES
for each e belonging
to the bag at top( QRES ) do
begin
push( ENVS, nested(e) ); //new section at ENVS
eval( q_{2} ); //Evaluation of the
second query; the result is at the top of QRES
partialResult := partialResultOf_{θ} (e, top(QRES) );
partialResults := partialResults È { partialResult }; // new partialResult
included to the partialResults bag
pop( QRES );
//removing the result(q_{2}) from QRES
pop( ENVS ); //removing the top section with nested(e) from ENVS
end;
finalResult := mergePartialResults_{θ}( partialResults ); // forming the final result
pop( QRES );
//removing the result(q_{1}) from QRES
push( QRES, finalResult);
//the final result of the entire query is pushed at the top of QRES
end
.......
end
As can be deduced from this procedure, the state
of ENVS is changing during evaluation, but the final state is the same as the
initial state. QRES is also used, but finally it is augmented by one section
keeping the entire result of q_{1 }θ q_{2}.
Below we present descriptions of particular
non-algebraic operators, including description of corresponding functions partialResultOf_{θ} and mergePartialResults_{θ}.
We also present some minimal changes to the procedure that may be required if
the first query will return a sequence or an individual element. In this chapter
we do not present transitive closure operators - they will be presented later.
Syntax:
query ::= q_{1} where q_{2}
Typing constraint: q_{2} must return true
or false.
Function partialResultOf_{where} : for given e it returns bag{e} if q_{2} returns true,
and bag{} (empty bag) if q_{2} returns false.
Function mergePartialResults_{where} : it returns the bag-sum of partial
results.
For sequences: partial results keep order and the
final result keeps the order too.
For individual
elements: an
element x is coerced to the bag{x}. The final result is thus bag{x} if q_{2} returns true,
and bag{} if q_{2} returns false. Other operators can coerce final bag{x} back to x.
Examples (c.f. Fig.2 and Fig.3):
Get references to employees earning more than
1000 |
Emp where (sal > 1000) |
Get a reference to the employee named
“Poe”; the reference should be named x |
(Emp
as x) where ((x.name)
= “Poe”) |
Get a reference to the Trade department
located in |
(Dept where (dname = “Trade”))
where (location
contains “ |
Get reference to the Trade department located
in |
Dept where
((dname = “Trade”) and (location
contains “ |
More examples will be presented after
introducing next operators. In further examples we will skip a lot of
parentheses assuming some obvious (and a bit random) precedence rules of
operators. In the concrete syntax these precedence rules must be of course
formalized.
Fig.29 shows steps of the evaluation of a
simple query (cf. Fig.3). Magenta boxes show the state of the environment
stack. Grey boxes present result of particular (sub)queries.
Fig.29. Steps of evaluation of the query Emp where
sal > 1000
Projection and Navigation, Path Expressions
Syntax:
query ::= q_{1} . q_{2}
Function partialResultOf_{.} : for given e it returns top(QRES),
i.e. the result(q_{2}).
Function mergePartialResult_{.} : it returns the bag-sum of partial results.
Partial results cannot be sequences.
For sequences: partial results must be sequences.
They are concatenated in the proper order into the final result.
For individual
elements: if q_{2} returns an individual
element too, then the final result is result(q_{2}). Otherwise result(q_{1}) is coerced to a bag and the situation is as
previous.
Examples (c.f. Fig.2 and Fig.3):
Get references to salaries of all employees |
Emp . sal |
Get references to names of employees earning
more than 1000 |
(Emp
where (sal > 1000)) . name |
Get references to names of employees working
in the Trade department |
(((Dept where (dname = “Trade”)).
employs).Emp).name |
Get references to names of employees working
in the Trade department (path expression) |
(Dept where (dname = “Trade”)).
employs. Emp.name |
Get references to worksIn pointers for employees earning more than 1000 |
(Emp
where (sal > 1000)) . worksIn |
Get references to Dept objects connected by worksIn
pointers for employees earning more than 1000 |
((Emp
where (sal > 1000)) . worksIn).Dept |
Third and fourth queries show that path
expressions are not a special construct in SBQL. A path expression is a
combination of several binary dot operators. In the fourth example we have
simply assumed the syntactic convention that dot operators are evaluated from
left to right. Because dot operators can be freely combined with other
operators, our path expressions are the most universal and orthogonal to other
operators as one can imagine.
Note the difference between the two last
queries. The first one identifies pointers worksIn,
and the second one identifies objects that the pointers are pointing to. In
ODMG OQL there is no possibility to express such a distinction. However, it is
important for two reasons:
(1) Conceptual
modeling: a path
expression clearly shows for the programmer where the navigation ends up;
(2) Updating
operations:
sometimes we need to update pointers, and sometimes we need to update the
objects that the pointers point to. No such distinction in OQL causes that one
of these updating kinds cannot be expressed through the query language.
In ODMG OQL path expressions are limited by the
assertion that the query before a dot (i.e. q_{1}
in our case) must return an individual element rather than a collection. If it
returns a collection, then instead of q_{1}.q_{2} one shall write select
q_{2} from q_{1} . In SBQL there is no such an assumption:
the dot operator can act on individual elements and on collections as well.
This feature of OQL looks unreasonable and causes our suspicion that the ODMG
guys tend to confuse syntax and semantics.
Syntax: query ::= q_{1} join q_{2}
Function partialResultOf_{join} : for given e and result(q_{2}) = bag{r_{1},r_{2},..., r_{k}} the function returns bag{ struct{e,r_{1}},
struct{e,r_{2}}, ..., struct{e,r_{k}} }. The bag{r_{1},r_{2},...,r_{k}} is stored at the top of QRES.
Results r_{1},r_{2},...,r_{k} can be simple elements or structures. Note that we
have assumed no directly nested structures, for instance, struct{a, struct{b,
c}, d} is equivalent to struct{a,
b, c, d}. Structures, however, can be nested assuming other query language
constructors, for instance, struct{a, n(struct{b, c}), d} is not equivalent to struct{a, b, c, d}.
Function mergePartialResult_{join} : it returns the bag-sum of partial
results. Partial results cannot be sequences.
For sequences: partial results must be sequences
of structures. They are concatenated in the proper order into the final result.
For individual
elements: if q_{1} and q_{2} returns individual elements, then the final result is
struct{e, result(q_{2})}.
Otherwise result(q_{1}) is coerced to bag and the situation is as previous.
Examples (c.f. Fig.2 and Fig.3):
Get references to all employees with
references to their salaries |
Emp join sal |
Get references to all employees with
references to their worksIn pointer
objects |
Emp join worksIn |
Get references to all employees with
references to their departments |
Emp join (worksIn.Dept) |
Get references to all employees (named e) with references to their
departments (named d) |
(Emp as e) join ((e.worksIn.Dept) as
d) |
Assume that the Dept objects contain boss
pointer objects leading to employees being managers of departments. Get
references to names of employees working in Roma departments, together with
references to names of their managers |
((Emp as e) join ((e.worksIn.Dept) as
d)). (e.name, d.boss.Emp.name) |
Syntax: for quantifiers we use the traditional prefix syntax:
query ::=
$ q_{1} q_{2}
| " q_{1} q_{2}
Sometimes
we will also use our standard infix syntax:
query ::= q_{1}
$ q_{2}
| q_{1} " q_{2}
In
implementation symbols $ and " should be substituted by keywords, so we assume also the syntax:
query ::= exists q_{1} such that q_{2} | for any q_{1} holds q_{2}
Typing constraint: q_{2} must return true
or false.
Function partialResultOf_{$} and partialResultOf_{"} :
for given e it returns top(QRES), i.e. the result(q_{2}).
Function mergePartialResult_{$} : it returns true if the bag of
partial results contains at least one result true. Otherwise it returns false.
Function mergePartialResult_{"} : it returns true if the bag of partial results is empty or if all partial
results are true. Otherwise it
returns false.
For sequences: as for bags.
For individual
elements: result(q_{1}) is coerced to a bag and the situation is as
previous.
Examples (c.f. Fig.2 and Fig.3):
Is it true that some employee earns less than
1000? |
$Emp (sal
< 1000) Emp $ (sal < 1000) |
Get departments where all the employees earn
more than 1000 |
Dept where ("(employs.Emp)(sal > 1000)) Dept where ((employs.Emp) " (sal > 1000)) |
Is it true that there are employees with no
address? |
$ (Emp as e ) count(e.address) = 0 |
Get departments where all employees have an
address |
Dept where
(" (employs.Emp) exists(address)) |
Assume that the Dept objects contain boss
pointer objects leading to employees being managers of departments. Get
departments with employees earning more than their boss. |
(Dept as d) where $((d.employs.Emp)
as e) (e.sal > d.boss.Emp.sal ) |
Is it true that there is a department where
all the employees for sure live in |
$ Dept ( "
(employs.Emp) ( $ address (city = “ |
Is it true that there is a department where
all the employees live in |
$ Dept ( "
(employs.Emp) (" address (city = “ |
More detailed examples of SBQL queries and
comparisons with SQL and OQL can be found in the next section.
Last modified: December 31, 2007