SBQL
- Algebraic Operators
by Kazimierz Subieta (February
2006)
Back to Description
of SBA and SBQL.
In the description of the query evaluation procedure eval we have presented the operational
semantics of typical algebraic operators that we would like to introduce in
SBQL. We do not fix which operators belong to SBQL and which do not belong. We
assume that SBQL contains all operators which can be necessary to illustrate
some topics by examples and all operators that developers of some SBQL
interpreter wish to introduce. There are myriads of algebraic operators, thus
fixing some particular subset of them is not reasonable. All of them can be
defined and implemented on the same semantic and implementation basis.
Moreover, after implementing a sufficiently rich set of operators, next ones
can be implemented in quarters or hours. Determining the set of operators that
belong to SBQL is of curse important if some organization wish to have SBQL as
a standard. So far such a goal looks as very distant (if real at all).
There is a subtle difference between operators
that are build in the language and
operators that are added on. Build in
operators are the property of a particular interpreter. Added on operators are
possible if an interpreter accepts external libraries (of procedures, functions
or classes) in some general way, e.g. as dynamically linked functions a
To simplify the explanation, besides the description of algebraic operators in the form of the eval procedure we will use a more abstract (actually, denotational) view on semantics by introducing the function result. We consciously take the same function name as the name of the domain Result because of the similarity of the concepts. The function result abstracts from the QRES stack. It returns the final result of the query being its argument. In particular, we can take the following semantic rules:
·
If
∆ is a symbol denoting a unary algebraic
operator, then result( ∆ q1 ) = ∆( result( q1
)). As before, ∆
is the operator denoted by the symbol ∆.
·
If
∆ is a symbol denoting a binary algebraic
operator, then result( q1 ∆ q2 ) = ∆( result( q1
), result( q2 ) ).
·
And
so on.
Note that the function result is implicitly parameterized by the state, i.e. by the object
store and ENVS. At this level of abstraction we need not to talk about the
stack QRES at all. However, this is only some didactic facility; formal
semantics is to be given by the procedure eval.
In general, we assume that all algebraic
operators have no side effects, i.e. they are pure functions. Side effects may
concern, for instance, updating operations in a database, some visual effects
on the user screen, some writes to files, etc. Side effects of operators
occurring in query languages usually undermine query optimization and
frequently are difficult to understand by the programmers. Nevertheless,
operators with side effects could be necessary for many applications. However, their use must be disciplined. This
will be possible in imperative programming extensions of SBQL.
Operators and Comparisons for Primitive Types
·
We
assume generic and overloaded comparison operator = (and its complement ≠)
for all the primitive types that we introduce. The operators have obvious
meaning. Examples: 2 = 2, x ≠ y,
sal ≠ 2000, name = “Doe”, etc. We assume
that operators = and ≠ can also act on complex values (e.g. structures,
sets, collections), however with the caution that their meaning must be
univocally defined. Operators may also act on object references, but the
comparison is shallow, e.g. i1
= i2 returns true if i1 and i2 refer to the same object. We do not assume any form
of a deep comparison, i.e. comparison of objects that the argument references
refer to. (Sometimes, however, deep comparisons may be enforced by the strong
typing system and automatic dereferences, but this is another story that we
explain elsewhere.) The operators are undefined for arguments of different
types, for instance (real) 3.14 = (int) 3 is illegal. Such comparisons,
however, will be possible due to automatic coercions, which will be introduced
a bit later.
·
For
primitive types having natural linear ordering we introduce generic and
overloaded comparison operators <, ≤, >, ≥. These operators
we can apply to integer numbers, real numbers (float), strings (assuming some
predefined lexicographical order, perhaps corresponding to the current
character coding table), dates and times (assuming the linear time axis), etc.
Operators are undefined (not allowed) for arguments of different atomic types
and are undefined for complex values and identifiers. Examples: 2 < 3, x ≤ y, sal ≥ 2000, name
> “Doe”, etc.
·
For
primitive numerical types we introduce typical arithmetic operators +, -, *, /.
Operator - is unary and binary. Their meaning is typical.
·
We
assume several string-oriented operators and comparisons, such as the
concatenation of strings (the overloaded + operator), is_a_substring and
is_a_superstring. We also provide parameterized string operators, such as
taking a substring from i-th to j-th character (perhaps in the C/C++
convention), some functions, such as changing strings to capital letters,
counting the size of a string, etc.
·
We
assume several build in functions acting on numerical values, such as the
integer part of a real number, rounding a real to integer, square root,
logarithm, power, etc. The syntax and meaning e.g. as in Pascal.
·
We
also provide some functions for dates, for instance, change the date into the
number of days since 1900.01.01 and v/v, changing data formats, changing the
time to the number of mili-seconds, etc.
The above operators and comparisons cause some
temptation of the developers of query language interpreters to extend their
usual meaning to arguments being collections. For instance, one wants to define
result( bag(3, 7) + bag(1, 4,
5)) = bag{ 4, 7, 8, 8, 11, 12
}. This philosophy resulted in SQL in special comparison operators, such as =any and =all. Taking such extensions of popular operators with commonly
understood semantics is however rather dangerous. In many cases the programmer
may have problems with understanding what they actually mean. They also weaken
the strong typing system.
Thus we discourage such extensions of their
semantics. Collections are to be processed by other operators, in particular,
quantifiers. The commonly understood operators, such as =, ≠, <, ≤,
>, ≥, +, -, *, /, should not extend their meaning. Using of them
should be disallowed for arguments being collections of primitive types. SBQL
provides options that make it possible to achieve the desired results without
such extensions of the semantics of these operators.
Aggregate Functions, Removing Duplicates
Aggregate functions act on arguments being
collections (bags or sequences) of integer or real numbers. For empty
collections the functions are undefined (not allowed). We assume the typical
set of numeric aggregate functions known from SQL:
·
sum - the sum of argument numbers; for instance, result( sum( bag(3, 6, 7) ) ) =
16, sum(
Emp.sal ) calculates the total sum of
employees’ salaries.
·
avg - the arithmetic average of argument numbers,
for instance, avg( bag(3, 6, 7) ) returns 5.333..., avg(
Emp.sal ) calculates the average of
employees’ salaries.
·
max - the maximal number from the argument
collection.
·
min - the minimal number from the argument
collection.
Several next similar functions can be useful
for some applications; for instance, median, geometric average, variance, etc.
This set is usually augmented by:
·
count - the size on any collection; the function is
defined for any kind of a collection, any type of its elements and for empty
collections.
·
exists - test for existence; it returns true if the argument collection is not
empty and false if it is empty.
·
distinct - the function removing duplicates from the
argument collection, for instance, result(
distinct( bag(3, 6, 3, 3, 6, 7) ) ) = bag{3,
6, 7}. For sequences distinct returns
the sequence where each next repetition of an element is removed, for instance,
result( distinct( sequence(7, 7,
3, 6, 3, 3, 6) ) ) = sequence{7,
3, 6}.
In SQL functions sum, avg, max, min and count
have two semantic faces. If they are not under the scope of the group by clause, they have the usual
semantics, as presented above. If they are under the scope, they calculate
their values for groups determined by the group
by clause and in this semantic sense can be used within the select and having clauses. Moreover, SQL assumes quite unusual (apparently
more friendly) syntax, where names of these functions are under the select clause; for instance, to calculate
the average salary of employees one has to write select
avg( sal ) from Emp
rather than more logical formulation
avg( select sal from Emp ) . (This syntax is
also an option in ODMG OQL.)
In SBA and SBQL we reject such doubtful
syntactic and semantic inventions. No function from the above list will have
two different semantic faces. The regular syntax of their use will not be
violated. Moreover, we do not introduce the group
by operator at all, for its severe disadvantages, such as non-orthogonality,
semantic reefs that it implies and implementation-oriented flavor that makes
difficulty for query optimization (we discuss this later). All the pragmatic
goals that can be achieved through the group
by clauses can be achieved in SBQL through other operators, in easier, more
logical and more consistent way. We will show that on examples. Lack of the group by clause causes that specific
problems with a special conditional clause (having),
non-orthogonality, semantic reefs and query optimization no more exist.
In SQL there is also semantic schizophrenia
concerning null values. Despite big words on the meaning of null values and
special constructs for serving them in queries, aggregate functions sum, avg, max, min ignore them totally.
It is not quite clear how they are treated by count, exists and distinct. The group by clause collects all null values in a single group, what
makes quite a pragmatically ill situation for aggregate functions. As before,
we avoid this schizophrenia in SBQL simply by assumption that our store models
and query languages do not introduce null values at all. Instead of null values
we propose to deal with options, i.e.
collections having zero or one element (in other words, cardinalities [0..1]).
We discuss this issue in the section devoted to processing irregular data.
Operators and Comparisons on Collections
These operators come from the set theory, which
introduce such operators as union of sets, intersection of sets, difference of sets, cartesian product,
set containment (⊂, ⊆), membership of an element in a set
(operator ∈), and so on. Actually, we have no sets as data structures, but bags and
sequences, hence we have to adopt these set-theoretic operators to a bit different
situation. We have already introduced some of these operators, in particular:
·
struct( A, B) - the constructor of structures,
which in the SQL jargon is known as cartesian
product. If query A returns bag{a1, a2,
...} and query B returns bag{b1, b2,
...}, then the query struct( A, B)
returns bag{ struct{ a1, b1},
struct{ a1, b2},...,
struct{ a2, b1},
struct{ a2, b2},
...}. If A returns an individual
element a, and B returns an individual element b,
then struct( A, B) returns an
individual element struct{a,
b}. In all other cases individual elements are converted to one-element bags.
Our syntax allows us to drop struct
and to write simply (A, B). We assume that struct{ a, struct{b, c} } is equivalent to struct{ struct{ a, b}, c } } and is equivalent
to struct{ a,
b, c }. We also assume that a
structure with one element is equivalent to this element: struct{ a } = a. Similar properties are then induced
on the level of the query language semantics: in particular, query struct( struct( A, B), C
) is equivalent to query struct(A, struct(B, C
)) and is equivalent to query struct(
A, B, C ). If A returns an individual element, then struct(A) is equivalent to A. As
follows from the above, struct(A, B,
C, D, E,...) is a shorthand for
composition of several binary struct
operators: struct(A, struct(B, struct(
C, struct( D, struct( E,...))))). If A or B returns an empty bag, then struct( A, B) returns an empty
bag. It is quite unclear what struct(
A, B) has to return if A
and/or B return sequences; hence we
propose to leave such cases undefined (i.e., to forbid them). Unlike SQL, we do
not limit the cases when our “cartesian product” can occur - it may
occur in any place of a query if it makes sense for the programmer and the
strong typing is not violated.
·
bag( A, B) - the constructor of bags, which in
the SQL jargon is known as union of
tables (relations). If query A returns bag{a1,
a2, ...} and query B returns bag{b1,
b2, ...}, then the query bag( A, B) returns bag{ a1, a2,
..., b1, b2, ...}. If A or B
returns an individual element, it is treated as a one-element bag. We assume
that bag{ a, bag{b, c} } is equivalent to bag{ bag{ a, b}, c } } and is equivalent to bag{ a, b, c }. We also assume that a bag with one element is equivalent to
this element: bag{ a } = a; our strong typing system will be prepared to handle such cases
correctly. Similar properties are then induced on the level of the query
language semantics: in particular, query bag(bag( A, B), C ) is equivalent to queries bag(A, bag(B, C
)) and bag( A, B, C ). If A returns an individual element, then bag(A) is equivalent to A.
As before, bag(A, B,
C, D, E,...) is a shorthand for
composition of several binary bag
operators: bag(A, bag(B, bag(
C, bag( D, bag(
E,...))))). If A or B returns an empty bag, then it does not
influence the result bag. It is quite unclear what bag( A, B) has to return if A and/or B return
sequences; hence (as for struct) we
propose to leave such cases undefined (to forbid them). As for struct, the operator may occur in any
place of a query if it makes sense for the programmer and the strong typing is
not violated.
·
sequence( A, B) - the constructor of sequences, known
as concatenation of sequences. If
query A returns sequence{a1,
a2, ...} and query B returns sequence{b1,
b2, ...}, then result( sequence( A, B) ) = sequence{ a1,
a2, ..., b1, b2, ...}. If A
or B returns an individual element,
it is treated as a one-element sequence. We assume that sequence{ a, sequence{b, c} } is equivalent to sequence{ sequence{ a, b}, c } } and to sequence{ a, b, c }. We also assume that a sequence with one element is
equivalent to this element: sequence{ a } = a; as for bags, our strong typing system will be prepared to such
cases. Similar properties are then induced on the level of the query language
semantics: in particular, query sequence(
sequence( A, B), C ) is equivalent to queries sequence(
A, sequence( B, C ) ) and sequence( A, B, C
). If A returns an individual
element, then sequence(A) is equivalent to A. As before, sequence(A, B,
C, D, E,...) is a shorthand for
composition of several binary sequence
operators: sequence( A, sequence(
B, sequence( C, sequence( D, sequence( E,...))))). If A or B returns an empty
sequence, then it does not influence the result sequence. It is quite unclear
what sequence( A, B) has to return if A and/or B return bags; hence (as for struct
and bag) we propose to leave such
cases undefined (to forbid them). As for struct
and bag, the operator may occur in
any place of a query if it makes sense for the programmer and the strong typing
is not violated.
Examples of application of struct, bag and sequence are the following:
(Emp, Dept)
where name = dname (the query is equivalent to struct(Emp, Dept)
where name = dname )
(Emp where
name = “Poe”).( sal, worksIn) (the query is equivalent to (Emp
where name = “Poe”).struct(sal, worksIn) )
struct( name(“Kim”),
sal(3000), worksIn( Dept where dname = “Trade”) )
bag( (Emp
where sal < 1000) , ((Dept where dname = “Trade”).employs.Emp)
)
sequence(“one”, “two”,
“three”, “four”)
Besides these operators, we can consider to
introduce next ones:
·
in - bag or sequence inclusion operator. A query A in
B returns true if each element of the bag that is returned by A can be found in the bag returned by B. For sequences, the query A in
B returns true if the sequence returned by A is a sub-sequence of the sequence returned by B. It is quite unclear what A in
B has to return if A returns a bag and B returns a sequence (or v/v); hence we propose to leave such cases
undefined (i.e., to forbid them). If A returns an empty bag or an empty
sequence, the query A in B
returns true. If A returns an individual element, it is
treated as a one element bag or one element sequence (depending on what returns
B).
·
contains - a
reverse bag or sequence inclusion operator; A
contains B is equivalent to B in A.
·
intersect - makes a
sense for bags only. A query A intersect B returns a bag consisting of elements that occur both in the bag
retuned by A and in the bag returned
by B. Because a bag can contain the
same element x many times, the result
of A intersect B contains the
minimal number of x from the numbers
of occurrences of x in A and in B. If A or B returns an individual element, it is
treated as one-element bag.
·
subtract - difference of bags. A query A subtract
B returns a bag of elements that
occur in the bag returned A and do not
occur in the bag returned by B. If
element x occurs in the result
of A
k1 times, and in the
result of B k2 times, then
the result A subtract B contains x if k1 - k2 > 0 and the number of occurrences of x in the result is k1 - k2.
·
i-th
element of a sequence. We denote this operator query1[query2], where query1 returns a sequence, and query2 returns an integer number. Unlike
C, C++, Java and ODMG, we propose to return to the oldest tradition where the
first element of a sequence is indexed by the number 1 (and not by 0). For instance, result( sequence(7, 6, 3)[2] ) = 6. Note that the operator makes no sense
for bags.
·
Elements
of a sequence from i-th to j-th. Similarly as before, we denote this operator query1[query2.. query3], where
query1 returns a sequence, query2
returns an lower index and query3
returns an upper index of the sequence elements that are taken as the result.
For instance, result( sequence(7, 6, 3,11, 5, 6, 7, 34,
3)[4..7] ) = sequence{11, 5,
6, 7}
There are some less important operators, such
as proper sub-bags and sub-sequences, continuous sub-sequences, cutting heads
and tails of sequences, i-th element from the end of a sequence, taking a
random single element from a bag, etc. We can introduce them if there will be
some significant applications.
Coercions are functions that change the types
and representation of values. In a lot of cases coercions are implicit, to
avoid annoying style of programming. For instance, if x is an integer number and y
is a real number, then in the query x
+ y the value returned by x is automatically coerced to a real
number. Similarly, in the query Emp.(name + “ earns “ + sal) the operator + is recognized as
concatenation of strings, hence integer values returned by sal are implicitly coerced to strings. The presence of implicit
coercions is sometimes called ad hoc
polymorphism.
Another kind of implicit coercions concerns
changing bags or sequences into single elements, and v/v. For instance, in SQL
a select clause can occur within a where clause, but in this case the
result of the select clause is
automatically coerced to an individual value. Such coercions are typical for
SBQL, which unifies a structure, bag or sequence having one element x and this element x. For instance, one can use a query (get employees earning more
than Kim):
Emp where
sal > ((Emp where name = “Kim”).sal)
The query assumes implicit coercion of the bag
returned by (Emp where name = “Kim”).sal
into a single value (of the Kim’s salary). If this is impossible,
because the company does not employ a person named Kim, or the company employs
more than one person named Kim, the dynamic type check will return a typing
error (exception).
A consistent and universal handling of implicit
coercions requires a static strong typing system, which during the type checks
of a query recognizes the necessity of a coercion, inserts a proper explicit
coercion operator into the query syntactic tree and instantly augments the tree
with some parts performing dynamic type checks. In this way the static
(compile-time) strong typing system delegates corresponding type checks to
run-time. For instance (see Fig.2),
considering the query Dept where location = “
Another commonly assumed kind of implicit
coercions are implicit dereferences.
Assuming an object <i, n, v>,
the dereference of i returns v. For instance (see Fig.2), for the query Emp
where name = “Poe”
the subquery name returns a
reference; it is automatically dereferenced to a value of the object pointed to
by this reference. This is the job of the static strong type checker, which
must properly change the syntactic tree of a query, see Fig.27. Usually,
the implicit dereference concerns values of primitive types. However,
dereferences of identifiers of complex objects can be also consistently
defined; we explain this issue later.
For the programmers convenience coercions and
dereferences can be explicitly introduced as algebraic operators. The typical
syntactic convention for that is known from languages such as C, C++, Java,
etc. as cast. We use this convention
for run-time explicit conversions of types. Casts will be written in the syntax
query ::= (typeName)
query
Note however that in contrast to C, where
arbitrary casts are allowed (and this is rather a compile-time facility), we
assume that our cast are run-time conversions, thus can be allowed or
disallowed. Sometimes they may cause compile time-typing errors. For instance (int)Emp
is disallowed and will be discovered as a type error. In some cases coercions
may cause a run-time error. For instance, the query (int) X, where X is of type string, causes conversion of the string
stored at X into integer, but this is
of course not always possible.
We also use casts with collection constructors.
For instance, (sequence) Emp changes the bag returned by Emp into a sequence with an arbitrary
order of elements. Similarly, we can use
(bag) query to convert a sequence
into a bag; the operator has no effect if the query returns bag.
In the following (in the AS1, AS2 and AS3
models) such casts will also be used to convert an object into an object of a
more specific or a more general class. For instance, (Student)
(Emp where sal > 2000) converts references to objects of employees
earning more than 2000 into references to Student
objects, but this conversion rejects such Emp
objects that are not at the same time Student
objects.
In some cases there is also a need for explicit
dereference operators. This will be possible by the operator deref that takes a single references or
a collection of references and then returns a corresponding value or a
collection of values. For instance (see Fig.2):
result( Emp.name
) = bag{i2, i6, i10}
result( deref(Emp.name) ) = bag{
“Doe”, “Poe”, “Lee” }
Conditional Queries
The semantics of conditional queries is rather
obvious:
result( if
q1 then q2 ) = if
result(q1) = true
then result(q2) else bag{}
result( if
q1 then q2 else q3) = if result(q1) = true then result(q2) else result(q3)
Example
if count(Emp) < 10 then Emp.name else (bag)((sequence)Emp.name)[1..10])
As in Oracle, we can also consider a more
general case of conditional queries that take the form of a switch operator.
Defining
Auxiliary Names
Auxiliary names appear as a property of query
languages in connection with the relational calculus, considered a mathematical
basis for relational query languages. The relational calculus introduced such
names as “variables” iterating over relations (tuple relational
calculus) or over columns (domain relational calculus). The variables can be
then used in selection conditions, in a clause forming the output from a query,
etc. Similar variables occur in predicate calculus and in lambda calculus.
QUEL, one of the first relational query languages, makes the use of such
“tuple variables” obligatory. Query-By-Example is considered a
language based on domain relational calculus.
SQL introduces auxiliary names, but their use
is not obligatory. They may appear in a from
clause after names of relations. Sometimes they are called
“synonyms” of relation names, but this is improper association,
because in where clauses and in select clauses they behave just as
variables ranging over relations. In other terminology they are called
“correlation variables” or “correlation names”. For
instance, assume the relational schema (worksIn
is a foreign key for d#, boss is a
foreign key for e#):
Emp(e#,
name, job, sal, worksIn)
Dept(d#,
dname, location, boss)
We can formulate the request “get
designers earning more than their bosses” as the following SQL query:
select x.name
from Emp x, Dept y, Emp z
where x.job
= ‘designer’ and x.worksIn = y.d#
and y.boss
= z.e# and x.sal > z.sal
This query defines three correlation names x, y and z. The use of x and z cannot be avoided because both refer
to the same table Emp. However, the
use of such variables in SQL is not obligatory and one can formulate the same
query without the variable y:
select x.name from Emp x, Dept, Emp z
where x.job
= ‘designer’ and x.worksIn = Dept.d#
and Dept.boss
= z.e# and x.sal > z.sal
Sometimes this is explained by the rule that
each relation named R has by
definition the correlation name named R.
This idea of auxiliary naming in SQL seems to
be quite semantically clear. However, in other query languages the situation is
not obvious. For instance, such variables are necessary in quantified
predicates as “variables bound by quantifiers”, in for each iterators as a variable iterating
over some calculated domain, in group by
clauses, in order by clauses, etc.
Many contexts when such “variables” may appear and complex data
structures that are to be served by such variables cause the necessity to
explain their semantics on the more general ground that is done in the
relational calculus.
An example of difficulties with correct
definition of the semantics of such auxiliary names is supplied by the ODMG
standard. Consider the following OQL query, p.104 of [ODMG00]:
select * from Students as x, x.takes as y, y.taught_by
as z
where z.rank
= "full professor"
The query defines three names x,
y, z, considered “iteration variables”. According to the scope
rules (p. 112), their scope is limited to this query (they have no meaning
outside it). But it appears (p. 105) that the type returned by this query is:
bag<struct (x:Students,
y:Section,
z:Professor)>
The semantic qualification of these names is
changed: instead of being “iteration variables” they become
structure field labels. Because the output of the query can be used in another
(outer) query, the names can be used outside the query where they have been
defined (which is inconsistent with the assumed scope rules).
This example rises very fundamental questions:
(1) Which formal model makes it possible
to change iteration variables into structure field labels? Clearly, the
relational calculus does not provide such situations.
(2) What is the real scope for x, y, z?
(3) How the semantics can be correctly
formalized? How it can be generally and consistently implemented?
(4) How this situation can be correctly
statically typed?
The ODMG OQL grammar introduces at least 7
different contexts where such auxiliary naming can be used. Till now, the
formal semantics of all these contexts is non-existent (explained by toy
examples rather than by formal specification). An attempt to define the
semantics of one of these contexts (on the level of static typing only)
obviously failed, as shown above.
Indeed, SBQL is the first query language where
the semantics of such auxiliary naming is generally and correctly defined. The
semantic definition is very simple, but has no precedents in any mathematical
theory, in programming languages and in former query languages. This idea was
implemented in the Loqis system in 1988, at least five year earlier than the
ODMG proposed the first version of the object-oriented database standard. Our
solution of the problem requires the notion of binder
(introduced previously) that occurs only in SBA and SBQL. The semantics covers
static strong typing (as will be shown later) and is fully consistent with the
introduced two stack apparatus. Moreover, it makes no special situation for
query optimization methods.
In our attempts to formalize the semantics of
auxiliary naming we have rejected the relational calculus as not sufficiently
precise for all the contexts where such naming can occur. Our idea was to
integrate smoothly auxiliary naming with the stack-based idea. The lighthouse
that we would like to follow was the following:
Each name occurring in
a query or in a program must be subordinated to the same scoping and binding
rules that are accomplished by the environment stack mechanism. This also
concerns all auxiliary names defined inside queries.
After taking this tenet, the questions that we
would like to solve were the following:
·
How
definitions of auxiliary names have to be internally represented?
·
Where
the definitions have to be stored?
·
How
the definitions have to be used when a particular auxiliary name occurring in a
query has to be bound?
·
When
the definition of an auxiliary name has to be removed, assuming it is no more
valid?
There is a long history of several ideas that
we tried to follow. In the Linda system we have assumed that the definition of
an auxiliary name in a query is equivalent to the declaration of a local
programming variable. Unfortunately, this was inconsistent; in particular,
makes no answer to the last question.
The answer for all the above questions was
invented in 1988, during the implementation of the Loqis system. The solution
appears to be simple, universal (covering all the cases of auxiliary naming),
very easy to implement, and free from disadvantages. The solution can be very
surprising for all the fans of the relational calculus, because it totally
rejects the concept of “variable”.
Defining an auxiliary name within a
query is a unary algebraic operator parameterized by the name.
The operator sticks
the name with each element of its argument (i.e. it makes binders).
Following OQL we assume the syntax:
query ::= query
as name
As usual in SBQL, such a query has the
semantics independent from other operators. It is as follows:
·
Let
query q returns bag{a1,
a2, a3,...}.
Then, the query q as
n
returns bag{n(a1), n(a2), n(a3),...}.
·
Let
query q returns sequence{a1,
a2, a3,...}. Then, the query q as n returns
sequence{n(a1), n(a2), n(a3),...}.
·
Let
query q returns an individual element
a. Then, the query q as n returns
n(a).
As previously, the operator may occur in any
place of a query if it makes sense for the programmer and the strong typing is
not violated. In particular, the operator can be nested. For instance, if q returns an individual element a, then
(q as n1) as n2 returns a nested binder n2(n1(a)) . As will be shown,
nesting of this operator is important for creating complex values (nested
structures). There is no constraint concerning the argument query q. For instance, 2 as two is a legal query that returns the binder two(2).
We will show in the following that this simple
operator in connection with other SBQL operators presents a very powerful and
universal facility to deal with almost all the contexts when auxiliary names
for queries or sub-queries are needed. This concerns the following cases:
·
(Student as x join
(x.takes.Lecture) as y join
( y.taught_by.Professor) as z) where
z.rank = "full professor"
One of the SBQL contexts of auxiliary naming is
not covered by the operator as. Thus
we have introduced in SBQL another naming operator that we denote group as. As before, group as is a unary algebraic operator
parameterized by a name. The definition of the group as operator is even simpler that the definition of the as operator.
We assume the syntax:
query ::= query
group as name
As usual in SBQL, such a query has
the semantics independent from other operators. Semantics is the following:
result( q group
as n ) = n(result( q))
The query q group as n returns a single binder
named n, which wraps the entire
result of the query q. For instance,
if result( Emp ) = bag{i1, i5, i9}
then result( Emp group as e ) = n( bag{i1, i5,
i9} ). The operator is necessary to define complex values that
include named collections treated as single elements. It has also some meaning
in conceptual modeling of queries, as it allows one to subdivide a complex
query into several simpler queries (in a way that is different from the define clause of OQL). There is come
conceptual intersection with the group by
operator of SQL and OQL. In general, however, pragmatic contexts (practical
applications) of using the group by
operator and our group as operator
are totally different. As we noted before, in SBQL we do not introduce the
group by operator because it is not necessary and causes far more problems than
it is worth. An example of the use:
(Emp
where sal > 4000) group as FatGuys .
( (FatGuys where job = “programmer”) group
as FatProgs , (FatGuys where job =
“manager”) group as FatMgrs ).
bag(FatProgs.(
“Programmer ” + name +
“ is well paid”), FatMgrs.(
“Manager ” + name +
“ has good salary”) )
Last modified: December 31, 2007