SBQL for the AS0 Store Model -
Syntax
Back to Description of SBA and SBQL.
The Stack-Based Query Language (SBQL) is a
formalized object-oriented query language in the SQL or OQL style. It contains
semantic counterparts of all the constructs of these languages, but in
essentially different configuration and semantic mechanisms. SBQL may concern
many object store models, in particular AS0-AS3 models. Thus we claim that SBQL
is the most complete and universal query language among all that can be found
in the literature and practice.
The concrete syntax of SBQL is not fixed. In
SBQL we strive to reduce syntactic considerations only to some abstract syntax.
In concrete implementation any designer can invent an own concrete syntax that
he/she considers the most adequate for the given purpose. So far the syntactic
standard of SBQL does not exist. SBQL we consider as a theoretical frame or
pattern for object-oriented languages, similar to the relational algebra or
relational calculus. With no doubt, however, SBQL is much more powerful,
strong, consequent and consistent in comparison to any such theory, including
relational and object algebras. The definition of SBQL explicitly involves the
concept of state
that is absent in major database theories. Lack of the state concept is a
severe flaw of these theories, because it reduces the possibility to define
many useful operators and from the very beginning forces some false view on
query languages’ semantics. As we argued previously, the concept of the
state and the naming-scoping-binding issues must lead to the concept of
environment stack, which is absent in major database theories too.
Semantics of SBQL is expressed operationally by
a machine, which accomplishes abstract implementation of SBQL constructs.
Essentially, this decision is motivated by didactics, because operational
semantics for so complex artifacts as query languages is much easier to grasp
by the average reader. In our early work we tried to use denotational
semantics, which is apparently “more elegant” (according to some
theoreticians). We abandoned this specification style because it was illegible,
non-intuitive and caused additional barriers in the understanding of
implementation.
In SBQL we introduce a new kind of query
operators, which we call “non-algebraic” for some tradition in
mathematics which treats some operators (such as quantifiers) as non-algebraic
ones. Selection, projection/navigation, join, quantifiers, ordering and
transitive closures are all the non-algebraic operators that we introduce. Some
of them (selection, projection and join) were introduced in the relational
algebra as “algebraic” operators. However, their
”algebraization” was at the cost of a bit unfair formal trick,
where a part of their semantics (for instance, a selection predicate) was
shifted to the informal meta-language of the mathematics. We have definitely
rejected such a trick as frustrating, unacceptable and completely unnecessary.
Mathematical solutions concerning semantics of query languages are not limited
to such limited, simplistic and primitive theories as the relational algebra or
relational calculus. In particular, the denotational semantics is much more
adequate notion. However, as we have mentioned before, we do not follow
strictly the mathematical method because essentially it has no advantages and
has severe
disadvantages. Our formal approach relies in very precise definition of an
abstract machine that will perform operations implied by query operators. This
level of formality is adequate to reason about any properties of query
languages, including implementation of the language, possible methods of query
optimization, strong typing, querying distributed databases and other issues.
SBQL
Syntax
We assume that some elements of the set of
atomic values V introduced in the AS0
model have the external form, which allows representing them as sequences of
characters in a query source code. The representation is sufficient to
distinguish such an element by some lexer and then, to assign to it a proper
value of V (and an atomic type). Note
that many elements of V, such as
graphics and compiled procedures, have no such source code representation. Such
external representation elements we traditionally call literals[1].
The set of literals we denote L. We
assume that the lexer accomplishes the function:
lexval: L →
V
Usually there are few types of literals: integer
numbers, real numbers, strings, boolean values and dates.
We assume that identifiers from the set I introduced in the AS0 model have no
such external representations, hence they cannot be directly used in source
queries. The only way to deal with such identifiers is to use in queries
external names from the set N. They are to be bound according to the rules explained in the previous section. In this
way we are ready to define the simplest queries of SBQL:
·
Any
literal belonging to L is a query.
For instance, 2, 3.14,
“Doe”, true
are queries.
·
Any
name belonging to N is a query. For
instance, Emp, sal, worksIn, e, d are
queries.
One may be surprised that a query such as sal is the query of its own rights. We
understand that this can be unusual for the SQL lovers. However note that each
query is evaluated relatively to the
state, in particular, to the state of ENVS. If ENVS contains a binder sal( x
), then the query sal makes a sense
and its result can be formally determined as x.
Queries can be joined in more complex queries
by operators. SBQL is perhaps the only practical query language which assumes
that all operators are unary or binary. In this way the SBQL grammar is very
simple. Moreover, the grammar in the maximal extent supports orthogonality and
compositionality principles: queries can be freely connected by operators
(providing typing constraints are not violated). We subdivide binary operators
into algebraic and non-algebraic. In this grammar we
neglect parentheses assuming that they can be freely inserted if there are
doubts concerning their order or precedence during the parsing. Then, the
syntactic rules of SBQL are as follows:
·
If
∆ is a symbol denoting a unary algebraic
operator and q is a query, then ∆q is a
query. Examples of unary algebraic operators are: count, sum, avg, max, median,
-, log, sqrt, not, etc. Because SBQL is an abstract language, we do not fix which
unary operators belong to SBQL and which do not belong. We simply assume that
SBQL accepts any unary operator if some designer wants to introduce it.
·
If
∆ is a symbol denoting a binary algebraic
operator and q1, q2
are queries, then q1 ∆ q2
is a query. Examples of binary algebraic operators are: =, <, >,
+, -, *, /, and, or, intersection,
concatenation, etc. As previously, we
do not fix which binary algebraic operators belong to SBQL. Any of them will be
accepted by us if it is necessary in examples or in a particular
implementation. For binary operators we assume the traditional infix notation
(an operator will be written between its arguments).
·
If
θ is a symbol denoting a non-algebraic operator
and q1, q2 are
queries, then q1 θ q2
is a query. Non-algebraic operators are the following: where (selection), dot (projection,
navigation), join (dependent or
navigational join), ∀, ∃ (universal and existential quantifiers), order by (sorting) and operators of
transitive closures (will be introduced later). For quantifiers we also assume
the traditional prefix notation: ∀q1 q2 and ∃q1 q2, which is equivalent to the universal infix notation q1 ∀ q2 and q1 ∃ q2.
·
If
q is a query and n ∈ N then q as n is a query. The operator as is a unary algebraic operator
parameterized by a name. This operator will be used in most cases which require
defining an auxiliary name within a query. The operator as in other languages allows to define “correlation
variables” (SQL, OQL), variables bound by quantifiers, cursors in for statements, etc. In ODMG OQL there
are eight situations when one can define an auxiliary name. Our definition of
the operator as will cover all such
cases and more. As will be shown, the operator is orthogonal to any other
operators and its semantics is very simple.
·
If
q is a query and n ∈ N then q group as n is a query. This is
another naming operator similar to as,
but with different semantics and pragmatics.
·
If q1,
q2 are queries, then if q1
then q2 is a query.
If q1, q2, q3
are queries, then if q1
then q2 else q3
is a query. The
last query has three arguments, but it is a shorthand - it can be substituted
by the composition of queries having one-argument and two-argument
operators: if q1 then q2 else q3
is equivalent to bag((if q1 then q2), (if not
q1 then q3)).
·
If
q1, q2, q3, ... are
queries, then struct( q1, q2,
q3,
... ) is a query. The
operator struct has many arguments,
but as in the previous case, it can be understood as a superposition of several
binary operators; for instance, struct(
q1, q2, q3, q4 ) = struct(struct (struct ( q1, q2),
q3,), q4 ). In the SQL and OQL terminology the operator is
called cartesian product; we do not
use this term. The keyword struct
can be skipped by default, ( q1,
q2, q3, ... ) is equivalent to struct( q1, q2, q3, ... ).
·
If
q1, q2, q3, ... are queries, then bag(
q1, q2, q3,
... ) and sequence(
q1, q2, q3,
... ) are queries. As
previously, the operators have many arguments, but can be understood as a
superpositions of several binary operators; for instance, bag( q1, q2,
q3, q4 ) = bag(
bag( bag( q1, q2),
q3,), q4 ).
These syntactic rules will be further extended
by several more advanced constructs related to higher-level store models
(AS1-AS3) and introducing programming constructs and abstractions such as
procedures. Fig.21 contains the summary of the SBQL syntax for the AS0 model.
|
query
::= literal |
The set
L |
|
query
::= name |
The
set N |
|
query
::= unaryAlgOperator query |
Unary
algebraic operators |
|
unaryAlgOperator
::= count | sum | max | - | sqrt | not | ... |
|
|
query
::= query binaryAlgOperator query |
Binary
algebraic operators |
|
binaryAlgOperator
::= =|<| >| +| -| *| /| and| or| intersect|... |
|
|
query
::= query NonAlgOperator query |
Non-algebraic
operator |
|
NonAlgOperator
::= where | . | join | ∀ | ∃ |
|
|
query ::= ∀query query | ∃query query |
Alternative
(traditional) syntax for quantifiers |
|
query
::= query as name |
Name
definition |
|
query
::= query group as name |
Grouping
and name definition |
|
query
::= if query then query |
Conditional
query |
|
query
::= if query then query else query |
Another
conditional query |
|
querySeq
::= query | query, querySeq |
Sequence
of queries |
|
query
::= struct( querySeq ) |
(querySeq) |
Structure
constructor |
|
query
::= bag( ) | bag( querySeq ) |
Bag
constructor |
|
query ::= sequence( ) | sequence( querySeq ) |
Sequence
constructor |
Fig.21. The SBQL basic syntax
Examples of SBQL
queries (c.f. Fig.3):
|
1. |
2000 |
Literal |
|
2. |
Emp |
Name |
|
3. |
sal |
Name |
|
4. |
2+2 |
algebraic
operator |
|
5. |
sal > 2000 |
algebraic
operator |
|
6. |
Emp where
(sal > 2000) |
Non-algebraic
and algebraic operator |
|
7. |
Emp where
(sal > 2000) . (name, (worksIn.Dept)) |
As
before, plus projection and structure composition |
|
8. |
((Emp as e) where ((e.sal) > (2000 + x +
y))).e.name |
Auxiliary
name e |
|
9. |
((Emp as e) join ((e.worksIn.Dept) as d)) . (e.name,
d.dname) |
Dependent
join with auxiliary naming, path expression, projection and struct |
|
10. |
∀ (Emp where sal < 1000) ((worksIn.Dept.dname) = “Ads”) |
Universal
quantifier with no “variable” |
|
11. |
∃ (Emp as e ) count(e.address) = 0 |
Existential
quantifier with a “variable” e
and counting function |
|
12. |
bag( 1, 1, 2, 3, 5, 8, 3, 13) |
Bag
constructor |
Looking at these examples we can see that SBQL
is a very powerful query language. In fact, although its syntax is extremely
simple, it is much more powerful than OQL and XQuery. All the features and
examples of SBQL will be presented after explaining its semantics and
pragmatics.
Note that in comparison to SQL and OQL we do
not introduce operators for processing null values and a group by (having)
operators. In SBA we avoid null values at all. Following Date, we consider null
values as dangerous feature, leading to more problems that it is worth. Null
values are substituted by absent (optional) data. Concerning the group by operator, an attempt to define
consistent and universal semantics firmly convinced us that the operator is
problematic (leading to semantic reefs and difficulties with query optimization)
and unnecessary for object-oriented query languages - it can be
substituted by other operators. We will show this further on examples.
Last modified: December 31, 2007
[1] Note that the term literal is used in the ODMG standard in a different meaning, out of
the traditional terminology of programming languages. We do not follow the
ODMG terminology, retaining the traditional one.