SBA:
Environment Stack in the AS0 Store Model
by Kazimierz Subieta (January 2006)
Back to Description of SBA and SBQL.
Back to Environment Stack in
Programming Languages
In our presentation we focus on the conceptual
view of the environment stack rather than its physical implementation. The
construction of the stack will reflect basic concepts of the query
languages’ semantics that we have introduced in the AS0 store model. In
the following, operations on the stack will be explained for AS1, AS2 and AS3
models, but essentially the construction of the stack for these models remains
the same. In contrast to typical programming languages the stack will be
prepared to treat uniformly individual data and collections. The stack is a
client-side main-memory structure, hence its size is somehow limited. However,
this is not a significant limitation, because nowadays main memories are very
large and can be extended by virtual memories. For instance, in Loqis all main
memory structures, including stacks and indices, can be partly stored on a
disc, depending on some policy involving the statistical frequency of the use.
Moreover, there are a lot of possible optimizations aiming reduction of the
stack size. Thus the danger that the main-memory stack implies some limits in
scalability of applications in our opinion does not occur. Actually, we see no
reason why e.g. SQL is scalable and query languages based on SBA are not
scalable (as claimed by some authors); such opinions are based on immature
superficial reasoning rather than on deep analysis of technical properties.
Because the stack is a client-side structure, its idea is orthogonal to
geographical distribution of database resources. Hence the argument of some
authors that the stack implies no possibility of processing distributed
databases has no foundation at all (what we have also confirmed by
implementation of distributed tools and applications).
The environment stack in SBA is abbreviated
ENVS. It consists of sections, which are ordered. The newest section will be named the top, the oldest one will be named the bottom. As in case of programming languages, each section presents
information on some run-time environment, for instance, a local environment of
a procedure, a local environment of an object, of a class, of a database, etc.
The size of a section is conceptually unlimited.
The bottom of ENVS will contain global
sections, in particular, a database section, a section related to the current
user session, libraries that are available in the computer environment and some
computer environment variables, such as date, time, user login information,
etc. We assume that each name that may occur in source query must possess its
counterpart on ENVS, hence the stack will be the only name binding mechanism;
no name occurring in a query can be bound otherwise. For simplicity, in
examples we sometimes consider only the database section, neglecting other
sections. In real implementations these sections must be of course taken into
account.
ENVS will be prepared to store uniformly
information on persistent and volatile objects. Persistent objects are usually
shared among many user sessions. Volatile objects are properties of a single
user session and are not available for other sessions. Local objects of
procedures are also considered volatile.
ENVS will also be prepared to store all
information that stems from definitions that are local for queries. Almost all
query languages introduce such definitions; for instance, “correlation
variables” in SQL, names defined by as
operator in OQL and for operator in
XQuery, names bound by quantifiers, cursors in for each statements, etc. All such names occurring in queries will
be bound according to the same standard routine.
In programming languages the run-time
environment stack usually plays also the role of the store of values of all volatile entities (e.g. local variables of
procedures). In contrast, we assume that the stack and the object store are
separate notions. We have several reasons to separate the notions. The main
reason is conceptual - the separation is quite convenient for the definition of
the semantics of a query language. The second reason is that we consider
persistent objects which live on a database server machine, while the stack
lives on a client machine. Hence the stack may contain references to persistent objects, but not the objects themselves.
The third reason is that the same object can be referred from several stack
sections, thus it is impossible to include it to a single section.In
implementation it is possible to make some optimizations which assume that some
volatile objects live on the stack. Such optimization was implemented in Loqis.
However, in this presentation we do not deal with implementation issues.
A basic structure that is stored at ENVS is
called binder. The concept is
specific only for SBA, it does not occur in other approaches to query
languages. Binder is also a key concept for query languages defined through
SBA. It is incredible how many semantic issues can be explained through such a
simple notion. It is also incredible that it was not introduced in any previous
theory devoted to query languages. Binder supports universal solutions to all
semantic issues related to naming, scoping, binding, parameter passing, and
definition of many query operators. Lack of this concept in former theories,
(such as algebras and calculi) is caused by the fact that naming issues have
the second-class citizenship in these theories: names are properties of
informal meta-languages rather than the defined languages. In contrast, naming
issues are fundamental in all known query languages, including SQL, OQL and
XQuery, hence lack of formal treatment of names inevitably results in
imprecise, underspecified semantics.
Binder is a pair n(x),
where n Î N is any external name which can be written in a source query or
program, and x is any result of a
query, in particular, a reference to an object or a value.
Results of queries (i.e. the domain Result) will be
defined a bit later. For better legibility the binder is written n(x)
rather than <n, x>. The idea of
a binder is very simple. Its role is to binding names, hence it joins an
external name n with a run-time
entity x that is denoted by this
name. Binding name n means that we
are looking at ENVS for a binder (binders) n(x). The result of the binding is x. If the stack contains no such a
binder, the situation can be qualified as a binding error. Alternatively, for
semi-structured data the situation can be qualified as correct, but the result
of the binding is the empty bag. However, we reject such an alternative,
because it makes impossible to distinguish absent data (normal in
semi-structured data) from spelling mistakes in names. We return to this issue much
later, when we will consider processing semi-structured data in SBA.
In our semantics binders will be present not
only on the environment stack. Some query operators will return binders as
query results. However, eventually the mission of every binder is to appear
sooner on later on the environment stack.
Definition
of Environment Stack
In SBA the environment
stack ENVS is a sequence of sections, where each section is a set of binders.
The sections correspond to particular run-time environments.
Fig.15 presents a sample state of the object
store (see Fig.2 and Fig.3 of the AS0 description) and
Fig.16 presents a sample state of the environment stack that address this
store. In this example binders contain object identifiers (references) and
values. Single references and single values are particular cases of query
results, but in general, because query results can be nested and arbitrarily
complex, binders can be nested and complex too.

Fig.15. A sample state of an object store

Fig.16. A sample state of an environment stack
ENVS
Note that the stack mixes and unifies volatile
and persistent data. This is due to the orthogonal persistence principle which
requires no difference between querying and operations on persistent and
volatile data.
We remind that in SBA the concept of State consists of
the state of an object store, as shown in Fig.15, and the state of the
environment stack, as shown in Fig.16. Lack of formal and precise definition of
the concept of state (the state of an
object store and the state of and environment stack) is one of the reasons that
the semantics of query language standards, such SQL-99, OQL and XQuery, is
for sure imprecise and leading to a lot of incompatible decisions in
different implementations.
We remind that an environment stack is a
volatile data structure that is maintained at the side of a client (a user
session). In case of concurrent access of many clients, each of them maintains
an own stack. If at some time there is no client session, then there is no
environment stack. Moreover, if a client session has some parallel processes,
threads or transaction, then it is possible that there are as many stacks as
processes/threads/transaction within the session. When a session
(process/thread/transaction) is started a stack dedicated to it is created and
fulfilled in advance by global sections, as shown in Fig.16. Then the stack is
maintained by the query execution engine, as will be determined in the
following parts of this SBA description.
Relational systems based on SQL assume that all
the query processing is performed on the side of a database server rather than
on a client. This architecture has some advantages, because source SQL queries
(essentially, strings) and query outputs are the only units that circulate
among clients and the server. The server centralizes all the mechanisms of
query optimization and processing. The
architecture has also severe disadvantage if there are many (hundreds or
thousands) of clients, because query processing could become a bottleneck compromising
the client-side performance. Thus we will strive to develop an architecture in
which clients are responsible for query optimization and processing as much as
possible, reducing the server workload. However, this does not mean that the
environment stack concept enforces such architecture. If all the query
processing has to be done on the server, then each transaction on the side of
the server should maintain an own environment stack, as described above.
ENVS
and Name Binding
As we have presented before, semantics of a query
is a function which maps a state into a query result. Because in SBA each
single name is a legal query, the question is how the mapping looks for it.
This mapping we call binding. Binding
is performed on ENVS according to the very simple “search from the
top” rule. As we have argued in the part devoted to the environment stack
in programming languages, some sections must be skipped because of scoping
rules; so far we abstract from this feature. Fig.17 presents the order of
searching within ENVS. Binding name n
requires searching for the binder n(x) which is present on the stack,
according to the rule. The first success stops the search. The result of the
binding is x. To take into account
collections we assume multi-valued
binding. If the binding of a name n
succeeds in some section, and it contains more binders named n: n(x1),
n(x2),
..., n(xk), then all such binders are taken into account and
the result of the binding is the bag{x1,
x2, ..., xk}.

Fig.17. The order of searching ENVS during the
binding a name
The function that does the search we call bind. It has a name as an argument and
returns a value being the result of the binding. The result will be stored at
another query result stack that will
be explained later. The function is quite easy to explain, for instance
(Fig.17):
bind( X ) = i127
bind( I ) = “Maria”
bind( sal ) = i11
bind( Dept ) = bag{ i17, i22, ...}
bind( Emp
) = i1
Note that in the last case there is a section
that contains more binders named Emp;
however, because the top section contains a single binder named Emp, the search is terminated on this
binder. This feature is known as overriding;
it has significance in query languages that we intend to cover and more
generally, in object-oriented query/programming languages. The overriding
accomplishes the principle of a local context (lexical scoping), which provides
that a particular name is bound in the environment that is currently most close
according to the programmer thinking. Note also that the binder Emp(i1)
occurs two times on the stack. Such a case is normal in query languages that we
want to cover.
Now we are prepared to define the domain Result of
queries, as explained previously. As we noted before, in SBA we assume that
queries never return objects but references to objects, sometimes within more
complex structures. Objects live in the object store; no entity called object
occurs elsewhere. Queries can also return values stored in objects and values
calculated by some functions or algorithms. As in other approaches, we
introduce structures, bags and sequences as results of queries. An essential
novelty of SBA in comparison to other theories and proposals of query languages
is that queries can return named structures that we already know as binders. Binders, as a structures
returned by queries allow us to deal with naming issues that occur in the
semantics of query languages.
Generalization of the above assumptions leads
us to the following recursive definition (c.f. the AS0 store model):
·
Any
atomic value belonging to V, or a
value calculated by query operators, belongs to Result.
·
Each
reference to an object (object identifier) belonging to I belongs to Result. In
particular, the domain Result includes
also references to procedures, functions, views, methods and other behavioral
entities if they will be introduced in a particular model of the object store.
·
If
x Î Result
and n is an external name, n Î N,
then the binder n(x) belongs to Result. Such a result we will also call named value.
·
If
x1, x2, x3, ... belong to Result, then struct{ x1, x2, x3, ... } belongs to Result. struct is
a structure constructor, which can be implemented as a flag decorating a
result. The order of elements in a structure can be significant. In contrast to
typical structures known from e.g. C/C++, we do not assume that all elements of
structures must be named (i.e. in our terminology, elements need not to be
binders). We assume full freedom in this respect, which is constrained only by
assumed query operators and typing (which will be introduced later). In
particular, we assume that some or all elements of structures can be unnamed,
and we do not exclude the situation when two or more elements of a structure
have a same name. A structure will be considered as a single (but composite)
element, i.e. a structure is not a collection. As usual, types of x1, x2, x3, ... can be different. Implicitly, we assume that
if a structure has a single element, struct{ x1 }, then it is equivalent
to this element x1.
Structures having no element are not allowed. Our structure generalizes the
concept of tuple, as known from relational systems.
·
If
x1, x2, x3, ... belong to Result, then bag{ x1, x2, x3, ... } belongs to Result and sequence{ x1, x2, x3, ... } belongs to Result. bag and sequence are collection
constructors, which can be implemented similarly as struct, by a flag decorating a query result.
·
The
set Result has no other elements.
Note that struct,
bag and sequence
are constructors of values rather
than constructors of types. We underline names of these constructors to avoid
confusing them with keywords struct, bag
and sequence that will be introduced
in SBQL. The definitions of struct,
bag and sequence
constructors are unified, what supports uniformity and compact definition of
semantics. Note however that:
struct{ x1, x2, x3, ... } is considered an individual
element (it is not a collection), while bag{ x1, x2, x3, ... } and sequence{ x1,
x2, x3, ... } are
collections.
Although in the AS0 store model we do not
introduce sequences, some operators can make sequences from bags (e.g. the order by operator); hence query results possess
such an option. We do not introduce other collections, for instance sets, because we consider them
particular cases of the two introduced collections. If one will consider the
necessity of the set collection concept in some query language, it can be
introduced on the level of types (rather then on the level of query results),
providing automatic coercions between types which enforce checking for no
duplicates and/or removing duplicates. In this model we do not assume that
elements of collections must be of the same type. So far we do not determine
what actually is a type, thus such assertions might express some informal
intention rather than a formal constraint. As we will see, the concept of type
is more complex than one can expect, thus we return to this issue much later.
Names used within named results (binders) are
not constrained to names from the object store - they can determined ad hoc by
the programmer through operators as
and group as (similarly to auxiliary
names in OQL). Moreover, if a query result is a binder n(i), where i is a reference to an object <i, m,
...>, then it is possible that n ≠
m, i.e. name of the binder is
different from the name of an object. This property is obviously necessary for
query languages; however it becomes difficult to introduce assuming classical
typing systems. Hence the typing system for SBA must be significantly altered
in comparison to classical ones; we discuss and introduce all such typing
issues later.
Below we present several examples of query
results:
|
Atomic: |
25 “Maria” true |
|
Identifiers: |
i1 i127 |
|
Binders: |
Dept(i22) d(i22) sal(i11) counter( 5 ) |
|
Complex: |
struct{
i1, i17} sequence{
i1, i5, 19 } bag{ struct{ i1, i17}, struct{ i5, i22}, struct{ i9,
i22} } bag{ struct{
n(„Doe”), sal(2500), d(i22)
} } bag{ struct{
Dept(i17), Emp( bag{ struct{ n(“Doe”), s(2500) } } ) }, struct{
Dept(i22), Emp( bag{ struct{ n(“Poe”), s(2000) },
struct{
n(“Lee”), s(900) } } ) }
} |
As seen from the above examples, we can create
results which much remain objects. However our results never have their own
identifier (identity) and they cannot be directly members of classes. However,
they may contain identifiers of objects, which in turn can be members of
classes. This will be explained in detail during discussion concerning the
AS1-AS3 store models
To make some association with relational query
languages some query results can be represented as tables. This representation
may simplify some examples by more legible visual representation. In Fig.18 we
show how some query results can be depicted in this way. Such a representation
may also have some meaning in implementation, as an optimization method.
However, it has no meaning for formal semantics of query languages. Note that
our query results cover the nested relational model (see last examples) and
much more.

Fig.18. Query results depicted as tables
The presented above definition of query results
is not dedicated to any specific query language. It is a sufficiently abstract
and universal, thus can be adopted by any of them, including SQL, OQL and
XQuery (providing their semantics will be expressed in terms of SBA). This
definition will be the basis for the Stack-Based Query Language (SBQL) - a
query language based on abstract syntax and the SBA method of specification of
semantics.
Opening
a New Section of ENVS
In the following we assume that the terms environment, name scope and stack section
mean essentially the same. In classical programming languages opening a new
scope on the environment stack is caused by entering a new procedure (function,
method) or entering a new block. Respectively, removing the scope is performed
when the control leaves the body of the procedure or the body of the block.
To these classical situations of opening a new
scope on the environment stack we add a new one. It is the essence and motive
of SBA. The idea is thet some query operators (called non-algebraic) behave on
the stack similarly to program blocks. For instance, in the SBQL query:
Emp where
( name = “Poe” and sal
> 1000 )
the part ( name
= “Poe” and sal > 1000 ) behaves as a program
block executed in an environment consisting of the interior of an Emp object. Because we have assumed that
each name occurring in a query has to be bound on the stack, this concerns also
names name and sal. Hence, to make the binding possible we push on ENVS a section
with the interior of the currently processed Emp object. The situation is
illustrated in Fig.19 (c.f. Fig.3).
The operator where iterates over the result of the query Emp. In
each cycle it pushes onto the ENVS a section with the internal environment of a
next Emp object. After processing of the condition the section is
removed from the stack. Note that the interior of an Emp objects is represented on the stack as a set of binders leading
to properties of the object.

Fig.19. Pushing a new section on ENVS to
evaluate a condition
Although the idea seems to be very simple, we
will show that after generalization it is quite universal for many classical
and non-classical query operators that occur in SQL, OQL and other query
languages. The idea makes it possible to make precise, formal and general
specification of the semantics of these operators. Moreover, the idea makes it
possible to accommodate in query languages all the features and principles of
the object-orientedness, such as inheritance, polymorphism and encapsulation.
The idea greatly supports uniformity, orthogonality and non-redundancy of query
languages’ constructs. This will be shown on the SBQL case. We emphasize
once again that the situation presented on ENVS is an abstract conceptual view
that is necessary to define the semantics. In implementation the construction
of the stack and operations on the stack can be significantly optimized.
In the previous part we have introduced the
concept of interior of an object, which
is to be pushed on the top of ENVS. Now we formalize the concept. In the
formalization we strive to generalize it to cover all the cases that may
require pushing on the stack some new environment. The formalization is quite
simple and it is expressed through the function named nested. The function takes any query result (as defined previously)
as an argument and is implicitly parameterized by an object store. For the
argument it returns a set of binders. This set is assumed to be pushed at the
top of ENVS, but the function nested does not do that. Pushing the result of
the function nested on ENVS will be the subject of other mechanisms of the
query execution engine. The function nested
is defined as follows:

Fig.20. Illustration of the function nested
Examples of the results returned by the
function nested (c.f. Fig.3):
nested( struct{i9, 55, i16,
“Maria”} ) = { name( i10 ), sal( i11
), address( i12 ), worksIn( i16
), Dept( i22 ) }
nested( struct{n(“Doe”), sal(2500 ), d( i22)
} ) = { n(“Doe”), sal(2500
), d( i22) }
nested( struct{i1, i5, i9} ) = Ø
Last modified: December 31, 2007