SBA Architecture for Query
Processing
by Kazimierz Subieta (February 2006)
Back to Description
of SBA and SBQL.
There are several views on architecture of
query processing. Fig.24 presents basic dependencies between three fundamental
data structures involved in SBA: an object store, an environment stack and a
query results stack. An object store contains volatile (non-shared) objects and
persistent (shared) objects. Both stacks contain references to objects. Non-algebraic operators act on
the query result stack and the object store and affect the environment stack.
Query evaluation takes the state of the environment stack and the state of the
objects store and puts query results on the query result stack.

Fig.24. Dependencies between object store,
environment stack and query result stack
In Fig.25 we present a more detailed view on
the architecture, which involves more data structures (figures with dashed
lines) and program modules (grey boxes). This architecture is implemented in
our newest project ODRA. The architecture takes into account the subdivision of
the storage and processing between client and server, strong typing and query
optimization (by rewriting and by indices).

Fig.25. Architecture of query processing with
strong type checking and query optimization
Below we present a short description of the
architecture elements presented in Fig.25. On the side of the client
application we have the following elements.
·
A
source code of a query is created within the software development environment, which includes an editor, a debugger, storage of source
programs, storage of compiled programs, etc.
·
A parser of queries and
programs takes a query source as input, makes syntactic analysis and returns a
query/program syntactic tree.
·
A query/program syntactic tree
is a data structure which keeps the abstract query syntax in a well-structured
form, allowing for easy manipulation (e.g. inserting new nodes or subtrees,
moving some subtree to another part of the tree, removing some subtrees, etc.).
Each node of the tree contains a free space for writing various query
optimization information.
·
The strong type checker takes
a query/program syntactic tree and checks if it conforms to the declared types.
Types are recorded within a client local metabase and within the metabase of
persistent objects that is kept on the server. The metabases contain
information from declarations of volatile object types (that are a part of
source programs) and from a database schema. The module that organizes the
metabases is not shown. The strong type checker uses two stacks, static ENVS
and static QRES, which simulate actual execution of a query during compile
time. Static stacks contain signatures of environments and signatures of
results (will be explained much later). The static type checker has several
other functions. In particular, it changes the query syntactic tree by
introducing new nodes that allow, in particular, for automatic dereferences,
automatic coercions, for resolving ellipses and for dynamic type checks (if
static checks are impossible). The checker introduces also additional
information to the nodes of the query syntactic tree that is necessary further
for query optimization. The most important information is the level of the ENVS
stack on which a particular name will be bound.
·
Static ENVS (S_ENVS) - static environment
stack (will be explained much later).
·
Static QRES (S_QRES) - static
result stack (will be explained much later).
·
Local metabase - a data structure
containing information of types introduced in source programs.
·
Optimization by rewriting - this is a program module that
changes the syntactic tree that is already decorated by the strong type checker. There are several rewriting methods
that are developed for SBA, in particular:
§
Performing calculations on literals.
§
Changing the order of execution of algebraic operators.
§
Application of the query modification technique, which changes
invocations of views into view bodies. To this end, the optimization module
refers to the register of views that
is kept on the server.
§
Removing dead subqueries, i.e. subqueries that do not influence the
final query result.
§
Factoring out independent subqueries, that is, subqueries whose result
is not changed within some loop.
§
Shifting conditions as close as possible to the proper operator, e.g.
shifting selection condition before a join.
§
Methods based on distributivity property of some query operators.
§
Perhaps other rewriting methods that are currently under investigation.
On the side of the database server
we have the following architectural elements:
In this
architecture we assume that the query/program interpreter (which will be
further formalized as the procedure
eval) acts directly on a syntactic tree. Syntactic trees are the most convenient way
to represent and to process query
evaluation plans, perhaps, involving some optimizations. Cost-based query
optimizers can generate several such query evaluation plans to take out one of
them that is the most promising in terms of the anticipated query evaluation
cost. According to our implementation experience, the interpreter acting
directly on a query syntactic tree (transformed by strong typing and
optimization modules) is the most convenient, flexible and easy to implement
solution. However, we can also assume that this tree will be further compiled
to some bytecode and the interpreter will act on this bytecode. This solution
was implemented in Loqis. It is also possible that the tree will be further
converted to the machine code. This solution was implemented in Linda, however,
actually we see no essential advantages of it and a lot of disadvantages.
This view on the query processing architecture,
although quite detailed, still can be augmented by new architectural elements,
e.g. by a cost-based query optimizer, by introducing user sub-schemas, and many
others. They will be refined in further parts of the SBA description.
Last modified: July 14, 2006