SBA Architecture for Query Processing
by Kazimierz Subieta (February 2006)
Back to Description of SBA and SBQL.
There are several views on architecture of query processing. Fig.24 presents basic dependencies between three fundamental data structures involved in SBA: an object store, an environment stack and a query results stack. An object store contains volatile (non-shared) objects and persistent (shared) objects. Both stacks contain references to objects. Non-algebraic operators act on the query result stack and the object store and affect the environment stack. Query evaluation takes the state of the environment stack and the state of the objects store and puts query results on the query result stack.
Fig.24. Dependencies between object store, environment stack and query result stack
In Fig.25 we present a more detailed view on the architecture, which involves more data structures (figures with dashed lines) and program modules (grey boxes). This architecture is implemented in our newest project ODRA. The architecture takes into account the subdivision of the storage and processing between client and server, strong typing and query optimization (by rewriting and by indices).
Fig.25. Architecture of query processing with strong type checking and query optimization
Below we present a short description of the architecture elements presented in Fig.25. On the side of the client application we have the following elements.
· A source code of a query is created within the software development environment, which includes an editor, a debugger, storage of source programs, storage of compiled programs, etc.
· A parser of queries and programs takes a query source as input, makes syntactic analysis and returns a query/program syntactic tree.
· A query/program syntactic tree is a data structure which keeps the abstract query syntax in a well-structured form, allowing for easy manipulation (e.g. inserting new nodes or subtrees, moving some subtree to another part of the tree, removing some subtrees, etc.). Each node of the tree contains a free space for writing various query optimization information.
· The strong type checker takes a query/program syntactic tree and checks if it conforms to the declared types. Types are recorded within a client local metabase and within the metabase of persistent objects that is kept on the server. The metabases contain information from declarations of volatile object types (that are a part of source programs) and from a database schema. The module that organizes the metabases is not shown. The strong type checker uses two stacks, static ENVS and static QRES, which simulate actual execution of a query during compile time. Static stacks contain signatures of environments and signatures of results (will be explained much later). The static type checker has several other functions. In particular, it changes the query syntactic tree by introducing new nodes that allow, in particular, for automatic dereferences, automatic coercions, for resolving ellipses and for dynamic type checks (if static checks are impossible). The checker introduces also additional information to the nodes of the query syntactic tree that is necessary further for query optimization. The most important information is the level of the ENVS stack on which a particular name will be bound.
· Static ENVS (S_ENVS) - static environment stack (will be explained much later).
· Static QRES (S_QRES) - static result stack (will be explained much later).
· Local metabase - a data structure containing information of types introduced in source programs.
· Optimization by rewriting - this is a program module that changes the syntactic tree that is already decorated by the strong type checker. There are several rewriting methods that are developed for SBA, in particular:
§ Performing calculations on literals.
§ Changing the order of execution of algebraic operators.
§ Application of the query modification technique, which changes invocations of views into view bodies. To this end, the optimization module refers to the register of views that is kept on the server.
§ Removing dead subqueries, i.e. subqueries that do not influence the final query result.
§ Factoring out independent subqueries, that is, subqueries whose result is not changed within some loop.
§ Shifting conditions as close as possible to the proper operator, e.g. shifting selection condition before a join.
§ Methods based on distributivity property of some query operators.
§ Perhaps other rewriting methods that are currently under investigation.
On the side of the database server we have the following architectural elements:
In this architecture we assume that the query/program interpreter (which will be further formalized as the procedure eval) acts directly on a syntactic tree. Syntactic trees are the most convenient way to represent and to process query evaluation plans, perhaps, involving some optimizations. Cost-based query optimizers can generate several such query evaluation plans to take out one of them that is the most promising in terms of the anticipated query evaluation cost. According to our implementation experience, the interpreter acting directly on a query syntactic tree (transformed by strong typing and optimization modules) is the most convenient, flexible and easy to implement solution. However, we can also assume that this tree will be further compiled to some bytecode and the interpreter will act on this bytecode. This solution was implemented in Loqis. It is also possible that the tree will be further converted to the machine code. This solution was implemented in Linda, however, actually we see no essential advantages of it and a lot of disadvantages.
This view on the query processing architecture, although quite detailed, still can be augmented by new architectural elements, e.g. by a cost-based query optimizer, by introducing user sub-schemas, and many others. They will be refined in further parts of the SBA description.
Last modified: July 14, 2006