©Copyright by Kazimierz Subieta.

SBA: Environment Stack in the AS0 Store Model

by Kazimierz Subieta (January 2006)

Back to Description of SBA and SBQL.

Back to Environment Stack in Programming Languages


In our presentation we focus on the conceptual view of the environment stack rather than its physical implementation. The construction of the stack will reflect basic concepts of the query languages’ semantics that we have introduced in the AS0 store model. In the following, operations on the stack will be explained for AS1, AS2 and AS3 models, but essentially the construction of the stack for these models remains the same. In contrast to typical programming languages the stack will be prepared to treat uniformly individual data and collections. The stack is a client-side main-memory structure, hence its size is somehow limited. However, this is not a significant limitation, because nowadays main memories are very large and can be extended by virtual memories. For instance, in Loqis all main memory structures, including stacks and indices, can be partly stored on a disc, depending on some policy involving the statistical frequency of the use. Moreover, there are a lot of possible optimizations aiming reduction of the stack size. Thus the danger that the main-memory stack implies some limits in scalability of applications in our opinion does not occur. Actually, we see no reason why e.g. SQL is scalable and query languages based on SBA are not scalable (as claimed by some authors); such opinions are based on immature superficial reasoning rather than on deep analysis of technical properties. Because the stack is a client-side structure, its idea is orthogonal to geographical distribution of database resources. Hence the argument of some authors that the stack implies no possibility of processing distributed databases has no foundation at all (what we have also confirmed by implementation of distributed tools and applications).

The environment stack in SBA is abbreviated ENVS. It consists of sections, which are ordered. The newest section will be named the top, the oldest one will be named the bottom. As in case of programming languages, each section presents information on some run-time environment, for instance, a local environment of a procedure, a local environment of an object, of a class, of a database, etc. The size of a section is conceptually unlimited.

The bottom of ENVS will contain global sections, in particular, a database section, a section related to the current user session, libraries that are available in the computer environment and some computer environment variables, such as date, time, user login information, etc. We assume that each name that may occur in source query must possess its counterpart on ENVS, hence the stack will be the only name binding mechanism; no name occurring in a query can be bound otherwise. For simplicity, in examples we sometimes consider only the database section, neglecting other sections. In real implementations these sections must be of course taken into account.

ENVS will be prepared to store uniformly information on persistent and volatile objects. Persistent objects are usually shared among many user sessions. Volatile objects are properties of a single user session and are not available for other sessions. Local objects of procedures are also considered volatile.

ENVS will also be prepared to store all information that stems from definitions that are local for queries. Almost all query languages introduce such definitions; for instance, “correlation variables” in SQL, names defined by as operator in OQL and for operator in XQuery, names bound by quantifiers, cursors in for each statements, etc. All such names occurring in queries will be bound according to the same standard routine.

In programming languages the run-time environment stack usually plays also the role of the store of values of all volatile entities (e.g. local variables of procedures). In contrast, we assume that the stack and the object store are separate notions. We have several reasons to separate the notions. The main reason is conceptual - the separation is quite convenient for the definition of the semantics of a query language. The second reason is that we consider persistent objects which live on a database server machine, while the stack lives on a client machine. Hence the stack may contain references to persistent objects, but not the objects themselves. The third reason is that the same object can be referred from several stack sections, thus it is impossible to include it to a single section.In implementation it is possible to make some optimizations which assume that some volatile objects live on the stack. Such optimization was implemented in Loqis. However, in this presentation we do not deal with implementation issues.


The Concept of binder

A basic structure that is stored at ENVS is called binder. The concept is specific only for SBA, it does not occur in other approaches to query languages. Binder is also a key concept for query languages defined through SBA. It is incredible how many semantic issues can be explained through such a simple notion. It is also incredible that it was not introduced in any previous theory devoted to query languages. Binder supports universal solutions to all semantic issues related to naming, scoping, binding, parameter passing, and definition of many query operators. Lack of this concept in former theories, (such as algebras and calculi) is caused by the fact that naming issues have the second-class citizenship in these theories: names are properties of informal meta-languages rather than the defined languages. In contrast, naming issues are fundamental in all known query languages, including SQL, OQL and XQuery, hence lack of formal treatment of names inevitably results in imprecise, underspecified semantics.

Binder is a pair n(x), where n Î N is any external name which can be written in a source query or program, and x is any result of a query, in particular, a reference to an object or a value.

Results of queries (i.e. the domain Result) will be defined a bit later. For better legibility the binder is written n(x) rather than <n, x>. The idea of a binder is very simple. Its role is to binding names, hence it joins an external name n with a run-time entity x that is denoted by this name. Binding name n means that we are looking at ENVS for a binder (binders) n(x). The result of the binding is x. If the stack contains no such a binder, the situation can be qualified as a binding error. Alternatively, for semi-structured data the situation can be qualified as correct, but the result of the binding is the empty bag. However, we reject such an alternative, because it makes impossible to distinguish absent data (normal in semi-structured data) from spelling mistakes in names. We return to this issue much later, when we will consider processing semi-structured data in SBA.

In our semantics binders will be present not only on the environment stack. Some query operators will return binders as query results. However, eventually the mission of every binder is to appear sooner on later on the environment stack.


Definition of Environment Stack

In SBA the environment stack ENVS is a sequence of sections, where each section is a set of binders. The sections correspond to particular run-time environments.

Fig.15 presents a sample state of the object store (see Fig.2 and Fig.3 of the AS0 description) and Fig.16 presents a sample state of the environment stack that address this store. In this example binders contain object identifiers (references) and values. Single references and single values are particular cases of query results, but in general, because query results can be nested and arbitrarily complex, binders can be nested and complex too.

 

Fig15

Fig.15. A sample state of an object store

 

Fig16

Fig.16. A sample state of an environment stack ENVS

Note that the stack mixes and unifies volatile and persistent data. This is due to the orthogonal persistence principle which requires no difference between querying and operations on persistent and volatile data.

We remind that in SBA the concept of State consists of the state of an object store, as shown in Fig.15, and the state of the environment stack, as shown in Fig.16. Lack of formal and precise definition of the concept of state  (the state of an object store and the state of and environment stack) is one of the reasons that the semantics of query language standards, such SQL-99, OQL and XQuery, is for sure imprecise and leading to a lot of incompatible decisions in different implementations.

We remind that an environment stack is a volatile data structure that is maintained at the side of a client (a user session). In case of concurrent access of many clients, each of them maintains an own stack. If at some time there is no client session, then there is no environment stack. Moreover, if a client session has some parallel processes, threads or transaction, then it is possible that there are as many stacks as processes/threads/transaction within the session. When a session (process/thread/transaction) is started a stack dedicated to it is created and fulfilled in advance by global sections, as shown in Fig.16. Then the stack is maintained by the query execution engine, as will be determined in the following parts of this SBA description.

Relational systems based on SQL assume that all the query processing is performed on the side of a database server rather than on a client. This architecture has some advantages, because source SQL queries (essentially, strings) and query outputs are the only units that circulate among clients and the server. The server centralizes all the mechanisms of query optimization and processing.  The architecture has also severe disadvantage if there are many (hundreds or thousands) of clients, because query processing could become a bottleneck compromising the client-side performance. Thus we will strive to develop an architecture in which clients are responsible for query optimization and processing as much as possible, reducing the server workload. However, this does not mean that the environment stack concept enforces such architecture. If all the query processing has to be done on the server, then each transaction on the side of the server should maintain an own environment stack, as described above.


ENVS and Name Binding

As we have presented before, semantics of a query is a function which maps a state into a query result. Because in SBA each single name is a legal query, the question is how the mapping looks for it. This mapping we call binding. Binding is performed on ENVS according to the very simple “search from the top” rule. As we have argued in the part devoted to the environment stack in programming languages, some sections must be skipped because of scoping rules; so far we abstract from this feature. Fig.17 presents the order of searching within ENVS. Binding name n requires searching for the binder n(x) which is present on the stack, according to the rule. The first success stops the search. The result of the binding is x. To take into account collections we assume multi-valued binding. If the binding of a name n succeeds in some section, and it contains more binders named n: n(x1), n(x2), ..., n(xk), then all such binders are taken into account and the result of the binding is the bag{x1, x2, ..., xk}.

Fig17

Fig.17. The order of searching ENVS during the binding a name

The function that does the search we call bind. It has a name as an argument and returns a value being the result of the binding. The result will be stored at another query result stack that will be explained later. The function is quite easy to explain, for instance (Fig.17):

bind( X ) = i127

bind( I ) = “Maria”

bind( sal ) = i11

bind( Dept ) = bag{ i17, i22, ...}

bind( Emp ) = i1

Note that in the last case there is a section that contains more binders named Emp; however, because the top section contains a single binder named Emp, the search is terminated on this binder. This feature is known as overriding; it has significance in query languages that we intend to cover and more generally, in object-oriented query/programming languages. The overriding accomplishes the principle of a local context (lexical scoping), which provides that a particular name is bound in the environment that is currently most close according to the programmer thinking. Note also that the binder Emp(i1) occurs two times on the stack. Such a case is normal in query languages that we want to cover.


Results Returned by Queries

Now we are prepared to define the domain Result of queries, as explained previously. As we noted before, in SBA we assume that queries never return objects but references to objects, sometimes within more complex structures. Objects live in the object store; no entity called object occurs elsewhere. Queries can also return values stored in objects and values calculated by some functions or algorithms. As in other approaches, we introduce structures, bags and sequences as results of queries. An essential novelty of SBA in comparison to other theories and proposals of query languages is that queries can return named structures that we already know as binders. Binders, as a structures returned by queries allow us to deal with naming issues that occur in the semantics of query languages.

Generalization of the above assumptions leads us to the following recursive definition (c.f. the AS0 store model):

·         Any atomic value belonging to V, or a value calculated by query operators, belongs to Result.

·         Each reference to an object (object identifier) belonging to I belongs to Result. In particular, the domain Result includes also references to procedures, functions, views, methods and other behavioral entities if they will be introduced in a particular model of the object store.

·         If x Î Result and n is an external name, n Î N, then the binder n(x) belongs to Result. Such a result we will also call named value.

·         If x1, x2, x3, ... belong to Result, then struct{ x1, x2, x3, ... } belongs to Result. struct is a structure constructor, which can be implemented as a flag decorating a result. The order of elements in a structure can be significant. In contrast to typical structures known from e.g. C/C++, we do not assume that all elements of structures must be named (i.e. in our terminology, elements need not to be binders). We assume full freedom in this respect, which is constrained only by assumed query operators and typing (which will be introduced later). In particular, we assume that some or all elements of structures can be unnamed, and we do not exclude the situation when two or more elements of a structure have a same name. A structure will be considered as a single (but composite) element, i.e. a structure is not a collection. As usual, types of x1, x2, x3, ...  can be different. Implicitly, we assume that if a structure has a single element, struct{ x1 }, then it is equivalent to this element x1. Structures having no element are not allowed. Our structure generalizes the concept of tuple, as known from relational systems.

·         If x1, x2, x3, ... belong to Result, then bag{ x1, x2, x3, ... } belongs to Result and sequence{ x1, x2, x3, ... } belongs to Result. bag and sequence are collection constructors, which can be implemented similarly as struct, by a flag decorating a query result.

·         The set Result has no other elements.

Note that struct, bag and sequence are constructors of values rather than constructors of types. We underline names of these constructors to avoid confusing them with keywords struct, bag and sequence that will be introduced in SBQL. The definitions of struct, bag and sequence constructors are unified, what supports uniformity and compact definition of semantics. Note however that:

struct{ x1, x2, x3, ... } is considered an individual element (it is not a collection), while bag{ x1, x2, x3, ... } and sequence{ x1, x2, x3, ... } are collections.

Although in the AS0 store model we do not introduce sequences, some operators can make sequences from bags (e.g. the order by operator); hence query results possess such an option. We do not introduce other collections, for instance sets, because we consider them particular cases of the two introduced collections. If one will consider the necessity of the set collection concept in some query language, it can be introduced on the level of types (rather then on the level of query results), providing automatic coercions between types which enforce checking for no duplicates and/or removing duplicates. In this model we do not assume that elements of collections must be of the same type. So far we do not determine what actually is a type, thus such assertions might express some informal intention rather than a formal constraint. As we will see, the concept of type is more complex than one can expect, thus we return to this issue much later.

Names used within named results (binders) are not constrained to names from the object store - they can determined ad hoc by the programmer through operators as and group as (similarly to auxiliary names in OQL). Moreover, if a query result is a binder n(i), where i is a reference to an object <i, m, ...>, then it is possible that nm, i.e. name of the binder is different from the name of an object. This property is obviously necessary for query languages; however it becomes difficult to introduce assuming classical typing systems. Hence the typing system for SBA must be significantly altered in comparison to classical ones; we discuss and introduce all such typing issues later.

Below we present several examples of query results:

Atomic:

25

“Maria”

true

Identifiers:

i1

i127

Binders:

Dept(i22)

d(i22)

sal(i11)

counter( 5 )

Complex:

struct{ i1, i17}

sequence{ i1,  i5,  19  }

bag{ struct{ i1, i17}, struct{ i5, i22}, struct{ i9, i22} }

bag{ struct{ n(„Doe”), sal(2500), d(i22) } }

bag{ struct{ Dept(i17), Emp( bag{ struct{ n(“Doe”), s(2500) } } ) },

         struct{ Dept(i22), Emp( bag{ struct{ n(“Poe”),  s(2000) },

                                                         struct{ n(“Lee”),  s(900)   } } ) }  }

 

As seen from the above examples, we can create results which much remain objects. However our results never have their own identifier (identity) and they cannot be directly members of classes. However, they may contain identifiers of objects, which in turn can be members of classes. This will be explained in detail during discussion concerning the AS1-AS3 store models

To make some association with relational query languages some query results can be represented as tables. This representation may simplify some examples by more legible visual representation. In Fig.18 we show how some query results can be depicted in this way. Such a representation may also have some meaning in implementation, as an optimization method. However, it has no meaning for formal semantics of query languages. Note that our query results cover the nested relational model (see last examples) and much more.

 

Fig18

Fig.18. Query results depicted as tables

The presented above definition of query results is not dedicated to any specific query language. It is a sufficiently abstract and universal, thus can be adopted by any of them, including SQL, OQL and XQuery (providing their semantics will be expressed in terms of SBA). This definition will be the basis for the Stack-Based Query Language (SBQL) - a query language based on abstract syntax and the SBA method of specification of semantics.


Opening a New Section of ENVS

In the following we assume that the terms environment, name scope and stack section mean essentially the same. In classical programming languages opening a new scope on the environment stack is caused by entering a new procedure (function, method) or entering a new block. Respectively, removing the scope is performed when the control leaves the body of the procedure or the body of the block.

To these classical situations of opening a new scope on the environment stack we add a new one. It is the essence and motive of SBA. The idea is thet some query operators (called non-algebraic) behave on the stack similarly to program blocks. For instance, in the SBQL query:

Emp where ( name = “Poe” and sal > 1000 )

the part ( name = “Poe” and sal > 1000 ) behaves as a program block executed in an environment consisting of the interior of an Emp object. Because we have assumed that each name occurring in a query has to be bound on the stack, this concerns also names name and sal. Hence, to make the binding possible we push on ENVS a section with the interior of the currently processed Emp object. The situation is illustrated in Fig.19 (c.f. Fig.3). The operator where iterates over the result of the query Emp. In each cycle it pushes onto the ENVS a section with the internal environment of a next Emp object. After processing of the condition the section is removed from the stack. Note that the interior of an Emp objects is represented on the stack as a set of binders leading to properties of the object.

Fig19

Fig.19. Pushing a new section on ENVS to evaluate a condition

Although the idea seems to be very simple, we will show that after generalization it is quite universal for many classical and non-classical query operators that occur in SQL, OQL and other query languages. The idea makes it possible to make precise, formal and general specification of the semantics of these operators. Moreover, the idea makes it possible to accommodate in query languages all the features and principles of the object-orientedness, such as inheritance, polymorphism and encapsulation. The idea greatly supports uniformity, orthogonality and non-redundancy of query languages’ constructs. This will be shown on the SBQL case. We emphasize once again that the situation presented on ENVS is an abstract conceptual view that is necessary to define the semantics. In implementation the construction of the stack and operations on the stack can be significantly optimized.


Function nested

In the previous part we have introduced the concept of interior of an object, which is to be pushed on the top of ENVS. Now we formalize the concept. In the formalization we strive to generalize it to cover all the cases that may require pushing on the stack some new environment. The formalization is quite simple and it is expressed through the function named nested. The function takes any query result (as defined previously) as an argument and is implicitly parameterized by an object store. For the argument it returns a set of binders. This set is assumed to be pushed at the top of ENVS, but the function nested does not do that. Pushing the result of the function nested on ENVS will be the subject of other mechanisms of the query execution engine. The function nested is defined as follows:

 

Fig20

Fig.20. Illustration of the function nested

Examples of the results returned by the function nested (c.f. Fig.3):

nested( struct{i9, 55, i16, “Maria”} ) = { name( i10 ), sal( i11 ), address( i12 ), worksIn( i16 ), Dept( i22 ) }

nested( struct{n(“Doe”), sal(2500 ), d( i22) } ) = { n(“Doe”), sal(2500 ), d( i22) }

nested( bag{i1, i5, i9} ) = Ø


Last modified: October 28, 2010