Environment Stack, Name Scoping and Binding
by Kazimierz Subieta (January 2006)
Back to Description of SBA and SBQL.
The concept of environment stack (a.k.a environmental stack or call stack) appeared in 50-ties or early 60-ties of the previous century, together with compilers of oldest high-level programming languages such as Algol-60. From that time the stack is a basic concept of semantics and implementation of majority of well-known languages, including Pascal, C, C++, Smalltalk, Java, C#, etc. The idea of the stack is well-known for the developers of compilers. It is simple and obvious, but not always well explained in manuals and textbooks. According to our experience, the database community folks rarely realize what the stack is for, what problems it solves and which properties it has. It is quite common confusing it with another stack known as arithmetic stack or result stack which is necessary for quite different reasons. Lack of proper knowledge, simplistic reasoning, neglecting hard semantic problems and hurrah-theoretical attitude are probably the reasons that developers of popular theories of query languages (such as relational algebra, relational calculus and formal logic) totally ignore this concept, or consider it an implementation issue having no relevance to high-level semantics. Unfortunately for the world (and fortunately for us) such theories are dramatically limited w.r.t. to a lot of semantic phenomena that occur in query languages such as SQL. Moreover, richer database models, such us object-oriented and XML-oriented models, show total inadequacy of these theories. Even simplest queries, such as 2+2 or select Salary - Tax from Employees are non-expressible in these theories. However real problems with these theories appear when one would like to describe unified semantics for languages that integrate querying and programming. Nowadays there are many such languages, including recent SQL standards, Oracle PL/SQL, MS Transact SQL, some 4GLs, and others. Therefore we abandon these theories as totally worthless for our purposes. Instead of them we introduce concepts with no such limitations. We will show that all advantages of these theories over our concepts, including their potential for query optimization, distributed query processing, etc., are overrated and inessential. We do that in a way that is better, more general and corresponding to real practical query languages.
In this way we are coming back to the idea of an environment stack. We will consider its roles in programming languages to conclude that the same reasons for introducing it occur in query languages. To this end, however, we will need to modify the stack structure to accomplish new features stemming from the properties of query languages.
Environment Stack in Programming Languages
The concept of environment of program execution denotes all the run-time entities (variables, constants, objects, functions, procedures, types, classes, etc.) that are available at the given point of the program control. The environment is not a flat structure and is changing during the run of the program. The most convenient (and clear for the programmers) assumption is that that the environment is subdivided into sub-environments that appear and disappear during the program run. From the ergonomic point of view there is a need to follow some principles that govern the appearance and disappearance of the sub-environments, as follows:
These principles lead to the concept of environment stack, which is just responsible for appearance and disappearance of sub-environments related to particular procedures. The stack fulfills the following roles (in our terminology functions and methods are procedures too):
A stack is a main memory (or virtual memory) data structure and is assigned to a single client application program or to a single process or thread. It is subdivided into sections, where each section (also called activation record) corresponds to a particular sub-environment. A section is associated with a particular procedure call or an executed program block. When the control is shifted to a procedure call, a new section with all entities local to this call is pushed on the top of the stack. The same is done when the control is shifted to a module (except that a return track is unnecessary). The section is popped from the sack when the procedure or program block is terminated. For the procedure that is currently running all values of parameters, local variables/objects and any other local entities are stored within the top stack section.
Note that the environment stack has more functionality than a typical stack understood as an abstract data type with four operators: push (push a new element on the top), pop (remove an element from the top), top (read top element) and isEmpty? (checking if the stack is empty). Additional functionalities concern name binding (which implies the search on the entire stack), scoping rule (skipping visiting some sections) and (in some cases) inserting new sections in the middle of the stack. All these additional functionalities stem from the definition of semantics and will be explained later.
For instance, binding of a variable X is a substitution of this name by a main-memory address, where the value of X will be recorded. All occurrences of X in a source code are substituted by this address.
Providing an environment stack stores environments together with all source names of run-time entities, a binding action for some name X is performed according to the following steps:
In fact, this search is usually done during compile time on the static environment stack rather than on run-time stack; we will return to this issue a bit later. In Fig.14 we show the case when name X occurs within a block b that is a part of procedure p2 that is called from procedure p1. The control is within the block b. Arrows show the order of searching the stack. The search is terminated when X is found.
Fig.14. Example situation on the environment stack
In Fig.14 we also show the scoping rule: variables and actual parameters of the procedure p1, and perhaps former sections of procedures, are not seen. The search is going directly to global variables. Procedures p1 and p2 can be written at different time, by different programmers who do not and need not to communicate and make some agreement. The programmer writing procedure p1 can use the name X and the programmer writing procedure p2 can use X, but these two X denote different things. Due to scoping rules X occurring within p1 is not seen when p2 is processed, thus there is no conflict.
If a programming language introduces modules, classes, inheritance and perhaps other notions the situation on the environment stack and scoping rules can be much more complex. We will discuss this issue after we introduce AS0-AS3 store models.
The stack mechanism makes it possible to accomplish the following features of programming languages:
As we will show, the above features have fundamental meaning for query languages, allowing one to accomplish their properties in a simple, consistent and universal way. This concerns, in particular, the possibility to define auxiliary names (“correlation variables”) that are defined within queries, the possibility of nesting queries, scope rules for all names occurring in queries, combining in queries names of database properties with names of applications’ properties, and so on. The stack mechanism is absolutely inevitable if one seriously thinks on integration of queries with programming constructs and abstractions. Lack of the environment stack concept in artifacts such as various algebras (object algebras, in particular), F-logic, Datalog, monoid calculus, OQL, XQuery, etc. in advance sentences them to limitations and handicaps.
Static and Dynamic Environment Stack
In languages with early (compile-time) binding names occurring in a program have the second-class citizenship, i.e. they are lost after compilation. The names do not exist during run-time. For such languages the environment stack must exists in two forms: a static stack that is managed during compilation, and a dynamic stack managed during run-time. The binding of source program names is performed on the static stack. During compilation this stack simulates the behavior of the dynamic stack, i.e. it grows and shrinks according to the moving of the syntactic analysis control point along the source program being compiled. In such cases binding is subdivided into two phases. In the static phase binding of variables is expressed relatively to the size of stack, i.e. for each variable name the static binding returns so-called offset, e.g. the distance in bytes between the top of the dynamic stack and the address of the run-time entity corresponding to this name. Eventually the binding is performed during run-time, when the address of the stack top is known. For instance, the address of the variable name is calculated by subtracting the offset from the address of the stack top.
Due to some irregularities of data structures practically all languages work with more advanced dynamic binding, when variable names (in some representation) still exist on the dynamic stack. In interpreted (script) languages the binding is fully dynamic, i.e. names of variables, together with their values, physically exist on the dynamic stack. The subdivision between static binding (during compile time) and dynamic binding (during run time) is almost always a bit rough and implementation dependent: usually the static binding is supported by calculations during run time, and dynamic binding operations (for optimization) are partly done during compile time.
Query languages are interpreted hence for them the dynamic binding and dynamic stack is relevant. To avoid considerations related to physical implementation of the static and dynamic stacks we assume that all bindings are dynamic. Hence at least at the beginning we avid the static stack. All run time entities that are necessary for query processing will be stored at the run-time stack together with their source names. In fact, majority of the semantics of query languages we are able to explain without referring to the static environment stack.
However, some issues in query languages require the static stack. These are the following:
In our further discussion we present the stack in conceptual rather than physical form. In this way our considerations will be still abstract and high-level. Of course, physical implementation of the stack can much differ in details from our conceptual view.
Go to Environment Stack in SBA in the AS0 Store Model.
Last modified: December 31, 2007