©Copyright by Kazimierz Subieta

Procedures and Methods in SBQL (under construction)

by Kazimierz Subieta

Back to Description of SBA and SBQL.

Procedures are the most important programming constructs for ensuring abstraction, encapsulation of code and program reuse. Practically all programming languages introduce procedures as a fundamental feature. Procedures encapsulate any complex calculations and hide their code from the programmer. They can be called from many places of applications and (in remote procedure call technologies) can be called from outside of the address space that a procedure is located in. Procedures can be parameterized explicitly, by parameters specified within their declarations, or implicitly, by an external environment that the body of a procedure can access and update (so-called side effects).

Depending on their properties procedures can be subdivided according to the following criteria (which can be orthogonally combined):

·         Proper procedures and functional procedures (functions). Proper procedures belong to the category of program statements. They do not return a value, thus cannot be called in expressions or queries. Functional procedures return a value, thus can be called within expressions and queries. In majority of solutions functional procedures can be used as proper procedures, i.e. their calls can be considered statements. In this case the return value of a procedure is ignored. In some languages (e.g. C/C++) all procedures are functional ones, but concerning conceptual issues and strong typing this assumption is controversial.

·         Procedures and methods. There are opinions that methods constitute “behaviour” of objects and they are activated by message passing. There are suggestions that message passing reminds communication in societies, thus presents a totally new paradigm of programming that contributes to parallel execution of programs. Such opinions are based on superficial observations and misleading associations. Parallel programming is a very important issue, but methods and message passing introduce to it exactly nothing. We follow the assumption that methods are ordinary procedures with a specifically constructed program environment (we explain it later) and message passing is equivalent to a procedure call (within special context that will be explained too). The most obvious difference between procedures and methods is their conceptual location. Procedures are located within program modules (sometimes within program blocks, e.g. Pascal), while methods are located within classes.

·         Procedures located within an application code and procedures stored in a database. In traditional programming languages procedures are properties of an application code. Relational databases introduced a new kind of procedures, called stored procedures or database procedures, that can be stored within a database. This issue is directly related to the phase of binding procedure names. Majority of programming languages assume early binding, i.e. binding during compilation and linking, before the program is executed. Stored procedures require late binding, i.e. binding during run time. The major advantages of early binding is much better performance and better opportunities for strong type checking. The advantage of late binding is flexibility and more possibility of generic programming, in particular, programming with linguistic reflection[1].

·         Ordinary procedures and higher-order procedures. Higher-order procedures can be parameterized by some programming entities that are have properties of the source code, for instance, by types, by procedures, by classes, by lambda expressions, etc. Higher order procedures much increase flexibility and power, especially related to generic programming. In contrast to linguistic reflection, which actually disables compile-time strong type checking, higher-order procedures can be strongly typed as well. There are various forms of type polymorphism proposed in different languages. This kind of procedures much fascinated academicians and is practically ignored by industrial professionals. There is old and perhaps already obsolete hope that higher-order procedures will create a new quality in program development. Despite many attempts (c.f. languages such as Scheme, ML, Haskell, F#) till now it is unclear if indeed they present a new quality that can be sufficiently attractive for the common programmer. Recently this kind of paradigm has been implemented within the LINQ project by Microsoft Research (lambda expressions), but after some experience our impression is that real advantage of this style of programming is difficult to identify. The obvious disadvantage concerns query optimization, which is still much underdeveloped in comparison to SQL and SBQL.

SBQL presents a new quality w.r.t. procedures and methods because we unify expressions and queries. Hence queries can be parameters of procedures or methods, can be used within return statements of functional procedures/methods and can be used within imperative statements in bodies of procedures/ methods. Although considering the theory and strong typing this is rather a well-recognized issue, none of the existing programming languages that deal with queries (e.g. PL/SQL of Oracle, T-SQL of Sybase and Microsoft, ABAP of SAP, SQL-2008 standard, C#/LINQ) does not assume such radical treatment of queries in all contexts. We show on examples that this SBQL feature indeed presents the new quality in application program development.

Semantics of procedures is based on the environment stack mechanism that we have introduced before for non-algebraic operators. The environment stack (known also as call stack) was introduced in early 1960-ties for the programming language Algol-60. Thus the history of this concept is almost as old as the idea of high-level programming languages. Our original contribution is to use this stack (ENVS) to define non-algebraic query operators, such as selection, projection, navigation, join, quantifiers, etc., see the section SBQL - Non-Algebraic Operators. The semantics of queries based on ENVS is a sound substitution of very limited, inconsistent, and usually mathematically incorrect semantics based on relational algebra, relational calculi, functional approaches, object algebras, object calculi, monoid comprehensions, F-logic and other concepts that were invented to deal with queries. A serious critics of object algebras can be found in [SuLe95], but majority of this critics is relevant to other formal approaches the we have listed above. The stack-based approach that we describe on these pages presents a specific conceptual construction that is motivated by the semantics of query languages. The thorough description can be found in the section Environment Stack, Name Scoping and Binding.

Usually developers of programming languages assume that procedures or methods can access and update external entities, including application data, databases, files, external devices, etc. directly, without use of parameters. This is known as side effects of procedures. Side effects can be passive, that is, a procedure can access external entities, but cannot updated them, and active, when a procedure can updated external entities. Currently popular programming languages have no means to specify side effects of procedures. Originally, in the concept of module that was proposed by David Parnas, side effects were specified in the form of a so-called import list. This was accomplished in the programming languages Modula-2 and Eiffel. Import list were the components of types and access to external entities was strongly typechecked. There is a similar concept of import in languages such as Java, however, not as a component of a procedure type. Interfaces in Java that specify signatures of procedures do not contain information on side effects. This is sometimes criticized as a feature leading to errors. Methodologies such as Design by Contract provide specification of side effects of procedures (an a lot of other features such as preconditions and postconditions).

Syntactically, we distinguish declarations and calls (invocations) of a procedure/method. A declaration of a procedure consists of procedure name, formal parameters (single names) usually associated with their types, the type of its output (for functional procedures) and a procedure body where the programmer specifies a sequence of instruction that is to be performed. Within the body there are declarations of local objects or variables or sometimes other declarations (e.g. local procedures). A call of a procedure consists of procedure name and procedure actual parameters which are expressions. Types of the expressions must conform to the declared types of corresponding formal parameters. A procedure is encapsulated: the programmer that uses a procedure needs to know only its signature which consists of procedure name, formal parameters with their types and the type of a procedure output. The signature contains also some other information, for instance, concerning the method of parameter passing. For instance, there is a declaration of a procedure square:

square( a: real): real {

   counter: integer; x: integer;

   counter := 0; x := 1;

   while counter < 100 do {

      x := (a/x + x)/2;

      counter := counter +1;

   }

   return x;

}

The signature of the procedure is

square( a: real): real

An example call of the procedure

square( 5+ z*t ) / 120

When a procedure or a method is called, a new section (so-called activation record) on the environment stack is pushed. The section contains three kinds of entities:

·         Local procedure environment, i.e. all local objects/variables that are declared in the body of the procedure/method.

·         Calculated actual parameters of the call. In some languages (e.g. C) calculated parameters are equivalent to variables, but this is not a rule.

·         Return path, i.e. an address of the program code when the control should be passed after the procedure is terminated.

Sometimes a call implies pushing on the environment stack more sections. When a procedure or method is terminated, its activation record is popped. When a procedure is running, local environments of a procedure that called it is unavailable. This is due to the lexical scoping principle which assumes no bindings to entities of any environment that the programmer was not aware during writing a procedure. This is illustrated in Fig.64. Black sections denote entities unavailable for bindings. Bindings are performed from top of the stack to its bottom, as usually. In case of query languages the situation can be a bit more complex, because sections implied by procedure/method calls can be mixed with sections implied by non-algebraic operators. This will be presented later.

Fig64 proc calls

Fig.64. ENVS stack: procedure P1 calls procedure P2, which calls procedure P3

Procedures can be called from procedures without limitations, in particular, any recursive calls are allowed. Due to the stack-based semantics recursive calls are not extraordinary and in majority of languages recursive calls are not distinguished syntactically. However the following cautions should be observed:

·         The size of the environment stack. It limits the depth of recursive calls, especially if the stack is organized in RAM.

·         Within the procedure there should be a condition ensuring the end of recursion. Otherwise the recursive calls must result in failure, but if the stack is large, this may happen after long time. For this reason some languages limit the depth of recursion.

·         Active side effects should be avoided or limited, because a next recursive call can overwrite the results of previous calls. Procedures that can be recursive without inconsistencies are called reentrant.

In SBQL we do not assume any limitations to the recursion. The programmer is responsible for avoiding stack overflows and for ensuring the end of recursion. Examples of recursive procedures in SBQL are presented in the chapter Recursive Queries in SBQL.

 

Parameters of Procedures

Procedures and methods can be parameterized. In contrast to mathematical functions, where a parameter is always evaluated to a value, parameters of procedures/methods can posses different semantics which influences their functionality and pragmatics of the use. Below we list several popular parameter passing methods. They can coexist within one language but usually must be syntactically distinguished.

·         Call-by-value. This method assumes that the parameter is evaluated before the procedure code is started. The parameter should result in a value (perhaps a complex one). If evaluation of the parameter results in a reference of object/variable, the dereferencing operation is performed. There are two a bit different treatments of the parameter within the procedure body. In languages such as Pascal the parameter is not a variable, thus it is impossible to assign to it a new value. In C/C++ inside the body of the procedure the parameter is treated in the same way as a local variable. In some systems, e.g. CORBA, a call-by-value parameter in the declaration of a procedure and in its signature is preceded by the keyword in.

·         Call-by-reference. As before, the method assumes that the parameter is evaluated before the procedure code is started, but the evaluation must result in a reference to an object (variable). Within the procedure body the reference is used to update or delete the object. Hence the method assume updates of an object from outside of the local procedure environment. Such a parameter is preceded by a special keyword: e.g. var in Pascal. In other systems (in particular built according to the CORBA standard) the parameter is preceded by keyword out or inout. There is a small difference in treatment of these two keywords: out means that the parameter is used for output only, while inout means that the parameter can be used for input and for output. This difference can be checked by the strong typing system, which should forbid the dereferencing operation for out parameters.

·         Strict-call-by-value. As for call-by-value, but no dereferencing operation is performed. A reference is treated in the same way as a value, hence the method can be applied without a special syntax as call-by-value or call-by-reference. The method implies less opportunities for strong type checking, but lack of a special syntax for distinguishing kinds of parameter passing is an advantage. The method is used in such languages as C. It has some meaning for query languages if the programmer wants to use a parameter which is evaluated to a bag of structures joining values and references.

·         Call-by-value-return. As for call-by-reference, but a copy of a referenced object within local procedure environment is created. All operations on the referenced object are then performed on this local copy. At the end of the procedure, the value of the copy is sent as the value of the original object. The method has a great meaning in distributed environments, as it minimizes the number of connections to remote objects. However, the method is not semantically clean, if two or more call-by-value-return parameters refer to the same object (what may happen due to nested loops, processing bags, etc.) In this case two or more copies of the same object will be created, all copies can be updated, but only updates of the copy that is sent as the last will be indeed recorded in the original object. Other updates will be lost, which as a rule means an error in the program.

·         Call-by-name. The method assumes that the parameter is not evaluated before the procedure body is started. Instead, an expression being the parameter is identified. A pointer to the expression code is passed to the body of the procedure. When the control meets this pointer within the procedure body, the expression is evaluated (in the caller environment). The method has some advantage: it avoids evaluation of the parameter if the control does not meet the above pointer. However, the method may require evaluation of the parameter many times. Moreover, each evaluation can return a different result (if the caller environment is changing), thus the method requires attention from the programmer. The method (implemented in Algol-60) is considered historical and obsolete, but the come-back can be considered in the context of query languages. If the environment necessary for evaluation of the parameter is not changed (e.g. it is a database) then the method can be implemented in such a way that each occurrence of the parameter within the procedure body is macro-substituted by the expression communicated as a parameter. This would give more opportunities for query optimization (rewriting, indices, etc.), because a query being the parameter is joined with a query that uses the parameter. We illustrate this method in example E.PR.1. The method is similar to the method of processing views known as query modification (explained later).

E.PR.1

C.f. Fig.63. Procedure MyEmp returns references to Emp objects communicated as the first parameter having names communicated as a second parameter. Assume there is an index for name and no index for sal.

SBQL:

MyEmp ( who: EmpType[0..*]; myNames: string[0..*] ): EmpType [0..*] {

   return who where name in myNames;

}

 

Within the query who where name in myNames no optimization is possible because the index for name cannot be applied.

 

Call the procedure MyEmp for employees earning more than 1000 and for names “Doe”, “Poe” and “Lee”.

SBQL:

print( MyEmp( Emp where sal > 1000; bag{“Doe”, “Poe”, “Lee” } ) );

 

Note that the call cannot be optimized too. However, when the call-by-name method is applied, the query who where name in myNames is rewritten to:

(Emp where sal > 1000) where name in bag{“Doe”, “Poe”, “Lee” }

This query can be rewritten to the semantically equivalent form:

(Emp where name in bag{“Doe”, “Poe”, “Lee” }) where sal > 1000

The query can be optimized by using the index for name.

 

·         Call-by-need. Sometimes the method is called lazy evaluation. Its assumptions are similar as for call-by-name, but the parameter is evaluated only once, first time when the control meets the pointer to an expression being the parameter. After evaluation, its result is stored within the local procedure environment and then reused each time when the control meets the pointer an expression. It has some performance advantage over call-by-name, but in comparison it gives less opportunities for query optimization that we have mentioned for call-by-name.

If we assume that parameters can be queries, the situation concerning parameter passing methods is changed. A query can return a complex value which consists of nested bags, sequences and structures. Moreover, the value can contain values and references. As we can see from example E.PR.1, some parameter passing methods can be much better to optimize than the traditional methods. For this reason the decision which method can be chosen for parameters being queries requires more research and experience.

 

Syntax and Semantics of SBQL Procedures

In SBQL there is little distinction between procedures and functional procedures. Semantically, the only difference is that a functional procedure name is eventually used to store the output from the procedure. Thus, an invocation of a functional procedure can be treated as a query and can be nested as a part of other queries. In principle, a value of any type (including nested structures, bags, sequences, etc.) can be allowed as an output from a functional procedure. Any limitations  in this respect violate the orthogonality principle and are not justified by implementation (which is equally easy for any type). A non-functional procedure cannot be a part of a query because  no value is assigned to its name. Typing of procedures and functional procedures is a bit different: the type of a proper (non-functional procedure does not contain a return type. All procedures can call procedures without limitation; arbitrary recursive calls are allowed with no syntactic distinction. A procedure is uniquely identified by its name. Overloading of procedure names is not supported (but this of course can be changed, e.g. in the spirit of Java). The result of a functional procedure is typed similarly to other programming languages. Each functional procedure can also be used as a proper procedure; in this case its return value has no meaning (is dropped).

A procedure declaration consists of a procedure name, a typed parameter list, a type of its output and a list of imperative SBQL statements.

proc_declaration ::= name ([formal_par_list]) [: return_type] { [statements] }

formal_par_list ::= par_declaration [; parameter_list]

statements ::= instruction [ statements ]

instruction ::= …any instructions defined previously and procedure calls

instruction ::= return ; | return query ;

par_declaration ::= [par_transmission_method] par_name : type [ cardinality ]

par_transmission_method ::= in | out | inout | strict | macro | …

Procedure parameter declaration syntax determines its name, type and cardinality. For a given procedure names of parameters must be unique. If the cardinality is not specified, the default cardinality [1..1] is assumed. In the procedure signature parameter declarations are separated by semicolons. Instruction return concerns proper procedures and causes immediate passing control to the caller of a procedure. It need not necessary if execution of the body of the procedure is terminated naturally. Instruction return query causes the same, but with storing the result of query on top of the QRES stack. This instruction must occur at least once within a functional procedure. In this case instruction return without query is not allowed and ending the procedure without return query is not allowed too.  Parameter transmission methods are distinguished syntactically by keywords. The default method (no keyword) is equivalent to in and denotes call-by-value. Keywords out and inout denote two variants of call-by-reference. Keyword strict denotes strict-call-by-value. Parameter macro denotes a variant of call-by-name in which the parameter is treated as a macro-definition. Other parameter passing methods can also be defined, depending on the research and experience[2].

Procedure invocation syntax is the following:

proc_invocation ::= name [([actual_par_list ])]

actual_par_list ::= parameter [; actual_par_list]     

parameter ::= query | par_name : query

Syntactically, a call of a procedure consists of its name and parameters. If the list of parameters is empty, it is allowed to drop parentheses (as in Pascal). Lack of parentheses may have some meaning, in particular, it allows to abstract from the distinction between procedure or method call and binding to stored objects.  For instance, if the class of objects Person has the functional method age, then the query Person where age > 25 looks more naturally than Person where age() > 25. For better legibility and reducing possible errors each actual parameter can be preceded by the corresponding name of a formal parameter. If all actual parameters are preceded by names of formal parameters, then the order of actual parameters is inessential. Additionally, if a mechanism of parameter default values would be implemented (we explain it later) then some or all parameters can be dropped. Otherwise, the order and the number of actual parameters must correspond to the order and the number of formal parameters.

Inferred types of actual parameters must conform to declared types of formal parameters, the conformity is determined by the substitutability principle or by some implemented forms of ad hoc conformity (e.g. the integer type can be used when the real type is expected).  So far we do not determine precise rules of ad hoc conformity, as the typing system of SBQL can differ in details (one of the variants is implemented in ODRA, but this is not a standard).

Semantics of a procedure call follows the semantics explained above. Parameter transmission methods call-by-value, call-by-reference and strict-call-by-value require evaluation of the actual parameter before the control is passed to the body of the procedure. The evaluation of a parameter results by a new section on the query result stack QRES. Then, a new section (activation record) is pushed on the environment stack; the section consists of actual parameters (represented as binders), local procedure objects (represented as binders too; the corresponding objects are stored in a special part of a volatile store) and return path (represented as an address of a compiled bytecode). Then, values of the parameters are popped from QRES. After this section is created, the control is passed to the body of the procedure, i.e. sequence of instructions (statements). Instructions from the body are executed; the order is determined by their sequence and by imperative control statements explained in the chapter Imperative Constructs of SBQL. When the control meets the end of the sequence of instruction or a return statement, the procedure is terminated. Termination means the following actions:

·         For functional procedures and methods the result of the query within the return statement is pushed at the top of QRES.

·         The return path is stored within some auxiliary variable.

·         All local objects that were created for this procedure are removed.

·         The environment stack ENVS is popped – the section that was pushed at the stack when the procedure was started is removed.

·         The control is passed to the caller code according to the return path.

 

 

 

 


Last modified: March 8, 2010



[1] Linguistic reflection is a property of a programming environment that makes it possible to dynamically generate a program that can be executed as a part of the generating program. The best-known language with linguistic reflection is LISP.

[2] In ODRA only strict-call-by-value is implemented, hence no keyword precedes a parameter declaration.