©Copyright by Kazimierz Subieta

Recursive Queries in SBQL

by Tomasz Pieciukiewicz, Krzysztof Stencel, Kazimierz Subieta

Back to Description of SBA and SBQL.

 

There are many important tasks that require recursive processing. The most widely known is Bill-Of-Material (BOM), a part of Materials Requirements Planning (MRP) systems. BOM acts on a recursive data structure representing a hierarchy of parts and subparts of some complex material products, e.g. cars or airplanes. Typical BOM software processes such data structures by proprietary routines and applications implemented in a classical programming language. Frequently, however, there is a need to use ad hoc queries or programs addressing such structures. In such cases the user requires special user-friendly facilities dealing with recursion in a query language. Similar problems concern processing and optimization tasks on genealogic trees, stock market dependencies, various types of networks (transportation, telecommunication, electricity, gas, water, and so on), etc. Recursion is also necessary for internal purposes of computer systems, such as processing recursive metadata structures (e.g. Interface Repository of the CORBA standard), software configuration management repositories, hierarchical structures of XML or RDF files, and others. 

In many cases recursion can be substituted by iteration, but this implies much lower programming level and less elegant problem specification. Iteration may also cause higher cost of program maintenance since it implies a clumsy code that is more difficult to debug and change.

Despite importance, recursion is not supported in SQL standards SQL-89 and SQL-92. Beyond the standards, it is implemented (differently) in relational database management systems, in particular, in Oracle and DB2, in the form of transitive closure and linear recursion. The newer standards SQL-3, SQL-99, SQL-2003, SQL-2008 introduce both transitive closure and deductive rules a la Datalog. Unfortunately these standards are very huge and eclectic thus there are doubts among database professionals if they will ever be fully implemented, especially concerning such advanced properties.

The ODMG standard for OO databases does not mention corresponding facilities. This will obviously lead to extensive programming in programming languages such as C++, Java and Smalltalk. Recursion is considered a desirable feature of XML and RDF-oriented query languages, but current proposals and implementations do not introduce corresponding features to the corresponding query languages or introduce them with limitations.

The possibility of recursive processing has been highlighted in the deductive databases paradigm, notably Datalog. The paradigm has roots in logic programming and has several variants. Some time ago it was advocated as a true successor of relational databases, as an opposition to the emerging wave of object-oriented databases. Despite high hype and pressure of academic communities it seems that Datalog falls short of the software engineering perspective. It has several recognized disadvantages, in particular:

·        Lack of efficient methodology supporting the developers of applications in transition from business conceptual models to Datalog programs. For real applications an average developer or programmer has no idea how to formulate Datalog rules in response to typical results of analysis and design processes (stated e.g. in UML).

·        Although Datalog is claimed to be a universal query language, its real application area is very limited to niche applications requiring some “intelligence” expressed through sylogisms and recursive rules.

·        Limitations of data structures that Datalog deals with. Current object-oriented analysis and design methodologies as well as programming languages and environments deal with much more sophisticated data structures (e.g. complex objects with associations, classes, inheritance, etc.) than relational structures that Datalog deals with. Complex data structures allow one to get the software complexity under control.

·        „Flatness” of Datalog programs, i.e. lack of abstraction and encapsulation mechanisms, such as procedures, views or classes. This flaw means no support for hierarchical decomposition of a big problem to sub-problems and no support for top-down program design and refinement and encapsulation of problem details.

·        Datalog is stateless thus it gives no direct possibility to express data manipulation operations. Majority of applications require update operations, which are possible to express in Datalog only via side effects, with no clear formal semantics.

·        Datalog, in principle, does not separate data and programs. Both are “rules” in the same abstraction frame. However, the fundamental differences between these two aspects concern the following factors: design of programs (databases are designed before programs that operate on them), client-server architecture (databases are on servers, programs are usually on clients); structural constructs (data structures are specifically constructed and constrained, especially in object-oriented databases; this does not concern programs); updating (data can be updated, while it makes little sense for programs). These fundamental differences cause that the unification of data and programs must meet severe limitations.

·        Datalog implies significant problems with performance. Current optimization methods, such as magic sets, do not seem to be sufficiently mature and efficient. 

Thus Datalog applications supporting all the scale that databases require are till now unknown. Nevertheless, the idea of Datalog semantics based on fixed-point equations seems to be very attractive to formulate complex recursive tasks. Note however that fixed-point equations can be introduced not only to languages based on logic programming, but to any query language, including SQL, OQL and XQuery. Below we show how to incorporate fixed point equations in SBQL. In contrast to Datalog, we assume that such a feature can be used by the SBQL programmer only when needed rather than as the major programming paradigm.

Besides transitive closures and fixed-point equations there are classical methods for recursive processing known from programming languages, namely recursive functions (procedures, methods). In the database domain a similar concept is known as recursive views. Integration of recursive functions or recursive views with a query language requires generalizations beyond the solutions known from typical programming languages or databases. First, functions have to be prepared to return bulk types that a corresponding query language deals with, i.e. a function output should be compatible with the output of queries. Second, both functions and views should possess parameters, which could be bulk types compatible with query output too. Currently very few existing query languages have such possibilities, thus using recursive functions or views in a query language is practically unexplored.

This chapter discusses three different approaches to recursive queries:

The first approach is implemented in a few industrial commercial systems with severe limitations [PiSu04]. Two next approaches are implemented so far in research prototype systems only, with limitations too. Till now no system implements all three approaches in a sufficiently robust version, thus comparison of them from the software designer and programmer point of view is unfeasible. Moreover, all widely known implementations concerns relational database systems which are successful, but with well-recognized disadvantages such as impedance mismatch and poor conceptual modeling. There are few theoretical proposals concerning recursive queries for object-oriented and XML-oriented systems, but implementations are not widely known or are very limited. Only SBQL implements all three approaches in a sufficiently robust and simple versions.

Recursive queries can imply performance problems. There are several typical optimization methods related to recursive queries, such as factoring out independent subqueries, indices, or caching (materializing) results of queries to utilize them during evaluation of next (identical) queries. For fixpoint equations there are special methods know as magic sets. In the chapter we briefly discuss the above issues.

 

Transitive Closures

A transitive closure in SBQL is a non-algebraic operator having the following syntax:

                      query ::= query close by query

                      query ::= query leaves by query

The query q1 close by q2 is evaluated as follows. Let final_result be the final result of the query and union be the bag union. The pseudo-code accomplishing abstract implementation of q1 close by q2 is the following:

    final_result := result_of (q1);

    for each r in final_result do {

        push nested(r) at top of ENVS;

        final_result := final_result union result_of (q2);

        pop ENVS;

    }

Note that each element r added to final_result by q2 is subsequently processed by the for each command. The above operational semantic can be described in the denotational setting as the least fixed point equation (started from final_result = empty_bag and continued till fixpoint):

    final_result = q1 union final_result.q2

where dot is identical with the dot operator in SBQL. Similarly, the semantics can be expressed by iteration (continued till result_of (q2) = empty_bag):

    final_result =  q1 union q1.q2 union q1.q2.q2 union q1.q2.q2.q2 union ....

Naive implementation of the close by operator is as easy as the implementation of the dot operator. Note that if q2 returns a previously processed element, an infinite loop will occur. Checking for such situations in queries is sometimes troublesome and may introduce unnecessary complexity into the queries. Another operator close unique by has been introduced to avoid infinite loops due to duplicates returned by q2.

The query q1 leaves by q2 is evaluated as q1 close by q2, but in the result only leaves of the transitive closure graph are left. The semantics can be expressed as follows:

    final_result := result_of (q1);

    for each r in final_result do {

        push nested(r) at top of ENVS;

        temporary_result := result_of (q2);

        final_result := final_result union temporary_result;

        if temporary_result <> bag() then final_result := final_result minus { r };

        pop ENVS;

    }

bag() stands for empty bag. As q1 and q2 can be any queries, simple or complex, the relation between elements which is used for transitive closure is calculated on the fly during the query evaluation; thus the relation need not be explicitly stored in the database.

Fig. 55. A sample recursive database schema

Fig.55 depicts a simple database schema used in our examples. It is a description of parts, similar to descriptions used in Bill of Material (BOM) applications. Each Part has a name. The part may be either a Detail or Aggregate. Each detail has attributes cost and mass (its cost and mass). Each Aggregate has attributes assemblyCost and assemblyMass which represent the cost of assembling this aggregate and mass added to the mass of its components as the result of the assembly process. Aggregates have one or more Component subobjects. Each Component has the attribute amount (number of components of specific type in a part), and a pointer object leadsTo leading to a Part object.

The following SBQL query with a transitive closure over this schema finds all components of a part named “engine”.

E.TC.1

All components of a part named “engine”.

SBQL:

(Part where name = ”engine”) close by Component.leadsTo.Part

 

This query first selects parts having the attribute name equal to “engine”. The transitive closure relation is described by the subquery Component.leadsTo.Part. It returns all Part objects which can be reached by the pointer leadsTo from already selected objects.

This query takes advantage of an assumption made in the AS1 model. After reaching an object by its name all its attributes are accessible, not only those of the class whose name has been used to retrieve the object. The assumption allows us to simplify queries by removing the necessity to use cast operators.

If an object does not have the queried attribute (because it belongs to a subclass which does not have this attribute or the attribute is optional), the empty collection is returned as the result of binding of the name of this attribute. In this query, the subquery Component.leadsTo.Part is evaluated for Part objects which can be either Detail or Aggregate. In case of Aggregate, the name Component is properly bound and returns the collection of the Component subobjects. In case of Detail, the name Component cannot be bound and returns the empty bag.

Assuming there is no access from a Part object to Aggregate and Detail attributes the query can be formulated with a cast:

E.TC.2

All components of a part named “engine”.

SBQL:

((Part where name = ”engine”) as e) close by ((Aggregate)e). Component.leadsTo.Part

 

One of the basic BOM problems, i.e. finding all components of a specific part, together with their amount, may be formulated using the transitive closure as follows:

E.TC.3

Find all components of a specific part, along with their amount required to make this part.

SBQL:

((Part where name=”engine”), 1 as qty)

close by Component.((leadsTo.Part), (qty*amount) as qty)

 

The query uses a named value in order to calculate the number of components. The number of parts the user wants to assemble (in this case 1) is named qty and paired with the found part. In subsequent iterations the qty value from parent object is used to calculate the required amount of child elements. It is also named qty and paired with the child object.

The above query does not sum up amounts of identical sub-parts from different branches of the BOM tree. Below we present a modified query which returns aggregated data – sums of distinct components from all branches of the BOM tree:

E.TC.4

Find all components of a specific part, along with their amount required to make this part, summing quantities of identical parts.

SBQL:

((((Part where name=”engine”) as x, 1 as qty)

  close by x.Component.((leadsTo.Part) as x, (qty*amount) as qty))

    group as allEngineParts ).

      ((distinct(allEngineParts.x) as y).

        (y, sum((allEngineParts where x=y).qty)))

 

This query uses grouping in order to divide the problem into two parts. First, all the components named x, along with their amounts named qty are found. The pairs are then grouped and named allEngineParts. The grouped pairs are further processed by finding all distinct elements and summing the amounts for each distinct element.

This query could be further refined in order to remove all aggregate parts (so only the detail parts will be returned). There are many ways to accomplish this goal. One of them is to use the operator leaves by in place of close by. The operator leaves by returns only leaf objects, i.e. objects which do not result in adding any further objects to the result bag:

E.TC.5

Find all details of a specific part, along with their amount required to make this part, summing quantities of identical details.

SBQL:

((((Part where name=”engine”) as x, 1 as qty)

  leaves by x.Component.((leadsTo.Part) as x, (qty*amount) as qty))

    group as allEngineDet ).

      ((distinct(allEngineDet.x) as y).

        (y, sum((allEngineDet where x=y).qty)))

 

The other way to sort the aggregates out of the result of the previous query is to use a cast. The cast operator takes a collection of objects and returns only these items of it which belong to the given class. The cast applied to a single object returns this object if it belongs to the class; otherwise it returns the empty bag. Here is a query with the same retrieval goal but written using a cast.

E.TC.6

Find all details of a specific part, along with their amount required to make this part, summing quantities of identical details.

SBQL:

((((Part where name=”engine”) as x, 1 as qty)

  close by x.Component.((leadsTo.Part) as x, (qty*amount) as qty))

    group as allEnginePar).

      ((distinct((Detail) (allEnginePar.x)) as y).

        (y, sum((allEnginePar where x=y).qty)))

 

Here the result of the subquery whose result is further named allEngineDet is first cast to the class Detail. This cast drops all objects which do not belong to this class.

Although the complexity of the SBQL solution is still high, SBQL supports facilities to manage the complexity. In this case the grouping operator allows decomposing the problem into easier subproblems.

SBQL queries may be used to perform even more complex tasks. The query below calculates the cost and mass of the part named “engine”, taking into account cost and mass of each engine part, amount of engine parts and cost and mass increment connected with assembly. This task has been used in [AtBu87] as an example of lack of power and flexibility of currently used query languages. In SBQL the task can be formulated with no essential problems:

E.TC.7

Calculate the cost and mass of the part named “engine”, taking into account cost and mass of each engine part, amount of engine parts and cost and mass increment connected with the assembly.

SBQL:

((((Part where name=”engine”) as x, 1 as qty)

close by x.Component.((leadsTo.Part) as x, (amount*qty) as qty))

group as allEngineParts ).

 

(allEngineParts.((Detail) x as d).

                  ((qty * d.cost) as c, (qty * d.mass) as m)

group as CostMassIncreaseD,

 

allEngineParts.((Aggregate) x as a).

                  ((qty*a.assemblyCost) as c, (qty*a.assemblyMass) as m)

group as CostMassIncreaseA ).

 

(   ((sum(CostMassIncreaseD.c) + sum(CostMassIncreaseA.c)) as engineCost,

(sum(CostMassIncreaseD.m) + sum(CostMassIncreaseA.m)) as engineMass)

 

Such a typical BOM task cannot be formulated so comprehensively as a single query in other query languages. For example, this query formulated in SQL-99 exceeds 100 lines of code and is totally illegible. Furthermore, some of its subqueries have to be repeated several times in its text.

SBQL can also perform quite complex calculations (not even considered in other query languages). Let the object named a store some non-negative number. The query below calculates approximation of the square root of a, using the fixed point equation x = (a/x + x)/2, starting from x=1 and making 50 iterations:

E.TC.8

Calculate the approximation of the square root of a.

SBQL:

((1 as x, 1 as counter)

leaves by (((a/x + x)/2 as x, counter+1 as counter) where counter£50)).

(x where counter = 50)

 

In this example we use a kind of internal variable counter as a counter of iterations.

Another example of a recursive task:

E.TC.9

Calculate 50 Fibonacci numbers according to the recursive definition:

F(0) = 0; F(1) = 1; F(n) = F(n-2) + F(n-1). The output should contain the argument of F and the result of F.

SBQL:

((0 as arg, 0 as fib, 1 as nextfib)

close by ((arg+1) as arg, nextfib as fib, (fib + nextfib) as nextfib)

                                                                                        where arg < 50)).

(arg as arg, fib as F)

 

Although the calculations remind programming, we underline that we are still using query operators only.  The examples show the power of recursive queries in SBQL so far not even considered in other query languages.

Cycles in the queried graph can be easily dealt with by means of another variant of the close by operator – close unique by. This variant removes duplicates on the fly after each closure iteration; thus cycles do not imply infinite loops. Another variant of the close by operator is the leaves unique by operator. It is a combination of the two previous variants. It returns only leaf objects, while preventing problems with cycles in graphs. Note that cycles in the queried graph do not have to be an effect of database inconsistency. It is easy to imagine a database which contains a graph with cycles that is consistent with the real world situation (see Fig. 56).

Fig. 56. A sample structure for the examples of an application of close unique by

A Company has name and may own shares in other Companies (in this example we abstract from the amount of shares). Other companies in turn may own shares in further companies; and so on. It is possible for a company (lets call it “ACME”) to own shares in another company which in turn owns shares in “ACME” (directly or by owning shares in other companies). To find the names of all companies owned (directly or indirectly) by “ACME” we cannot use a simple query using the close by operator as it would not function correctly upon encountering the cycle. We can use the close unique by operator instead, as shown in the example below:

E.TC.10

Find the names of all companies owned (directly or indirectly) by “ACME”.

SBQL:

(Company where name = ”ACME”)

close unique by hasSharesIn.Company

 

A simple case of using the operator leaves unique by is shown below.

E.TC.11

Calculate the approximation of the square root of a up to 5-th position after the dot.

SBQL:

1 as x

leaves unique by ((integer)(100000 * (a/x + x)/2)/100000) as x

 

In the above case we have no counter variable that terminates the closure. The termination is achieved after the expression following leaves unique by returns the same value as previously. However, the case is less safe, as it is possible in general that each next value of the closure will always be different than the previous one (the final results may form a cycle).

As shown above, some advanced tasks may lead to very complex queries which semantics could be difficult to grasp for the programmers. However, the complexity is caused by the tasks rather than by the language. According to our tests, such tasks written e.g. in C++ may take several source code pages. SBQL offers also other recursive querying options which may support easier problem decomposition, in particular fixed point equation and recursive procedures.

Sometimes, apparently quite easy tasks cannot be formulated without a transitive closure. In example E.TC.12 we show the case when calculating a proper bag of numbers require such an operator.

E.TC.12

Assume Emp objects with an attribute salary. For each interval [i, i+100), i = 0, 100, 200, ... get the number of employees having the salary within this interval. The maximal salary is unknown for the person asking the query.

SBQL:

(0 as i close by ((i+100) where i <= max(Emp.salary)) as i)

    join

(count(Emp where salary >= i and salary < i+100) as c)

 

The result is a bag of structures { i(lower_bound), c(nbr_of_emps) }. The transitive closure is used to calculate the bag {i(0), i(100), i(200), … i(maxsal)}, where maxsal is the maximal salary rounded to full hundreds. Note that the number of elements in this bag is initially unknown, hence the use of the transitive closure. Even such a simple query is impossible to express in SQL.

 

Fixed Point Equations

 

In mathematics a fixed point equation has the form x = f(x), where x is a variable or (a vector of variables) of type T, f is a function TT. For instance, such a function can look like x = (2/x + x)/2. The solution of this equation is known as a least fixed point. For our example function the solution is the square root of 2. Well-known context-free grammars can be considered least fixed point equations. Recursive procedures can also be described as least fixed point equations. The essence of such equations is that they precisely model recursive processes and structures. Starting from some initial value x0 we calculate x1 = f(x0), x2 = f(x1), x3 = f(x2), … etc. till reaching the fixed point, i.e. when xn = xn-1. Sometimes reaching the fixed point requires infinitely many steps, but mathematical apparatus is well prepared for such situations. It may also happen that the fixed point cannot be achieved in this way (hence the above procedure will result in an infinite loop). This means that function f must be specifically constrained; these constraints also form a mathematical theory (unfortunately, inapplicable for our purposes).

The theory of fixed point equations has great meaning in mathematics (in particular, it is the foundation of a semantic specification method known as denotational semantics). In case of SBQL we have introduced fixed point equations as a facility for programmers to formulate recursive queries. For some tasks such equations could be more legible than transitive closures or recursive procedures and could better support decomposition of the task into subtasks with clear conceptual meaning.

Essentially, the evaluation mechanism of our fixed point equations is the same as the described above mathematical procedure. However, we must define some important syntactic and semantic details, such as “variables”, because SBQL binding rules for names are more semantically specific than the simple semantics of  mathematical variables. The syntax of an SBQL fixpoint equations is as follows:

          query ::= fixpoint [‘(‘set_of_variables’)’] ‘{‘set_of_rules’}’

          set_of_variables ::= variable {‘,’ variable}

          set_of_rules ::= rule{rule}

          rule ::= variable ‘:-‘ query ’;’

Let  fixpoint(xi1, xi2,…, xin) {x1 :- q1; x2 :- q2; xm :- qm;} be a fixpoint system, where:

·    x1, x2,, xm are names of variables in this equation system,

·    xi1, xi2,…, xin are returned variables, {xi1, xi2,…, xin} is a subset of {x1, x2,…, xm},

·    q1, q2,, qm are SBQL queries with free variables x1, x2,…, xm;

The basic semantics of this construct is the following:

1.      Variables x1, x2,, xm are initialized to empty bags. More precisely, a new section with binders x1(bag()), x2(bag()),…, xm(bag()) is pushed at top of ENVS; (bag() stands for empty bag).

2.      Queries q1, q2,, qm are evaluated. Their results are on the top of QRES.

3.      Pop ENVS.

4.      If the results of q1, q2,, qm are equal to the values of x1, x2,, xm, correspondingly, then the fixpoint is reached, hence go to step 5. Otherwise assign the results of q1, q2,, qm to the values of x1, x2,, xm. More precisely, a new section with binders x1(result of q1), x2(result of q2),…, xm(result of qm) is pushed at top of ENVS. Then remove results of  q1, q2,, qm from QRES and go to step 2.

5.      Results of queries that are not among qi1, qi2,…, qin (corresponding to variables xi1, xi2,…, xin) are dropped. Remaining results are changed into binders xi1(result of qi1), xi2(result of qi2),…, xin(result of qin) and left at the top of the QRES stack. Hence only variables  xi1, xi2,…, xin can be used in an outside query that consumes the results of the fixpoint operator. If set_of_variables is absent (in our syntax it is optional), then all the variables x1, x2,…, xm form the final result.

The symbol :- we use instead of = to distinguish the fact that we are dealing with equations having the above semantics rather than with regular equality predicates. As queries q1, q2,, qm can reference variables x1, x2,, xm, fixpoint equations provides recursive capabilities. A fixpoint query should be used within some non-algebraic operator (or within a for each statement) which pushes the binders from top of QRES onto the top of ENVS.

Note that a transitive closure query q1 close by q2 can be written as a fixed point equation x = q1 union x.q2, hence the general fixed point equation x = f(x) offers the power of transitive closures. However, it is not sure if such general fixed point equations would be sufficiently usable and understandable by the programmers. Moreover, such complex equations could be difficult to optimize and could result in unpredictable infinite loops. It also easy to see that the power of the transitive closure operator is sufficient to express any system of fixpoint equations. The fixpoint system

    fixpoint(xi1, xi2,…, xin) {x1 :- q1; x2 :- q2; xm :- qm;}

is semantically equivalent to the following query with the leaves unique by operator:

   (struct( bag() group as x1, bag() group as x2, … , bag() group as xm )

           leaves unique by

   struct( q1 group as x1,  q2 group as x2, … , qm group as xm ) ).

   struct( xi1 group as xi1,  xi2 group as xi2, …  ,  xin group as xin)

The equivalence stems from the fact that in both cases the evaluation procedure is the same. However, fixpoint equations offer much more compact and legible formulation of some recursive problems.

Below we present examples of SBQL fixed point equations; they refer to the schema from Fig.55.

 

E.FIX.1

All components of a part named “engine”.

SBQL:

fixpoint(engparts){

    myPart :- Part where name=”engine”;

    engparts:- myPart union (engparts.Component.leadsTo.Part);}

 

The first rule returns a reference to a part named engine. The second rule involves recursion on the engparts variable. The operator union in this query is the union of bags. Indeed, it may happen that the fixpoint equation returns a bag rather than a set. This makes our construct different from Datalog, which assumes sets (relations) as results of calculations. To illustrate this situation consider the parts graph from Fig.60.

 

  Fig.60. Example of a graph of parts

In Fig.60 arrows represent instances of the relationship Component. We can see that the part “bolt10x30” is a component of (at least) two sub-parts of the “engine” part. Hence, to be consistent, the reference to the bolt10x30 object should occur in the engparts variable (at least) two times. The necessity of such semantics will be more evident in next examples.

Fixpoint equations are regular SBQL queries, fully consistent with the entire SBQL semantics. As such may be used as parts of other SBQL queries, for instance:

E.FIX.2

All unique components of a part named “engine”.

SBQL:

distinct( fixpoint(engparts){

    myPart :- Part where name=”engine”;

    engparts:- myPart union (engparts.Component.leadsTo.Part);})

 

E.FIX.3

From all components of a part named “engine” take parts named “carburettor” (two queries below are equivalent).

SBQL:

(fixpoint( engparts ){

    mypart :- Part where name=”engine”;

    engparts :- mypart union (engparts.Component.leadsTo.Part);}.

engparts where name = “carburettor) as engcarb

SBQL:

fixpoint( engcarb ){

    mypart :- Part where name=”engine”;

    engparts :- mypart union (engparts.Component.leadsTo.Part);

    engcarb :- engparts where name = “carburettor”;}

 

In the first query we use the fixpoint query within an outer query. In the second query the same effect is achieved by three rules.

A fixpoint construct may use some variables as a way to break down the problem into smaller, more manageable parts, for instance (c.f. Fig.56):

E.FIX.4

For each component of the engine get the number of identical elements that the engine consists of.

SBQL:

fixpoint (final) {

engine :- (Part where name=”engine”) as x, 1 as howMany;

engineParts :- engine union

(engineParts.Component.((leadsTo.Part) as x,

        (amount*howMany) as howMany);

final :- ((distinct(engineParts.x) as y).

(y,sum(engineParts where x=y).howMany));

}

 

Only variable final is returned as the fixpoint. The other two variables are used to perform calculations, but never returned, as their final values are inessential to the user. Variable engine is used to find the top element of the hierarchy (the “engine” part), while engineParts is the variable in which the results of recursive calculations are stored. Variables final and engine are not defined recursively – their purpose is to make the query easier to read and create. Note that in this case the fact that variables store bags (and operator union is the union of bags) is essential. If duplicates are removed, the result would be warped. Assume that in Fig.60 bolt10x30 occurs once in carburetor and once in starter. Hence the structure struct{x(ibolt10x30), howMany(1)} occurs two times within the result of the variable engineParts; ibolt10x30 is the identifier of the bolt10x30 object (finally, 2 such bolts are components of the engine, while removing duplicates will cause that the number of such bolts will be 1).

The same principle is used in the next example.

 

E.FIX.5

Get the total cost and mass of the engine.

SBQL:

fixpoint (cost, mass){

   engine :- (Part where name=”engine”) as x, 1 as howMany;

   engineParts :- engine union

       engineParts.Component.((leadsTo.Part) as x,

       (amount*howMany) as howMany);

   detailsMass :- sum((Detail)engineParts. (howMany*x.mass));

   detailsCost :- sum((Detail)engineParts.(howMany*x.cost));

   addedMass :- sum((Aggregate)engineParts.

                                                         (howMany*x.assemblyMass));

   addedCost:- sum((Aggregate)engineParts.

                                                           (howMany*x.assemblyCost));

   cost :- detailsCost + addedCost;

   mass :- detailsMass + addedMass;}

 

Queries utilizing fixpoint equations, unlike those utilizing transitive closure, are capable of evaluating more than one recursive problem in each recursion step, in a manner similar to the Datalog. It is believed that such capabilities will allow users to formulate complex and intelligent “business rules”. This topic may be an interesting area for further research, although most of the practical recursive problems authors are aware of can be solved using only a single recursion.

Similarly to transitive closure operator, fixpoint equations may be used to perform recursive calculations without referring to the database at all. For instance, the example below shows a fixpoint equations version similar to Example E.TC.8:

 

E.FIX.6

Calculate the cubic root of a, according to the fixpoint equation

x = (a/x2 + x)/2, starting from x = 1 and making 50 iterations

SBQL:

fixpoint( x ){

 y :- (1 as r, 1 as c) union y.(((a/(r*r)+r)/2 as r, c+1 as c)

                                                                                      where c <=50);

 x :- (y where c = 50).r; }

 

Next example is analogous to E.TC.9:

E.FIX.7

Calculate 50 Fibonacci numbers. The output should contain arguments and results of the Fibonacci function.

SBQL:

fixpoint( Fib ){

    init :- (0 as a, 0 as fib, 1 as n);

    f :- init union f.(((a+1) as a, n as fib, (fib+n) as n) where a < 50);

    Fib :- f.(a as arg, fib as F);

}

In the example variable a is a counter (which is the argument of the function), fib represents a Fibonacci number, n represents a next Fibonacci number.

If the graph to be closed contains cycles then the transitive closure is unsafe  (thus the special closure variant close unique by). This problem does not occur for systems of fixed point equations, as function distinct can be used as usual for removing duplicates. For instance (c.f. Fig.56):

 

E.FIX.8

Find the names of all companies owned (directly or indirectly) by “ACME”.

SBQL:

fixpoint( comps ) {

   acme :- Company where name = ”ACME”;

   comps :- distinct(acme union comps.hasSharesIn.Company);}

 

Assume the following class definition (ODRA) for typical genealogical trees:

class PersonClass {

      name: string;

      birthYear: integer;

      sex: enum{“m”, “f”};

      alive: boolean;

      parent: ref Person [0..2] reverse child;

      child: ref Person [0..*] reverse parent;

}

 

Person: PersonClass[0..*];

 

In PersonClass we have used bidirectional pointers parent/child that represent relationships in the genealogical tree. Then we declare a collection of objects named Person. This structure can be used to formulate interesting queries with the help of fixed point equations.

 

E.FIX.9

Get names and sex for all living and younger cousins of  “Doe”.

SBQL:

fixpoint( result ) {

   myperson :- Person where name = ”Doe”;

   ancestors :- myperson union ancestors.parent.Person;

   allcousins :- ancestors union allcousins.child.Person;

   livingYoungerCousins :- allcousins where alive

                                                  and birthYear > myperson.birthYear ;

   result :-  livingYoungerCousins.(name, sex);

}

 

E.FIX.10

Get all cousins of  “Doe” of the same generation.

SBQL:

fixpoint( result ) {

   myperson :- (Person where name = ”Doe”) as p, 1 as gen;

   ancestors :- myperson union

                                  (ancestors.((p.parent.Person) as p, (gen-1) as gen);

   allcousins :- ancestors union

                                  (allcousins.((p.child.Person) as p, (gen+1) as gen);

   result :-  ((allcousins where gen = myperson.gen) minus myperson).p;

}

 

These examples much remind typical cases that can be found in the Datalog literature. Of course, we are not sure if all examples of Datalog programs can be formulated in this way. However, we do not expect that SBQL fixed point equations alone would have the full algorithmic power, as they are one of many SBQL constructs and features.

Strong Typing

For simplicity, in the presented constructs we skip typing issues. Because fixed point rules can be recursive, type inferences can be problematic (as initially the type of a variable on the right side of a rule is unknown). Probably this is not a very difficult problem to solve, however to avoid it each variable could be typed. This makes the syntax of the rules a bit more complex, however, the strong type checking would be more efficient.

Optimization

Evaluation of a fixpoint system may be expensive, especially when bags to be assigned to variables are very large. There are several opportunities for optimization. All optimization methods that are developed for SBQL can be used, in particular, indices, factoring out independent subqueries, removing dead subqueries, pushing selections before structure constructors (joins), caching, factoring out weakly dependent subqueries and pipelining. 

There are also methods specific for fixed point equations. The method from Datalog is known as seminaive evaluation. The main idea of this technique is to divide the fixpoint rules into groups, which may be evaluated independently from the other groups, utilizing only the final results of previous group evaluation. For instance, in E.FIX.9 we can evaluate myperson at the beginning, because this rule does not depend on other variables. Then, we can evaluate ancestors, as this variable depends only on myperson and ancestors. Then, we can evaluate allcousins, which depends on ancestors and allcousins. Finally, we evaluate result that depend on allcousins and myperson. In general, the method is easy to implement through constructing a dependency graph that for our example is shown in Fig.61. Calculations of a variable can start when all its predecessor variables are calculated.

Fig.61. A dependency graph between variables in E.FIX.9,

Another optimization opportunity concerns rules of the form

         x :- c union f(x)

where c is some already counted part independent from x. All recursive rules that we have introduced in our examples have this form. Under some assumptions (distributivity of f w.r.t. elements of its bag argument) such a recursion can be turned into the iteration:

      x0 = c; x1 = f(x0); ; x2 = f(x1); x3 = f(x2); x4 = f(x3); …

The iteration is continued till some next component returns the empty bag. Then,

      x = x0 union x1 union x2 union x3 union x4 union …

Probably the magic sets technique, also known from Datalog, can be adapted to SBQL queries – further research in this direction is needed.

Convergence of Fixpoint Equations

Evaluation of fixed point equations can result in an infinite loop. That is, the fixed point is never reached. For instance (c.f. Fig.56):

E.FIX.11

Find the names of all companies owned (directly or indirectly) by “ACME” (without removing duplicates).

SBQL:

fixpoint( comps ) {

   acme :- Company where name = ”ACME”;

   comps :- acme union comps.hasSharesIn.Company;}

 

If objects Company form a cycle that can be accessed from the ACME object via the hasSharesIn relationship, then the evaluation process of comps will never be finished. This situation can be tested, as usual, but without any guarantee that a test will be reliable. For instance, it is possible that the above example is convergent for “ACME” as a parameter, but it leads to an infinite loop for another parameter.

In general,  the convergence of fixpoint equations depends both on data structures to be processed and on query operators that are used in queries on right of the rules. For instance, the equation x = 1 – x starting from x = 1 will produce an infinite loop 1, 0, 1, 0, 1, 0,….  (despite there is a fixpoint x = 0.5).

In our opinion, there is no sufficiently general theory or algorithm that could discover and prevent all such cases. In Datalog there is the stratification concept that prevents infinite loops in case when a negation operator is used. However, there are a lot of other operators that may cause infinite loops, for instance, arithmetic minus, divide, set-theoretic difference, function sinus, etc. Moreover, as shown in E.FIX.10, infinite loops can be caused also by cyclic data structures, which can be explicit (navigation via pointers) or implicit (a cycle due to some non-evident dependencies among data). Hence some “rules of thumb” are necessary to prevent such cases, in particular:

  • Clear understanding of a problem to be solved by a system of fixpoint equations,
  • Being suspicious to all implicit or explicit cyclic dependencies in data to be processed,
  • Testing the equations for various cases.

Of course, these rules do not prevent all such bugs, hope that eliminate them in sufficiently reliable manner.

 

Recursive Procedures and Functions

 

As almost all popular programming languages, SBQL allows for declaring procedures (and methods), including recursive ones. Any procedure can be recursive due to the stack-based semantics of procedure calls. There is no special syntax separating recursive and non-recursive procedures. In principle, the number of levels of recursion is unlimited, however, there could be some physical and performance constraints. In contrast to languages such as Java, C++, C#, etc., SBQL is unique due to the following features:

·        SBQL functional procedures (methods) can return a bulk output and can be called within SBQL queries. Hence, SBQL functional procedures can encapsulate queries.

·        Parameters of SBQL procedures can be SBQL queries too. Parameters can be transmitted in the call-by-value, call-by-reference and strict-call-by-value modes.

·        Output returned by SBQL functional procedures can contain references. Hence, the output from a procedure calls can be used for updating.

·        As follows from the previous features, SBQL functional procedures can work as virtual updatable views. However, for the well recognized view updating problem we consider procedures and views as separate notions. (SBQL updatable views are the subject of a separate chapter).

In this chapter we do not specify precisely all the syntactic and semantic details of SBQL procedures. This is the subject of a separate chapter. Below we show on examples how SBQL recursive procedures can be used to formulate recursive tasks.

The syntax for SBQL procedures is shown below.

Procedure declaration

A procedure declaration consists of a procedure name, a typed parameter list, a type of its output and a list of imperative SBQL statements.

             procDeclaration ::= name’(‘[parameter_list]’)’[‘:’returntype] ‘{‘statement_list’}’

             parameterList ::= parDeclaration {‘;’ parDeclaration}

 parDeclaration ::= name ‘:’ type [ cardinality ]

If the cardinality is not specified, the default cardinality [1..1] is assumed.

Procedure invocation

             procInvocation ::= name’(‘[actual_par_list]’)’

             actual_par_list ::= parameter {‘;’ parameter}

             parameter ::= query

Return statement

 statement ::= return [query]’;’

Semantics: When a procedure is called, the statements in the procedure are evaluated in the consecutive order. When the return statement is reached, the result of the query being parameter of that statement is put on top of QRES. If no return statement is reached before the end of procedure evaluation, the procedure returns nothing (thus it is not a functional procedure).

Statements in SBQL procedures use SBQL queries. SBQL includes several imperative operators (object creation, assignment, deletion, etc.) and control flow statements (including loops) such as if, while, for each, etc.

A simplest recursive procedure consists of a single return statement, which depends on a procedure parameter either returning an empty collection or returning the result from invocation of the same procedure with different parameter value. A sample recursive procedure finding all components of specified parts, along with their amount required to make these parts, is shown below (see Fig.55):

E.PRO.1

Find all components of specified parts, along with their amount required to make these parts.

SBQL:

type PCType is record { p: PartType; c: integer; }

SubPartsPC( myPC: PCType[0..*] ): PCType[0..*] {

   return

      if exists myPC then bag(myPC, SubPartsPC (

            myPC.p.Component.((leadsTo.Part) as p, (c * amount) as c )))

      else bag()}

 

The procedure takes a bag of structures as the parameter (named myPC). Each structure contains a reference to a part object named p and the amount of parts of this type named c. Note that a query being the parameter of this procedure can have unlimited complexity.

E.PRO.2

Find all the details necessary to make 125 engines and 150 gear boxes.

SBQL:

(Detail) SubPartsPC ( bag(

           ((Part where name = ”engine”) as p, 125 as c),

           ((Part where name = ”gear box”) as p, 150 as c)))

 

E.PRO.3

How many bolts 5X50 is necessary to make 125 engines and 150 gear boxes?

SBQL:

sum((SubPartsPC ( bag(

           ((Part where name = ”engine”) as p, 125 as c),

           ((Part where name = ”gear box”) as p, 150 as c)))

                                                            where p.name = “bolt5X50”).c)

 

The advantage of recursive procedures is simplicity of problem decomposition. A recursive task can be easily distributed among several procedures (some of which may be reused in other tasks), and calculations within a single procedure can be performed in easy to comprehend steps, comparable to the simplicity of fixed point equations. A procedure calculating the cost and mass of a part illustrating this possibility is shown in the next example. The procedure utilizes the previously defined SubPartsPC procedure in order to perform the recursive processing and then performs calculations, utilizing local variables.

E.PRO.4

Get the total cost and mass of a part given as a parameter.

SBQL:

PartCostMass( myPart: PartType ): 

                                        record{ p: PartType; cost: real; mass: real; }{

    allSubParts: PCType[0..*];

    detailsMass: real; detailCost: real;

    addedMass; real; addedCost: real;

    create local allSubParts( SubPartsPC( bag((myPart as p; 1 as c)) ));

    detailMass := sum((Detail) allSubParts.(c * p.mass));

    detailCost := sum((Detail) allSubParts.(c * p.cost));

    addedMass := sum((Aggregate) allSubParts.(c * p.assemblyMass));

    addedCost := sum((Aggregate) allSubParts.(c * p.assemblyCost));

return (myPart as p, (addedCost+detailsCost) as cost,

                    (addedMass+detailsMass) as mass);

}

 

E.PRO.5

Get the cost for all parts having the mass higher than 100.

SBQL:

(Part as p).(PartCostMass(p) where mass > 100).(p, cost))

 

Recursive procedures in SBQL offer many advantages, when compared to stored procedures in relational DBMSs. Most of them are consequences of the fact that recursive procedures in SBQL are a natural extension of the SBA, working on the same principles and evaluated by the same evaluation engine – unlike relational systems, where stored procedures are an addition to the system, but are evaluated separately from SQL queries. The main advantages of SBQL recursive procedures are the following:

·        SBQL queries are valid as expressions, procedure parameters, etc.;

·        The type system is the same;

·        There is no impedance mismatch.

SBQL procedures may be also used as a way to reuse queries or fixpoint equations. Instead of writing a stand-alone fixpoint equation for a single use, it is possible to write a procedure utilizing the fixpoint equation, while providing a way to parameterize it easily. A sample procedure doing this is shown below:

E.PRO.6

Get all subparts of a part being a parameter.

SBQL:

subParts(myPart: PartType): PartType[0..*] {

return distinct(fixpoint (parts){

              parts :-  myPart union (parts.Component.leadsTo.Part); });}

 

Recursive procedures can also substitute fixed point equations or transitive closures, for example:

E.PRO.7

Calculate the square root of a, according to the fixpoint equation

x = (a/x + x)/2, starting from x = 1 and making 50 iterations.

SBQL:

Root( a: real ): real{

    return internalRoot(a; 1; 50);}

internalRoot(b:real; x:real; c:integer): real {

    if c = 0 then return x;

    else return internalRoot(b; (b/x + x)/2 ; c - 1); }

 

Formulation of recursive definitions is equally simple:

E.PRO.8

Calculate n-th Fibonacci number.

SBQL:

Fib(n: integer):integer {

   return if n = 0 or n = 1 then n else Fib(n-2) + Fib(n-1); }

 

Similarly, recursive procedures can be used for processing cyclic structures, for instance (c.f. Fig.56):

E.PRO.9

Find the names of all companies owned (directly or indirectly) by “ACME”.

SBQL:

visited: string[0..*]; //global variable storing names of visited objects

allOwnedBy( comp: CompType): CompType[1..*] {

    delete visited;

    intAllOwned(comp);

    return Company where name in visited;

}

intAllOwned( comp: CompType) {

    if not comp.name in visited then {

        create visited( comp.name );

        for each (comp.hasSharesIn.Company) as own do

            intAllOwned ( own );

    }

}

SBQL:

allOwnedBy(Company where name = ”ACME”)

 

In the example we use a global variable visited to avoid starting next navigation via hasSharesIn from an already visited object. This task can be of course solved in many different ways.

Conclusion

In this chapter we have presented and compared several implemented and postulated techniques for expressing recursive problems in database query and programming languages. We have considered three techniques: transitive closures (four variants), fixed point equations and recursive procedures and  functions. All of them are implemented in the ODRA system, hence we have gained some experience. Practical examples have shown that usability of these techniques can be different in various situations and none of them is dominating or disadvantageous. Although recursive tasks are not much frequent, in some important situations presence of the techniques in a database programming language can much simplify the programmers’ effort.


Last modified: November 24, 2009