©Copyright by Kazimierz Subieta.

SBQL order by Operator and Range Queries

by Kazimierz Subieta

Back to Description of SBA and SBQL.

A sorting operator order by is introduced in SQL and is proposed in other query languages, in particular in ODMG OQL. The operator makes it possible to determine sorting of a query result according to some chosen keys. Sorting is a very important operation in query languages for the following reasons:

In the relational model sorting was considered auxiliary, implementation feature, which can be used only for forming final output from a query. The sorting cannot be covered by the relational model. Relations are sets and the order of tuples in a relation is inessential. Hence the sorting operator and operations relying on sorted collections are not expressible in the relational model theory. They are also not expressible in other theories developed for the relational model, such as relational calculus and formal logic. Fortunately, majority of implementations of SQL do not consider theories as a serious obstacle in developing programmer’s interfaces. Some SQL implementations provide also range queries allowing to retrieve i-th element from a sorted relational table (or to retrieve elements from i-th to j-th).

In object-oriented databases sorting operator is provided as well. In ODMG OQL the sorting operator is introduced exactly as in SQL, including range queries. Object-oriented databases usually introduce sequences as a kind of stored collections, hence sorting operator and range queries become necessary. Similarly, XML assumes the order of its parts, hence languages such as XQuery can be used to formulate range queries.

However, some aspects of the sorting operator and range queries are not considered systematically in the literature and in proposed languages. Some range queries are still impossible to formulate. For instance, we a query “Get employees in the alphabetic order according to last names, together with the rank number of their salaries (the lowest salary obtains the rank number 1, the highest salary obtains the rank number count(Employees) )”. Such a query cannot be formulated in SQL, OQL and XQuery. The reason is that in these languages a rank number can be used in queries as input, but not as output. SBQL is the first language that allows for both cases.

In this chapter we present thoroughly major issues related to the sorting operator and range queries in the object query language SBQL.

 

Sorting Operator in SBQL

In SBQL the sorting operator belongs to non-algebraic operators. We assume the following syntax:

query ::= query order by query

Semantics of the query q1 order by q2 is as follows. As usually, a collection being sorted is determined by q1 and a sorting key by q2.

The precise semantics is explained in the following steps:

1.      First q1 join q2 is evaluated (see the semantics of the non-algebraic join operator). Let q1 returns bag{r1, r2, …} or sequence{r1, r2, …}, a sequence is converted to bag. In the result of the join we obtain a bag of structures

bag{ struct{ r1, v11, v21, ..., vk1 }, struct{ r2, v12, v22, ..., vk2 }, .... }

where struct{ v1i, v2i, ..., vki } is a structure returned by q2 for ri.

2.      If some vji is a reference, a dereference operator is performed which changes the reference to a value. If a dereference concerns a reference to a complex object then it is treated as a failure. This situation can be considered a typing error discovered during static type checking. If the dereference is impossible or the value after the dereference does not belong to the type with a natural linear ordering, then such a situation is considered a failure too. For instance, this is the case of multimedia data types. This situation can also be considered as typing error discovered during static analysis of a query.

3.      The bag resulted from q1 order by q2 is sorted according to v1i, then for identical v1i is sorted according v2i, etc. till performing the sorting for the last sorting key vki.

4.      After sorting we obtain a sequence

sequence{ struct{ s1, vs11, vs21, ..., vsk1 }, struct{ s2, vs12, vs22, ..., vsk2 }, .... }

which differs from the previous bag only by the order of structures.

1.      The final result of q1 order by q2

sequence{ s1 , s2 , .... }

is obtained from the previous sequence by removing results returned by q2 (equivalently, projection of the sequence to the results returned by q1).

Examples (see Fig.35 in the Chapter SBQL for the AS1 model)

 

E.RAN.1

Get references to employees sorted according to names

SBQL:

Emp order by name

Note that the query is formulated with the use of inheritance.

E.RAN.2

Get references to employees sorted according to their ages and then according to names

SBQL:

Emp order by (age, name)

Note the use of the inherited method.

 

E.RAN.3

Get departments sorted according to the number of employees and then according to boss names; return names of departments, and their locations sorted alphabetically.

SBQL:

(Dept order by (count(employs), (boss.Emp.name))).

(dname, ((loc as x) order by x).x) group as locations)

We use an auxiliary name x as an ordering key. Then, to remove x from the result we make projection on x. All locations of a department are grouped under a single name locations. Note that in the result bag we return references rather than values.

 

E.RAN.4

As above, but return only departments with the budget greater than 1 mln. The output should contain the number of employees and name of a boss.

SBQL:

(((Dept where budget > 1000000)

join count(employs) as c join (boss.Emp.name) as b) order by (c, b)).

(dname, c, b, ((loc as x) order by x).x) group as locations)

We assume that all algebraic and non-algebraic operators preserve the order of elements in a processed sequence. This concerns such operators as where, dot, as, for each, etc. For some operators, e.g. for join, this is not obvious. In general, for each SBQL operator it should be determined how it behaves for combination of kinds of collections being its arguments.

Sometimes sorting of strings require regarding/disregarding small and capital letters. This effect can be achieved through a typical operations on strings.

E.RAN.5

Get references to employees sorted according to names; disregard lower and upper letter cases

SBQL:

Emp order by toUpper(name)

 

Empty and Multi-Valued Keys

The presented semantics of the order by operator has several peculiarities which should be well understood. The first of them concerns the situation when the sorting attribute is optional, for instance sal in Fig.35. Consider the query

 

E.RAN.6

Get references to employees sorted according to salaries

SBQL:

Emp order by sal

According to the semantics of the join operator on which the order by operator is based on, the query omits all employees for which the subobject sal does not exist. Hence the sorted result will contain only references to Emp objects having sal attribute. If the programmer wants to take into account all Emp objects, he/she can use, in particular, one of the methods that are presented in the chapter Storing and Processing Irregular Data. For instance, assuming that for employees having no salary the sorting key is 0, the above query can be formulated as follows:

 

E.RAN.7

Get references to all employees sorted according to salaries; assume sal = 0 for employees having no salary

SBQL:

Emp order by max(bag(0, sal))

SBQL:

Emp order by if exists(sal) then sal else 0

A similar situation arises with multi-valued keys. Consider the query

 

E.RAN.8

Get references to employees sorted according to jobs

SBQL:

Emp order by job

Attribute job is multi-valued, hence according to the join operator each reference to an employee object having n jobs, n >1, will be repeated n times in the final result. Such interpretation of the sorting operator seems to be the most logical and reasonable, however, some programmers could be surprised with such a result, hence they should be aware of the semantics. Of course, there are many ways to avoid repetitions of references. For instance, the programmer can write a query in such a way that for sorting only the first job is considered (providing jobs are typed as sequences):

 

E.RAN.9

Get references to employees sorted according to jobs, return employee name and jobs (no repetition of Emp references).

SBQL:

(Emp order by job[1]).(name, job group as jobs)

If jobs are not typed as sequences, depending on the type constraint job[1] will return a type error or a random job. It is also possible to construct a query that takes some additional sorting logic. Assume there is a collection of objects JobRange(j: string, r: integer) that assign to jobs range number (1 is the range of the most important job, etc.). Now we can ask the following query:

 

E.RAN.10

Sort employees according to their most important jobs; return employee name and jobs sorted according to their range (no repetition of Emp references).

SBQL:

(Emp order by min((JobRange where j in job).r).

(name, ((job as k order by ((JobRange where j = k).r).k) group as jobs

The examples show that there is a lot of semantic peculiarities with the sorting operator. However, the power of SBQL gives the hope that almost all of them can be efficiently formulated in the query language.

 

Sorting in Ascending and Descending Order

In SQL and OQL there are special keywords asc and desc placed after a sorting key; asc is optional. Mathematically, asc is an identity function, while desc is a generic function that returns the reverse of an argument (assuming the linear ordering of the sorting key domain). The function desc for a number X is simply –X. Alternatively, assuming that a sorting key should be always a positive number, we can assume that desc(X) = MaxPositiveNumber – X. For strings the function desc can be defined as a conversion of the string characters where each character with ASCII code X is substituted by a character with ASCII code 256 –X. Similarly for other coding standards. The function desc should be defined for each domain that values can be sorting keys. It is a secondary issue if such a function should be written in a typical syntax desc(X) or in the syntax assumed by SQL and OQL: X desc.

E.RAN.11

Get references to employees sorted according to age, descending, and then according names, ascending.

SBQL:

(Emp order by (desc(age), asc(name))

 

Alphabetic Order in Native Languages

In early relational systems the SQL order by operator caused a lot of controversy concerning sorting strings in native languages different than English (especially in German and French). The SQL developers assumed the ASCII coding for English characters, which exactly corresponds to the alphabetic order of strings in English. However, the rules of alphabetic order for other languages (German, French, Polish, etc.) are different. Moreover, alphabets of these languages contain characters absent in the English alphabet. Sometimes sorting rules have anomalies, for instance in German double s is equivalent to the character β. Changing the rules of ordering by SQL was totally unacceptable for many users (and office rules), who wanted to follow several hundreds years of tradition concerning various kinds of alphabetically ordered lists, reports, dictionaries, etc. In effect, vendors of relational systems were forced to change the rules of sorting assumed by the SQL order by operator. The solution gives the decision in hands of the database administrator, who has proper options to determine the order of strings for a particular application according to the given native language. This method has solved majority of sorting problems, but not all of them. For instance, one application can require (in the same run) sorting according to English, according to German and according to Polish. Only the programmer of a database application (not the database administrator) is able to determine which kind of ordering is required in a particular place of the program.

There is a simple solution of the problem, which consists in implementing a family of functions with string arguments that return the ordering key corresponding to a particular native language. Such a function takes a parameter being a string in a native language and returns a string of numbers. For each character in the input string the function returns a number being its position in the alphabetic rank. For instance, for Polish the alphabet the function mapping can be as follows: a→1, →2, b→3, c→4, →5, d→6, e→7, →8, f→9, g→10, h→11, … This reminds the ASCII coding, but the system can be different than ASCII due to some exceptional cases (e.g. equivalence of –ss- and –β- in German). Because such functions are an internal sorting feature, correspondence to any standard is less essential.

An additional parameter of such a function can determine an encoding standard of an input string, for instance, ISO, UTF-8, etc.

 

E.RAN.12

Get references to employees sorted according to department names in English (ISO), and then, according to employee names in Polish (UTF-8).

SBQL:

(Emp order by (English(ISO, worksIn.Dept.dname), Polish(UTF-8, name))

 

Range Queries

In general, range queries are properties of sequences (as collections) rather than directly properties of ordering query operator. Because the order by operator always returns a sequence, range queries are naturally associated with this operator. In the simplest variant a range query means the possibility to choice an i-th element of a sequence, where i is a parameter that can be determined by an expression (a query). The typical syntax is as follows:

query ::= query [ query ]

The query q1[q2] returns i-th element of q1, where i is determined by q2. In a more general variant, q2 can return a bag of numbers. In this case q1[q2] returns all elements that are pointed by q2. We can also assume cardinalities written in the form i..j or i..*, which return a bag of numbers starting from i and ending by j, or starting by i and ending by infinity. This convention gives us a lot of possibility to write range queries, for instance:

E.RAN.13

Get references to 50 best-paid employees.

SBQL:

(Emp order by desc(sal) )[1..50]

 

E.RAN.14

Get the median salary of employees

SBQL:

((Emp order by sal )[(integer)(count(Emp)/2)]).sal

 

Note that in the above examples we assume that counting of sequence elements starts from 1. The languages C, C++, Java, C# and standards CORBA and ODMG assumed that the counting starts from 0. This solution was reasonable in assembler and C/C++ due to accordance of indices of arrays with pointer arithmetic. However, currently developed languages are very far from assemblers, hence such a convention is a kind of unnecessary atavism what is illogical and error prone. In SBQL we have rejected it.

E.RAN.15

Get references to 50 best-paid employees and 50 worst-paid employees.

SBQL:

(Emp order by desc(sal) )[bag(1..50, (count(Emp) – 49)..count(Emp)))]

 

If we would use the convention that ordering of sequences starts form 0, the above query must be reformulated to a less legible form:

E.RAN.16

Get references to 50 best-paid employees and 50 worst-paid employees.

???

(Emp order by desc(sal))[bag(0..49,(count(Emp)-50)..(count(Emp)-1))]

 

Queries written in this syntax are able to return elements of a sequence if their order numbers are known, but are unable to return order numbers for sequence elements that are selected in another way. For instance the query “Which rank number takes Brown in the ranking concerning salary?” is to be formulated as a program (a sequence of instructions) rather than a single query. For this reason in the Loqis and ODRA system we have introduced a special operator range as. The syntax is similar to as and group as operators:

query ::= query range as name

Semantics of the query q range as n is as follows. Let q returns sequence{r1, r2, r3, …}. Then, the query q range as n returns

sequence{ struct{ r1, n(1), struct{ r2, n(2) , struct{ r3, n(3) }, .... }

Each element ri returned by q is associated with the binder n(i), where n is a name determined by the programmer, i is a rank number of this element in the sequence, starting from 1. This makes it possible to formulate any conditions based on this rank number, as well as returning it as an output of a query.

E.RAN.17

Which rank number takes Brown in the ranking concerning salary?

SBQL:

(((Emp order by desc(sal)) range as s) where name = “Brown” ).s

 

E.RAN.18

Get a report sorted by departments names and returning the names and the rank of the department concerning the number of employees, descending.

SBQL:

(((Dept order by desc(count(employs))) range as c)

order by dname) “Brown” ).(dname, c)

 

E.RAN.19

Return employees having the rank in the salary category (descending) at least on 10 higher from the rank in the age category (descending):

SBQL:

((((Emp order by -sal) as e range as i) order by -e.age) range as j )

where i >= j+10).e

 

E.RAN.20

Return the average salary of employees, disregarding 25% of the worst paid and 25% of the best paid ones.

SBQL:

avg((((Emp orderby sal) range as r)

where r > (integer)(0.25 * count(Emp))

and r < (integer)(0.75 * count(Emp))).sal)

 


Last modified: August 6, 2009