©Copyright by Kazimierz Subieta.

What are query languages?

by Kazimierz Subieta

Back to Description of SBA and SBQL.

There are several views on what are query languages. In 1970-ties, when the term was coined, query languages were considered user-friendly tools that allow easy access to databases by non-professionals (or so-called “naive users” or “end users”). At that time SQL, QUEL and Query-By-Example were considered typical query languages fulfilling this goal. This view, however, was not justified by common practice and has been changed. Currently, query languages such as SQL or OQL have apparently user-friendly syntax (i.e. introduce keywords imitating the commands of the natural language), but essentially they are complex machine languages with strong syntactic rules, sophisticated semantics, not obvious pragmatics (c.f. “SQL puzzles”) and a lot of options that are usually properties of programming languages (create, update, delete, stored procedures, etc.).

It is a common opinion that such query languages address programmers rather than end users. Non-professional users prefer mouse-oriented interfaces, with friendly, easy to use and self-explained graphical metaphors based on choices from a menu, clicking buttons or hyperlinks, dragging and dropping, navigation in a visual graph, etc. In this meaning the term “query language” denotes easy (preferably graphical) end user tool allowing the user to ask some queries to databases, to browse through the database and to obtain their results in some easy visual form. There are a lot of examples of such visual user interfaces, in particular, QBE in MS Access, SKGN and Mavigator developed by M.Trzaska, and many others.

Yet another meaning of the term “query language” is introduced by proponents of mathematical theories, such as relational algebra, relational calculi, formal logic, and others (including some theories for object-oriented models). Theoreticians are trying to work out a (false) stereotype that query languages are syntactic manifestations of these theories. However, the theses that SQL is a syntactic variant of the relational algebra (or the mathematical logic) are worthless. Approximately, the relational algebra covers not more than 5% of the functionality of SQL. The rest is not founded on any theories. The 5% presents too narrow basis to claim that it is the core. In particular, updating operations, or even simple arithmetic + operator, are outside of these theories. In general, these theories do not address any issue relevant to programming constructs and programming abstractions (such as procedures, functions, methods, classes, views, triggers, and perhaps others) that are more and more often properties of query languages such as SQL.

The stack-based approach deals with query languages in the spirit of SQL, i.e. being essentially high-level programming interfaces. “High-level” means that they offer facilities for the database programming that are concise, conceptual and free of internal computer concepts. Below we present features of query languages of this kind.

There are two general approaches to the interoperation of query languages with programming languages. The traditional one assumes that a query language is much less powerful than a programming language. It is a sublanguage, i.e. it offers only access to a database and nothing more. To obtain full functionality, queries have to be embedded into a universal programming language. The approach has advantages, in particular, programmatic access to databases is done in a single language (typically SQL) and programming of database applications can be done in many popular programming languages, such as C, C++, Pascal, Java, etc. Sometimes the approach is claimed to be obligatory due to query optimization (which is easier if the power of a query language is reduced), but such a claim we consider another false stereotype. The approach has severe disadvantages relying on various incompatibilities between a query language and a programming language. The incompatibilities are collectively named “impedance mismatch”, which exposes some essential aesthetical and technical faults that are associated with the approach.

The novel approach to query languages assumes smooth (seamless) integration with programming constructs. The approach avoids the impedance mismatch. It results in novel programming languages having queries as basic building blocks of imperative commands and procedural abstractions. This approach can be observed in Oracle PL/SQL. MS SQL Server Transact-SQL, in the standard SQL-99, in our Loqis prototype, etc. The approach has only one disadvantage (for some people): it implies totally new programming languages, based on new principles and constructs. Some people, usually from commercial circles, are trying to convince the community that 2000 - 5000 of already existing computer languages are enough: Pascal, C++, Java, SQL and many other programming languages end the history of the domain. Obviously, this view is not reasonable (and even idiotic): nobody can claim that the progress in programming languages is over. We can remind that the history of Java is about 10 years, and the history of PHP is even much shorter. In this short time these languages have enjoyed the great success. This convinces us that new and new languages will constantly appear during perhaps the nearest 100 years, despite various claims that they are already unnecessary. Among them there will better database programming languages based on full and seamless integration with query languages. The Stack-Based Approach follows this trend in the most revolutionary way, by rejecting majority of ideological “roots” and false stereotypes stemming from relational theories and existing programming languages. In SBA we assume full unification of programming language expressions and queries in all the program contexts, including imperative statements (update, insert, create, delete, etc.), programming abstractions (procedures, functions, classes, methods, types, etc.) and database abstractions (views, integrity constraints, triggers, etc.).

Basic properties of query languages (understood as a kind of programming languages)

·        Conceptualization and abstraction: A query languages should support conceptual modeling of an application and should allow the programmer to access to databases on a very high abstraction level. They should follow the data independence principle, i.e. should not involve any technical details (such as file organization, buffers, indices, storage devices, etc.), making them transparent for the programmer.

·        Non-procedurality (declarativity): A query should specify directly the goal of the retrieval rather than actions of the machine leading to the goal. In other words, queries should specify what has to be done rather than how it has to be done by the machine. Essentially, this means that queries should be focused more on the business logic and should not involve extensively computer-oriented notions. Non-procedurality/declarativity is to be considered as impression of the programmer; in fact, semantics of queries can be expressed in a procedural form, c.f. the relational algebra. Associations of declarativity with any kind of mathematical logic (e.g. programming in logic or Datalog) are attempts to make fake stereotypes. No theory has preferences in this respect. The programmer’s feeling and the mode of his/her work is the only valid criterion.

·        Macroscopic processing: The programmer should have feeling that a query processes in parallel any number of data. For instance, in the SQL query 

select * from Emp where Sal > 5000

the programmer has the feeling that he/she processes the condition Sal > 5000 in parallel on any number of Emp data.

·        Naturalness: A query language should follow innate human psychology and the ways of programmers’ thinking. It should introduce macroscopic operators, such as selection, projection, navigation, union, etc., that have natural conceptual meaning for the programmers.

·        Efficiency: Execution of queries should be efficient for an application, i.e. the processing time should be reasonable, not violating expectations of programmers and users. Efficiency implies the necessity of very powerful query optimization methods, which are able to reduce the query processing time from e.g. 100 years (in a non-optimized, naive case) to 5 seconds (in the optimized case).

·        Universality: A query language should be universal enough to support any kind of retrieval goal that the programmer wishes to accomplish. The so-called “relational completeness” is far from being a proper measure of the universality, as it does not cover even such simple operations as +, e.g. 2+2 is not expressible in the relational algebra. The power of the Turing machine is also a false measure, as query languages are not intended to compute recursive functions but express mappings of any database state into a result expected by the programmer. For instance, if some database model introduces inheritance and polymorphism, but our query language does not address these concepts, it is not efficiently universal, even if it has the Turing power[1] (b.t.w. the Turing power can be achieved in query languages by quite banal extensions).  The proper measure of the universality should refer not to a mathematical concept, but to programmers feeling and needs: everything that the programmer needs to retrieve should be expressible in a query language. Universality is also associated with naturalness and efficiency: it not an issue that the retrieval goal is theoretically achievable - it must be achievable by normal programmer’s routine, in reasonable time, with reasonable effort and the resulting query should be executed in reasonable time.  This measure of the universality is of course relative and subjective. Nevertheless, it is easy to express lack of the universality by presenting some useful queries that cannot be expressed in a particular query language. For instance, providing a typical BOM database, SQL is not able to express the query “Get the total cost and mass of the engine” (providing the hierarchy of engine parts and the cost and mass associated with all the atomic parts).

·        Independence from an application domain: A query language, similarly to programming languages, should be independent from a particular application domain, i.e. should not introduce notions that are valid on one domain but are meaningless in another domain.

·        Interpreting mode of execution, late binding: Queries can be issued dynamically during run-time of an application, hence all the binding should be done during run-time. For instance, some application can create a view consisting of a query. The view can be added during processing of an application, hence all the operations of the query are performed during the run-time and all the bindings of the names occurring in the query are dynamic (run-time). In applications a query is frequently given as a string, hence the dynamic binding of all names occurring in the query is the only possible mode.

Desired features of approaches to query languages

·        Conceptual simplicity, clean and precise semantics.

·        A uniform approach to individual and bulk data.

·        A uniform approach to volatile and persistent data.

·        Minimal set of concepts necessary for the definition of semantics.

·        Universality of the approach to the semantics, covering all the language constructs that are to be introduced in the query language (including extensions to programming languages).

·        Compositionality and orthogonality, minimizing the syntactic/semantic constructs.

·        Uniform and homogeneous approach to all the concepts of object-oriented database models, such as complex nested objects, object identity, classes, static inheritance (between classes and interfaces), collections (bags, sequences, arrays), encapsulation, polymorphism, relationships (links among objects), dynamic object roles, dynamic inheritance (between objects), null values, types and typechecking, abstract data types, and perhaps others.

·        Uniform and homogeneous approach to all imperative programming constructs and abstractions based on a query language, such as database updates, program control statements, database views, procedures, functions, methods, parameters of procedures (functions, methods), and so on.

·        Modularity of query language, i.e. the possibility of encapsulation and hiding details of any of its constructs.

·        High potential for query optimization through strong and universal formal model.

·        Special attention to all semantic details: even smallest semantic problem is a big problem!

Last modified: December 14, 2005

[1] The Turing power can be achieved by quite banal features, thus it cannot be considered an efficient measure of the power of any programming or query languages. Pragmatic goals of query/programming languages (making programming efficient) are quite different than theoretical goals of a Turing machine (counting recursive functions), thus using Turing machine as a yardstick of  the universality of query/programming languages is total nonsense. This nonsense is unfortunately cultivated by so-called “theoreticians”, who are trying to establish this false stereotype as an axiom.