What are query languages?
Back to Description
of SBA and SBQL.
There are several views on what are query
languages. In 1970-ties, when the term was coined, query languages were
considered user-friendly tools that allow easy access to databases by
non-professionals (or so-called “naive users” or “end users”).
At that time SQL, QUEL and Query-By-Example were considered typical query
languages fulfilling this goal. This view, however, was not justified by common
practice and has been changed. Currently, query languages such as SQL or OQL
have apparently user-friendly syntax (i.e. introduce keywords imitating the
commands of the natural language), but essentially they are complex machine
languages with strong syntactic rules, sophisticated semantics, not obvious
pragmatics (c.f. “SQL puzzles”) and a lot of options that are
usually properties of programming languages (create, update, delete, stored procedures, etc.).
It is a common opinion that such query
languages address programmers rather than end users. Non-professional users
prefer mouse-oriented interfaces, with friendly, easy to use and self-explained
graphical metaphors based on choices from a menu, clicking buttons or
hyperlinks, dragging and dropping, navigation in a visual graph, etc. In this
meaning the term “query language” denotes easy (preferably
graphical) end user tool allowing the user to ask some queries to databases, to
browse through the database and to obtain their results in some easy visual
form. There are a lot of examples of such visual user interfaces, in
particular, QBE in MS Access, SKGN and Mavigator developed by M.Trzaska,
and many others.
Yet another meaning of the term “query
language” is introduced by proponents of mathematical theories, such as
relational algebra, relational calculi, formal logic, and others (including
some theories for object-oriented models). Theoreticians are trying to work out
a (false) stereotype that query languages are syntactic manifestations of these
theories. However, the theses that SQL is a syntactic variant of the relational
algebra (or the mathematical logic) are worthless. Approximately, the
relational algebra covers not more than 5% of the functionality of SQL. The
rest is not founded on any theories. The 5% presents too narrow basis to claim
that it is the core. In particular, updating operations, or even simple
arithmetic + operator, are outside of these theories. In general, these
theories do not address any issue relevant to programming constructs and programming
abstractions (such as procedures, functions, methods, classes, views, triggers,
and perhaps others) that are more and more often properties of query languages
such as SQL.
The stack-based approach deals with query
languages in the spirit of SQL, i.e. being essentially high-level programming
interfaces. “High-level” means that they offer facilities for the
database programming that are concise, conceptual and free of internal computer
concepts. Below we present features of query languages of this kind.
There are two general approaches to the
interoperation of query languages with programming languages. The traditional
one assumes that a query language is much less powerful than a programming
language. It is a sublanguage, i.e.
it offers only access to a database and nothing more. To obtain full
functionality, queries have to be embedded
into a universal programming language. The approach has advantages, in
particular, programmatic access to databases is done in a single language
(typically SQL) and programming of database applications can be done in many
popular programming languages, such as C, C++, Pascal, Java, etc. Sometimes the
approach is claimed to be obligatory due to query optimization (which is easier
if the power of a query language is reduced), but such a claim we consider
another false stereotype. The approach has severe disadvantages relying on
various incompatibilities between a query language and a programming language.
The incompatibilities are collectively named “impedance mismatch”, which
exposes some essential aesthetical and technical faults that are associated
with the approach.
The novel approach to query languages assumes
smooth (seamless) integration with programming constructs. The approach avoids
the impedance mismatch. It results in novel programming languages having
queries as basic building blocks of imperative commands and procedural
abstractions. This approach can be observed in Oracle PL/SQL. MS SQL Server
Transact-SQL, in the standard SQL-
Basic
properties of query languages (understood as a kind of programming languages)
·
Conceptualization and abstraction: A query languages
should support conceptual modeling of an application and should allow the
programmer to access to databases on a very high abstraction level. They should
follow the data independence principle, i.e. should not involve any
technical details (such as file organization, buffers, indices, storage
devices, etc.), making them transparent for the programmer.
·
Non-procedurality (declarativity): A query should specify directly
the goal of the retrieval rather
than actions of the machine leading
to the goal. In other words, queries should specify what has to be done rather than how it has to be done by the machine. Essentially, this means that
queries should be focused more on the business
logic and should not involve extensively computer-oriented notions.
Non-procedurality/declarativity is to be considered as impression of the
programmer; in fact, semantics of queries can be expressed in a procedural
form, c.f. the relational algebra. Associations of declarativity with any kind
of mathematical logic (e.g. programming in logic or Datalog) are attempts to
make fake stereotypes. No theory has preferences in this respect. The
programmer’s feeling and the mode of his/her work is the only valid
criterion.
·
Macroscopic processing: The programmer should have feeling
that a query processes in parallel any number of data. For instance, in the SQL
query
select * from
Emp where Sal > 5000
the
programmer has the feeling that he/she processes the condition Sal >
·
Naturalness: A query language should follow innate human psychology and the ways of
programmers’ thinking. It should introduce macroscopic operators, such as
selection, projection, navigation, union, etc., that have natural conceptual
meaning for the programmers.
·
Efficiency: Execution of queries should be
efficient for an application, i.e. the processing time should be reasonable,
not violating expectations of programmers and users. Efficiency implies the
necessity of very powerful query optimization methods, which are able to reduce
the query processing time from e.g. 100 years (in a non-optimized, naive case)
to 5 seconds (in the optimized case).
·
Universality: A query language should be
universal enough to support any kind of retrieval goal that the programmer
wishes to accomplish. The so-called “relational completeness” is
far from being a proper measure of the universality, as it does not cover even
such simple operations as +, e.g. 2+2 is not expressible in the relational
algebra. The power of the Turing machine is also a false measure, as query
languages are not intended to compute recursive functions but express mappings
of any database state into a result
expected by the programmer. For instance, if some database model introduces
inheritance and polymorphism, but our query language does not address these
concepts, it is not efficiently universal, even if it has the Turing power[1]
(b.t.w. the Turing power can be achieved in query languages by quite banal
extensions). The proper measure of the
universality should refer not to a mathematical concept, but to programmers
feeling and needs: everything that the programmer needs to retrieve should be
expressible in a query language. Universality is also associated with
naturalness and efficiency: it not an issue that the retrieval goal is
theoretically achievable - it must be achievable by normal programmer’s
routine, in reasonable time, with reasonable effort and the resulting query
should be executed in reasonable time.
This measure of the universality is of course relative and subjective.
Nevertheless, it is easy to express lack of the universality by presenting some
useful queries that cannot be expressed in a particular query language. For
instance, providing a typical BOM database, SQL is not able to express the
query “Get the total cost and mass of the engine” (providing the
hierarchy of engine parts and the cost and mass associated with all the atomic
parts).
·
·
Interpreting mode of execution, late
binding: Queries
can be issued dynamically during run-time of an application, hence all the
binding should be done during run-time. For instance, some application can
create a view consisting of a query. The view can be added during processing of
an application, hence all the operations of the query are performed during the
run-time and all the bindings of the names occurring in the query are dynamic
(run-time). In applications a query is frequently given as a string, hence the
dynamic binding of all names occurring in the query is the only possible mode.
Desired
features of approaches to query languages
·
Conceptual simplicity, clean and precise semantics.
·
A uniform approach to individual and bulk data.
·
A uniform approach to volatile and persistent data.
·
Minimal set of concepts necessary for the definition
of semantics.
·
Universality of the approach to the semantics,
covering all the language constructs that are to be introduced in the query
language (including extensions to programming languages).
·
Compositionality and orthogonality, minimizing the
syntactic/semantic constructs.
·
Uniform and homogeneous approach to all the concepts
of object-oriented database models, such as complex nested objects, object
identity, classes, static inheritance (between classes and interfaces),
collections (bags, sequences, arrays), encapsulation, polymorphism,
relationships (links among objects), dynamic object roles, dynamic inheritance
(between objects), null values, types and typechecking, abstract data types,
and perhaps others.
·
Uniform and homogeneous approach to all imperative
programming constructs and abstractions based on a query language, such as
database updates, program control statements, database views, procedures,
functions, methods, parameters of procedures (functions, methods), and so on.
·
Modularity of query language, i.e. the possibility
of encapsulation and hiding details of any of its constructs.
·
High potential for query optimization through strong
and universal formal model.
·
Special attention to all semantic details: even
smallest semantic problem is a big problem!
Last modified: December 14, 2005
[1] The Turing power can be achieved by quite
banal features, thus it cannot be considered an efficient measure of the power
of any programming or query languages. Pragmatic goals of query/programming
languages (making programming efficient) are quite different than theoretical
goals of a Turing machine (counting recursive functions), thus using Turing
machine as a yardstick of the
universality of query/programming languages is total nonsense. This nonsense is unfortunately cultivated by
so-called “theoreticians”, who are trying to establish this false
stereotype as an axiom.