Developed at the

Polish-Japanese Institute of Information Technology

Chair of Software Engineering

SBA and SBQL home pages

© Copyright by ODRA team, © Copyright by PJIIT

 

 

ODRA – Object Database for Rapid Application development

Description and Programmer Manual

 

 

by Tomasz Kowalski and the ODRA team

 

18. ODRA Indexing

Indices are auxiliary (redundant) database structures stored at a server. A database administrator manages a pool of indices generating a new index or removing an index depending on the current need. As indices at the end of a book are used for quick page finding, a database index makes quick retrieving objects (or records) matching given criteria possible. Because indices have relatively small size (comparing to a whole database) the gain in performance fully justifies some extra storage space. Due to single aspect search, which allows one for very efficient physical organization and very fast search algorithms, the gain in performance can be even several orders of magnitude. Indices, however, consume some extra time when a database is updated. Hence introducing indices is mainly constrained by the proportion of searches and updates. For ODRA no simple rule for decisions concerning the presence or absence of indices is available; they depend on the experience of a database administrator.

18.1 General Idea of ODRA indexing

Objects can be indexed using a wide range of selection criteria (i.e. search key). The value of a search key must depend on a current object. It can be:

  • An objects attribute or a sub-object attribute (using path-expressions),
  • The result of an expression, which can contain build-in query language functions or user defined functions calculated from objects attributes or sub-objects attributes (function-based indices).

The last approach enables the administrator to create an index matching exactly predicates within frequently occurring queries, so their evaluation is faster and uses the minimal amount of I/O operations.

Currently the implementation supports indices based on Linear Hashing structures which can be easily extended to its distributed version SDDS in order to optimally utilize data grid computational resources.

18.1.1 Example on indexing

In query optimization indices are used in the context of the where operator, when the left-hand operand is indexed by key values of the right-hand selection predicate. For instance, if the administrator will establish the index named idxEmpSalary returning references to Emp objects depending on their salaries, then the following queries will produce the same result (the second query is generated by the automatic query optimizer).

(Emp where sal = 2000 and worksIn.Dept.dName = “Sales”).fName;

(idxEmpSalary(2000) where worksIn.Dept.dName = “Sales”).fName;

In case of big databases replacing the where evaluation with an index function call may cause performance gain in orders of magnitude. However to achieve this effect, the database should ensure index transparency and automatic index updating.

18.1.2 Physical Properties of Indices

The idea of indexing implies two important properties:

  • Index transparency from a point of view of a database application programmer. The programmer should not be aware of the indices existence. They are used automatically during query evaluation. Therefore, the administrator of a database can freely generate new indices and remove them without a need to change the code of applications. The mechanism responsible for index transparency during query evaluation is called index optimizer. Its function is to replace a part of a query (transparently for a user) with an index call in order to minimize the amount of processed data.
  • Automatic index updating which is the consequence of changes in the database. Indices, like all redundant structures, can lose cohesion when the database is updated. An automatic mechanism should improve, eliminate or generate a new index in case of database updates. Automatic index updating makes modifying, removing and inserting objects slower. An index makes no sense and should be removed when the indexed objects are modified more frequently than queried via the index.

18.2 Index Management – creating and removing

All indices existing in a database are registered and managed by the ODRA Index Manager. Each index is associated with a module where it has been created. Its name is to be unique. The index manager will be described in detail in a separate technical documentation.

The administrator issues the 'add index' command in CLI to create an index in the database. The syntax of this command is the following:

add index <indexname> [<type_indicators>] on <creating_query>

This is the only command necessary to install new index in the databse. The type indicators are optional and are described more in the further sections.

The next section briefly discusses a creating query structure and current rules and constrains concerning creating single-key indices.

The syntax of command for removing an index from the register is simple:

remove index <indexname>

18.2.1 Creating a single-key index

<creating_query> is an SBQL query, which returns references to objects with associated key values. In order to create an index the administrator must provide the 'add index' command with such a query as a parameter.

Syntax of <creating_query> is the following:

<object_expression> join <key_expression>;

where:

  • <object_expression> - generates references to objects.
  • <key_expression> - generates key values for given objects.

add index idxPerCity on Person join birthYear;

E.g. if such an index is added it is transparently used in optimization of the following queries evaluation:

Person where birthYear = 1980;

Person whereWarsawin address.city and birthYear = 1980;

The current syntax can be changed in the future to be more user friendly (e.g. join can be replaced with parentheses embracing <key_expression>).

Indexed objects are defined by <object_expression> which is to be bound in the database section (root objects). The <object_expression> is to be built using only dot operators and names (i.e., by a path expression).

<key_expression> sub-expressions should be bound in the join operator stack section. They are allowed to return a value of the following types: integer, double, string, reference and boolean. In the simplest case <key_expression> can be an objects attribute, however derived attributes and expressions containing procedure calls are allowed.

An important property of a created index is a cardinality of a key. It indicates the number of key values, which can be returned for the given object. The index optimization is simplest if one key value is always returned for an indexed object (singular cardinality [1..1]). Currently the optimization for keys with a maximal cardinality greater then 1 is not supported. The optional cardinality [0..1] of a key enforces more strict rules for query optimization utilizing index in order to preserve query semantics after optimization. In a following example a key cardinality is optional because manages attribute is optional for Emp objects.

add index idxManagerDName on Emp join manages.Dept.dName;

Example queries which can be transparently optimized by applying idxManagerDName index are shown in section 1.3.

18.2.2 Dense, Range and Enum type indicators

The creating index syntax allows the administrator to specify general index key properties, i.e. concerning key values or the optimization goal. These are achieved by introducing optional <type_indicators>: dense, range and enum.

The dense indicator implies that the optimization of selecting queries which use a given key value as a condition will be used only for selection predicates based on '=' or in operators. Therefore the distribution of indexed objects in an index (e.g. in hash table) can be more efficient for optimization of such cases.

add index EmpSal(dense) on Emp join sal;

The range indicator implies that optimization will concern selection predicates based not only on '=' or in operators but also on range operators: '>', '', '<' and ''.

add index idxperage(range) on Person join (2007 - birthYear);

E.g. if administrator would like to optimize evaluation of following range queries:

Person where 23 <= 2007 – birthYear and 2007 - birthYear < 23;

Person where “Smith” = lName where 2007 - birthYear > 50;

she should issue the command mentioned above, which adds the idxperage index.

The enum indicator was introduced in order to take advantage of keys with countably limited set of distinct values. The performance of an index can be strongly deteriorated if key values have low cardinality e.g. person eye colour, a marriage status (boolean value) or the year of birth. Using the enum key type, an index internally stores all possible key values and uses this information to optimize the index structure.

The enum key type can deal with optimizing selection predicates exactly like in the case of the range indicator, i.e. for: '=', in, '>', '', '<' and '' operators.

add index idxempdnbr(enum) on Emp join worksIn.Dept.dNbr;

The default type indicator for integer, string, double or reference values is dense. In case of boolean values, the enum type is always used. The dense indicator should always be used for reference values.

add index idxempdept on Emp join worksIn;

The next section describes ODRA's optimization rules which can be helpful in applying good indexing.

18.3 Query optimization tips

Currently the index optimizer analysing the right operand of the where operator takes into consideration all selection predicates joined with an and and or operators.

Building selection criteria with the non [1..1] key cardinalities may cause runtime errors. Selection predicates based on '=', '>', '', '<' and '' operators force using single values as left and right operands. Unexpected number of operand values causes a runtime error. More operand values are allowed only if the in operator is used as a predicate because it does not constrain the cardinality of a right operand. To enable full optimization of queries with optional cardinality [0..1] suitable predicate based on exists expression should be used.

Example queries

  • unsafe predicate evaluation (may cause run-time error): the left side of the equality expression has cardinality [0..1] like the manages attribute.

Emp where manages.Dept.dName =Sales”;

  • safe predicate evaluation as in operator is used (this query is optimizable with idxManagerDName index).

Emp where “Sales” in manages.Dept.dName;

  • safe predicate evaluation as exists operator ensures it (this query is optimizable with idxManagerDName index).

Emp where exists(manages.Dept.dName) where “Sales” = manages.Dept.dName;

18.4 Examples

More examples are available through ODRA SVN repository in batch files stored in the folder EGB/res/index/batch/”. Batch files include queries and CLI commands and can be executed using following syntax:

batch <batch_file_name>

            Short description of batch files contents:

l        res\sampledata\batch\createM0.cli - creates a module with sample set of data necessary to perform batch test's and examples.

l        res\index\batch\add.cli – creates a set of indices.

l        res\index\batch\remove.cli – removes a set of indices.

l        res\index\batch\test-error_idxmgr.cli – test for index manager containing several errors.

l        res\index\batch\test-xml.cli – test for optimizing data from XML import (done for medium database).

l        res\index\batch\startall.cli starts all mentioned above tests.

l        res\index\batch\start.cli starts some of tests mentioned above.

 

 

Last modified: July 14, 2008