Developed at the

Polish-Japanese Institute of Information Technology

Chair of Software Engineering

SBA and SBQL home pages

© Copyright by ODRA team, © Copyright by PJIIT

 

 

 

 

ODRA – Object Database for Rapid Application development

Description and Programmer Manual

 

 

 

by Kazimierz Subieta and the ODRA team

 

4. ODRA Object-Oriented Store Model

ODRA is based on UML-like object model, with complex objects, (nested) collections, classes, methods, static inheritance and binary associations. In our plans we assume extension of the UML object model with dynamic object roles and dynamic inheritance. The model fully covers the relational model, as it can be considered as a primitive object model where each tuple is an object and no inheritance, no methods and no associations are supported. This observation is important for making object-relational wrappers for ODRA. The ODRA model covers also the XML model, which (conceptually) offers hierarchies of nested objects with no classes, inheritance and associations. However, some minor features of the XML model have no direct counterparts in the ODRA store model, in particular, the order of XML subobjects is not supported in ODRA. In the same way the ODRA store model can be 1:1 compatible with the RDF model, with the Topic Maps model, etc. These properties of the ODRA store model allow to implement SBQL for a lot of different data environments. Moreover, such implementation can be strongly type checked and optimized by powerful SBQL query optimization methods.

4.1 Modules

In ODRA the basic unit of database organization is a module. As in popular object-oriented languages, a module is a separate component of an application. An ODRA module groups a set of database objects and compiled programs and can be a base for reuse and separation of programmers workspaces. From the technical point of view and the assumed object relativism principle the modules can be perceived as special purpose complex objects that store metadata and data.

4.1.1 Module metabase and database

Each module includes (apart from some internal system data) two kinds of information: a metadata stored in a metabase and a data stored in a module database.

A metabase stores information needed during compilation of an SBQL source code. It is used for query analysis, type checking and optimization. Objects that are stored in the metabase contain meta-information about objects stored in the database. For example, for a declaration of a particular object (a variable) in the module source code the metabase stores such information as the name of the object, its type and its cardinality. Thanks to this information many type errors can be detected during compilation[1]. Moreover the information stored in the metabase is essential to query optimization. The module metabase is used both during compilation and during runtime. In contrast, a module database stores only data needed at runtime.

4.1.2 System module

A new database contains a single default module called system module. A system module is a root for all user defined modules. Additionally it stores the data and metadata that can be perceived as ODRA standard library objects. All user defined modules automatically import the system module.

4.1.1         4.1.3 User defined modules

Each new user account added to a database server is ascribed with a default module that represents the root of the user defined database. The name of the module is the same as the name of the user account. All data created inside this module belong to the corresponding user. The user data can be additionally organized with sub-modules.

4.2 Objects, Nested Objects

In this document we primarily use the term object to denote stored data structures. Frequently, there is a correspondence between such data structures and real world objects, but this is rather informal relationship that not always holds. We do not make a difference between objects and variables known from a lot of programming languages. Sometimes the concepts are distinguished according to membership in classes: objects must be members of classes, while variables need not. Because there is very subtle difference between the class and type concepts, such a criterion is not firm. Hence, any stored data structure we will call object or variable, without assuming any syntactic or semantic difference between the concepts.

Our objects inherit a property of programming variables that says that objects can be stored strucures only. SBQL queries never return objects, but values of objects and references to objects, perhaps within some complex structures, such as records and bags. We reject totally the so-called closure property, which claims that input for queries (i.e. objects) and output from queries belong to the same conceptual domain. Careful analysis of semantic situations convinced us that the closure property, understood in this way, is a conceptual nonsense.

During design of our data model we have assumed important principles that govern semantic properties of objects. They are known as object relativity, total internal identification and orthogonal persistence. The principles are formulated as follows:

Object relativity: If some object O1 can be defined, then object O2 having O1 as a component can also be defined. There are no limitations concerning the number of hierarchy levels of objects. Objects on any hierarchy level is treated uniformly. In particular, an atomic object (having no sub-objects inside) should be allowed as a regular data structure, independent from other structures. The relativity of objects implies the relativity of corresponding query capabilities, i.e. there should be no difference in language concepts and constructs acting on different object hierarchy levels. Traditionally, an object consists of attributes, an attribute consists of sub-attributes, etc. In SBQL there is no need for such distinction: attributes, sub-attributes, pointer links between objects, procedures, methods, views, etc. are objects too. The principle cuts the size of database model, the size of specification of query languages addressing the model, the size of implementation, and the size of documentation. It also supports easier learning of both a database model and a corresponding query language. By minimizing the number of concepts the principle of object relativity supports development of a universal theory of query languages, which is necessary to reason about query optimization methods.

Total internal identification: Each object, which could be separately retrieved, updated, inserted, deleted, authorized, indexed, protected, locked, etc. should possess a unique internal identifier. The identifier is not printable and the programmer never uses it explicitly. A unique internal identifier should be assigned not only to objects on the top level of their hierarchy, but to all sub-objects, including atomic ones. If some atomic objects create a repeating group, e.g. a person has many hobbies, each object in the group should possess a unique identifier. For persistent objects (i.e. database objects) their identifiers should be persistent too, i.e. invariant during all the life of the objects. We are not interested in the structure and meaning of internal identifiers. For us it is essential that all objects and all their sub-objects can be unambiguously identified through its internal unique name. The principle makes it possible to make references and pointers to all possible objects, thus to avoid conceptual problems with binding, scoping, updating, deleting, parameter passing, and other functionalities that require object references as query primitives. Note that object identifier is purely technical term, in contrast to object identity that belongs to another domain of discourse, related to business modeling rather than to data structures.

Orthogonal persistence: No conceptual difference in typing and accessing persistent and volatile objects. In particular, a database can store individual objects (not only collections) and the volatile main memory of an application can contain collections of objects. Persistent objects are usually shared among many clients, hence must obey the transactional semantics. However, persistent (but non-shared) objects can also be stored at a client side; in this case the transactional semantics is not necessary. ODRA introduces three kinds of persistence: permanent that is stored on a server and shared, temporal that is stored at a client and not shared, and local that are assigned to a particular procedure, function, method or transaction call.

According to the object relativity principle each ODRA data element is an object with an internal identifier i, the external name n and the value v. At the lowest (physical) level there are three kind of objects.

  1. Atomic (simple) objects represented by a triple < i, n, v >. The supported value types in ODRA are: integer, real, boolean, string, date and binary.
  2. Pointer objects represented by a triple < i, n, i1 >, where i1 is an object identifier of the pointed object.
  3. Complex objects represented by a triple < i, n, T> where T is a set of objects (of any kind).

Basic data store model (called M0) is a set of objects described above and a set of identifiers of root objects (starting points for database object graph navigation). Usually starting points for objects are identifiers of modules. At the higher logical level a complex object is used to represent different kind of conceptual objects – modules, metabases, classes, views, procedures, database links, indexes, and so on. An example of an ODRA object is presented in Fig.1-4. In Fig.4-1 we present an object schema and 5 objects that correspond to it. In Fig. 4-2 we present how a relational database can be represented in the ODRA data model.

4-1. A database schema and corresponding objects

 

4-2. A relational database represented as ODRA objects

 

4.3 Structures

A structure in ODRA differs from structures that are known from Pascal records or structures of C/C++. Concerning stored objects, we distinguish structures in the typing system. For instance, a sequence of objects (<i6, name “Poe”>, <i7, sal, 2000>, <i8, worksIn, “Sales”>) can be considered a structure of the type record{name:string, sal:integer, worksIn: string}. In structure types the number of elements, their order, their names and their type are fixed. However, a structure is a concept related to the typing system only. Actually, in the object store model such a concept is not necessary - structures are simply ordered collections of objects.

In case of query results structures are sequences of elements that are not collections and that are results of queries. ODRA does not require that each structure element have to be named. Any result of a query, except collections, can be an element of a structures, in particular, atomic values, references to objects and any binders. For instance, <i1, i2, x(5)> is a structure instance having three elements – identifiers i1 and i2 and binder x(5).

 

4.4 Collections and Cardinalities

In the ODRA store model we assume no uniqueness of external names on any level of object hierarchy. For instance, in Fig.4-1 name Emp are assigned to three objects and name Dept is assigned to two objects. Within the “Trade” Dept object name location is assigned to two atomic sub-objects and within the “Ads” Dept object name employs is assigned to two pointer sub-objects. This is the way in which we deal with collections. Note that similar assumptions are taken for XML. In this way we unify several concepts related to collections, such as sets, bags, extents and repeating attributes. We also abstract from the concepts of structure, record and tuple, as known e.g. C/C++, Pascal and relational systems. For the goal of building the formal semantics of query and programming languages such notions are secondary and can be expressed in the terms of the ODRA store model as complex objects.

In the ODRA store model a collection does not occur as a single entity having its own unique identitfier. However, it is possible to create a complex object with subobjects of the same type. For instance, one can create an object Employees having many Emp objects. This is the only way in which a collection may obtain a unique identifier.

Because each object differs from other objects at least by its object identifier, it makes little sense to distinguish stored collections by their kinds such as sets and bags (c.f. the ODMG standard). The current ODRA version does not support stored collection kinds known as sequence and array. Such extensions are planned in the next release.

The situation with collections is a bit different when we consider results returned by queries. In general, we consider the unification of collections stored at an object store and collections returned by queries as conceptually doubtful[2]. Concerning this case, the current ODRA version supports collection types bags and sequences. As a query result, sequences may appear in the result of the order by (sorting) operator. Collection types sets are not supported by the ODRA typing system, however, the programmer can make a set from a bag by applying the function distinct, just like in SQL.

In the ODRA typing system collections are constrained by cardinalities (known e.g. from UML). A cardinality is a pair of two symbols written as [min..max], where min is a non-negative integer denoting the minimal number of collection elements and max is a natural number or * denotin the maximal number of collection elements. The symbol * denotes “as many as you like”. For instance, [0..1] denotes a collection which is empty or contains one element, [1..1] is a collection having exactly one element, [0..*] is a collection having any number of elements and [1..*] is a non-empty collection having any number of elements. Other cardinalities are possible. If max is a number, then minmax. Cardinality [1..1] is the default and can be omitted. Moreover, a collection with exactly one element is considered by the typing system as identical to that element. A cardinality [0..1] denotes an elements which may occur or not. This is the way in which ODRA deals with the concept that is known from relational systems as NULL. In SBQL we apply a liberal typing system (called semi-structured) where any collection having exactly one element e is equivalent to this element e (thus e.g. comparisons of elements and one-element collections are possible) and each single element e can be considered a bag with e as a single element. Note that similar coercion rules are also taken by SQL.

 

4.5 Links

In ODRA links are understood as triples <i1, n, i2>, where i1 is a reference to a link, n is an external name used in a source code and i2 is a reference to an object that the link leads to. For instance, <i21, employs, i1> is a link (having the reference i21) that can be inserted into a Dept object and leads to an Emp object with the reference i1. Currently directed links (i.e. pointers) and bidirectional links (i.e. twin pointers) are supported. Bidirectional links are instances of the concept that is known as relationship (in the Entity-Relationship Model or the ODMG standard) or associations (in UML).

Links are strongly typed and can be updated, inserted and deleted. Links follow the orthogonal persistence principle, i.e. we do not restrict links to persistent and shared objects only. Links implement association instances known from UML; however, only binary associations with no properties and no association classes are supported. Deleting any object A implies that all links leading to A are deleted (or nullified) too; hence no dangling links (links leading to garbage or improper objects) can appear. Note that we do not follow the idea that removing an object A requires removing or nullifying all the links that lead to A; object A is then removed by an automatic garbage collector (c.f. Java). For several reasons, e.g. a restricted client subschema, such an idea is inconsistent for database objects. Due to the limited view and access rights the application programmer may have no possibility to remove or nullify all the links that lead to an object that he/she wants to delete. Hence, ODRA and SBQL explicitly deal with the deletion operator, just like SQL.

 

4.6 Procedures, Functions and Transactions

ODRA supports procedures and functions in the classical variant known from majority of programming languages; arbitrary calls of procedures/functions from procedures/functions are supported, including any recursive calls. The novelty of ODRA procedures and functions concerns parameter passing and a return from a function (a functional procedure). Either the parameters and the return can be determined by SBQL queries. This allow one to make programs much more conceptual and shorter. ODRA basically supports the parameter passing method that is known as strict-call-by-value. The method means that the actual parameter is calculated before the function call, then it is named by the name of the formal parameter, and then the body of the procedure/function is executed. The parameter passing method combines call-by-value and call-by-reference known e.g. from Pascal. No syntax distinguishes call-by-value and call-by-reference, just like in C/C++. The big advantage of the method is that it is simple to implement, fully consistent and allows for declarative and macroscopic (many-data-at-a-time) processing that is implied by queries.

Parameters of ODRA procedures and functions are typed. The result of a function is typed too. Typing is strongly checked during compile time and when necessary, typing is delegated to run time.

Procedures and functions can be persistent, i.e. they can be store at a database server and shared among many clients. This accomplishes the paradigm that is known from relational database systems as database procedures.

Procedure and functions are stored within modules or within classes. In the last case they are called methods and by default they act on an environment that includes internals of a class member object.

Concerning the source code, transactions in ODRA are similar to procedures. Except one keyword transaction and the command abort their semantic and pragmatic properties are the same as for procedures. Transactions are strongly typechecked, may have parameters being queries, may have local data environment and may return a result. As procedures, transactions can be stored within modules or within classes, can be stored on a server side (within the database) or on a client application side. Transactions can invoke other transactions without limitations (hence nested transactions are supported). Transaction invocations differ slightly from procedures during run time because of the ACID semantics on shared resources. A transaction invocation can be aborted and in this case its updates are canceled (rollbacked). During runtime a transaction invocation is represented by a special object. ODRA uses the traditional (pessimistic) 2PL transaction processing algorithm with no deadlocks due to the wait-die method. More detailed description of procedures, functions and transactions will be presented in proper chapters of this documentation.

4.7 Views

For Virtual Repository concept within the eGov Bus project we have applied a new approach to database views that allows us to achieve the power of updateable views that has not been even considered so far in the database domain. Our method has some commonalities with instead of trigger views implemented in Oracle, SQL Server and DB2, but it is based on different principles, is much more powerful and efficient, and may address any object-oriented database model, including an XML datamodel. In general, the method is based on overloading generic updating operations (create, delete, update, insert, etc.) acting on virtual objects by invocation of procedures that are written by the view definer. The procedures are the inherent part of the view definition. The procedures have full algorithmic power, thus there are no limitations concerning the mapping of view updates into updates of stored data. ODRA updatable views allow one to achieve full transparency of virtual objects: they cannot be distinguished from stored objects by any programming option. This feature is very important for distributed and heterogeneous databases.

ODRA views can be used as mediators on top of local resources to convert them virtually to the required format, as integrators that fuse fragmented data from different sources, and as customizers that adopt the data to the needs of a particular end user application. ODRA views are the basis for the Virtual Repository Management System that lies in the centre of the eGov Bus software.

Concerning storage, views share properties of procedures, functions and transactions. In particular, they can be stored within modules on a database server, within modules of client applications or within classes. In the last case views accomplishes the feature that is known as virtual attributes. Views are first-class entities that can be dynamically inserted or removed into/from a particular environment.

More detailed description of ODRA views will be presented in proper chapters of this documentation.

4.8 Classes, Inheritance, Polymorphism, Types and Schemata

A class in ODRA is a programming entity having two forms:

  1. A class is an encapsulated and named piece of source code containing specification of class members (their type) and specification of the methods that can be performed on the members
  2. After compilation a class is a special run-time object that stores invariant properties of objects, in particular, compiled methods.

A class has some number of member objects. During processing of a member object the programmer can use all properties stored within its class. Classes can be connected into an ODRA schema, as shown in Fig.2-4.

As in the UML object model, classes inherit properties of their superclasses. Multiple inheritance is allowed, but name conflicts are not automatically resolved (similarly to UML). A method from a class hierarchy can be overridden. An abstract method can be instantiated differently in different specialized classes (due to late binding); this feature is known as polymorphism.

ODRA assumes strong or semi-strong type checking of all the programming entities and contexts. Strong typing is a prerequisite for query optimization and for resolving some ambiguities or ellipses that may occur in SBQL queries. For some purposes, however, strong typing can be switched off. The ODRA typing system includes atomic types (integer, real, string, date, boolean) that are known from other programming languages. Further atomic types are considered, but not implemented yet. The programmer can also define his/her own complex types known as records. All type constructors can be nested with no limitations. Collection types are specified by cardinality numbers, for instance, [0..*], [1..*], [0..1], etc.

The ODRA internal typing system checks some attributes that are assigned to type signatures. Currently the following attributes are supported:

  • Mutability: some operations, e.g. updating, require that the argument must be a reference to an object rather than some value. This is checked statically (during compilation time).
  • Cardinality: cardinality constraints are checked, mostly dynamically (during run time).
  • Collection kind: some operations are improper for some kinds of collections, for instance, extraction of i-th element is valid for a sequence but invalid for a bag. This is checked statically.

Other type signature attributes are considered, e.g. type name (for type equivalence based on type names), binary large object (for checking operations on multimedia) and side effects of queries and functions. The typing system makes also several automatic coercions (changing types) and automatic dereferences. For instance, a bag can be coerced to an element of this bag. If necessary, coercions are checked dynamically.

A database schema in ODRA is a specification of object types, classes and declarations that supports majority of elements known from UML.

More detailed specification of the ODRA types, classes and schemata will be given in next chapters of the document.

 

 

Last modified: June 17, 2008

 

 



[1] It is also possible to execute the system in the special “unsafe”, un-optimized mode with compile-time query analysis switched off and all the control moved to the runtime environment.

[2] See SQL, where stored collections (tables) are unordered sets, but collections returned an SQL query can be sets (application of the distinct operator), bags (in a typical case) and sequences (application of the order by operator).