©Copyright by Kazimierz Subieta.

AS0 Store Model: Complex Objects and Pointer Links

by Kazimierz Subieta (January 2006)

Back to Description of SBA and SBQL.

Back to Abstract Object Store Models


Each object is represented as a triple having three components: internal object identifier, external object name and object value that can be an atomic value, an object identifier (a pointer value) or a set of objects. More formal definition:

·         A triple <i, n, v> is an object that we call atomic object. The object has the internal identifier i, the external name n and the atomic value v.

·         A triple < i1, n, i2> is an object that we call pointer object. The object has the internal identifier i, the external name n and internal identifier i2 which is understood as a pointer (or reference) to another object.

·         A triple < i1, n, T>, where T is a set of object, is an object that we call complex object. Note this rule is recursive, thus allows one to create complex objects with an unlimited number of sub-objects and many hierarchy levels.

In the model AS0 an object store is defined as a pair <S, R>,

where S is a set of objects and R is set of root or start identifiers.

The set R determines entry points to the object store, that is, identifiers of such objects that can be starting point for searching and navigation by a query language. Usually this concerns objects on the top hierarchy level, i.e. objects that are not nested in other objects. However, this is not a requirement. In some cases, e.g. in object store having modules or extents, just modules or extents present the top object hierarchy level, but root identifiers may include objects that are stored within the modules or extents.

The object store is also the subject of some obvious constraints:

·         Each object, sub-object in an object store has a unique identifier.

·         If the store contains a pointer object < i1, n, i2> then there is an object in the store with the identifier i2.

·         If i R, then there is an object in the store with the identifier i.

·         Each object in the store should be reached from objects having identifiers belonging to R; that is, an object should be a sub-object of the object having an identifier in R (perhaps, recursively), or should be accessible by pointer objects. Objects that cannot be accessible from R are not existing (they have to be removed from the store as garbage).

An example of a complex objects having three sub-objects is presented below:

<i5, Emp, {<i6, Name, “Doe”>, <i7, Sal, 2000>, <i8, worksIn, i22>}>

The objects named Name and Sal are atomic objects, the objects named worksIn is a pointer object. Each object has a unique identifier.

In Figure 2 we present a database schema (which is an informal notion in AS0) and one of the possible database states according to the schema. Object specifications presented in the schema are associated with cardinalities, i.e. minimal and maximal number of occurrences in an actual database. Unlimited maximal cardinality is presented by *. The cardinalities [1..1] are dropped. In Figure 3 we present the same database state in a graphical notation, where objects are boxes with round corners, sub-object boxes are depicted within their parent object boxes, pointer objects are represented by arrows and root identifiers are presented within circles.

Fig. 2. The AS0 model - a small database

 

Fig. 3. The AS0 model - a graphical representation of a small database

The notation and representation of a database presented above is a conceptual (but algorithmically precise) view that is addressed to imagination of designers of query language interpreters (and perhaps, application programmers). It is not our intention to organize physical databases in this way. The favorite critical argument of some professionals concerning inefficiency of the organization is irrelevant for the goals that we follow. We will try to show that every known database can be conceptually perceived as a particular case of AS0-AS3.

The AS0 model does not deal with types. Many database models start considerations from defining types, but we are against such an approach. The notion of type is quite sophisticated, thus must be preceded by a lot of explanation and definitions. The issues related to types we consider very important, but we are against a simplistic approach (presented, in particular, in the ODMG standard) where types are introduced at the beginning, but chaotically and inconsistently. We plan to prepare to all the necessary things to introduce types, hence we shift explanation of types and strong type checking at the end of our discussion concerning SBA. The basic semantics of query languages can be explained without introducing types. Types, however, are necessary for majority of methods aiming query optimization. Types are also important for resolving some anomalies during processing of semi-structured data.

As we have said before, we are not interested in syntax and construction of object identifiers. However, we strongly rely on the total internal object identification principle, because a fundamental concept of the semantics of query languages defined through SBA is reference, i.e. an object identifier used as an internal object name in some place of an application program. The construction of object identifiers can be different. For instance, for object with a fixed format an identifier of an attribute A can be a pair <OID, offset_A>, where OID is an identifier of a root object, and offset_A is the number of bytes preceding the representation of A in the object. Similarly, we can assume that an identifier of an attribute is a pair <OID, attribute_name>. However, the construction of identifiers may have the meaning for performance and pragmatic properties of a query/programming language. For instance, if one would assume that an object identifier includes the identifier of the object class, then changing the object class without changing its object identifier becomes impossible. Too long identifiers may also compromise performance.


Programming Variables and the Difference between Volatile and Persistent Data

In all store models we make no distinction between programming languages’ variables and objects. For instance, a variable x defined in some programming language, having 5 as a value, in our convention will be represented as triple <i, x, 5>, where i is an internal identifier of the variable, for instance, its address in the main memory. Similarly, we will make no distinction between the concepts of variable and objects on the principle: “unlike a variable, an object must be a member of a class”. Such a distinction presents for us no conceptual significance; it is based on some traditional terminology of particular programming languages rather than on different qualities. For instance, assuming that x is of a type integer, we can conclude that x is also a member of a class, but the class is reduced (in this time) to the type. (In general, however, we will distinguish the type and class concepts; we discuss this issue elsewhere.) We can also assume that all variables/objects have their classes, but for some of them they are empty, hence are omitted by default.

The AS0-AS3 models also do not deal with the persistence of objects. We assume the orthogonal persistence principle, i.e. the persistence property is orthogonal to all other properties of object-oriented models and is not taken into account in the specification of semantics of query/programming languages. Unlike SQL, OQL , XQuery and other database languages, for us the persistence property has practically no meaning for the definition of the semantics of query languages.

Nowadays more and more databases are stored within a main memory (and magnetic media are used as back-up devices only), thus one may ask on the difference between persistent and volatile data. Indeed, the difference concerns not a kind of media that the data is stored, but the mode of use. Data can be a property of one user (client) session and disappears when the session is terminated; therefore such a data is volatile. A persistent data is shared among many users and in typical terminology is a property of a serves rather than a client. Such data may exist even if there is no open session in the system.

Looking from this side on a query language we can conclude that a query (query execution) is a property of a client session rather than the server (with exception of some server internal processes). Hence from this point of view volatile data of the session are on the same rights as persistent server data. In conclusion, the semantics of query languages should not differentiate the access to both kinds of the data.

Of course, there should be special constructs and constraints in the languages to deal with the persistence property. For instance, there can be a programming statement which makes volatile objects persistent (i.e. causes storing them on the server) or some constraint saying that no sub-object of a volatile object can be persistent. Nevertheless, these properties are constraints we consider secondary issues concerning the definition of semantics.


Object Relativism

In all store models we make no distinction between objects stored on different object hierarchy levels. Subdivision of objects into simple and complex will be also secondary. We do not introduce special terminology for objects on different object hierarchy levels and objects of different complexity (such as “variable”, “object”, “composite object”, “attribute”, “sub-attribute”, “repeating attribute”, “complex attribute”, “structure”, tuple”, record”, “collection”, “extension”, etc.). All these notions have direct counterparts in our store models. Such distinctions may have some meaning for business-oriented external data models, but they are inessential for the definition of query languages’ semantics. Similarly to Smalltalk, an object, if it is complex, consists of objects; other terms are not necessary. Sometimes we use them to make things more clear, for instance, we sometimes use “attribute” to denote a direct sub-object of some object, but such terminology has no meaning for our formal semantics.

The object relativism has a principal meaning for simplification of the proposed languages, much simplifies a database metamodel and operations on the metamodel, increases the universality of the language and makes the syntax, semantics and pragmatics of the language more clear. These advantages in turn have a positive impact on implementation effort, development, implementation of query optimization methods and generality of the methods. Simplification of the pragmatics results in much shorter documentations and user manuals.

Unfortunately, many proposals of object-oriented standards, languages and systems do not follow object relativism. For instance, in the ODMG standard an object attribute is “literal”, which is not an object. In many other proposals all objects must be complex, i.e. there is no possibility to consider atomic objects having no attributes. As shown above, we do not follow such crippled conceptions and constraints, because we believe they are not reasonable and lead to disadvantages concerning all the technical aspects around some object-oriented idea or product.

In this idea a module is simply an object storing other objects. One can assume some additional properties and functions of modules, e.g. they can be considered conceptual units of a database, units of reuse, units of compilation, units of exchange and substitution, units of encapsulation, but all such properties we consider orthogonal to the defined semantics of query languages.


Collections and Structures

In the AS0-AS3 models we assume no uniqueness of external names on any level of object hierarchy. For instance, in Figures 2 and 3 name Emp are assigned to three objects and name Dept to two objects, within the Trade Dept object name location is assigned to two atomic sub-objects and within the Ads Dept object name employs is assigned to two pointer sub-objects. This is the way in which we deal with collections. Similar assumptions are taken by XML. In this way we unify several concepts related to collections, such as sets, bags, extents and repeating attributes. We also abstract from the concepts of structure, record and tuple, as known e.g. C/C++, Pascal and relational systems. For the goal of building the formal semantics of query languages such notions are secondary and can be expressed in the terms of the AS0 store model as complex objects too.

In all the store models the collection concept does not occur as a single entity having its own identity.

The apparently innocent notion of collection, as introduced in many object-oriented proposals, e.g. the ODMG standard, is inconsistent with fundamental principles of object-orientedness, such as the substitutability principle and the open-close principle. There are several signs of weakness of the collection concept. For instance, in the ODMG standard one can specify in the schema five kinds of collections, in particular, sets and bags. The same standard ensures that each object has an own identity, hence there is no two identical objects. So the question is: how can we create in the store a bag of objects? This is nonsense. Careful analysis of the standard and attempts to make all the concepts semantically clear and consistent have lead us to the conclusion that the collection concept, as a database entity having an own identity, is inconsistent and unnecessary. All that we need is the possibility to introduce many objects having the same name, just like in XML.

Obviously, in the AS0 model one can create an object named Employees and then, to insert into this object 10 000 objects named Emp. In this way we obtain the desired effect, i.e. we have created a collection having an own identity, but without introducing the collection concept explicitly. As follows from the definition of AS0, such an object can have a value being an empty set, i.e. in this way we represent an empty collection. During presentation of the model AS2 we discuss more precisely some pitfalls connected with the collection concept and the methods of avoiding them.

The collection concept, however, we will introduce in two other contexts: as a feature of the set Result, presenting all results returned by a query, and for a database schema language, as a type constructor. We’ll return to these notions later.

AS0 deals only with collections that are sets. Another interesting collection kind is sequence, and this kind is specific for XML. By assuming that a complex object, as a value, has a sequence of objects rather than a set of objects we obtain another store model AS0seq that is close to XML. It is also possible to create the model AS0set&seq , where complex objects can be qualified by special flags saying if the objects store sets or sequences. However, we have concluded that such a modification (although perhaps important for practice) presents no essential quality for the semantics of query languages that we have to define. Thus at this stage of our explanation we do not burden it by such secondary issues.


Links Between Objects

Pointer objects assumed in AS0-AS3 are introduced to cover links among between objects. In Fig. 2 and Fig.3 each pointer object worksIn leads to corresponding object Dept, and each pointer object employs leads to an object Emp. So far we take no care about the fact that such structure is redundant, because at this stage we are interested in the conceptual picture of the database state rather than in possible redundancy that we have involved. The redundancy is justified by the freedom in navigation from objects Dept to Emp and v/v.

Pointer objects can be considered as an abstract implementation of the feature that in UML is known as association and in the entity-relationship model (ERM), in the OMG CORBA standard and in the ODMG standard named relationship. (We will use relationship.) In ERM, UML and CORBA n-ary relationships are possible, i.e. a relationship that joins two, three or more classes (entities) and can be decorated by attributes or classes. Similarly to ODMG we have assumed in AS0-AS3 that associations can be binary only, with no decorations. We met severe difficulties in defining a store model that would introduce relationships of arbitrary arities and would be consistent and universal concerning all aspects of query/programming languages. The universality should concern not only retrieval (what is a relatively easy problem) but also all updating operations, in particular, switching an end of an association (a role) to another object. We do not want to present here all the considerations that have led us to the conclusion. Eventually we have concluded that if an n-ary association has to be updated, it must have an identifier. Moreover, because it can be decorated by attributes, it must have methods to serve them. Hence n-ary associations should have identifiers, similarly to objects, and can be served by methods, similarly to objects too. Is it a sense to introduce two different, but very similar data structures, i.e. objects and relationships, on the level of the store model? We have concluded that there is no such sense, hence n-ary associations, n > 2, possibly with attributes, we consider equivalent to objects. In effect, we obtain objects connected by binary, non-decorated relationships, what is exactly materialized in AS0 in the form of pointer objects.

Note that the only known artifact where n-ary decorated relationships are proposed on the level of data structures (with algorithmic precision) is the CORBA Relationship Service. This proposal is extremely clumsy and it just convinced us that the idea makes little sense. Because each n-ary and decorated relationship can be easily substituted by one more class and n binary non-decorated relationships, and because unclear options for updating such n-ary relationships, we reject the idea. Our final conclusion is that

n-ary relationships, where n > 2, possible decorated by attributes, present an unacceptable conceptual knot for the programmers.


Null Values, Variants, Semi-structured Data and Types

The AS0-AS3 models deal formally with the concept of null values and unions (variants). There is no requirement that objects having the same name should possess the same structure. The structure will be later constrained by types, but the type system that we have developed (and implemented) is not as restrictive as e.g. the Pascal or Java typing systems and allows for a lot of irregularities in data structures. In Fig.2 and Fig.3 the sub-object address is optional, hence in a particular instance of an Emp object can occur or not. This is the way in which we are dealing with null values.

In contrast to the relational model we do not introduce the null value concept explicitly. As argued by Date and Subieta, a special null value introduced explicitly works as a devil which is able to spoil clarity and consistency of almost all language constructs. Null values in SQL are frequently given as an example of schizophrenic inconsistency and chaotic design. A thorough discussion of the issue can be found in [Subi96a] (postscript).

Our method of dealing with null values based on optional data does not lead to any inconsistency. Note that in this way we can deal with optional data on any level of object granularity. For instance, in Fig.2 and Fig.3 an optional sub-object address is a complex object.

In a similar way we can deal with unions or variants known from C/C++ and Pascal. Pascal introduces an additional notion of discriminator of a variant, i.e. a special value which allows during run time to recognize the actual variant and to prevent (through dynamic type checking) its illegal use. In our case such a discriminator is not obligatory, but can be introduced as an ordinary sub-object. Again, one can design a special syntax of types which would make it possible to inform the type checker that a particular object has variants and that some sub-object is the discriminator of the variants. This makes it possible to shift proper type checking to run time.

As shown in Fig.2 (left), on the type level irregularities in data are covered by cardinality constraints. For instance, address is constrained by cardinalities [0..1] and employs is constrained by cardinalities [1..*]. The cardinalities introduce the necessary discipline to the concept that is commonly referred to as semi-structured data. XML is frequently associated with this concept. Without such a discipline programming of large semi-structured data would be very difficult (or impossible). Eventually, the programmer must be aware what the database contains and how it is organized. This awareness must be supported on the level of algorithmic precision. The typing system involving cardinalities fulfills this goal. It will be introduced later.

The problem of specification, representation and processing of semi-structured data will be presented much later, after introducing classes, types and after defining all the necessary constructs of SBQL.


Relational Model and Nested Relational Model

The AS0 model covers the relational data structures as a particular case. In Fig. 4 we present a relational schema, a relational table and the AS0 representation of the table.

Fig. 4. A relational database represented in the AS0 store model

As we will show later, for such a data model SBQL queries are similar to SQL, except minor syntactic differences. Thus we can claim that SBA is also supporting the formal semantics of SQL. However, we do not strive to give the formal semantics of all the constructs of SQL. SQL is a very irregular language, with a lot of anomalies, special cases, semantics reefs and own peculiarities. Defining its formal semantics without making some order within its design makes little sense. On the other hand, SQL is a closed language, surrounded by a lot of documents, implementations and own user culture, thus any discussion on changing its features is at least 20 years too late. The SQL-99 and SQL 2003 standards, which much extend the kinds of data structures that are addressed by the language and introduce a lot of constructs specific to programming languages, also do not present artifacts that we consider attractive to propose changes or contribute in any way. In our opinion, these standards will play the role of monuments showing (for future computer professionals) that chaotic design done by large committees, but not supported by any essential theory, leads to useless artifacts.

Note that AS0 is richer than the relational model. So-called natural join is not expressible in the pure relational model, because it is based on names of attributes, which are second-class citizens in the model, i.e. they are on the level of its informal meta-language. Such definition presents no problem assuming AS0, because external names of objects are at the same semantic level as values. Similarly, because the pure relational model is value-oriented, i.e. it does not involves identifiers of relations, tuples and values of tuples, expressing updating operations is impossible or at least requires additional notions, which would be outside the relational theory. AS0 involves no such problems because of the assumed principle of total internal identification.

The AS0 model covers also the idea on the nested relations (commonly referred to as Non-First-Normal-Form, NF2) as a particular case. AS0 makes no limitations concerning the complexity of objects and the number of object hierarchy levels. The idea how to represent in AS0 an NF2 database is the same as shown in Fig.4. Also some models known as functional or semantic (e.g. data structures implied by the entity-relationship model) are covered by AS0.


XML Data Model

XML is a syntactic convention that makes it possible to unify some protocols of data exchange or to parameterize some system in some unified way. However, all the noise around XML as a new database model (or format) is in our opinion exaggeration. XML representation of data is legible for humans when the size of XML files is reasonable (say, not larger than 100 KB). For large databases storing megabytes or gigabytes of information XML representation is nonsense. For several important reasons (security, buffering, transaction processing, indexing, query optimization, etc.) such databases should be organized according to the databases state-of-the-art, as relational, object-relational, object-oriented or a database following another paradigm.

For this reason special query languages addressing XML files have very narrow meaning. For large databases XML should not be considered as internal data representation paradigm, hence XML query languages are inapplicable. Obviously, any database application can be equipped with wrappers that accept XML files and convert the data from the files to the assumed database format. Similarly, any database application can be equipped with XML generators that convert data stored in the database to an XML file. Such wrappers and generators present no conceptual or implementation problem. Assuming this simple idea, no XML will be present inside the database, hence no XML-oriented query language is necessary.

We also note that some special requirements concerning XML query languages not always look reasonable, for instance, the requirement that an XML query should have the XML syntax. Translating this requirement to SQL, one can claim that SQL queries should be written within relational tables, what is possible, but obviously idiotic.

Taking into account the above cautions and the limited role of XML query languages, we show that the XML data model can also be considered as a particular case of AS0. We also show some features of XML that are problematic for AS0 and therefore are problematic for query languages addressing XML. In Fig.5 we present a simple XML file and its counterpart written as an object in the AS0 store model.

 

<Dept>

<dname> Trade </dname >

<location> Paris </location >

<location> London </location >

<employs> Doe </employs>

</Dept >

<Dept>

<dname> Ads </dname >

<location> Rome </location >

<employs> Poe </employs>

<employs> Lee </employs>

</Dept >

S – Objects

< i17, Dept, {< i18, dname, ”Trade”>,

< i19, location, “Paris”>,

< i20, location, “London”>

< i21, employs, ”Doe”> } >,

 

< i22, Dept, {< i23, dname, ”Ads”>,

< i24, location, “Rome”>,

< i25, employs, ”Poe”>,

< i26, employs, ”Lee”> } >

R - Start identifiers

i17, i22

Fig.5. An XML file and its AS0 representation

Some XML features are not covered by AS0 thus (as a rule) present problems for clean and consistent definition of XML query language.

A basic disadvantage of XML as a target of a query language is not following the total identification principle. XML objects (logical parts of an XML file) have no unique internal identifiers. Because sooner or later programs have to refer to such objects, some identification method is the must. This is done by the Xpath language, which uses path expressions consisting of tag names. However, because XML objects may have names that are not unique at the same level of the object hierarchy, Xpath involves some tricks, for instance, determining the order number of a required object (e.g. 5-th object) or special predicates. While such methods are acceptable for relatively short XML files, they could be difficult for large parsed XML files stored in a structural database. Two potential disadvantages concern poor performance and updating anomalies (e.g. the order number n of some object A will be changed if another object A having order number m < n will be deleted).

The order of XML objects may bear information, hence in this aspect the XML data model and its AS0 representation are different. As we said before, there is no problem to create a variant of the AS0 model with complex objects storing sequences rather than sets. Alternatively, designers of XML representations may be discouraged to rely on the order of XML objects as information bearing feature. Note that ordering of objects generally decrease the query optimization potential, because some query optimization methods, e.g. indices, can loose the ordering.

XML tags may include so-called attributes, which intention is to represent meta-information rather than information. The subdivision, however, is volatile and arbitrary, thus indeed it is difficult to realize what the feature is for. In our opinion, the feature is redundant and has additional disadvantages (e.g. no DTD support), thus should be avoided. The only reasonable method is to convert attributes to some regular XML representation (and v/v). This ensures no special extensions to a query language. One of such methods is illustrated in Fig.6.

*                   

<emp version=”1.2” date=”2006.01.10”>

<name> Doe </name>

...

</emp>

<emp>

<@version> 1.2 </@version>

<@date> 2006.01.10 </@date>

<name> Doe </name>

...

</emp>

Fig. 6. XML file with attributes and its equivalent without attributes

XML has also a quite strange feature allowing for mixing atomic and complex objects. An XML object can be inserted within a string being a value of an XML object. This feature can be difficult for a query language, hence as before we suggest reduce it to the regular XML, for instance, as shown in Fig.7.

*       

<emp>

John Doe, born 1973

<address> Warsaw, Sienna 5 </address>

His salary is 2500

</emp>

<emp>

<&info> John Doe, born 1973 </&info>

<address> Sienna 5, Warsaw </address>

<&info> His salary is 2500 </&info>

</emp>

Fig.7. XML file mixing strings and objects and its equivalent with no mix

 


Arrays in AS0

A data structure kind known as array can be modeled in AS0 by the assumption that names of objects can be numbers. For instance, array A[1..5] having values subsequent values “Doe”, “Poe”, “Lee”, “Kim”, “Noe” can be represented as an AS0 complex object:

<i, A, { <i1, 1, “Doe”>, <i2, 2, “Poe”>, <i3, 3, “Lee”>, <i4, 4, “Kim”>, <i5, 5, “Noe”> }>

Under this assumption the access to an array element would be possible through its index being a number; for instance, A.2 identifies the Poe element and A.5 identifies the Noe element. Usually in programming languages an index of an array element can be calculated by some expression. Such a possibility was implemented in Loqis by assuming that the result of an expression that is put in square brackets is interpreted as an object name. For instance, we can write A.[x+1], which for x = 3 returns the Kim element. In Loqis this convention was assumed as general, hence any name (including string names) can be calculated by an expression. This possibility reminds a bit a feature that is known as “reflection”. Unfortunately, it is incompatible with the strong static typing.

Such interpretation of arrays can be associated with a typing system which will assure that for each allowed index there is a corresponding array element. Alternatively, (e.g. for semi-structured data) we can allow for “thin” arrays where some indices have no corresponding elements.

Some proposals, e.g. the ODMG standard, introduce arrays and sequences as different, but a bit similar collection kinds. The idea of arrays, as presented above, is however not quite good for sequences. The difference is that sequences are always “dense”, i.e. removing an element from a sequence causes that indices of all its subsequent elements are reduced by 1; analogously for inserting an element into the sequence. As we can see from the above array example, arrays behave differently: removing an element from an array causes no changes of indices of other elements and inserting a new element into an array can be impossible (we can only insert an element with a known index, if an element with this index is absent). Hence sequences in data structures still require a new store model different from AS0.

The above representation is good for any type of an array element. In particular, in this way we can create multi-dimensional arrays. We underline, however, that this view is abstract and concerns only the definition of query languages’ semantics. In implementation arrays can be physically represented in a more compact way.

 

Variants of the AS0 model

During the work on object store models and related topics we have investigated several candidate variants that were eventually rejected as introducing limitations or the necessity of further (less intuitive) definitional elements.

Variant 1: (from [Subi85]):

A database content is defined as the set of triples of the form <i1, n, i2> and <i, n, v>. A database is defined as previously, as a pair <S, R>, where S is the set of starting (root) identifiers and R is a set of the above triples. Complex objects are represented as collections of triples, e.g. a complex object named n having attributes a1, …, ak, pointer links p1,…, pt and identifier i can be represented as:

<i, n, ia1>, …, <i, n, iak>, <i, n, ip1>, …, <i, n, ipt>, <, ia1, a1, va1>, …, <, iak, ak, vak>, <, ia1, p1, ip1>, …, <, iak, pk, ipk>

The main problem with this model is that it does not determine the boundary of a complex object, thus makes problems (and the necessity of additional definitions) for updating operations, strong typing, parameter passing and other features that need encapsulated complex objects as a primitive data manipulation units.

Variant 2: (influenced by Java):

A database content is defined as a set of triples <i, n, {i1, i2,…,ik}> (representing complex objects with references to their subobjects), <i, n, v> (representing atomic object or attributes/subattrributes) and <i1, n, i2> (representing reference objects). A database is defined as previously, as a pair <S, R>. For instance, a complex object described previously can be represented in this formalism as:

<i, n, {ia1, …, iak, ip1, …, ipt}>, <, ia1, a1, va1>, …, <, iak, ak, vak>, <, ia1, p1, ip1>, …, <, iak, pk, ipk>

The model makes it possible to express complex objects, but only with one object hierarchy level. To represent objects with more hierarchy levels some additional assumptions or additional elements of the formalism are required. As previously, there are doubts concerning object boundaries. In general, concerning updating of complex objects, this variant inherits disadvantages of the previous one.

Some modifications of the above variants were also investigated. In the result we have chosen AS0 as the simplest, homogeneous and not requiring too much explanations or formal elements. So far all of its variants were not introducing a new quality to formal definitions, but resulted in limitations or the necessity of defining new elements (such as flags representing boundaries of complex objects). Such add-ons are awkward considering definitions of query/programming languages’ features such as updating, strong typing and parameter passing. We have also investigated store models where (from the very beginning) objects are associated with types; thus, for instance, object names are properties of types rather than properties of objects. We abandoned this idea for two reasons. First, it is possible that our data model and a query language would be untyped (see, for instance, XQuery). Second, types present a complex issue; thus introducing them too early in the definition, without understanding them in their full complexity, might introduce limitations or inconsistencies.


Last modified: January 18, 2008