AS0
Store Model: Complex Objects and Pointer Links
by Kazimierz Subieta (January 2006)
Back to Description
of SBA and SBQL.
Back to Abstract
Object Store Models
Each object is represented as a triple having three components: internal
object identifier, external object name and object value that can be an atomic
value, an object identifier (a pointer value) or a set of objects. More formal
definition:
·
A
triple <i, n, v> is an object that we call atomic object. The object has the
internal identifier i,
the external name n and the atomic
value v.
·
A
triple < i1, n, i2>
is an object that we call pointer object.
The object has the internal identifier i, the external name n
and internal identifier i2 which
is understood as a pointer (or reference) to another object.
·
A
triple < i1, n, T>,
where T is a set of object, is an
object that we call complex object.
Note this rule is recursive, thus allows one to create complex objects with an unlimited
number of sub-objects and many hierarchy levels.
In the model AS0 an
object store is defined as a pair <S,
R>,
where S is a set of objects and R is set of root or start
identifiers.
The set R determines entry points to the object store, that is,
identifiers of such objects that can be starting point for searching and
navigation by a query language. Usually this concerns objects on the top
hierarchy level, i.e. objects that are not nested in other objects. However,
this is not a requirement. In some cases, e.g. in object store having modules
or extents, just modules or extents present the top object hierarchy level, but
root identifiers may include objects that are stored within the modules or
extents.
The object store is also the subject of some obvious constraints:
·
Each
object, sub-object in an object store has a unique identifier.
·
If
the store contains a pointer object <
i1, n, i2> then there is an object in the store
with the identifier i2.
·
If
i Î R,
then there is an object in the store with the identifier i.
·
Each
object in the store should be reached from objects having identifiers belonging
to R; that is, an object should be a
sub-object of the object having an identifier in R (perhaps, recursively), or should be accessible by pointer
objects. Objects that cannot be accessible from R are not existing (they have to be removed from the store as
garbage).
An example of a complex objects having three sub-objects is presented
below:
<i5, Emp,
{<i6, Name, “Doe”>, <i7, Sal, 2000>, <i8,
worksIn, i22>}>
The objects named Name and Sal are atomic objects, the objects
named worksIn
is a pointer object. Each object has a unique identifier.
In Figure 2 we present a database schema (which is an informal notion in
AS0) and one of the possible database states according to the schema. Object
specifications presented in the schema are associated with cardinalities, i.e.
minimal and maximal number of occurrences in an actual database. Unlimited
maximal cardinality is presented by *. The cardinalities [1..1]
are dropped. In Figure 3 we present the same database state in a graphical
notation, where objects are boxes with round corners, sub-object boxes are
depicted within their parent object boxes, pointer objects are represented by
arrows and root identifiers are presented within circles.
Fig. 2. The AS0 model - a small database
Fig. 3. The AS0 model - a graphical representation of
a small database
The notation and representation of a database presented above is a
conceptual (but algorithmically precise) view that is addressed to imagination
of designers of query language interpreters (and perhaps, application
programmers). It is not our intention to organize physical databases in this
way. The favorite critical argument of some professionals concerning inefficiency
of the organization is irrelevant for the goals that we follow. We will try to
show that every known database can be conceptually perceived as a particular
case of AS0-AS3.
The AS0 model does not deal with types. Many database models start
considerations from defining types, but we are against such an approach. The
notion of type is quite sophisticated, thus must be preceded by a lot of
explanation and definitions. The issues related to types we consider very
important, but we are against a simplistic approach (presented, in particular,
in the ODMG standard) where types are introduced at the beginning, but
chaotically and inconsistently. We plan to prepare to all the necessary things
to introduce types, hence we shift explanation of types and strong type checking
at the end of our discussion concerning SBA. The basic semantics of query
languages can be explained without introducing types. Types, however, are
necessary for majority of methods aiming query optimization. Types are also
important for resolving some anomalies during processing of semi-structured
data.
As we have said before, we are not interested in syntax and construction
of object identifiers. However, we strongly rely on the total internal object
identification principle, because a fundamental concept of the semantics of
query languages defined through SBA is reference,
i.e. an object identifier used as an internal object name in some place of an application
program. The construction of object identifiers can be different. For instance,
for object with a fixed format an identifier of an attribute A can be a pair <OID, offset_A>,
where OID is an identifier of a root
object, and offset_A
is the number of bytes preceding the representation of A in the object. Similarly, we can assume that an identifier of an
attribute is a pair <OID, attribute_name>. However, the construction of
identifiers may have the meaning for performance and pragmatic properties of a
query/programming language. For instance, if one would assume that an object
identifier includes the identifier of the object class, then changing the
object class without changing its object identifier becomes impossible. Too
long identifiers may also compromise performance.
Programming
Variables and the Difference between Volatile and Persistent Data
In all store models we make no distinction between programming
languages’ variables and objects. For instance, a variable x defined in some programming language,
having 5 as a value, in our convention will be represented as triple <i, x, 5>, where i is an
internal identifier of the variable, for instance, its address in the main
memory. Similarly, we will make no distinction between the concepts of variable
and objects on the principle: “unlike a variable, an object must be a
member of a class”. Such a distinction presents for us no conceptual
significance; it is based on some traditional terminology of particular programming
languages rather than on different qualities. For instance, assuming that x is of a type integer, we can conclude that x
is also a member of a class, but the class is reduced (in this time) to the
type. (In general, however, we will distinguish the type and class concepts;
we discuss this issue elsewhere.) We can also assume that all variables/objects
have their classes, but for some of them they are empty, hence are omitted by
default.
The AS0-AS3 models also do not deal with the persistence of objects. We
assume the orthogonal
persistence principle, i.e. the persistence property is orthogonal to all
other properties of object-oriented models and is not taken into account in the
specification of semantics of query/programming languages. Unlike SQL, OQL , XQuery and other database
languages, for us the persistence property has practically no meaning for the
definition of the semantics of query languages.
Nowadays more and more databases are stored within a main memory (and
magnetic media are used as back-up devices only), thus one may ask on the
difference between persistent and volatile data. Indeed, the difference
concerns not a kind of media that the data is stored, but the mode of use. Data
can be a property of one user (client) session and disappears when the session
is terminated; therefore such a data is volatile.
A persistent data is shared among
many users and in typical terminology is a property of a serves rather than a client.
Such data may exist even if there is no open session in the system.
Looking from this side on a query language we can conclude that a query
(query execution) is a property of a client session rather than the server
(with exception of some server internal processes). Hence from this point of
view volatile data of the session are on the same rights as persistent server
data. In conclusion, the semantics of query languages should not differentiate
the access to both kinds of the data.
Of course, there should be special constructs and constraints in the
languages to deal with the persistence property. For instance, there can be a
programming statement which makes volatile objects persistent (i.e. causes
storing them on the server) or some constraint saying that no sub-object of a
volatile object can be persistent. Nevertheless, these properties are
constraints we consider secondary issues concerning the definition of
semantics.
Object
Relativism
In all store models we make no distinction between objects stored on
different object hierarchy levels. Subdivision of objects into simple and
complex will be also secondary. We do not introduce special terminology for
objects on different object hierarchy levels and objects of different
complexity (such as “variable”, “object”,
“composite object”, “attribute”,
“sub-attribute”, “repeating attribute”, “complex
attribute”, “structure”, tuple”,
record”, “collection”, “extension”, etc.). All
these notions have direct counterparts in our store models. Such distinctions
may have some meaning for business-oriented external data models, but they are
inessential for the definition of query languages’ semantics. Similarly
to Smalltalk, an object, if it is complex, consists of objects; other terms are
not necessary. Sometimes we use them to make things more clear, for instance,
we sometimes use “attribute” to denote a direct sub-object of some
object, but such terminology has no meaning for our formal semantics.
The object relativism has a principal meaning for simplification of the
proposed languages, much simplifies a database metamodel
and operations on the metamodel, increases the
universality of the language and makes the syntax, semantics and pragmatics of
the language more clear. These advantages in turn have a positive impact on
implementation effort, development, implementation of
query optimization methods and generality of the methods. Simplification of the
pragmatics results in much shorter documentations and user manuals.
Unfortunately, many proposals of object-oriented standards, languages
and systems do not follow object relativism. For instance, in the ODMG standard
an object attribute is “literal”, which is not an object. In many
other proposals all objects must be complex, i.e. there is no possibility to
consider atomic objects having no attributes. As shown above, we do not follow
such crippled conceptions and constraints, because we believe they are not
reasonable and lead to disadvantages concerning all the technical aspects
around some object-oriented idea or product.
In this idea a module is
simply an object storing other objects. One can assume some additional
properties and functions of modules, e.g. they can be considered conceptual
units of a database, units of reuse, units of compilation, units of exchange
and substitution, units of encapsulation, but all such properties we consider
orthogonal to the defined semantics of query languages.
Collections
and Structures
In the AS0-AS3 models we assume no uniqueness of external names on any
level of object hierarchy. For instance, in Figures 2 and 3 name Emp
are assigned to three objects and name Dept
to two objects, within the Trade Dept object name location is assigned to two atomic sub-objects and within the Ads
Dept object name employs is assigned
to two pointer sub-objects. This is the way in which we deal with collections.
Similar assumptions are taken by XML. In this way we unify several concepts
related to collections, such as sets, bags, extents and repeating attributes.
We also abstract from the concepts of structure,
record and tuple, as known e.g. C/C++,
Pascal and relational systems. For the goal of building the formal semantics of
query languages such notions are secondary and can be expressed in the terms of
the AS0 store model as complex objects too.
In all the store
models the collection concept does not occur as a single entity having its own
identity.
The apparently innocent notion of collection, as introduced in many
object-oriented proposals, e.g. the ODMG standard, is inconsistent with fundamental
principles of object-orientedness, such as the substitutability
principle and the open-close
principle. There are several signs of weakness of the collection concept.
For instance, in the ODMG standard one can specify in the schema five kinds of
collections, in particular, sets and bags. The same standard ensures that each
object has an own identity, hence there is no two identical objects. So the
question is: how can we create in the store a bag of objects? This is nonsense.
Careful analysis of the standard and attempts to make all the concepts
semantically clear and consistent have lead us to the conclusion that the
collection concept, as a database entity having an own identity, is
inconsistent and unnecessary. All that we need is the possibility to introduce
many objects having the same name, just like in XML.
Obviously, in the AS0 model one can create an object named Employees and then, to insert into this
object 10 000 objects named Emp. In this way we obtain the desired effect, i.e. we have
created a collection having an own identity, but without introducing the
collection concept explicitly. As follows from the definition of AS0, such an
object can have a value being an empty set, i.e. in this way we represent an
empty collection. During presentation of the model AS2 we discuss more
precisely some pitfalls connected with the collection concept and the methods
of avoiding them.
The collection concept, however, we will introduce in two other
contexts: as a feature of the set Result,
presenting all results returned by a query, and for a database schema language,
as a type constructor. We’ll return to these notions later.
AS0 deals only with collections that are sets. Another interesting collection kind is sequence, and this kind is specific for XML. By assuming that a
complex object, as a value, has a sequence of objects rather than a set of
objects we obtain another store model AS0seq that is close to XML.
It is also possible to create the model AS0set&seq , where complex objects
can be qualified by special flags saying if the objects store sets or
sequences. However, we have concluded that such a modification (although
perhaps important for practice) presents no essential quality for the semantics
of query languages that we have to define. Thus at this stage of our
explanation we do not burden it by such secondary issues.
Links
Between Objects
Pointer objects assumed in AS0-AS3 are introduced to cover links among
between objects. In Fig. 2 and Fig.3 each pointer object worksIn leads to corresponding
object Dept, and each pointer object employs
leads to an object Emp.
So far we take no care about the fact that such structure is redundant, because
at this stage we are interested in the conceptual picture of the database state
rather than in possible redundancy that we have involved. The redundancy is
justified by the freedom in navigation from objects Dept to Emp and v/v.
Pointer objects can be considered as an abstract implementation of the
feature that in UML is known as association
and in the entity-relationship model (ERM), in the OMG CORBA standard and in
the ODMG standard named relationship.
(We will use relationship.) In ERM,
UML and CORBA n-ary relationships are possible, i.e.
a relationship that joins two, three or more classes (entities) and can be
decorated by attributes or classes. Similarly to ODMG we have assumed in
AS0-AS3 that associations can be binary only, with no decorations. We met
severe difficulties in defining a store model that would introduce
relationships of arbitrary arities and would be
consistent and universal concerning all aspects of query/programming languages.
The universality should concern not only retrieval (what is a relatively easy
problem) but also all updating operations, in particular, switching an end of
an association (a role) to another object. We do not want to present here all
the considerations that have led us to the conclusion. Eventually we have
concluded that if an n-ary association has to be
updated, it must have an identifier. Moreover, because it can be decorated by
attributes, it must have methods to serve them. Hence n-ary
associations should have identifiers, similarly to objects, and can be served
by methods, similarly to objects too. Is it a sense to introduce two different,
but very similar data structures, i.e. objects and relationships, on the level
of the store model? We have concluded that there is no such sense, hence n-ary associations, n > 2, possibly with attributes, we
consider equivalent to objects. In effect, we obtain objects connected by
binary, non-decorated relationships, what is exactly materialized in AS0 in the
form of pointer objects.
Note that the only known artifact where n-ary
decorated relationships are proposed on the level of data structures (with
algorithmic precision) is the CORBA Relationship Service. This proposal is
extremely clumsy and it just convinced us that the idea makes little sense.
Because each n-ary
and decorated relationship can be easily substituted by one more class and n binary non-decorated relationships,
and because unclear options for updating such n-ary
relationships, we reject the idea. Our final conclusion is that
n-ary relationships, where
n > 2, possible decorated by attributes, present an unacceptable conceptual
knot for the programmers.
Null
Values, Variants, Semi-structured Data and Types
The AS0-AS3 models deal formally with the concept of null values and
unions (variants). There is no requirement that objects having the same name
should possess the same structure. The structure will be later constrained by
types, but the type system that we have developed (and implemented) is not as
restrictive as e.g. the Pascal or Java typing systems and allows for a lot of
irregularities in data structures. In Fig.2 and Fig.3 the sub-object address is optional, hence in a
particular instance of an Emp object can occur or
not. This is the way in which we are dealing with null values.
In contrast to the relational model we do not introduce the null value concept explicitly. As argued
by Date and Subieta, a special null value introduced explicitly works as a
devil which is able to spoil clarity and consistency of almost all language
constructs. Null values in SQL are frequently given as an example of
schizophrenic inconsistency and chaotic design. A thorough discussion of the
issue can be found in [Subi96a] (postscript).
Our method of dealing with null values based on optional data does not lead
to any inconsistency. Note that in this way we can deal with optional data on
any level of object granularity. For instance, in Fig.2 and Fig.3 an optional
sub-object address is a complex
object.
In a similar way we can deal with unions or variants known from C/C++
and Pascal. Pascal introduces an additional notion of discriminator of a variant, i.e. a special value which allows
during run time to recognize the actual variant and to prevent (through dynamic
type checking) its illegal use. In our case such a discriminator is not
obligatory, but can be introduced as an ordinary sub-object. Again, one can
design a special syntax of types which would make it possible to inform the
type checker that a particular object has variants and that some sub-object is
the discriminator of the variants. This makes it possible to shift proper type
checking to run time.
As shown in Fig.2 (left), on the type level irregularities in data are
covered by cardinality constraints. For instance, address is constrained by cardinalities [0..1]
and employs is constrained by
cardinalities [1..*]. The cardinalities introduce the necessary discipline to
the concept that is commonly referred to as semi-structured
data. XML is frequently associated with this concept. Without such a discipline
programming of large semi-structured data would be very difficult (or
impossible). Eventually, the programmer must be aware what the database
contains and how it is organized. This awareness must be supported on the level
of algorithmic precision. The typing system involving cardinalities fulfills
this goal. It will be introduced later.
The problem of specification, representation and processing of
semi-structured data will be presented much later, after introducing classes,
types and after defining all the necessary constructs of SBQL.
Relational Model and Nested Relational Model
The AS0 model covers the relational data structures as a particular
case. In Fig. 4 we present a relational schema, a relational table and the AS0
representation of the table.

Fig.
As we will show later, for such a data model SBQL queries are similar to
SQL, except minor syntactic differences. Thus we can claim that SBA is also
supporting the formal semantics of SQL. However, we do not strive to give the
formal semantics of all the
constructs of SQL. SQL is a very irregular language, with a lot of anomalies,
special cases, semantics reefs and own peculiarities. Defining its formal
semantics without making some order within its design makes little sense. On
the other hand, SQL is a closed language, surrounded by a lot of documents,
implementations and own user culture, thus any discussion on changing its
features is at least 20 years too late. The SQL-99 and SQL 2003 standards,
which much extend the kinds of data structures that are addressed by the
language and introduce a lot of constructs specific to programming languages,
also do not present artifacts that we consider attractive to propose changes or
contribute in any way. In our opinion, these standards will play the role of
monuments showing (for future computer professionals) that chaotic design done
by large committees, but not supported by any essential theory, leads to
useless artifacts.
Note that AS0 is richer than the relational model. So-called natural join is not expressible in the
pure relational model, because it is based on names of attributes, which are
second-class citizens in the model, i.e. they are on the level of its informal
meta-language. Such definition presents no problem assuming AS0, because
external names of objects are at the same semantic level as values. Similarly,
because the pure relational model is value-oriented, i.e. it does not involves
identifiers of relations, tuples and values of tuples, expressing updating operations is impossible or at
least requires additional notions, which would be outside the relational
theory. AS0 involves no such problems because of the assumed principle of total
internal identification.
The AS0 model covers also the idea on the nested relations (commonly
referred to as Non-First-Normal-Form, NF2) as a particular case. AS0
makes no limitations concerning the complexity of objects and the number of
object hierarchy levels. The idea how to represent in AS0 an NF2 database
is the same as shown in Fig.4. Also some models known as functional or semantic
(e.g. data structures implied by the entity-relationship model) are covered by
AS0.
XML
Data Model
XML is a syntactic convention that makes it possible to unify some
protocols of data exchange or to parameterize some system in some unified way.
However, all the noise around XML as a new database model (or format) is in our
opinion exaggeration. XML representation of data is legible for humans when the
size of XML files is reasonable (say, not larger than 100 KB). For large
databases storing megabytes or gigabytes of information XML representation is
nonsense. For several important reasons (security, buffering, transaction
processing, indexing, query optimization, etc.) such databases should be
organized according to the databases state-of-the-art, as relational,
object-relational, object-oriented or a database following another
paradigm.
For this reason special query languages addressing XML files have very narrow
meaning. For large databases XML should not be considered as internal data
representation paradigm, hence XML query languages are inapplicable. Obviously,
any database application can be equipped with wrappers that accept XML files
and convert the data from the files to the assumed database format. Similarly,
any database application can be equipped with XML generators that convert data
stored in the database to an XML file. Such wrappers and generators present no
conceptual or implementation problem. Assuming this simple idea, no XML will be
present inside the database, hence no XML-oriented query language is necessary.
We also note that some special requirements concerning XML query
languages not always look reasonable, for instance, the requirement that an XML
query should have the XML syntax. Translating this requirement to SQL, one can
claim that SQL queries should be written within relational tables, what is
possible, but obviously idiotic.
Taking into account the above cautions and the limited role of XML query
languages, we show that the XML data model can also be considered as a
particular case of AS0. We also show some features of XML that are problematic
for AS0 and therefore are problematic for query languages addressing XML. In
Fig.5 we present a simple XML file and its counterpart written as an object in
the AS0 store model.
|
<Dept> <dname>
Trade </dname > <location> <location> <employs> Doe </employs> </Dept > <Dept> <dname>
Ads </dname > <location> <employs> Poe </employs> <employs> Lee </employs> </Dept
> |
S – Objects < i17, Dept, {< i18, dname,
”Trade”>, < i19, location, “ < i20, location, “ < i21, employs, ”Doe”> } >, < i22, Dept, {< i23, dname,
”Ads”>, < i24, location, “ < i25, employs, ”Poe”>, < i26, employs, ”Lee”> } > R - Start
identifiers i17, i22 |
Fig.5. An XML file and its AS0 representation
Some XML features are not covered by AS0 thus (as a rule) present
problems for clean and consistent definition of XML query language.
A basic disadvantage of XML as a target of a query language is not
following the total identification principle. XML objects (logical parts of an
XML file) have no unique internal identifiers. Because sooner or later programs
have to refer to such objects, some identification method is the must. This is
done by the Xpath language, which uses path
expressions consisting of tag names. However, because XML objects may have
names that are not unique at the same level of the object hierarchy, Xpath involves some tricks, for instance, determining the
order number of a required object (e.g. 5-th object) or special predicates.
While such methods are acceptable for relatively short XML files, they could be
difficult for large parsed XML files stored in a structural database. Two
potential disadvantages concern poor performance and updating anomalies (e.g.
the order number n of some object A will be changed if another object A having
order number m < n will be deleted).
The order of XML objects may bear information, hence in this aspect the
XML data model and its AS0 representation are different. As we said before,
there is no problem to create a variant of the AS0 model with complex objects
storing sequences rather than sets. Alternatively, designers of XML representations
may be discouraged to rely on the order of XML objects as information bearing
feature. Note that ordering of objects generally decrease the query
optimization potential, because some query optimization methods, e.g. indices,
can loose the ordering.
XML tags may include so-called attributes, which intention is to
represent meta-information rather than information. The subdivision, however,
is volatile and arbitrary, thus indeed it is difficult to realize what the
feature is for. In our opinion, the feature is redundant and has additional
disadvantages (e.g. no DTD support), thus should be avoided. The only
reasonable method is to convert attributes to some regular XML representation
(and v/v). This ensures no special extensions to a query language. One of such
methods is illustrated in Fig.6.
|
<emp version=” <name> Doe </name> ... </emp> |
<emp> <@version> 1.2 </@version> <@date> 2006.01.10 </@date> <name> Doe </name> ... </emp> |
Fig. 6. XML file with attributes and its equivalent without attributes
XML has also
a quite strange feature allowing for mixing atomic and complex objects. An XML
object can be inserted within a string being a value of an XML object. This
feature can be difficult for a query language, hence as before we suggest
reduce it to the regular XML, for instance, as shown in Fig.7.
|
<emp> John Doe, born 1973 <address> His salary is 2500 </emp> |
<emp> <&info> John Doe, born 1973
</&info> <address> Sienna 5, <&info> His salary is 2500
</&info> </emp> |
Fig.7. XML file mixing strings and objects and its equivalent with no
mix
Arrays
in AS0
A data structure kind known as array
can be modeled in AS0 by the assumption that names of objects can be numbers.
For instance, array A[1..5] having
values subsequent values “Doe”, “Poe”,
“Lee”, “Kim”, “Noe”
can be represented as an AS0 complex object:
<i, A, { <i1, 1, “Doe”>, <i2, 2, “Poe”>, <i3, 3, “Lee”>, <i4, 4, “Kim”>, <i5, 5, “Noe”> }>
Under this assumption the access to an array element would be possible
through its index being a number; for instance, A.2 identifies the Poe element and A.5 identifies the Noe element. Usually
in programming languages an index of an array element can be calculated by some
expression. Such a possibility was implemented in Loqis
by assuming that the result of an expression that is put in square brackets is
interpreted as an object name. For instance, we can write A.[x+1], which for x = 3
returns the Kim element. In Loqis this convention was
assumed as general, hence any name (including string names) can be calculated
by an expression. This possibility reminds a bit a feature that is known as
“reflection”. Unfortunately, it is incompatible with the strong
static typing.
Such interpretation of arrays can be associated with a typing system
which will assure that for each allowed index there is a corresponding array
element. Alternatively, (e.g. for semi-structured data) we can allow for
“thin” arrays where some indices have no corresponding elements.
Some proposals, e.g. the ODMG standard, introduce arrays and sequences
as different, but a bit similar collection kinds. The idea of arrays, as
presented above, is however not quite good for sequences. The difference is
that sequences are always “dense”, i.e. removing an element from a
sequence causes that indices of all its subsequent elements are reduced by 1;
analogously for inserting an element into the sequence. As we can see from the
above array example, arrays behave differently: removing an element from an
array causes no changes of indices of other elements and inserting a new
element into an array can be impossible (we can only insert an element with a
known index, if an element with this index is absent). Hence sequences in data
structures still require a new store model different from AS0.
The above representation is good for any type of an array element. In
particular, in this way we can create multi-dimensional arrays. We underline,
however, that this view is abstract and concerns only the definition of query
languages’ semantics. In implementation arrays can be physically
represented in a more compact way.
Variants
of the AS0 model
During the work on object store models and related topics we have
investigated several candidate variants that were eventually rejected as
introducing limitations or the necessity of further (less intuitive)
definitional elements.
Variant 1: (from [Subi85]):
A database content is defined as the set of triples of the form <i1, n, i2> and
<i, n, v>. A database is defined as
previously, as a pair <S, R>,
where S is the set of starting (root) identifiers and R is a set of the above
triples. Complex objects are represented as collections of triples, e.g. a
complex object named n having
attributes a1, …, ak, pointer links p1,…,
pt and identifier i can be represented as:
<i, n, ia1>, …, <i, n, iak>,
<i, n, ip1>, …, <i, n, ipt>,
<, ia1, a1, va1>,
…, <, iak,
ak, vak>,
<, ia1, p1, ip1>,
…, <, iak,
pk, ipk>
The main problem with this model is that it does not determine the
boundary of a complex object, thus makes problems (and the necessity of
additional definitions) for updating operations, strong typing, parameter passing
and other features that need encapsulated complex objects as
a primitive data manipulation units.
Variant 2: (influenced by Java):
A database content is defined as a set of triples <i, n, {i1, i2,…,ik}> (representing complex objects
with references to their subobjects), <i, n, v>
(representing atomic object or attributes/subattrributes)
and <i1, n, i2>
(representing reference objects). A database is defined as previously, as a
pair <S, R>. For instance, a
complex object described previously can be represented in this formalism as:
<i, n, {ia1, …, iak, ip1,
…, ipt}>,
<, ia1, a1, va1>,
…, <, iak,
ak, vak>,
<, ia1, p1, ip1>,
…, <, iak,
pk, ipk>
The model makes it possible to express complex objects, but only with
one object hierarchy level. To represent objects with more hierarchy levels
some additional assumptions or additional elements of the formalism are
required. As previously, there are doubts concerning object boundaries. In
general, concerning updating of complex objects, this variant inherits
disadvantages of the previous one.
Some modifications of the above variants were also investigated. In the
result we have chosen AS0 as the simplest, homogeneous and not requiring too
much explanations or formal elements. So far all of its variants were not introducing
a new quality to formal definitions, but resulted in limitations or the
necessity of defining new elements (such as flags representing boundaries of
complex objects). Such add-ons are awkward considering definitions of
query/programming languages’ features such as updating, strong typing and
parameter passing. We have also investigated store models where (from the very
beginning) objects are associated with types; thus, for instance, object names
are properties of types rather than properties of objects. We abandoned this
idea for two reasons. First, it is possible that our data model and a query
language would be untyped (see, for instance, XQuery). Second, types present a complex issue; thus
introducing them too early in the definition, without understanding them in
their full complexity, might introduce limitations or inconsistencies.
Last modified: January 18, 2008