Object-Oriented
Concepts
Back to Description of SBA and SBQL.
Understanding of object-oriented
concepts can be significantly different. There is no common view, in
particular, on the definition and meaning of such basic concepts as object,
attribute, class, type, collection, association, etc. We will try to present
the variety of different views and definitions, however, eventually we would
like to promote a reasonable view that we will follow when we define the
semantics of a query language. We cannot present and discuss all such views.
In our presentation of object
oriented concepts we sometimes criticize some imprecise, superficial or
inconsistent definitions. This concerns fundamental concepts as class, type,
ADT, interface, collection, extent, encapsulation, etc. We will present our
arguments against some definitions and will propose definitions that have no
obvious pitfalls or disadvantages.
In the object-oriented literature
"object" has three different meanings:
·
As
a real, abstract or imaginable real world object, having its boundaries and
state. For instance, a car, a person, an ant, today's weather, yesterday John's
travel, Zeus, angel, can be considered objects. This point of view on objects
is important for philosophers, businessmen, software engineering methodologists
and database designers, but it is inessential for the semantic definitions of
query and programming languages.
·
As
an abstract imaginable entity considered in software engineering methodologies,
languages and tools. Usually there are some associations between real world
objects and abstract imaginable objects, but this is not obligatory. Such
objects abstract from many features of real objects, focusing on the features
that are important for the particular business goal or information system. Such
objects are "used" mainly in human reasoning processes, but they are
not enough precisely defined to be the foundation of the semantic definitions
of query and programming languages. An example are objects considered in UML
object diagrams - their features are sufficient for thinking, but not
sufficient for developing formal semantics.
·
As
a data structure supported by a
programming language, a query language or another software tool. It is
desirable that such data structures are 1:1 compatible with objects understood
as abstract imaginable entities or even with real world objects. However this
is not a requirement.
In SBA we are interested in only in
objects as data structures, similar to "variables" in classical
programming languages. Actually, we do not strive to make the conceptual
distinction between a "variable" and an "object". Some
professionals make this distinction assuming that objects must be members of
classes, which variables are not. This distinction introduces some additional
constraint on the object concept that does not always hold. We assume that
there could be objects being not members of classes. Thus the distinction of
objects and variables on this ground we consider superficial.
Objects understood as data
structures need not to be counterparts of some real-world material or abstract
objects. Data structures are the subject of various design processes where
modeling abstract or real objects is one of the criteria. Other criteria, e.g.
performance, scalability, maintainability might be more important and might
much misshape the original structure of objects. Moreover, the programmer can
assign to an object a state that is essentially different than the state of
some real-world object. For instance, the programmer can assign to an object
"car" the entire history of the car that is interesting from the
point of view of some decision processes. A real-world object "car"
is presented only by its current state.
Usually we must assume that objects
as data structures are incompatible for different languages or tools. For
instance, Java objects, C++ objects and CORBA objects differ in conceptual and
technical details thus there is no direct portability between them.
From the programming point of view
it is important that each object has
clearly determined boundaries and structure. Thus, an object can be
manipulated as a whole, e.g. created or deleted. The structure of an object is
usually determined by its type. However, we do not associate a type with an
object from the very beginning because there is several object-oriented
languages and systems that have no types. Of course, types are very important
especially for static (compile time) type checking of queries and programs
addressing objects. Note that clear boundaries of an object can be considered
as a major factor distinguished object-oriented systems from others. In
particular, in relational databases objects are dispersed among many tables,
boundaries of objects are not supported by the system, thus objects cannot be
manipulated as wholes. Similar remarks concern Resource Description Framework
(RDF) by W3C, which is fact-oriented rather than object-oriented.
Object Identity, Identifier, Reference
An objects has its identity, i.e. it is recognizable
independently of its current state. Concerning material objects an identity is
a philosophical category related to humans’ understanding of the real
world. This point of view, however, is not interesting for objects understood
as data structures. For this case objects have simply object identifiers (OID-s)
that are considered unique, at least within some programming or database
environment. OID-s are usually assigned to objects automatically thus have no
meaning in the real world. However, there are object-oriented systems where an
OID bears some business information too. It is possible that the same object
may have more than one OID, but such a feature is error prone thus not
recommended. Object identifiers in some contexts are also called references. The possibility to receive
a reference to an object is crucial for a lot of features in a corresponding
query or programming language. For instance, an object reference is necessary
as an l-value in updating statements,
as an argument of a delete statement,
as a parameter of a procedure transmitted through the call-by-reference parameter passing method, etc. The programmer
almost never uses a reference explicitly. It is the result produced by a query
or programming language mechanism known as binding,
or by other mechanisms. A reference can also be used as a pointer value, i.e. a link leading from one object to another. A
pointer is understood as an object identifier stored within some another
objects (as a value). An object identifier is immutable, i.e. cannot be changed by any programming option. More
precisely, if an object identifier is changed, it is equivalent to the
situation that the object is deleted and at the same time a new object with the
same state is created.
There is an important intention behind
object identifiers concerning the performance of the programming (or database)
system. If one knows an object identifier then access to the corresponding
object is to be very fast. This property of object identifiers is crucial e.g.
to use them as non-key values within indices. Very frequently object
identifiers are simply machine addresses (disk addresses, in particular). Such
a solution, however, has disadvantages concerning the maintenance of the
storage space in case when objects can grow, shrink and move. Some authors
postulate to use symbolic object identifiers (having no associations with
machine addresses), but this solution has also disadvantages (for instance,
very large hash tables). Several tradeoffs between these two extremes are known
in the literature.
Objects have names that are assigned by software designers, database designers
or programmers and used by programmers to identify them within the programming
environment. Names usually bear some external world semantics, for instance
“customer” “car” or “account”, but some
meaningless names, such as “x” or “J23” are also
possible. Similarly to names of material objects, object names need not be
unique even in the same environment. For instance, there could be many objects
named “customer”. Concerning typical object-oriented programming
languages, however, within the same environment object names must be unique.
Objects with the same names have conceptual intersection with a collection or can be considered as an
equivalent of a collection. Allowing the same name for many objects is the
property of XML.
Note that in many programming
languages object names are second-class citizens i.e. they are properties of
source program code rather than run-time environment. However, the second-class
citizenship of object names is irrelevant for conceptual modeling – it
can be considered as purely optimization issue. In object-oriented databases
one must assume the first-class citizenships of object names, because objects
exist independently of any programs.
As for material objects, objects may
have several names (known as synonyms). Usually the programming languages and
discipline avoid such freedom. However, the possibility to assign many names to
objects opens new possibility for conceptual modeling; for instance, the same
object can be simultaneously named “person”,
“employee”, “student” and “patient”. The
disciplined version of such a possibility does not treat such names as
synonyms, but as roles of an object.
The same object is perceived and processed differently depending on a role name
used by the programmer.
Objects are usually containers of
named values known as attributes.
Values of attributes form an object state.
Attributes represent some properties of objects. There are several kinds of
attributes (not always disjoint) that occur in different languages and systems,
in particular the following ones:
· Atomic attribute that is characterized by a single atomic value of some type. For
instance, the value is “Doe” for the attribute named NAME of an
object named PERSON.
· Complex attribute that consists of several atomic or complex sub-attributes.
· Optional attribute that may or may not occur in particular object. Optional attributes has
some conceptual connotations with null values, known from relational databases.
· Repeating attribute i.e. an attribute that may possess many values, in many cases with
unpredictable number of occurrences.
· Binary large
or multimedia attribute that is an
atomic attribute, but very large size of its value requires special treatment
during processing.
· Pointer attribute that contains as a value an OID of some object.
Some combinations of these kinds are
possible, for instance, a repeating complex attribute. In a system with types
each value of an attribute has determined its type.
There are also some other attribute
kinds that are actually some database or programming abstractions, in
particular:
· Default attribute: it is taken for the given object when a corresponding optional
attribute is absent.
· Derived attribute, which value is calculated on-the-fly during accesses to it. A derived
attribute is equivalent to a method
attached to the corresponding object.
· Class attribute: an attribute that is stored within a class of an object and is the same for
all the objects of this class.
· Procedural attribute: an attribute which value is some executable code.
· Dictionary attribute: an attribute which value is a reference to some predefined set of
values known as dictionary. A dictionary may contain atomic or complex values.
If a dictionary is predefined and cannot be updated then the attribute type is
know as enumeration. A dictionary
can also be updatable.
This list of possible kinds of attributes is
probably not complete. Such classification of attributes may have some meaning
for explaining the conceptual structure of objects. However, if we treat an
object as a data structure we take the position that the distinction between
objects and attributes is superficial, unnecessary and leading to extra complexity
of speciofication, implementation and documentation. The distinction was made
in the ODMG standard, where attributes were considered literals having no identities. Such a point of view is correct when
we consider objects on the conceptual level, but it is a nonsense when we treat objects as data structures that have to be
updated.
It is more reasonable to consider attributes as
objects nested in other objects. The semantic
relativity of objects claims for the same syntax, semantics and
pragmatics of language constructs for all the object nesting levels. When we
consider conceptual objects, updating them is usually not a relevant issue.
This point of view is unacceptable for objects treated as data structures,
because in this case updating of any its atomic or complex part is inevitable.
Hence the total
internal identification
principle that claims for internal identifier for each, even smallest and
atomic part of an object. An internal identifier is necessary as a reference
being the heart of updating constructs. Taking both these claims, the best and
most universal assumption is that a complex object consists of objects, which
may consist of objects, etc. A consequence of this assumption is the existence
of objects that have no attributes but only an atomic value. Each external and
nested object has the same general properties, i.e. internal identifier,
external name and some value, perhaps complex or encapsulating some
abstraction. From the point of view of query/programming language all objects,
including nested ones, are to be served by the same binding mechanism. In this
setting the concept of attribute is unnecessary. So far, we did not discover
any serious disadvantage of such assumption.
Any operation on an object is
performed by some language construct that in object-oriented models is called method. Typical object-oriented
literature couples all the methods that are relevant for some objects within classes. However, as we said before, not
all object-oriented models and systems deal with classes. Hence our view on
methods is wider. We subdivide operations on an object into two groups:
· Generic operations that are build-in
for the given language.
· Added-on operations that are defined
by a software designer or programmer and somehow associated with an object. The
storage for added-on operations is usually called a class, but the nature of classes is complex thus we discuss it
separately.
It is possible to build an
object-oriented language or system having only generic operations and no
classes. Moreover, it is impossible
to build an object-oriented system having no generic operations, even if the
system introduces classes. Several generic operations are inevitable, in
particular, binding (changing object name into object reference), creating,
updating (so-called setter),
deleting, inserting, and perhaps others. A very shallow view on encapsulation
and demagogic rhetoric claim for no generic operations on objects –
everything must be done by user-defined methods. This is however, technically
inconsistent and based on some mental and technical shortcuts. We return to this
issue when we discuss the concept of encapsulation.
Several authors (in particular,
Darwen and Date) claim that lack of generic operations, such as binding to
attributes, makes it impossible to build a query language acting on an
object-oriented database. We strongly disagree. Their arguments concern not
object-orientedness as such, but some community of professionals that are
trying to promote object-orientedness in an ideological, shallow and
inconsistent way. In purely technical terms (out of any shallow ideology) there
is no contradiction between object-orientedness and query languages. On these
pages we present SBA - the most universal, consistent and formal methodology of
constructing query languages addressing any object model, starting from very
simple and ending at very sophisticated one.
Because of the object relativity
principle some objects, especially atomic ones, have predefined operations. For
instance, an object having the type integer has predefined operations such as
+, -, *, /, <, =, >, etc. Such an object is considered the same being as
“variable” known from classical programming languages.
An important generic operation on an
object is binding to its subobjects (attributes). There is some ideological
tenet that such a binding is improper for object-orientedness and should be
substituted by two methods: some getter
(returning the value of an attribute) and some setter (assigning a new value for an attribute). It is not clear
why such a doubtful ideological assumption is promoted, the arguments in favor
are religious rather than technical. The assumption, however, is justified in
the standard CORBA, because this standard is based on adapters (wrappers) on
the server side that “understand” only methods. The assumption is
not relevant for object-oriented databases; we will return to this issue when
we discuss encapsulation.
Links (relationships, associations) Between Objects, Pointers
The entity-relationship model by
P.Chen has introduced the concept of relationship
as an important conceptual modeling feature. Similar features are introduced in
modern object-oriented methodologies and notations, in particular in UML as associations (we use this term in the
following). On the conceptual modeling level associations are depicted as lines
connecting symbolic representation of real-time entities or classes. These
lines are named, for instance, the association connecting entities Person and Company is named worksFor,
Fig.44. Beside this name, UML introduces names of roles that are attached to ends
of a line, for instance a role Employee
attached to Person and the role Employer attached to Company. Associations can be binary,
i.e. they connect two entities or classes, or can be n-ary, n > 2, when they
connect more than two entities or classes. Binary associations are additionally
qualified by cardinalities, i.e.
symbolic description of the number of occurrences of instances on both sides of
an association. In some proposals associations can be qualified by attributes
(CORBA Relationship Service, OMT) or by a class (UML), e.g. EmplymentDetails in Fig.44.

Figure 44. Association worksFor with attached roles,
cardinalities and class EmploymentDetails
We do not describe the issue in more
detail, as it is the subject of many popular textbooks. For us, the only
interesting thing is how associations will be represented on the level of
object-oriented data structures. While (strange enough) updating is not
considered an issue on the conceptual level, it is necessary for any data
structures, including machine representation of an association. The problem has
not been investigated by many researchers. To the best of our knowledge, the
CORBA Relationship Service is one of few solutions (if not unique). The
solution is very sophisticated and clumsy. It much reminds the old CODASYL DBTG
standard for the network model, which has eventually lost in competition with
the relational model.
The most straightforward method is
to consider machine representation of an association instance as a structure of
references leading to proper objects. For example, Fig.44, an instance of the worksFor association will be considered
as a pair <iP, iC>,
where is a reference to a Person
object and is a reference to a Company
object. This approach is quite universal, allowing to deal with attributes of
associations and n-ary associations. However, when we assume that associations
can be decorated by attributes or classes (as in UML) it will be very difficult
to distinguish (on the level of data structures) such an association instance
from an object. At least, it seems that making such a distinction is
unreasonable from the point of view of programmers, as it leads to two
different notions – objects and association instances – having
practically no structural, semantic and pragmatic differences.
In conclusion, on the level of data
structures it makes sense to leave only binary associations with no associated
attributes or classes. We propose to name them links. The most natural and efficient way to implement links is to
consider them as references (actually, pointers).
Such an approach minimizes the number of concepts, is easy for updating of
links, is easy in implementation and for query optimization. The approach is
also quite universal, as any n-ary
association with associated attributes or a class can be reduced to an
additional class with n links. The
approach is assumed for the ODMG standard and in our opinion this was very
reasonable decision. In Fig.45 we show how ternary association Deal with attributes joining Buyer, Seller and Broker classes can be
changed into an additional class Deal
and three binary associations (i.e. links). In Fig.45 we show the same on the
level of data structures.

Figure 45. Changing a ternary
association Deal into a new class Deal and three links

Figure 46. Changing two instances of
a ternary association Deal into two
additional object Deal and binary
links
Hence, we advocate to use as a
representation of links the well known concept of pointer. From the programmer point of view, a pointer is a quite
easy notion. Pointers are an excellent facility for navigation in
object-oriented databases and in the following we will show how this is
utilized in SBQL. Updating of association instances is till now unknown
feature, while updating of pointers does not cause essential conceptual,
implementation or pragmatic problems. An object-oriented data structure with
pointers representing links is depicted in Fig.47.

Figure 47. Objects linked by
pointers worksIn, employs and Boss
There is some bad fame around
pointers. Pointers in C/C++ (more specifically, pointer arithmetic) are
considered error prone. In our case pointers are leading to objects rather than
to memory addresses, hence they have little in common with C/C++ pointers. Thus
there is no reason that such pointers have to be error prone. Another bad fame
concerns so-called dangling pointers,
i.e. pointers that lead to an improper place or non-existent object due to
removing some objects. Careful design of this feature (proper garbage
collection or so-called backward pointers)
makes it possible to avoid this danger. Another bad fame, promoted by advocates
of the relational model, is that pointers decrease data independence and should
be removed (changed into values) during the process of
“normalization”. We strongly disagree with this false stereotype.
Because data independence is not defined in technical terms (there is no
objective measure of it), it is a matter of rhetoric abilities or some demagogy
what does it mean “decreasing data independence”. But for sure,
pointers, as implementation of associations, decrease the distance between
conceptual and implementation models, thus much support conceptual modeling.
Moreover, they support performance by reducing demands on costly join operators
(as will be shown) and much simplify queries by extending capabilities of path expressions.
Pointers can be defined alone, as Boss in Fig.47, or can be paired into
twins, as worksIn and employs. Pairs of pointers are the
subject of a special integrity constraint saying that if one from the twin
pointers is updated, another twin is immediately updated too. This is well
described in the ODMG standard in the part devoted to the C++ binding. For
instance (Fig.47), if one updates the worksIn
pointer within the Doe object and the updating concerns moving Doe from the
Syntex to the Poltex company, then the twin pointer employs within the Syntex object is removed and the pointer employs leading to the Doe object is
inserted into the Poltex object.
Cardinalities of an association
belong to typing information. We will return to this issue when we consider
types.
The concept of class is an
abstraction in thinking, understanding the world and programming, which intention
is to capture the structure of objects as well as operations (methods) that are
attached to objects. Classes are perhaps the oldest mental abstraction invented
by humans and exist in all known human tongues usually as nouns denoting
abstract concepts such as animal, stone, food, fire and home. Abstract concepts
denote some “ideals” that have many concrete instantiations.
There are many definitions of the
class concept, as well as many misunderstandings. A class is frequently
confused (or identified) with a class
extent (i.e. currently stored collection of objects being members of the
class), with an object domain (i.e.
infinite collection of all objects that can be potentially members of the
class), with a type, with an abstract data type and with an interface. Unfortunately, in general all
these concepts are not synonyms and in different contexts have different
meaning (sometimes – fundamentally different meaning). Additional
complication to the class concept is introduced by the fact that classes may
have three incarnations: (1) as pieces of source code; (2) as elements of some
metamodel (i.e. some internal data structure bearing information stored in
classes); (3) as a run-time entity that participates in evaluation of
expressions, queries or programs.
In this text we make some order in
definition of classes, types, extents, domains and interfaces by assigning to
these concepts the most expected pragmatic roles. However, we are fully aware
that our definitions can be incompatible with a lot of various sources where
these concepts are defined and discussed. For instance, in OMG UML/OCL class is equivalent to type. Such unification can be justified
by the necessity of reducing conceptual modeling notions, but it is not
reasonable for query/programming languages.
The most essential distinction of
the class concepts from concepts such as types, interfaces, extents and domains
is that classes contain essential program
code, i.e. implementation. Hence,
classes can be the subject of trade (because manufacturing of them is usually
costly), while types, interfaces, domains and extents usually cannot occur in
this role.
We assume the definition of a class
that is most common and most relevant to all popular cases of object-oriented
modeling tools and object-oriented programming languages.
A class is an
entity that stores invariants of
objects, that is, properties or features that are common and constant for each
member of some population of objects.
The relationship between a class and
objects is usually called membership.
An object being a member of a class is equipped in all the invariants that are
stored within the class.
Invariants concerning a particular
object can be stored in several classes, which create some structure known as inheritance or generalization/specialization. Conceptually, more general classes
contain more general invariants, which (usually) concern richer population of
member objects. In many programming cases classes are simply some sets of
invariants. In conceptual modeling classes usually bear essential (business)
semantics that correspond to human understanding of some universe of discourse.
Two kinds of invariants that are
stored within classes are the most common:
· Structure of
member objects. Usually a class determines names of objects’ components
(attributes) and their types.
· Methods
(sometimes called operations or behavior) that can be executed on
objects. Specification of methods include typing information (types of parameters
and a type of a returned result), implementation code and possible exceptions.
Some authors postulate additionally to introduce some constraints on methods,
such as pre-conditions (constraints on a state and parameters when a method
execution is started) and post-conditions (constraints on a state and
parameters when a method execution is terminated).
Some explanations takes the point of
view that classes are blueprints for
object, i.e. determine the object’s conceptual structure. This point of view,
however, does not take into account many other aspects of classes.
There are many other invariants that
can be stored inside classes:
· Objects’ name. A class determines the “external” name of all objects
being members of the class. This is implicitly assumed in several proposals,
including UML, the ODMG standard and the CORBA standard. Indeed, if we consider
a schema of object instances, the name of an object is the major conceptual
information. In programming languages, however, a class does not determine a
member object name. This follows from two different views on programming: data centric (specific to databases and
business applications) and program
centric (specific to object-oriented PLs). In the data-centric view objects
have a predefined name determined by a class. When a class determines the
object name, some issues are simpler, e.g. how to type pointer links.
· Persistence status. In some proposals a class bears information if member objects are to
be persistent (e.g. stored in a database) or not. In proposals and systems
based on the orthogonal persistence classes do not involve this feature;
persistence is determined e.g. during object creation.
· Links (associations,
relationships) with member objects of the same or (usually) other classes.
Links are usually named, belong to the structure of objects and typed
accordingly to the objects’ type system.
· Interface, export list, private/public declarations or other means that subdivide
properties of classes and their member objects into private and public
(sometimes with some tradeoffs like protected).
The consequence of the subdivision concerns scoping rules for names of
particular object properties (known also as encapsulation
or information hiding). Names of
public properties can be used externally. Private properties can be used only
internally by program entities (usually methods) stored inside classes.
· Default values for object attributes. They are used when a new object is created or
assigned when an attribute takes a null value.
· Allowance for null values. Null values can be considered properties of attributes’ types.
· Events or exceptions that can occur within the methods of a class, and code blocks (triggers)
that are provided to react on them.
· Integrity constraints that restrict possible object states (or possible state changes) and
provide reaction on operations that violate the constraints.
· Derived attributes that are to be dynamically calculated during run time; they can be also
considered as functional methods.
· Many other information, including icons for visualization of objects, security, safety,
privacy and business rules, ontological information, help information, etc.
A class can also store some
information that is invariant not just to the class members, but to a class
extent, i.e. to the collection of currently stored members. This concerns
so-called static attributes and static methods. For instance, the attribute number of objects is an information n
the entire population of class members, and the method average age of employees concerns the entire population of the
members of the class EmployeeClass.
In general, this feature can be questioned, since it mixes up different levels
of discourse: the level of objects and the level of collections of objects.
More clean situation we obtain when for collections of objects one would define
a special class, known as power set of
class. For instance, we can have a class EmployeeClass and a power set of class EmployeeCollectionClass, where we store all static attributes and
methods.
An essential issue concerns class citizenship. A class has
frequently the second-class citizenship,
what means that it exists in a source code rather than in runtime. This
concerns majority of object-oriented programming languages. For object-oriented
databases classes should exist as run-time database entities, i.e. they should
have first-class citizenship. This
makes it possible to introduce many capabilities that are typical for database
management systems, such as dynamic inserting some methods to a class (e.g.
virtual attributes), dynamic inserting constraints or business rules, changing
default values for object creation, etc. A class, as first-class citizen, is
stored in some environment (e.g. in a databases) as a special object and can be
processed by typical capabilities, e.g. by a query language. Some mixed
solutions are possible when some properties of the class exist only in its
source code (e.g. information on attributes) and some other exists in source
code and are accessible and can be mapipulated during run time (e.g. methods).
Class Hierarchy and Inheritance
A class can be built according to
formal factoring out invariant properties from some population of objects.
Frequently, however, classes have some conceptual meaning for the business
domain, as a general abstraction that is used by designers or programmers to
reflect properties of some population of objects. Such classes has strong
relationships to the domain of discourse and correspond to the system of
notions and concepts that is developed for the domain. For instance, some
designer has developed a class Student
with attributes firstName, lastName, dateOfBirth, student#, yearOfStudy, faculty, and a class Employee
with attributes firstName, lastName, dateOfBirth, employee#, job, salary,
company#. Then, he/she realizes that
these classes are something in common: conceptually, both kinds of objects are
special cases of an object Person,
and technically, attributes firstName,
lastName, dateOfBirth are the same in both classes. Hence, he she decided to
create the class Person with
attributes firstName, lastName, dateOfBirth and establishing a special kind of relationship between
classes Person, Student and Employee, as
shown in Fig. 48. This relationship is called inheritance.

Fig.48. Class hierarchy and
inheritance
Inheritance introduces some
structure to classes. The number of a class hierarchy level is usually
unlimited (although for human perception it should not be too big). Inheritance
means that during processing of objects of the class Student the programmer can
use all the properties of the class Person.
Inheritance reduces the amount of conceptual information: the properties from
higher-level classes are reused by lover level classes. This also concerns
code, for instance in Fig.3 the code for the method age is reused by classes Student
and Employee.
The hierarchy of classes is a graph
without cycles. Multiple inheritance
means that a class can inherit properties of more than one classes. For
instance, the Employee class can inherit
not only from the Person class, but
also from a TaxPayer class. Multiple
inheritance is popular for modeling business domains, but it is less popular
for programming languages. In particular, Java has single inheritance only.
This limitation can be treated as not severe, however it increases the distance
between conceptual and implementation models.
Multiple inheritance leads to the
problem if properties inherited from different classes have the same name.
There are some methods to resolve this situation (e.g. in C++), but in general
there is no good solution. Name conflicts leads to violation of
substitutability and/or open-close principles, the basic principles of
object-orientedness.
Inheritance usually determines one
aspect among several possible aspects. For instance, employees can specialized
according to profession, according to experience, according to gender,
according to education, etc. In some real situations two or more aspects can be
considered. Multiple aspects lead to multiple inheritance and/or to explosion
of permuted classes. Multiple aspect inheritance lead to the concept of object
roles and dynamic inheritance.
Another disadvantage of inheritance
concerns repeating inheritance. One can easily cope with the situation when a Student class inherits from a Person class, but this simple dependency
does not work if we want to cover the case that one person can simultaneously
study at more than one faculty or university. In this case the inheritance must
be substituted by association, what makes the model more complex to understand
and manipulate (e.g. no substitutability). As previously the issue can be
resolved by dynamic object roles and dynamic inheritance.
Substitutability and Open-Close Principles
The substitutability
principle is formulated as follows. If subclass B inherits from a superclass A,
then an object b being an
instantiation of B can be used in all
places of a query/program in which object a
being an instantiation of A can be
used. For instance, an object Student
can be used in all places when the object Person
can be used, because a class StudentClass
inherits from the class PersonClass.
Although this principle (known also as LSP, Liskov Substitutability Principle)
seems to be obvious, it must be taken with care, at least for three reasons.
The first concerns updating, where straightforward application of this
principle leads to anomalies or (at least) different semantic interpretations.
The second case concerns parameter passing, where the requirement of strong
type checking leads to a long (and a bit academic) dilemma concerning
co-variance and contra-variance. The last, most difficult case concerns the
case when an object name is a property of its type (this does not occur in
programming languages, but is a strong rule in database schemata). For such a
case the object-oriented model based on substitutability makes little sense and
must be superseded by some more general model. Substitutability, together with
the open-close principle, is also contradictory to the
straightforward concept of collection, fundamental for databases. From these
three concepts you can take any two, but not the three together. The
alternative for substitutability is an object model with dynamic object roles.
The open-close
principle is formulated as follows. Each class can be closed for
modifications, but it is still open for specializations. The principle is the
basis for program reuse and method polymorphism. For instance, we can buy
within some compiled library a class Person,
together with all the methods that are implemented for the class. We have no
possibility to change anything in this class, because its source code is
unavailable. Nevertheless, we can specialize this class by classes Student, Employee, Customer, etc.
Each specialization inherits the implementation that is already done within the
class Person and augments the
implementation by new methods, specific for the given specialization. Moreover,
each specialization can override some
method implemented within the class Person
by a specialized method implemented within this specialized class and having
the same name (plus compatible parameters) as the name of the method within Person.
As we have
noticed before, the open-close principle is contradictory to the substitutability principle and the straightforward
concept of collections of objects.
Interfaces
Standards and languages such as
CORBA, ODMG, Java, COM/DCOM and C# introduce the concept of interface, which
contains the full information on properties of objects and classes that are available
as retrieval or manipulation capabilities that can be used by programmers.
Usually an interface corresponds to a class. An interface has a conceptual
meaning for the programmer, as it specifies with the necessary precision what
an object of the given class contains and how it can be manipulated. An
interface does not present all the knowledge on a corresponding class.
The fundamental difference between
an interface and a class is that a class can be the subject of trade (because
it contains an executable code), while an interface cannot; it can be only
published.
One interface can be associated with
many implementations, for instance, different classes offered by different
companies. Similarly, one class can be associated with many interfaces. This
feature can be used to restrict rights of particular users to use properties of
classes.
Although very similar, the concept
of interface is different from the concept of type. The main difference concerns the pragmatic roles: interfaces
are to inform the programmer on the structure of objects and possibilities to
manipulate them, while types are constraints concerning the contexts in which
the corresponding objects can be used in programs. There could be also some
syntactical and functional differences, for instance, an interface can specify
exceptions or run-time constraints that are not relevant to types. Moreover,
there are types that are independent from interfaces, for instance, atomic
types. Types can be defined recursively, interfaces cannot.
Interfaces can be considered as the
specification of public (exported) properties of classes. However, inheritance
of interfaces has different properties than inheritance of classes. Actually,
inheritance of interfaces is based on operations on the text: if interface A
inherits from B it means that the text B is somehow included into the text A.
This may or may not be associated with the inheritance of implementation.
Current popular concept of
interfaces usually neglects a very important feature, namely, side effects that
can be done by a class. Side effects concerns everything that can be retrieved
(passive side effects) or updated (active side effects). Checking side effects
is crucial for reliability of programs thus the programmers should be fully
aware of them. This was the idea of import
lists in Modula-2 and other languages. Unfortunately designers of currently
popular programming languages treat this feature as secondary.
In many sources there is minor
difference between classes, interfaces and types. Often, these concepts are
unified by some common syntax, e.g. declaration of classes. However, from the
pragmatic point of view, the concept of type is different from the concepts of
classes and interfaces. The main pragmatic role of classes is to keep implementation
of methods and determine objects membership. The main pragmatic role of
interfaces is to specify all the possibilities that programmers can use to
retrieve and manipulate objects. Types
are determined by interfaces, but types can also exist without interfaces.
Moreover, interfaces specify only public properties of objects, but type
checking must concern also private properties. Hence the following main
pragmatic role of types:
Types are intended as constraints on
the construction and behavior of any program entities (in particular, modules,
objects, values, links, procedures, methods, etc.) and static constraints on the query/programming context in which these
entities can be used.
A static constraint concerns the source
program and is checked before the program is executed. Some programming
languages, e.g. Smalltalk and Ruby, offer the possibility to check types during
run time. However, dynamic type checking is less reliable than static type
checking, because some type errors may not be discovered during testing.
Additionally, dynamic type checking causes that programs are slower, because
type checking is an additional run time activity. On the other hand,
programming languages without types or with dynamic type checking are much
easier to develop and implement.
The object-oriented literature
contains many patters concerning static typing systems. For instance,
interfaces written in IDL (interface Definition Language) of the CORBA standard
contain mainly the typing information. However, making consistent and useful
typing system is not an easy task. For instance, the typing system assumed by
ODL (Object Definition Language) of the ODMG standard is criticized as
inconsistent thus non-implementable.
The
typing community attempts to make an impression that the issue of types is
exhausted: everything what is necessary in this respect is already discovered
and investigated. The history of strong typing has more than 30 years and evolved from the
Pascal type system based on name equivalence up to modern concepts based on
structural equivalence and inclusion/parametric polymorphism. There are
thousands of papers devoted to types and a lot of strong mathematical theories.
Many of them deal with bulk types (collections) typical for databases and
database programming languages, including query languages. There are also many
practical proposals of typing systems implemented in popular object-oriented
programming languages (e.g. C++ or Java) as well as in research prototypes, e.g.
PS-Algol, DBPL, Galileo, Napier-89, Fibonacci, and many others. An overview of
these proposals from the strong typing perspective can be found in [AtBu87,
Atki95].
Experts in strong typing
usually adopt the simpleminded notion that a type is a (possibly infinite) set
of values and that a variable of that type is constrained to assume values of
that type [AtBu87]. Mathematically, the notion is clear and (what is important
for formal research) makes it possible to develop very advanced type theories
with a lot of formal properties and theorems. However, there are features of
query/programming languages and environments that make it difficult to adopt
the above notion. We subdivide these features into four groups:
· Irregularities in
data structures, such as null values (optional data), repeating data
(collections with various cardinality constraints), exclusive variants/unions,
pointer links between data, unconstrained data names (e.g. CORBA attributes in
XML), and perhaps others. Such irregularities make the strong typing much more
difficult (or even impossible) and may cause the necessity for shifting type
checking to run time.
· Ellipses, automatic
coercions, automatic dereferences and other options of query/programming
languages. Such features are introduced by designers of database
query/programming languages to increase user friendliness of programming
interfaces and to relieve the programmers from annoying, too formalistic and
too verbose style of writing queries/programs.
· Features of types
not covered by above simpleminded type notion, such as mutability, collection
cardinality constraints, collection types (set, bag, sequence, etc.), type
names (for name type equivalence), methods/procedures/functions, and perhaps
others.
· Other properties of
the programming environment, such as interfaces, classes, abstract data types,
inheritance and multiple inheritance, dynamic object roles and dynamic
inheritance, modules, export and import lists, scoping rules for names
occurring in queries, etc.
Although the
literature contains explanations of some notions (e.g. multiple inheritance
[Card84], or abstract data types [Mitc88])
in general, such explanations suffer as a rule from two kinds of drawbacks:
· They assume a very
strict formal model which hardly meets practical situations;
· They isolate some
particular aspect from other aspects of the strong typing problem. In practice,
however, all aspects must be considered together and no such isolation of a
particular aspect is possible.
Such papers are
like islands in the sea of decisions which the developers must take during
development and implementation of a type system. As our experience has shown,
typing environments for object-oriented or XML-oriented databases and their
query languages present so many peculiarities that the ideas presented in these
papers are not much helpful and in most cases inapplicable.
Roles of a typing system are the following:
·
Compile-time type checking
of query operators, imperative constructs, procedures, functions, methods,
views and modules;
·
User-friendly, context
dependent reporting on type errors;
·
Resolving ambiguities with
automatic type coercions, ellipses, dereferences, literals and binding
irregular data structures;
·
Shifting type checks to run-time,
if it is impossible to do them during compile time;
·
Restoring a type checking
process after a type error, to discover more than one type error in one run;
·
Preparing information for
query optimization by properly decorating a query syntax tree. Without a typing
system major query optimization methods, such as query rewriting and using
indices, are much more difficult or impossible to develop.
In general, we must distinguish internal
and external type systems. The
internal type system reflects the behavior of the type checking mechanism,
while the external type system is used by the programmer. A static strong type
checking mechanism simulates run-time computations during compile time by
reflecting the run-time semantics with the precision that is available at the
compile time. The internal type system operates on signatures, i.e. internal
representation of types. Signatures are additionally associated with
attributes, such as mutability, cardinality, collection kind, type name,
multimedia, side effects, etc. For each query/program operator a decision table
is provided, which determines allowed combinations of signatures and
attributes, the resulting signature and its attributes, and additional actions.
Such a type mechanism will be described in next chapters in detail.
In the object-oriented literature the term polymorphism has two different meanings. The strong typing
community defines type polymorphism as the ability of writing procedures that
accept parameters of different types (possible, infinitely many types) and
returns an output of many types. Such a feature is necessary for generic
programming, i.e. programming with unknown types. For instance, one wants to
write a sorting procedure that sorts everything, independently on the type of
the collection element. Generic programming is difficult or impossible in
statically bound and strongly typed languages, thus the demand for new, more
flexible typing systems. Two most known type polymorphism is inclusion polymorphism (equivalent to the
substitutability
principle) and parametric
polymorphism, where types or classes are parameterized by types. Parametric
polymorphism is introduced in such programming languages as SML and Quest;
however, actually the idea is advocated within narrow academic circle, probably
with no chances for implementation in widely used programming languages. An
alternative to parametric polymorphism is linguistic
reflection, which means the possibility to write a program that generates a
program that can be immediately executed. Linguistic reflection is easy to
implement, especially in script languages, and it is fully universal concerning
generic programming. Unfortunately, it undermines strong typing. Reflection is
known from such facilities as dynamic SQL, ODBC, JDBC, DII of CORBA and others.
In the second meaning polymorphism means the possibility to create
different methods with the same name and signature. Each such a method is
assigned to a specific class and the binding is to be dynamic: after
recognizing the class of an object during run time, a corresponding method is
executed. For instance, in an abstract class PersonClass there is a method Income,
perhaps with no body. In specialized classes EmployeeClass, StudentClass
and OldAgePensionierClass there is a
method Income, each with an own
(different) code. When message Income
is directed to an object x, then the
system recognizes to which class x belongs, and then, invokes a proper Income method.
In the second meaning the polymorphism is considered an important
facility for abstraction and code reuse. For instance, the method Income can be designed on the abstract PersonClass level and a programmer that
works on this level can use it without thinking about specialized classes. On a
lower abstraction level each specialized class must implement this method
according to the business logic of the class.
Some authors confuse the above two meanings of polymorphism or
(hopelessly) try to unify them.
Dynamic object roles and dynamic inheritance
In a popular object-oriented literature we meet
frequently a statement “a student is a person” emphasizing
the obvious fact that for a given time moment a student is always of person.
Such a statement expresses static inheritance, that is, inheritance that is
independent on time (or inheritance which neglects the time). However, from the
point of view of conceptual modeling, considering some time scale, it is much
better to say that “a person becomes a student”. Indeed, a
person takes the role of a student only for some time, then after that time a
person is no more a student. This is the case of dynamic inheritance which is
dependent on time. Equivalently, we can say that a person for some time takes a
role “student”. At the same time, a person can take many other
roles.
The idea of dynamic object roles and dynamic
inheritance assumes that a real or abstract entity during its life can acquire
and lose many roles without changing identity. The roles appear during the life
of a given object, they can exist simultaneously, and they can disappear at any
moment. For example, a certain person can at the same time be a student, a
worker, a patient, a club member, etc., Fig.49. Similarly, a building can be an
office, a house, a magazine, etc.

Fig.49. Roles of an object
Dynamic object roles are useful both for conceptual modeling and for implementation. The concept could much facilitate such modeling tools as UML and could be an important paradigm on object databases. Note thaa we do not postulate to replace static inheritance by dynamic inheritance with roles. Static inheritance is useful in majority cases. Dynamic roles and dynamic inheritance can complement static inheritance in many important cases.
Dynamic object roles have had for several years the reputation of a notion on the brink of acceptance. There are many papers advocating the concept. On the other hand, many researchers consider applications of the concept not sufficiently wide to justify the extra complexity of conceptual modeling facilities. Moreover, the concept is neglected on the implementation side - as far as we know, none of popular object-oriented programming languages or database systems introduces it explicitly. Some authors assume a tradeoff, where the role concept is the subject of special design patterns, applied both on the conceptual modeling and the implementation sides. The disadvantage of design patterns is that this simple notion is substituted by a combination of other notions (e.g. aggregations), which actually means warping the original concept of a modeler, designer or implementor.
The role concept assumes that an object is associated with other objects (subobjects), which are modeling its roles. Object-roles cannot exist without their parent object. Deleting an object causes deleting all of its roles. Roles can exist simultaneously and independently. A role can have its own additional attributes and methods. Two roles can contain attributes and methods with the same names, and this does not lead to conflict. This is a fundamental difference in comparison to the concept of multiple inheritance. Relationships (associations) between objects can connect not only objects with objects, but also objects with roles and roles with roles. For example, a relationship works_in connects an Employee role with a Company object. This makes the referential semantics clean in comparison to the traditional object models. Roles can be further specialized as subroles, sub-subroles, etc.; e.g. Club_Member can be can be specialized by a role Club_President.
The role concept requires introducing composite objects with a special structure, semantics and generic operations. In the following we describe the structure formally and present assumptions of a query/programming language supporting generic operations to process such structures. Our idea to deal with dynamic roles in a query language is based on the stack-based approach and is probably the only current approach which can naturally adopt the concept.
We assume that an object can contain many sub-objects called roles.
These subobjects can be inserted and removed at run time. Roles have different
types. A role has own attributes and behaviour. Identical names in two or more
roles of different types do not imply any semantic dependency between
corresponding properties. For example, a person can play simultaneously the
role of an employee of a research institute with the attribute Salary, and the role of an employee of a
service company with the attribute Salary
too. These two attributes exist at the same time, but except for the name
no other feature is shared, including types, semantics and business ontologies.
A role dynamically “imports” attributes (values) and behavior from
its super-roles, in particular, from its parent object. In Fig.50 we present an
example showing basic features of the store model with dynamic roles. The
following features are presented:
·
An
object (shown as a grey rectangle with round corners) has one main role (Person) and any number of specializing
roles (Employee and Student).
· Each role has its own name, which can be
used to bind the role from a program or a query. The presented objects can be
bound through name Person (each),
through name Employee (2nd and 3rd) and
through name Student (3rd and 4th).
Each binding returns the identifier of a proper role (or the identifiers of
proper roles in case of multi-valued bindings).
· Each role is encapsulated, i.e. its properties
are not seen from other roles unless it is explicitly stated by a special link
(shown as a double-line with the black diamond end). In particular, a role Employee imports all properties of its
parent role Person. For example, if
the second object is bound by name Person,
then the properties {Name Brown,
BirthYear 1975} are available;
however if the same object is bound by name Employee,
then the properties {Salary 2500, Job
analyst, Name Brown, BirthYear 1975}
are available.
· Each role is connected to its own class.
The connection is shown as a grey thick continuous arrow. Classes contain
invariant properties of corresponding roles, in particular, names (first
section), attributes and their types (second section; attribute types are not
shown) and methods (third section).

Fig.50. Objects and roles as data structures
Links can join not only objects with
objects, but also objects with roles and roles with roles. For example, a link works_in joins an object Company with a role Employee, Fig.50. A similar link studies_at joins a role Student
with an object School. If such a link
leads to Employee, it indirectly
leads to Person, because the role Employee imports the properties of its
parent object Person. However, after
accessing the object via such a link, the properties of the role Student remain invisible. As follows, a
role identifier must be different from the identifier of the corresponding
object. Fig. 50 shows also a link is_a_customer_of
between objects Person and Company. Accessing the object Person through this link implies that
any role of the object Person remains
invisible.
The possibility to create links between roles
is a new quality for analysis and design
methodologies and notations (such as UML). Links must lead to parts of
objects, not to entire objects. To model this situation the methodologies
suggest using aggregation/composition. Such an approach implicitly assumes that
e.g. an Employee is a part of a Person on the similar principle as an Engine is a part of a Car. Although the approach achieves the
goal (e.g. we can connect the relationship works_in
directly to the Employee sub-object
of Person), it obviously misuses the
concept of aggregation, which normally is provided for modeling the
„whole-part” situations.
Note that in the case presented on Fig.50
static inheritance known from UML is inessential (hence dashed lines), because
it is fully substituted by dynamic inheritance. For instance, Student objects
are members of the StudentClass (hence inherit all properties of the class),
but also dynamically inherit from Employee Person objects, which in turn are
member of the EmployeeClass. Hence, during processing of a Student objects the
inheritance mechanism ensures access to properties of a corresponding Person
objects, as well as properties of the StudentClass and PersonClass.
Below
we list several features, which make the concept of dynamic roles different in
comparison to the classical object-oriented concepts.
·
Multiple inheritance: Because roles are encapsulated there is
no name conflict even if the super classes would have different properties with
the same name. There is no need for EmployeeStudentClass,
which inherits both from EmployeeClass
and StudentClass.
·
Repeating inheritance: An object can have two or more roles with
the same name; for instance, Brown
can be an employee in two companies, with different Salary and Job. Such a
feature cannot be expressed by the traditional inheritance or multi-inheritance
concepts.
·
Multiple-aspect inheritance: A class can be specialized according to
many aspects. For example, a vehicle can be specialized according to
environment (ground, water, air) and/or according to a drive (horse, motor,
jet, etc.). Some modeling tools (e.g. UML) cover this feature, but it is
neglected in object-oriented programming and database models. One-aspect
inheritance makes problems with conceptual modeling and usually requires
multiple inheritance. Roles avoid problems with this feature.
·
Variants (unions): This feature, introduced e.g. in C++, CORBA
and ODMG object models, leads to a lot of semantic and implementation problems.
Some professionals argue that it is unnecessary, as it could be substituted by
specialized classes. However, if a given class can possess many properties with
variants, then modeling this situation by specialized classes leads to the
combinatorial explosion of classes (e.g. for 5 properties with binary variants
- 32 specialized classes). Dynamic object roles avoid this problem. Each branch
of a variant can be considered a role of an object.
·
Object migration: Roles may appear and disappear at run
time without changing identifiers of other roles. In terms of classical object
models it means that an object can change its classes without changing its identity.
This feature can hardly be available in classical object models, especially in
models where binding objects is static.
·
Referential consistency: In the presented model relationships are
connected to roles, not to the entire objects; thus, e.g. it is impossible to
refer to Salary and Job of Smith when one navigates to its object from the object School. In classical object-oriented
models this consistency is enforced by strong typing, but is problematic if the
typing is weak.
·
Overriding: Properties of a super-role can be overridden by
properties of a sub-role. The possibilities of overriding are extended in
comparison to the classical object models: not only methods but also attributes
(with values) can be overridden.
·
Binding: An object can be bound by the name of any of its
roles, but the binding returns the identifier of a role rather than the
identifier of the object. By definition, the binding is dynamic, because in a
general case during compilation it is impossible to decide that a particular object
has a role with a given name.
·
Typing: A role must be associated with a name, because this is
the only feature allowing the programmer to distinguish a role from another
one. Hence, the role name is a property of its type (unlike classical
programming languages, where a type usually does not determine the name of a
corresponding object/variable). Because an object is seen through the names of
its roles, it has as many types as it has different names for roles.
·
Subtyping: It can be defined as usual; for instance, the Employee type is defined with the use of
the Person type. However, there is no
sense to introduce the StudentEmployee
type. Due to encapsulated roles, properties of a Student object and properties of an Employee object are not mixed up within a single structure.
·
Substitutability: Since names of roles are determined
within types, it makes little sense to say, e.g. that the Employee type can be used in all places, where the Person type can be used. Thus the
substitutability principle must be at least reformulated.
·
Temporal properties: Dynamic object roles are enormously
useful for temporal databases, as roles can represent any past facts concerning
objects, e.g. the employment history through many Employee roles within one Person
object. Without roles, historical objects present a hard design problem,
especially if one wants to avoid redundancy, preserve reuse of unchanged
properties through standard inheritance, and avoid changing objects’
identifiers.
·
Aspects of objects
and heterogeneous collections. A big problem with classical database object models is
that an object belongs to at most one collection. This is contradictory with
both multiple inheritance and substitutability. For instance, we can include a
StudentEmployee object into the extent Students, but we cannot include it at
the same time into the extent Employees (and vice versa). This violates
substitutability and leads to inconsistent processing. Dynamic roles have a
natural ability to model heterogeneous collections: an object is automatically
included into as many collections as the types of roles it contains.
·
Aspect-Oriented Programming. AOP makes it possible to encapsulate
cross-cutting (tangled) concerns within separate modules. For example, such concerns
are: history of changes, security and privacy rules, visualization,
synchronization, etc. As follows from the previous feature, dynamic object
roles have conceptual similarities with AOP or can be considered as a technical
facility supporting AOP.
·
Metadata support. Metadata are a particular case of
crosscutting concerns. Meta-information, such as authorship, validity, legal
status, ownership, coding, etc., can be implemented as dynamic roles of
information objects.
As follows from the above, dynamic object roles have
the potential to create new powerful qualities, which are difficult or
impossible to achieve in classical object model. More about our approach to
dynamic object roles are presented in the sections devoted to the AS2 store
model and corresponding properties of SBQL.
Encapsulation and information hiding
Encapsulation denotes grouping components and details within some box
and then, making it possible to manipulate this box as a whole, without considering
its internals. Encapsulation is usually associated with information hiding: the
content, structure and implementation of the box is hidden (it is a
“black box”). In the following we use the term encapsulation to denote the join of both concepts. Encapsulation is
not the invention of object-orientedness. It is a general principle of software
engineering that was originally formulated by David Parnas in
The
encapsulation principle states that the
programmer should know about some software entity as much as necessary for
effective using it.
Everything that can be hidden
from the programmer should be hidden.
It is desired from the point of view of not burdening the programmer by
inessential details, as well as from the point of view of reducing the
potential errors in the software. Specification of a software entity should be
separated from its implementation. This feature is necessary to isolate local
changes in implementation of the given entity from the rest of the software.
The separation means that changes in implementation are not influencing the
external semantics of the given software entity. (Unfortunately, the last
assumption is sometimes difficult to assure thus not always holds. Anyway, each
change in implementation require testing of the whole software due to a lot of
undocumented dependencies.) A programmer uses the encapsulated entities on the
“black box” principle, usually without the possibility to see their
internal construction and without the possibility to do anything on them except
what is allowed by their externally specified operations.
Encapsulation and information hiding are the motivation for important
concepts in programming and databases, such as procedure, function
(functional procedure), object, class, method, module, procedure and class library, abstract data type (ADT), application programming interface (API),
database view and perhaps others.
Encapsulation is also the basis for a database query language, which can be
considered an abstract interface to data hiding a lot of data implementation
and organization details.
In object-oriented models encapsulation is considered twofold, as
follows:
·
Orthodox
encapsulation (known from Smalltalk, CORBA and ADT-s): externally, objects are seen
and processed only by methods/operations. Other object features (its state), in
particular all its attributes, are hidden.
·
Orthogonal
encapsulation (known from Modula-2, C++, Eiffel and Java): any property of an object,
in particular its attributes and methods, can be private (hidden for external
access) or public (available for external program entities). Technically, such
a feature is introduced differently: as export
list (Modula-2, Eiffel), as special language construct known as interface (Java) or as special keywords
attached to class/type properties, such as private
and public (C++). In C++ there are
notions being some compromise between private and public, such as protected and friend class.
The orthodox encapsulation is more restrictive as it disallows generic
operations on an object state, such as direct bindings to attributes and
updates of attributes. These constraints concerning the access to object
properties some professionals consider a positive feature of
object-orientedness. In their rhetoric, more constrained access to object
properties reduces the possibility of errors in software. Moreover, such
restrictive encapsulation implies higher abstraction level, as implementation
of an object (i.e. its attributes and their types) is totally hidden for
external access. Hence it is argued that the orthodox encapsulation supports
safety and quality of the produced software.
Because anyway some access to attributes may be necessary, the advocates
of orthodox encapsulation postulate that each attribute attr that is to be available for public use should be externally
substituted by two methods: get_attr
(so-called getter), to read the
attribute value, and set_attr
(so-called setter), to change the
attribute value; the last method has a parameter with the new attribute value. In
this way the class creator ensures full control over external access to the
components of an object state.
We argue that the arguments on orthodox encapsulation are only
apparently true. In our opinion they are based on wishful thinking and are
technically inconsistent with a lot of properties that we require from object
models, especially database object models. We present the following arguments
against the orthodox encapsulation:
·
Inconsistency with
the principle of orthogonality of bulk types: Let the attribute jobs be a collection of all jobs for an
employee. In this case it is not enough to have two methods get_jobs and set_jobs, because we have to perform an operation on every single
job value, not on the collection of values treated as a whole. For instance,
the programmer may need to read or update a particular job value. Frequently in
such situations it is argued that the situation needs the concept of iterator, i.e. generic methods such as get_first_value, get_next_value, check_existence_of_next_element,
delete_current, insert_after_current, etc. Semantic consistency of these operators
requires that some of these methods must return a reference to a value rather
than the value itself. However, references lead to violation of orthodox
encapsulation, because the essence of it is to avoid generic operations on
references to attributes. The above example shows that such assumption is very
inconvenient for processing nested collections.
· Inconsistency with the principle of the semantic
relativity of objects. The principle claims that attributes within object are objects too,
perhaps with own classes and methods. (This principle is relevant only to
objects understood as data structures.) Orthodox encapsulation requires serving
all objects by methods only, but this would be impossible for attributes being
objects because there would be no possibility to identify them to send a
message to them (i.e. to call their methods).
·
Inconsistency with null values
and/or variants. If
attribute attr can take a null value or be present or absent in a given
variant, then a getter and a setter are insufficient. At least two next methods
are necessary, test_attr_if_null and set_attr_to_null.
· Inconsistency with the idea of query languages. Query languages require direct
binding both to objects and to their attributes. This inconsistency has formed
the view that the idea of query languages is inconsistent with
object-orientedness (see e.g.[ Darw95, Date98]). Such a view we consider a nonsense
directly coming from treating the orthodox encapsulation as a principle. There
are a lot of query languages addressing object-oriented models, in particular
SBQL introduced on these pages, that follow the idea of encapsulation.
· Inconsistency with the idea of database schema. A programmer that works on a
database application must see and use objects structures in the form of
objects, attributes and links to other objects. Any artificial warping of this
programmer view will result in lower productivity, lover legibility of
programs, more errors and more difficulties with program maintenance.
· Conceptual limitations and inconsistencies. How setters
and getters have to process pointer links (relationships, associations) that
are set inside objects? How to process some special attributes such as BLOB
(multimedia)? In the last case both getters and setters must work on references
to long values rather than on values themselves. How to write a generic
procedure that processes some data types that can be assigned to many
attributes in many object of different classes, for instance, date? Providing
there is some generic procedure that checks and converts dates, the natural way
is to use a call-by-reference
parameter passing method. The idea of getters and setters excludes call-by-reference applied to attributes
of objects.
As follows from the above arguments, the orthodox encapsulation leads to
serious limitations and inconsistencies, especially in object database models.
Sometimes such understanding of encapsulation results in conclusions that the
idea of encapsulation is wrong and should be forgotten. Such conclusion are at
least strange, taking in account the fact that encapsulation (in different
forms) is the basic principle of software engineering (e.g. modules) and even
more, it is the basic principle of any
engineering (see your encapsulate computer, TV set or headphones).
The only reasonable alternative is the orthogonal encapsulation, where
each property of a class (of an object) can be declared as public and private
(perhaps with some middle options such as protected).
Orthogonal encapsulation makes no problems with defining query languages
addressing powerful object-oriented database models. Attributes that are public
can be served by some generic methods such as binding, dereferencing,
arithmetic and string comparisons and operators, etc. Attributes can be the
subject of updating operations such as assignment and deletion. This feature
requires that binding to an attribute returns its reference rather than its
value. A reference can be used as arguments for the mentioned updating
operations as well as can be used in the call-by-reference parameter passing to
procedures, functions and methods.
Orthogonal encapsulation is not contradictory to getters and setters. If
for any reason the programmer of a class wants to make its attribute attr hidden, he/she declares it as
private and then writes two public methods: get_attr
and set_attr. This can be done with
any attribute of the class, perhaps with attributes that can store null values,
pointer-valued attributes, etc. However, due to generic methods acting on
attributes, getters and setters will be unnecessary in majority of cases.
The concept of module (1972, D.Parnas) is motivated by several software
engineering factors that support software manufacturing and maintenance. They
are the following:
·
Independent development (isolation):
module make it possible to develop the software by many programmers with the
minimal communication overhead that are necessary to synchronize their job.
Each module has an own name space that does not collide with namespaces of
other modules, even if the same names are used for different purposes.
·
Independent compilation:
modules can be compiled independently from other modules. Compiled modules are
joined into an executable program by a special utility (linker).
·
Program store units: modules are independently
stored, administered and maintained by software maintenance and administration
staff.
·
Encapsulation and
information hiding: modules encapsulate and hide the details of the
software units. In this way the complexity of the software is reduced through
subdividing among many software development phases.
·
Comprehensibility:
modules support conceptual modeling of the software through units that are
understandable from the point of view of their meaning, semantics and business
logic.
·
Abstraction: modules allow for
designing and implementing software on several abstraction levels, due to
encapsulation and information hiding.
·
Decomposition: modules decompose
a program into many software units that can be designed and implemented
independently.
·
Safety: modules reduce
the possibility of critical bugs in the software through well defined
interfaces that determine exported functions (through export lists) and well
defined back-end specifications that define possible side effects (through
import lists). Exported and imported features can be strongly (statically)
typechecked.
·
Reuse: modules present
ready-to-use software units that can be reused in many configurations for
different applications.
·
Changeability: it is possible to substitute
a module by another module, providing its specification and functionality is
compatible. The programmer can change a module by another module having
compatible specification and functionality, but different in other aspects
(e.g. performance).
Almost all programming languages allow some forms of modules, but not
all the above qualities are supported. For instance, in C/C++ modules are
equivalent to source code files perhaps with different roles (such program and
header files). Modula-2 is considered as
a mature accomplishment of the idea of modules where almost all the above
qualities are supported. In Modula-2 modules can be nested (unlike C/C++). Each
module has specification and implementation. Implementation is assumed to be
not available externally, it is a „private” part with hidden
details. Specification consists of export list (that is currently known
from Java, ODMG, CORBA etc. as interface)
and import list that specifies
properties of other modules that can be used in the given module. (Unfortunately,
the idea of import list seems to be forgotten.)
Programmers can use exported features of a module, but internals of the
module are hidden for external programmers. External programmers are informed
about side effects of a module through inspecting its import list. The
programmer of the module is constrained to use only those properties of other
modules that are specified on its import list.
Considering object databases, the concept of module should combine two
aspects. From one side, a module should be an encapsulated piece of a source
code, just like in Modula-2 or similar modular languages. From another side, a
module is a database object that encapsulates all the programs and data that
are devoted to some application goal. Note that in databases the situation is
different than in programming languages because of late binding. Many features
(e.g. views, stored procedures) can be dynamically inserted during a run of a
database server, hence the classical subdivision on compilation, linking and
executing is no more valid. In the ODRA system a module is a specialized object
that stores and encapsulates different database or program entities, including
(compiled) classes and their methods, (compiled) procedures, views, proper
objects, etc. Source codes are stored independently as operating system files,
but connected with stored modules by some naming convention.
Last modified: January 10, 2008