©Copyright by Kazimierz Subieta.

Object-Oriented Concepts

by Kazimierz Subieta

Back to Description of SBA and SBQL.


Understanding of object-oriented concepts can be significantly different. There is no common view, in particular, on the definition and meaning of such basic concepts as object, attribute, class, type, collection, association, etc. We will try to present the variety of different views and definitions, however, eventually we would like to promote a reasonable view that we will follow when we define the semantics of a query language. We cannot present and discuss all such views.

In our presentation of object oriented concepts we sometimes criticize some imprecise, superficial or inconsistent definitions. This concerns fundamental concepts as class, type, ADT, interface, collection, extent, encapsulation, etc. We will present our arguments against some definitions and will propose definitions that have no obvious pitfalls or disadvantages.


Objects

In the object-oriented literature "object" has three different meanings:

·       As a real, abstract or imaginable real world object, having its boundaries and state. For instance, a car, a person, an ant, today's weather, yesterday John's travel, Zeus, angel, can be considered objects. This point of view on objects is important for philosophers, businessmen, software engineering methodologists and database designers, but it is inessential for the semantic definitions of query and programming languages.

·       As an abstract imaginable entity considered in software engineering methodologies, languages and tools. Usually there are some associations between real world objects and abstract imaginable objects, but this is not obligatory. Such objects abstract from many features of real objects, focusing on the features that are important for the particular business goal or information system. Such objects are "used" mainly in human reasoning processes, but they are not enough precisely defined to be the foundation of the semantic definitions of query and programming languages. An example are objects considered in UML object diagrams - their features are sufficient for thinking, but not sufficient for developing formal semantics.

·       As a data structure supported by a programming language, a query language or another software tool. It is desirable that such data structures are 1:1 compatible with objects understood as abstract imaginable entities or even with real world objects. However this is not a requirement.

In SBA we are interested in only in objects as data structures, similar to "variables" in classical programming languages. Actually, we do not strive to make the conceptual distinction between a "variable" and an "object". Some professionals make this distinction assuming that objects must be members of classes, which variables are not. This distinction introduces some additional constraint on the object concept that does not always hold. We assume that there could be objects being not members of classes. Thus the distinction of objects and variables on this ground we consider superficial.

Objects understood as data structures need not to be counterparts of some real-world material or abstract objects. Data structures are the subject of various design processes where modeling abstract or real objects is one of the criteria. Other criteria, e.g. performance, scalability, maintainability might be more important and might much misshape the original structure of objects. Moreover, the programmer can assign to an object a state that is essentially different than the state of some real-world object. For instance, the programmer can assign to an object "car" the entire history of the car that is interesting from the point of view of some decision processes. A real-world object "car" is presented only by its current state.

Usually we must assume that objects as data structures are incompatible for different languages or tools. For instance, Java objects, C++ objects and CORBA objects differ in conceptual and technical details thus there is no direct portability between them.

From the programming point of view it is important that each object has clearly determined boundaries and structure. Thus, an object can be manipulated as a whole, e.g. created or deleted. The structure of an object is usually determined by its type. However, we do not associate a type with an object from the very beginning because there is several object-oriented languages and systems that have no types. Of course, types are very important especially for static (compile time) type checking of queries and programs addressing objects. Note that clear boundaries of an object can be considered as a major factor distinguished object-oriented systems from others. In particular, in relational databases objects are dispersed among many tables, boundaries of objects are not supported by the system, thus objects cannot be manipulated as wholes. Similar remarks concern Resource Description Framework (RDF) by W3C, which is fact-oriented rather than object-oriented.

 

Object Identity, Identifier, Reference

An objects has its identity, i.e. it is recognizable independently of its current state. Concerning material objects an identity is a philosophical category related to humans’ understanding of the real world. This point of view, however, is not interesting for objects understood as data structures. For this case objects have simply object identifiers (OID-s) that are considered unique, at least within some programming or database environment. OID-s are usually assigned to objects automatically thus have no meaning in the real world. However, there are object-oriented systems where an OID bears some business information too. It is possible that the same object may have more than one OID, but such a feature is error prone thus not recommended. Object identifiers in some contexts are also called references. The possibility to receive a reference to an object is crucial for a lot of features in a corresponding query or programming language. For instance, an object reference is necessary as an l-value in updating statements, as an argument of a delete statement, as a parameter of a procedure transmitted through the call-by-reference parameter passing method, etc. The programmer almost never uses a reference explicitly. It is the result produced by a query or programming language mechanism known as binding, or by other mechanisms. A reference can also be used as a pointer value, i.e. a link leading from one object to another. A pointer is understood as an object identifier stored within some another objects (as a value). An object identifier is immutable, i.e. cannot be changed by any programming option. More precisely, if an object identifier is changed, it is equivalent to the situation that the object is deleted and at the same time a new object with the same state is created.

There is an important intention behind object identifiers concerning the performance of the programming (or database) system. If one knows an object identifier then access to the corresponding object is to be very fast. This property of object identifiers is crucial e.g. to use them as non-key values within indices. Very frequently object identifiers are simply machine addresses (disk addresses, in particular). Such a solution, however, has disadvantages concerning the maintenance of the storage space in case when objects can grow, shrink and move. Some authors postulate to use symbolic object identifiers (having no associations with machine addresses), but this solution has also disadvantages (for instance, very large hash tables). Several tradeoffs between these two extremes are known in the literature.

 

Object Name

Objects have names that are assigned by software designers, database designers or programmers and used by programmers to identify them within the programming environment. Names usually bear some external world semantics, for instance “customer” “car” or “account”, but some meaningless names, such as “x” or “J23” are also possible. Similarly to names of material objects, object names need not be unique even in the same environment. For instance, there could be many objects named “customer”. Concerning typical object-oriented programming languages, however, within the same environment object names must be unique. Objects with the same names have conceptual intersection with a collection or can be considered as an equivalent of a collection. Allowing the same name for many objects is the property of XML.

Note that in many programming languages object names are second-class citizens i.e. they are properties of source program code rather than run-time environment. However, the second-class citizenship of object names is irrelevant for conceptual modeling – it can be considered as purely optimization issue. In object-oriented databases one must assume the first-class citizenships of object names, because objects exist independently of any programs.

As for material objects, objects may have several names (known as synonyms). Usually the programming languages and discipline avoid such freedom. However, the possibility to assign many names to objects opens new possibility for conceptual modeling; for instance, the same object can be simultaneously named “person”, “employee”, “student” and “patient”. The disciplined version of such a possibility does not treat such names as synonyms, but as roles of an object. The same object is perceived and processed differently depending on a role name used by the programmer.

 

Attributes

Objects are usually containers of named values known as attributes. Values of attributes form an object state. Attributes represent some properties of objects. There are several kinds of attributes (not always disjoint) that occur in different languages and systems, in particular the following ones:

·      Atomic attribute that is characterized by a single atomic value of some type. For instance, the value is “Doe” for the attribute named NAME of an object named PERSON.

·      Complex attribute that consists of several atomic or complex sub-attributes.

·      Optional attribute that may or may not occur in particular object. Optional attributes has some conceptual connotations with null values, known from relational databases.

·      Repeating attribute i.e. an attribute that may possess many values, in many cases with unpredictable number of occurrences.

·      Binary large or multimedia attribute that is an atomic attribute, but very large size of its value requires special treatment during processing.

·      Pointer attribute that contains as a value an OID of some object.

Some combinations of these kinds are possible, for instance, a repeating complex attribute. In a system with types each value of an attribute has determined its type.

There are also some other attribute kinds that are actually some database or programming abstractions, in particular:

·       Default attribute: it is taken for the given object when a corresponding optional attribute is absent.

·       Derived attribute, which value is calculated on-the-fly during accesses to it. A derived attribute is equivalent to a method attached to the corresponding object.

·       Class attribute: an attribute that is stored within a class of an object and is the same for all the objects of this class.

·       Procedural attribute: an attribute which value is some executable code.

·       Dictionary attribute: an attribute which value is a reference to some predefined set of values known as dictionary. A dictionary may contain atomic or complex values. If a dictionary is predefined and cannot be updated then the attribute type is know as enumeration. A dictionary can also be updatable.

This list of possible kinds of attributes is probably not complete. Such classification of attributes may have some meaning for explaining the conceptual structure of objects. However, if we treat an object as a data structure we take the position that the distinction between objects and attributes is superficial, unnecessary and leading to extra complexity of speciofication, implementation and documentation. The distinction was made in the ODMG standard, where attributes were considered literals having no identities. Such a point of view is correct when we consider objects on the conceptual level, but it is a nonsense when we treat objects as data structures that have to be updated.

It is more reasonable to consider attributes as objects nested in other objects. The semantic relativity of objects claims for the same syntax, semantics and pragmatics of language constructs for all the object nesting levels. When we consider conceptual objects, updating them is usually not a relevant issue. This point of view is unacceptable for objects treated as data structures, because in this case updating of any its atomic or complex part is inevitable. Hence the total internal identification principle that claims for internal identifier for each, even smallest and atomic part of an object. An internal identifier is necessary as a reference being the heart of updating constructs. Taking both these claims, the best and most universal assumption is that a complex object consists of objects, which may consist of objects, etc. A consequence of this assumption is the existence of objects that have no attributes but only an atomic value. Each external and nested object has the same general properties, i.e. internal identifier, external name and some value, perhaps complex or encapsulating some abstraction. From the point of view of query/programming language all objects, including nested ones, are to be served by the same binding mechanism. In this setting the concept of attribute is unnecessary. So far, we did not discover any serious disadvantage of such assumption.

 

Operations and Methods

Any operation on an object is performed by some language construct that in object-oriented models is called method. Typical object-oriented literature couples all the methods that are relevant for some objects within classes. However, as we said before, not all object-oriented models and systems deal with classes. Hence our view on methods is wider. We subdivide operations on an object into two groups:

·       Generic operations that are build-in for the given language.

·       Added-on operations that are defined by a software designer or programmer and somehow associated with an object. The storage for added-on operations is usually called a class, but the nature of classes is complex thus we discuss it separately.

It is possible to build an object-oriented language or system having only generic operations and no classes. Moreover, it is impossible to build an object-oriented system having no generic operations, even if the system introduces classes. Several generic operations are inevitable, in particular, binding (changing object name into object reference), creating, updating (so-called setter), deleting, inserting, and perhaps others. A very shallow view on encapsulation and demagogic rhetoric claim for no generic operations on objects – everything must be done by user-defined methods. This is however, technically inconsistent and based on some mental and technical shortcuts. We return to this issue when we discuss the concept of encapsulation.

Several authors (in particular, Darwen and Date) claim that lack of generic operations, such as binding to attributes, makes it impossible to build a query language acting on an object-oriented database. We strongly disagree. Their arguments concern not object-orientedness as such, but some community of professionals that are trying to promote object-orientedness in an ideological, shallow and inconsistent way. In purely technical terms (out of any shallow ideology) there is no contradiction between object-orientedness and query languages. On these pages we present SBA - the most universal, consistent and formal methodology of constructing query languages addressing any object model, starting from very simple and ending at very sophisticated one.

Because of the object relativity principle some objects, especially atomic ones, have predefined operations. For instance, an object having the type integer has predefined operations such as +, -, *, /, <, =, >, etc. Such an object is considered the same being as “variable” known from classical programming languages.

An important generic operation on an object is binding to its subobjects (attributes). There is some ideological tenet that such a binding is improper for object-orientedness and should be substituted by two methods: some getter (returning the value of an attribute) and some setter (assigning a new value for an attribute). It is not clear why such a doubtful ideological assumption is promoted, the arguments in favor are religious rather than technical. The assumption, however, is justified in the standard CORBA, because this standard is based on adapters (wrappers) on the server side that “understand” only methods. The assumption is not relevant for object-oriented databases; we will return to this issue when we discuss encapsulation.

 

Links (relationships, associations) Between Objects, Pointers

The entity-relationship model by P.Chen has introduced the concept of relationship as an important conceptual modeling feature. Similar features are introduced in modern object-oriented methodologies and notations, in particular in UML as associations (we use this term in the following). On the conceptual modeling level associations are depicted as lines connecting symbolic representation of real-time entities or classes. These lines are named, for instance, the association connecting entities Person and Company is named worksFor, Fig.44. Beside this name, UML introduces names of roles that are attached to ends of a line, for instance a role Employee attached to Person and the role Employer attached to Company. Associations can be binary, i.e. they connect two entities or classes, or can be n-ary, n > 2, when they connect more than two entities or classes. Binary associations are additionally qualified by cardinalities, i.e. symbolic description of the number of occurrences of instances on both sides of an association. In some proposals associations can be qualified by attributes (CORBA Relationship Service, OMT) or by a class (UML), e.g. EmplymentDetails in Fig.44.

 

Figure 44. Association worksFor with attached roles, cardinalities and class EmploymentDetails

 

We do not describe the issue in more detail, as it is the subject of many popular textbooks. For us, the only interesting thing is how associations will be represented on the level of object-oriented data structures. While (strange enough) updating is not considered an issue on the conceptual level, it is necessary for any data structures, including machine representation of an association. The problem has not been investigated by many researchers. To the best of our knowledge, the CORBA Relationship Service is one of few solutions (if not unique). The solution is very sophisticated and clumsy. It much reminds the old CODASYL DBTG standard for the network model, which has eventually lost in competition with the relational model.

The most straightforward method is to consider machine representation of an association instance as a structure of references leading to proper objects. For example, Fig.44, an instance of the worksFor association will be considered as a pair <iP, iC>, where is a reference to a Person object and is a reference to a Company object. This approach is quite universal, allowing to deal with attributes of associations and n-ary associations. However, when we assume that associations can be decorated by attributes or classes (as in UML) it will be very difficult to distinguish (on the level of data structures) such an association instance from an object. At least, it seems that making such a distinction is unreasonable from the point of view of programmers, as it leads to two different notions – objects and association instances – having practically no structural, semantic and pragmatic differences.

In conclusion, on the level of data structures it makes sense to leave only binary associations with no associated attributes or classes. We propose to name them links. The most natural and efficient way to implement links is to consider them as references (actually, pointers). Such an approach minimizes the number of concepts, is easy for updating of links, is easy in implementation and for query optimization. The approach is also quite universal, as any n-ary association with associated attributes or a class can be reduced to an additional class with n links. The approach is assumed for the ODMG standard and in our opinion this was very reasonable decision. In Fig.45 we show how ternary association Deal with attributes joining Buyer, Seller and Broker classes can be changed into an additional class Deal and three binary associations (i.e. links). In Fig.45 we show the same on the level of data structures.

 

Figure 45. Changing a ternary association Deal into a new class Deal and three links

 

Figure 46. Changing two instances of a ternary association Deal into two additional object Deal and binary links

 

Hence, we advocate to use as a representation of links the well known concept of pointer. From the programmer point of view, a pointer is a quite easy notion. Pointers are an excellent facility for navigation in object-oriented databases and in the following we will show how this is utilized in SBQL. Updating of association instances is till now unknown feature, while updating of pointers does not cause essential conceptual, implementation or pragmatic problems. An object-oriented data structure with pointers representing links is depicted in Fig.47.

 

Figure 47. Objects linked by pointers worksIn, employs and Boss

 

There is some bad fame around pointers. Pointers in C/C++ (more specifically, pointer arithmetic) are considered error prone. In our case pointers are leading to objects rather than to memory addresses, hence they have little in common with C/C++ pointers. Thus there is no reason that such pointers have to be error prone. Another bad fame concerns so-called dangling pointers, i.e. pointers that lead to an improper place or non-existent object due to removing some objects. Careful design of this feature (proper garbage collection or so-called backward pointers) makes it possible to avoid this danger. Another bad fame, promoted by advocates of the relational model, is that pointers decrease data independence and should be removed (changed into values) during the process of “normalization”. We strongly disagree with this false stereotype. Because data independence is not defined in technical terms (there is no objective measure of it), it is a matter of rhetoric abilities or some demagogy what does it mean “decreasing data independence”. But for sure, pointers, as implementation of associations, decrease the distance between conceptual and implementation models, thus much support conceptual modeling. Moreover, they support performance by reducing demands on costly join operators (as will be shown) and much simplify queries by extending capabilities of path expressions.

Pointers can be defined alone, as Boss in Fig.47, or can be paired into twins, as worksIn and employs. Pairs of pointers are the subject of a special integrity constraint saying that if one from the twin pointers is updated, another twin is immediately updated too. This is well described in the ODMG standard in the part devoted to the C++ binding. For instance (Fig.47), if one updates the worksIn pointer within the Doe object and the updating concerns moving Doe from the Syntex to the Poltex company, then the twin pointer employs within the Syntex object is removed and the pointer employs leading to the Doe object is inserted into the Poltex object.

Cardinalities of an association belong to typing information. We will return to this issue when we consider types.

 

Classes

The concept of class is an abstraction in thinking, understanding the world and programming, which intention is to capture the structure of objects as well as operations (methods) that are attached to objects. Classes are perhaps the oldest mental abstraction invented by humans and exist in all known human tongues usually as nouns denoting abstract concepts such as animal, stone, food, fire and home. Abstract concepts denote some “ideals” that have many concrete instantiations.

There are many definitions of the class concept, as well as many misunderstandings. A class is frequently confused (or identified) with a class extent (i.e. currently stored collection of objects being members of the class), with an object domain (i.e. infinite collection of all objects that can be potentially members of the class), with a type, with an abstract data type and with an interface. Unfortunately, in general all these concepts are not synonyms and in different contexts have different meaning (sometimes – fundamentally different meaning). Additional complication to the class concept is introduced by the fact that classes may have three incarnations: (1) as pieces of source code; (2) as elements of some metamodel (i.e. some internal data structure bearing information stored in classes); (3) as a run-time entity that participates in evaluation of expressions, queries or programs.

In this text we make some order in definition of classes, types, extents, domains and interfaces by assigning to these concepts the most expected pragmatic roles. However, we are fully aware that our definitions can be incompatible with a lot of various sources where these concepts are defined and discussed. For instance, in OMG UML/OCL class is equivalent to type. Such unification can be justified by the necessity of reducing conceptual modeling notions, but it is not reasonable for query/programming languages.

The most essential distinction of the class concepts from concepts such as types, interfaces, extents and domains is that classes contain essential program code, i.e. implementation. Hence, classes can be the subject of trade (because manufacturing of them is usually costly), while types, interfaces, domains and extents usually cannot occur in this role.

We assume the definition of a class that is most common and most relevant to all popular cases of object-oriented modeling tools and object-oriented programming languages.

A class is an entity that stores invariants of objects, that is, properties or features that are common and constant for each member of some population of objects.

The relationship between a class and objects is usually called membership. An object being a member of a class is equipped in all the invariants that are stored within the class.

Invariants concerning a particular object can be stored in several classes, which create some structure known as inheritance or generalization/specialization. Conceptually, more general classes contain more general invariants, which (usually) concern richer population of member objects. In many programming cases classes are simply some sets of invariants. In conceptual modeling classes usually bear essential (business) semantics that correspond to human understanding of some universe of discourse.

Two kinds of invariants that are stored within classes are the most common:

·       Structure of member objects. Usually a class determines names of objects’ components (attributes) and their types.

·       Methods (sometimes called operations or behavior) that can be executed on objects. Specification of methods include typing information (types of parameters and a type of a returned result), implementation code and possible exceptions. Some authors postulate additionally to introduce some constraints on methods, such as pre-conditions (constraints on a state and parameters when a method execution is started) and post-conditions (constraints on a state and parameters when a method execution is terminated).

Some explanations takes the point of view that classes are blueprints for object, i.e. determine the object’s conceptual structure. This point of view, however, does not take into account many other aspects of classes.

There are many other invariants that can be stored inside classes:

·       Objects’ name. A class determines the “external” name of all objects being members of the class. This is implicitly assumed in several proposals, including UML, the ODMG standard and the CORBA standard. Indeed, if we consider a schema of object instances, the name of an object is the major conceptual information. In programming languages, however, a class does not determine a member object name. This follows from two different views on programming: data centric (specific to databases and business applications) and program centric (specific to object-oriented PLs). In the data-centric view objects have a predefined name determined by a class. When a class determines the object name, some issues are simpler, e.g. how to type pointer links.

·       Persistence status. In some proposals a class bears information if member objects are to be persistent (e.g. stored in a database) or not. In proposals and systems based on the orthogonal persistence classes do not involve this feature; persistence is determined e.g. during object creation.

·       Links (associations, relationships) with member objects of the same or (usually) other classes. Links are usually named, belong to the structure of objects and typed accordingly to the objects’ type system.

·       Interface, export list, private/public declarations or other means that subdivide properties of classes and their member objects into private and public (sometimes with some tradeoffs like protected). The consequence of the subdivision concerns scoping rules for names of particular object properties (known also as encapsulation or information hiding). Names of public properties can be used externally. Private properties can be used only internally by program entities (usually methods) stored inside classes.

·       Default values for object attributes. They are used when a new object is created or assigned when an attribute takes a null value.

·       Allowance for null values. Null values can be considered properties of attributes’ types.

·       Events or exceptions that can occur within the methods of a class, and code blocks (triggers) that are provided to react on them.

·       Integrity constraints that restrict possible object states (or possible state changes) and provide reaction on operations that violate the constraints.

·       Derived attributes that are to be dynamically calculated during run time; they can be also considered as functional methods.

·       Many other information, including icons for visualization of objects, security, safety, privacy and business rules, ontological information, help information, etc.

A class can also store some information that is invariant not just to the class members, but to a class extent, i.e. to the collection of currently stored members. This concerns so-called static attributes and static methods. For instance, the attribute number of objects is an information n the entire population of class members, and the method average age of employees concerns the entire population of the members of the class EmployeeClass. In general, this feature can be questioned, since it mixes up different levels of discourse: the level of objects and the level of collections of objects. More clean situation we obtain when for collections of objects one would define a special class, known as power set of class. For instance, we can have a class EmployeeClass and a power set of class EmployeeCollectionClass, where we store all static attributes and methods.

An essential issue concerns class citizenship. A class has frequently the second-class citizenship, what means that it exists in a source code rather than in runtime. This concerns majority of object-oriented programming languages. For object-oriented databases classes should exist as run-time database entities, i.e. they should have first-class citizenship. This makes it possible to introduce many capabilities that are typical for database management systems, such as dynamic inserting some methods to a class (e.g. virtual attributes), dynamic inserting constraints or business rules, changing default values for object creation, etc. A class, as first-class citizen, is stored in some environment (e.g. in a databases) as a special object and can be processed by typical capabilities, e.g. by a query language. Some mixed solutions are possible when some properties of the class exist only in its source code (e.g. information on attributes) and some other exists in source code and are accessible and can be mapipulated during run time (e.g. methods).

 

Class Hierarchy and Inheritance

A class can be built according to formal factoring out invariant properties from some population of objects. Frequently, however, classes have some conceptual meaning for the business domain, as a general abstraction that is used by designers or programmers to reflect properties of some population of objects. Such classes has strong relationships to the domain of discourse and correspond to the system of notions and concepts that is developed for the domain. For instance, some designer has developed a class Student with attributes firstName, lastName, dateOfBirth, student#, yearOfStudy, faculty, and a class Employee with attributes firstName, lastName, dateOfBirth, employee#, job, salary, company#. Then, he/she realizes that these classes are something in common: conceptually, both kinds of objects are special cases of an object Person, and technically, attributes firstName, lastName, dateOfBirth are the same in both classes. Hence, he she decided to create the class Person with attributes firstName, lastName, dateOfBirth and establishing a special kind of relationship between classes Person, Student and Employee, as shown in Fig. 48. This relationship is called inheritance.

 

Fig.48. Class hierarchy and inheritance

 

Inheritance introduces some structure to classes. The number of a class hierarchy level is usually unlimited (although for human perception it should not be too big). Inheritance means that during processing of objects of the class Student the programmer can use all the properties of the class Person. Inheritance reduces the amount of conceptual information: the properties from higher-level classes are reused by lover level classes. This also concerns code, for instance in Fig.3 the code for the method age is reused by classes Student and Employee.

The hierarchy of classes is a graph without cycles. Multiple inheritance means that a class can inherit properties of more than one classes. For instance, the Employee class can inherit not only from the Person class, but also from a TaxPayer class. Multiple inheritance is popular for modeling business domains, but it is less popular for programming languages. In particular, Java has single inheritance only. This limitation can be treated as not severe, however it increases the distance between conceptual and implementation models.

Multiple inheritance leads to the problem if properties inherited from different classes have the same name. There are some methods to resolve this situation (e.g. in C++), but in general there is no good solution. Name conflicts leads to violation of substitutability and/or open-close principles, the basic principles of object-orientedness.

Inheritance usually determines one aspect among several possible aspects. For instance, employees can specialized according to profession, according to experience, according to gender, according to education, etc. In some real situations two or more aspects can be considered. Multiple aspects lead to multiple inheritance and/or to explosion of permuted classes. Multiple aspect inheritance lead to the concept of object roles and dynamic inheritance.

Another disadvantage of inheritance concerns repeating inheritance. One can easily cope with the situation when a Student class inherits from a Person class, but this simple dependency does not work if we want to cover the case that one person can simultaneously study at more than one faculty or university. In this case the inheritance must be substituted by association, what makes the model more complex to understand and manipulate (e.g. no substitutability). As previously the issue can be resolved by dynamic object roles and dynamic inheritance.

 

Substitutability and Open-Close Principles

The substitutability principle is formulated as follows. If subclass B inherits from a superclass A, then an object b being an instantiation of B can be used in all places of a query/program in which object a being an instantiation of A can be used. For instance, an object Student can be used in all places when the object Person can be used, because a class StudentClass inherits from the class PersonClass. Although this principle (known also as LSP, Liskov Substitutability Principle) seems to be obvious, it must be taken with care, at least for three reasons. The first concerns updating, where straightforward application of this principle leads to anomalies or (at least) different semantic interpretations. The second case concerns parameter passing, where the requirement of strong type checking leads to a long (and a bit academic) dilemma concerning co-variance and contra-variance. The last, most difficult case concerns the case when an object name is a property of its type (this does not occur in programming languages, but is a strong rule in database schemata). For such a case the object-oriented model based on substitutability makes little sense and must be superseded by some more general model. Substitutability, together with the open-close principle, is also contradictory to the straightforward concept of collection, fundamental for databases. From these three concepts you can take any two, but not the three together. The alternative for substitutability is an object model with dynamic object roles.

The open-close principle is formulated as follows. Each class can be closed for modifications, but it is still open for specializations. The principle is the basis for program reuse and method polymorphism. For instance, we can buy within some compiled library a class Person, together with all the methods that are implemented for the class. We have no possibility to change anything in this class, because its source code is unavailable. Nevertheless, we can specialize this class by classes Student, Employee, Customer, etc. Each specialization inherits the implementation that is already done within the class Person and augments the implementation by new methods, specific for the given specialization. Moreover, each specialization can override some method implemented within the class Person by a specialized method implemented within this specialized class and having the same name (plus compatible parameters) as the name of the method within Person.

As we have noticed before, the open-close principle is contradictory to the substitutability principle and the straightforward concept of collections of objects.

 

Interfaces

Standards and languages such as CORBA, ODMG, Java, COM/DCOM and C# introduce the concept of interface, which contains the full information on properties of objects and classes that are available as retrieval or manipulation capabilities that can be used by programmers. Usually an interface corresponds to a class. An interface has a conceptual meaning for the programmer, as it specifies with the necessary precision what an object of the given class contains and how it can be manipulated. An interface does not present all the knowledge on a corresponding class.

The fundamental difference between an interface and a class is that a class can be the subject of trade (because it contains an executable code), while an interface cannot; it can be only published.

One interface can be associated with many implementations, for instance, different classes offered by different companies. Similarly, one class can be associated with many interfaces. This feature can be used to restrict rights of particular users to use properties of classes.

Although very similar, the concept of interface is different from the concept of type. The main difference concerns the pragmatic roles: interfaces are to inform the programmer on the structure of objects and possibilities to manipulate them, while types are constraints concerning the contexts in which the corresponding objects can be used in programs. There could be also some syntactical and functional differences, for instance, an interface can specify exceptions or run-time constraints that are not relevant to types. Moreover, there are types that are independent from interfaces, for instance, atomic types. Types can be defined recursively, interfaces cannot.

Interfaces can be considered as the specification of public (exported) properties of classes. However, inheritance of interfaces has different properties than inheritance of classes. Actually, inheritance of interfaces is based on operations on the text: if interface A inherits from B it means that the text B is somehow included into the text A. This may or may not be associated with the inheritance of implementation.

Current popular concept of interfaces usually neglects a very important feature, namely, side effects that can be done by a class. Side effects concerns everything that can be retrieved (passive side effects) or updated (active side effects). Checking side effects is crucial for reliability of programs thus the programmers should be fully aware of them. This was the idea of import lists in Modula-2 and other languages. Unfortunately designers of currently popular programming languages treat this feature as secondary.

 

Types

In many sources there is minor difference between classes, interfaces and types. Often, these concepts are unified by some common syntax, e.g. declaration of classes. However, from the pragmatic point of view, the concept of type is different from the concepts of classes and interfaces. The main pragmatic role of classes is to keep implementation of methods and determine objects membership. The main pragmatic role of interfaces is to specify all the possibilities that programmers can use to retrieve and manipulate objects. Types are determined by interfaces, but types can also exist without interfaces. Moreover, interfaces specify only public properties of objects, but type checking must concern also private properties. Hence the following main pragmatic role of types:

Types are intended as constraints on the construction and behavior of any program entities (in particular, modules, objects, values, links, procedures, methods, etc.) and static constraints on the query/programming context in which these entities can be used.

A static constraint concerns the source program and is checked before the program is executed. Some programming languages, e.g. Smalltalk and Ruby, offer the possibility to check types during run time. However, dynamic type checking is less reliable than static type checking, because some type errors may not be discovered during testing. Additionally, dynamic type checking causes that programs are slower, because type checking is an additional run time activity. On the other hand, programming languages without types or with dynamic type checking are much easier to develop and implement.

The object-oriented literature contains many patters concerning static typing systems. For instance, interfaces written in IDL (interface Definition Language) of the CORBA standard contain mainly the typing information. However, making consistent and useful typing system is not an easy task. For instance, the typing system assumed by ODL (Object Definition Language) of the ODMG standard is criticized as inconsistent thus non-implementable.

The typing community attempts to make an impression that the issue of types is exhausted: everything what is necessary in this respect is already discovered and investigated. The history of strong typing has more than 30 years and evolved from the Pascal type system based on name equivalence up to modern concepts based on structural equivalence and inclusion/parametric polymorphism. There are thousands of papers devoted to types and a lot of strong mathematical theories. Many of them deal with bulk types (collections) typical for databases and database programming languages, including query languages. There are also many practical proposals of typing systems implemented in popular object-oriented programming languages (e.g. C++ or Java) as well as in research prototypes, e.g. PS-Algol, DBPL, Galileo, Napier-89, Fibonacci, and many others. An overview of these proposals from the strong typing perspective can be found in [AtBu87, Atki95].

Experts in strong typing usually adopt the simpleminded notion that a type is a (possibly infinite) set of values and that a variable of that type is constrained to assume values of that type [AtBu87]. Mathematically, the notion is clear and (what is important for formal research) makes it possible to develop very advanced type theories with a lot of formal properties and theorems. However, there are features of query/programming languages and environments that make it difficult to adopt the above notion. We subdivide these features into four groups:

·      Irregularities in data structures, such as null values (optional data), repeating data (collections with various cardinality constraints), exclusive variants/unions, pointer links between data, unconstrained data names (e.g. CORBA attributes in XML), and perhaps others. Such irregularities make the strong typing much more difficult (or even impossible) and may cause the necessity for shifting type checking to run time.

·      Ellipses, automatic coercions, automatic dereferences and other options of query/programming languages. Such features are introduced by designers of database query/programming languages to increase user friendliness of programming interfaces and to relieve the programmers from annoying, too formalistic and too verbose style of writing queries/programs.

·      Features of types not covered by above simpleminded type notion, such as mutability, collection cardinality constraints, collection types (set, bag, sequence, etc.), type names (for name type equivalence), methods/procedures/functions, and perhaps others.

·      Other properties of the programming environment, such as interfaces, classes, abstract data types, inheritance and multiple inheritance, dynamic object roles and dynamic inheritance, modules, export and import lists, scoping rules for names occurring in queries, etc.

Although the literature contains explanations of some notions (e.g. multiple inheritance [Card84], or abstract data types [Mitc88]) in general, such explanations suffer as a rule from two kinds of drawbacks:

·      They assume a very strict formal model which hardly meets practical situations;

·      They isolate some particular aspect from other aspects of the strong typing problem. In practice, however, all aspects must be considered together and no such isolation of a particular aspect is possible.

Such papers are like islands in the sea of decisions which the developers must take during development and implementation of a type system. As our experience has shown, typing environments for object-oriented or XML-oriented databases and their query languages present so many peculiarities that the ideas presented in these papers are not much helpful and in most cases inapplicable.

Roles of a typing system are the following:

·      Compile-time type checking of query operators, imperative constructs, procedures, functions, methods, views and modules;

·      User-friendly, context dependent reporting on type errors;

·      Resolving ambiguities with automatic type coercions, ellipses, dereferences, literals and binding irregular data structures;

·      Shifting type checks to run-time, if it is impossible to do them during compile time;

·      Restoring a type checking process after a type error, to discover more than one type error in one run;

·      Preparing information for query optimization by properly decorating a query syntax tree. Without a typing system major query optimization methods, such as query rewriting and using indices, are much more difficult or impossible to develop.

In general, we must distinguish internal and external type systems. The internal type system reflects the behavior of the type checking mechanism, while the external type system is used by the programmer. A static strong type checking mechanism simulates run-time computations during compile time by reflecting the run-time semantics with the precision that is available at the compile time. The internal type system operates on signatures, i.e. internal representation of types. Signatures are additionally associated with attributes, such as mutability, cardinality, collection kind, type name, multimedia, side effects, etc. For each query/program operator a decision table is provided, which determines allowed combinations of signatures and attributes, the resulting signature and its attributes, and additional actions. Such a type mechanism will be described in next chapters in detail.

 

Polymorphism

In the object-oriented literature the term polymorphism has two different meanings. The strong typing community defines type polymorphism as the ability of writing procedures that accept parameters of different types (possible, infinitely many types) and returns an output of many types. Such a feature is necessary for generic programming, i.e. programming with unknown types. For instance, one wants to write a sorting procedure that sorts everything, independently on the type of the collection element. Generic programming is difficult or impossible in statically bound and strongly typed languages, thus the demand for new, more flexible typing systems. Two most known type polymorphism is inclusion polymorphism (equivalent to the substitutability principle) and parametric polymorphism, where types or classes are parameterized by types. Parametric polymorphism is introduced in such programming languages as SML and Quest; however, actually the idea is advocated within narrow academic circle, probably with no chances for implementation in widely used programming languages. An alternative to parametric polymorphism is linguistic reflection, which means the possibility to write a program that generates a program that can be immediately executed. Linguistic reflection is easy to implement, especially in script languages, and it is fully universal concerning generic programming. Unfortunately, it undermines strong typing. Reflection is known from such facilities as dynamic SQL, ODBC, JDBC, DII of CORBA and others.

In the second meaning polymorphism means the possibility to create different methods with the same name and signature. Each such a method is assigned to a specific class and the binding is to be dynamic: after recognizing the class of an object during run time, a corresponding method is executed. For instance, in an abstract class PersonClass there is a method Income, perhaps with no body. In specialized classes EmployeeClass, StudentClass and OldAgePensionierClass there is a method Income, each with an own (different) code. When message Income is directed to an object x, then the system recognizes to which class x belongs, and then, invokes a proper Income method.

In the second meaning the polymorphism is considered an important facility for abstraction and code reuse. For instance, the method Income can be designed on the abstract PersonClass level and a programmer that works on this level can use it without thinking about specialized classes. On a lower abstraction level each specialized class must implement this method according to the business logic of the class.

Some authors confuse the above two meanings of polymorphism or (hopelessly) try to unify them.

 

Dynamic object roles and dynamic inheritance

In a popular object-oriented literature we meet frequently a statement “a student is a person” emphasizing the obvious fact that for a given time moment a student is always of person. Such a statement expresses static inheritance, that is, inheritance that is independent on time (or inheritance which neglects the time). However, from the point of view of conceptual modeling, considering some time scale, it is much better to say that “a person becomes a student”. Indeed, a person takes the role of a student only for some time, then after that time a person is no more a student. This is the case of dynamic inheritance which is dependent on time. Equivalently, we can say that a person for some time takes a role “student”. At the same time, a person can take many other roles.

The idea of dynamic object roles and dynamic inheritance assumes that a real or abstract entity during its life can acquire and lose many roles without changing identity. The roles appear during the life of a given object, they can exist simultaneously, and they can disappear at any moment. For example, a certain person can at the same time be a student, a worker, a patient, a club member, etc., Fig.49. Similarly, a building can be an office, a house, a magazine, etc.

Fig.49. Roles of an object

Dynamic object roles are useful both for conceptual modeling and for implementation. The concept could much facilitate such modeling tools as UML and could be an important paradigm on object databases. Note thaa we do not postulate to replace static inheritance by dynamic inheritance with roles. Static inheritance is useful in majority cases. Dynamic roles and dynamic inheritance can complement static inheritance in many important cases.

Dynamic object roles have had for several years the reputation of a notion on the brink of acceptance. There are many papers advocating the concept. On the other hand, many researchers consider applications of the concept not sufficiently wide to justify the extra complexity of conceptual modeling facilities. Moreover, the concept is neglected on the implementation side - as far as we know, none of popular object-oriented programming languages or database systems introduces it explicitly. Some authors assume a tradeoff, where the role concept is the subject of special design patterns, applied both on the conceptual modeling and the implementation sides. The disadvantage of design patterns is that this simple notion is substituted by a combination of other notions (e.g. aggregations), which actually means warping the original concept of a modeler, designer or implementor.

The role concept assumes that an object is associated with other objects (subobjects), which are modeling its roles. Object-roles cannot exist without their parent object. Deleting an object causes deleting all of its roles. Roles can exist simultaneously and independently. A role can have its own additional attributes and methods. Two roles can contain attributes and methods with the same names, and this does not lead to conflict. This is a fundamental difference in comparison to the concept of multiple inheritance. Relationships (associations) between objects can connect not only objects with objects, but also objects with roles and roles with roles. For example, a relationship works_in connects an Employee role with a Company object. This makes the referential semantics clean in comparison to the traditional object models. Roles can be further specialized as subroles, sub-subroles, etc.; e.g. Club_Member can be can be specialized by a role Club_President.

The role concept requires introducing composite objects with a special structure, semantics and generic operations. In the following we describe the structure formally and present assumptions of a query/programming language supporting generic operations to process such structures. Our idea to deal with dynamic roles in a query language is based on the stack-based approach and is probably the only current approach which can naturally adopt the concept.

We assume that an object can contain many sub-objects called roles. These subobjects can be inserted and removed at run time. Roles have different types. A role has own attributes and behaviour. Identical names in two or more roles of different types do not imply any semantic dependency between corresponding properties. For example, a person can play simultaneously the role of an employee of a research institute with the attribute Salary, and the role of an employee of a service company with the attribute Salary too. These two attributes exist at the same time, but except for the name no other feature is shared, including types, semantics and business ontologies. A role dynamically “imports” attributes (values) and behavior from its super-roles, in particular, from its parent object. In Fig.50 we present an example showing basic features of the store model with dynamic roles. The following features are presented:

·    An object (shown as a grey rectangle with round corners) has one main role (Person) and any number of specializing roles (Employee and Student).

·    Each role has its own name, which can be used to bind the role from a program or a query. The presented objects can be bound through name Person (each), through name Employee (2nd and 3rd) and through name Student (3rd and 4th). Each binding returns the identifier of a proper role (or the identifiers of proper roles in case of multi-valued bindings).

·    Each role is encapsulated, i.e. its properties are not seen from other roles unless it is explicitly stated by a special link (shown as a double-line with the black diamond end). In particular, a role Employee imports all properties of its parent role Person. For example, if the second object is bound by name Person, then the properties {Name Brown, BirthYear 1975} are available; however if the same object is bound by name Employee, then the properties {Salary 2500, Job analyst, Name Brown, BirthYear 1975} are available.

·    Each role is connected to its own class. The connection is shown as a grey thick continuous arrow. Classes contain invariant properties of corresponding roles, in particular, names (first section), attributes and their types (second section; attribute types are not shown) and methods (third section).

Fig.50. Objects and roles as data structures

Links can join not only objects with objects, but also objects with roles and roles with roles. For example, a link works_in joins an object Company with a role Employee, Fig.50. A similar link studies_at joins a role Student with an object School. If such a link leads to Employee, it indirectly leads to Person, because the role Employee imports the properties of its parent object Person. However, after accessing the object via such a link, the properties of the role Student remain invisible. As follows, a role identifier must be different from the identifier of the corresponding object. Fig. 50 shows also a link is_a_customer_of between objects Person and Company. Accessing the object Person through this link implies that any role of the object Person remains invisible.

The possibility to create links between roles is a new quality for analysis and design methodologies and notations (such as UML). Links must lead to parts of objects, not to entire objects. To model this situation the methodologies suggest using aggregation/composition. Such an approach implicitly assumes that e.g. an Employee is a part of a Person on the similar principle as an Engine is a part of a Car. Although the approach achieves the goal (e.g. we can connect the relationship works_in directly to the Employee sub-object of Person), it obviously misuses the concept of aggregation, which normally is provided for modeling the „whole-part” situations.

Note that in the case presented on Fig.50 static inheritance known from UML is inessential (hence dashed lines), because it is fully substituted by dynamic inheritance. For instance, Student objects are members of the StudentClass (hence inherit all properties of the class), but also dynamically inherit from Employee Person objects, which in turn are member of the EmployeeClass. Hence, during processing of a Student objects the inheritance mechanism ensures access to properties of a corresponding Person objects, as well as properties of the StudentClass and PersonClass.

Below we list several features, which make the concept of dynamic roles different in comparison to the classical object-oriented concepts.

·    Multiple inheritance: Because roles are encapsulated there is no name conflict even if the super classes would have different properties with the same name. There is no need for EmployeeStudentClass, which inherits both from EmployeeClass and StudentClass.

·    Repeating inheritance: An object can have two or more roles with the same name; for instance, Brown can be an employee in two companies, with different Salary and Job. Such a feature cannot be expressed by the traditional inheritance or multi-inheritance concepts.

·    Multiple-aspect inheritance: A class can be specialized according to many aspects. For example, a vehicle can be specialized according to environment (ground, water, air) and/or according to a drive (horse, motor, jet, etc.). Some modeling tools (e.g. UML) cover this feature, but it is neglected in object-oriented programming and database models. One-aspect inheritance makes problems with conceptual modeling and usually requires multiple inheritance. Roles avoid problems with this feature.

·    Variants (unions): This feature, introduced e.g. in C++, CORBA and ODMG object models, leads to a lot of semantic and implementation problems. Some professionals argue that it is unnecessary, as it could be substituted by specialized classes. However, if a given class can possess many properties with variants, then modeling this situation by specialized classes leads to the combinatorial explosion of classes (e.g. for 5 properties with binary variants - 32 specialized classes). Dynamic object roles avoid this problem. Each branch of a variant can be considered a role of an object.

·    Object migration: Roles may appear and disappear at run time without changing identifiers of other roles. In terms of classical object models it means that an object can change its classes without changing its identity. This feature can hardly be available in classical object models, especially in models where binding objects is static.

·    Referential consistency: In the presented model relationships are connected to roles, not to the entire objects; thus, e.g. it is impossible to refer to Salary and Job of Smith when one navigates to its object from the object School. In classical object-oriented models this consistency is enforced by strong typing, but is problematic if the typing is weak.

·    Overriding: Properties of a super-role can be overridden by properties of a sub-role. The possibilities of overriding are extended in comparison to the classical object models: not only methods but also attributes (with values) can be overridden.

·    Binding: An object can be bound by the name of any of its roles, but the binding returns the identifier of a role rather than the identifier of the object. By definition, the binding is dynamic, because in a general case during compilation it is impossible to decide that a particular object has a role with a given name.

·    Typing: A role must be associated with a name, because this is the only feature allowing the programmer to distinguish a role from another one. Hence, the role name is a property of its type (unlike classical programming languages, where a type usually does not determine the name of a corresponding object/variable). Because an object is seen through the names of its roles, it has as many types as it has different names for roles.

·    Subtyping: It can be defined as usual; for instance, the Employee type is defined with the use of the Person type. However, there is no sense to introduce the StudentEmployee type. Due to encapsulated roles, properties of a Student object and properties of an Employee object are not mixed up within a single structure.

·    Substitutability: Since names of roles are determined within types, it makes little sense to say, e.g. that the Employee type can be used in all places, where the Person type can be used. Thus the substitutability principle must be at least reformulated.

·    Temporal properties: Dynamic object roles are enormously useful for temporal databases, as roles can represent any past facts concerning objects, e.g. the employment history through many Employee roles within one Person object. Without roles, historical objects present a hard design problem, especially if one wants to avoid redundancy, preserve reuse of unchanged properties through standard inheritance, and avoid changing objects’ identifiers.

·    Aspects of objects and heterogeneous collections. A big problem with classical database object models is that an object belongs to at most one collection. This is contradictory with both multiple inheritance and substitutability. For instance, we can include a StudentEmployee object into the extent Students, but we cannot include it at the same time into the extent Employees (and vice versa). This violates substitutability and leads to inconsistent processing. Dynamic roles have a natural ability to model heterogeneous collections: an object is automatically included into as many collections as the types of roles it contains.

·    Aspect-Oriented Programming. AOP makes it possible to encapsulate cross-cutting (tangled) concerns within separate modules. For example, such concerns are: history of changes, security and privacy rules, visualization, synchronization, etc. As follows from the previous feature, dynamic object roles have conceptual similarities with AOP or can be considered as a technical facility supporting AOP.

·    Metadata support. Metadata are a particular case of crosscutting concerns. Meta-information, such as authorship, validity, legal status, ownership, coding, etc., can be implemented as dynamic roles of information objects.

As follows from the above, dynamic object roles have the potential to create new powerful qualities, which are difficult or impossible to achieve in classical object model. More about our approach to dynamic object roles are presented in the sections devoted to the AS2 store model and corresponding properties of SBQL.

 

Encapsulation and information hiding

Encapsulation denotes grouping components and details within some box and then, making it possible to manipulate this box as a whole, without considering its internals. Encapsulation is usually associated with information hiding: the content, structure and implementation of the box is hidden (it is a “black box”). In the following we use the term encapsulation to denote the join of both concepts. Encapsulation is not the invention of object-orientedness. It is a general principle of software engineering that was originally formulated by David Parnas in 1972 in connection with the concept of module. We can even say more: encapsulation is a principle of human civilization, which – on different levels, media and entities – encapsulate details into some box and then treats the box as a unit. Your TV set is an example.

The encapsulation principle states that the programmer should know about some software entity as much as necessary for effective using it.

Everything that can be hidden from the programmer should be hidden.

It is desired from the point of view of not burdening the programmer by inessential details, as well as from the point of view of reducing the potential errors in the software. Specification of a software entity should be separated from its implementation. This feature is necessary to isolate local changes in implementation of the given entity from the rest of the software. The separation means that changes in implementation are not influencing the external semantics of the given software entity. (Unfortunately, the last assumption is sometimes difficult to assure thus not always holds. Anyway, each change in implementation require testing of the whole software due to a lot of undocumented dependencies.) A programmer uses the encapsulated entities on the “black box” principle, usually without the possibility to see their internal construction and without the possibility to do anything on them except what is allowed by their externally specified operations.

Encapsulation and information hiding are the motivation for important concepts in programming and databases, such as procedure, function (functional procedure), object, class, method, module, procedure and class library, abstract data type (ADT), application programming interface (API), database view and perhaps others. Encapsulation is also the basis for a database query language, which can be considered an abstract interface to data hiding a lot of data implementation and organization details.

In object-oriented models encapsulation is considered twofold, as follows:

·      Orthodox encapsulation (known from Smalltalk, CORBA and ADT-s): externally, objects are seen and processed only by methods/operations. Other object features (its state), in particular all its attributes, are hidden.

·      Orthogonal encapsulation (known from Modula-2, C++, Eiffel and Java): any property of an object, in particular its attributes and methods, can be private (hidden for external access) or public (available for external program entities). Technically, such a feature is introduced differently: as export list (Modula-2, Eiffel), as special language construct known as interface (Java) or as special keywords attached to class/type properties, such as private and public (C++). In C++ there are notions being some compromise between private and public, such as protected and friend class.

The orthodox encapsulation is more restrictive as it disallows generic operations on an object state, such as direct bindings to attributes and updates of attributes. These constraints concerning the access to object properties some professionals consider a positive feature of object-orientedness. In their rhetoric, more constrained access to object properties reduces the possibility of errors in software. Moreover, such restrictive encapsulation implies higher abstraction level, as implementation of an object (i.e. its attributes and their types) is totally hidden for external access. Hence it is argued that the orthodox encapsulation supports safety and quality of the produced software.

Because anyway some access to attributes may be necessary, the advocates of orthodox encapsulation postulate that each attribute attr that is to be available for public use should be externally substituted by two methods: get_attr (so-called getter), to read the attribute value, and set_attr (so-called setter), to change the attribute value; the last method has a parameter with the new attribute value. In this way the class creator ensures full control over external access to the components of an object state.

We argue that the arguments on orthodox encapsulation are only apparently true. In our opinion they are based on wishful thinking and are technically inconsistent with a lot of properties that we require from object models, especially database object models. We present the following arguments against the orthodox encapsulation:

·       Inconsistency with the principle of orthogonality of bulk types: Let the attribute jobs be a collection of all jobs for an employee. In this case it is not enough to have two methods get_jobs and set_jobs, because we have to perform an operation on every single job value, not on the collection of values treated as a whole. For instance, the programmer may need to read or update a particular job value. Frequently in such situations it is argued that the situation needs the concept of iterator, i.e. generic methods such as get_first_value, get_next_value, check_existence_of_next_element, delete_current, insert_after_current, etc. Semantic consistency of these operators requires that some of these methods must return a reference to a value rather than the value itself. However, references lead to violation of orthodox encapsulation, because the essence of it is to avoid generic operations on references to attributes. The above example shows that such assumption is very inconvenient for processing nested collections.

·       Inconsistency with the principle of the semantic relativity of objects. The principle claims that attributes within object are objects too, perhaps with own classes and methods. (This principle is relevant only to objects understood as data structures.) Orthodox encapsulation requires serving all objects by methods only, but this would be impossible for attributes being objects because there would be no possibility to identify them to send a message to them (i.e. to call their methods).

·       Inconsistency with null values and/or variants. If attribute attr can take a null value or be present or absent in a given variant, then a getter and a setter are insufficient. At least two next methods are necessary, test_attr_if_null and set_attr_to_null.

·       Inconsistency with the idea of query languages. Query languages require direct binding both to objects and to their attributes. This inconsistency has formed the view that the idea of query languages is inconsistent with object-orientedness (see e.g.[ Darw95, Date98]). Such a view we consider a nonsense directly coming from treating the orthodox encapsulation as a principle. There are a lot of query languages addressing object-oriented models, in particular SBQL introduced on these pages, that follow the idea of encapsulation.

·       Inconsistency with the idea of database schema. A programmer that works on a database application must see and use objects structures in the form of objects, attributes and links to other objects. Any artificial warping of this programmer view will result in lower productivity, lover legibility of programs, more errors and more difficulties with program maintenance.

·       Conceptual limitations and inconsistencies. How setters and getters have to process pointer links (relationships, associations) that are set inside objects? How to process some special attributes such as BLOB (multimedia)? In the last case both getters and setters must work on references to long values rather than on values themselves. How to write a generic procedure that processes some data types that can be assigned to many attributes in many object of different classes, for instance, date? Providing there is some generic procedure that checks and converts dates, the natural way is to use a call-by-reference parameter passing method. The idea of getters and setters excludes call-by-reference applied to attributes of objects.

As follows from the above arguments, the orthodox encapsulation leads to serious limitations and inconsistencies, especially in object database models. Sometimes such understanding of encapsulation results in conclusions that the idea of encapsulation is wrong and should be forgotten. Such conclusion are at least strange, taking in account the fact that encapsulation (in different forms) is the basic principle of software engineering (e.g. modules) and even more, it is the basic principle of any engineering (see your encapsulate computer, TV set or headphones).

The only reasonable alternative is the orthogonal encapsulation, where each property of a class (of an object) can be declared as public and private (perhaps with some middle options such as protected). Orthogonal encapsulation makes no problems with defining query languages addressing powerful object-oriented database models. Attributes that are public can be served by some generic methods such as binding, dereferencing, arithmetic and string comparisons and operators, etc. Attributes can be the subject of updating operations such as assignment and deletion. This feature requires that binding to an attribute returns its reference rather than its value. A reference can be used as arguments for the mentioned updating operations as well as can be used in the call-by-reference parameter passing to procedures, functions and methods.

Orthogonal encapsulation is not contradictory to getters and setters. If for any reason the programmer of a class wants to make its attribute attr hidden, he/she declares it as private and then writes two public methods: get_attr and set_attr. This can be done with any attribute of the class, perhaps with attributes that can store null values, pointer-valued attributes, etc. However, due to generic methods acting on attributes, getters and setters will be unnecessary in majority of cases.

 

Modules

The concept of module (1972, D.Parnas) is motivated by several software engineering factors that support software manufacturing and maintenance. They are the following:

·         Independent development (isolation): module make it possible to develop the software by many programmers with the minimal communication overhead that are necessary to synchronize their job. Each module has an own name space that does not collide with namespaces of other modules, even if the same names are used for different purposes.

·         Independent compilation: modules can be compiled independently from other modules. Compiled modules are joined into an executable program by a special utility (linker).

·         Program store units: modules are independently stored, administered and maintained by software maintenance and administration staff.

·         Encapsulation and information hiding: modules encapsulate and hide the details of the software units. In this way the complexity of the software is reduced through subdividing among many software development phases.

·         Comprehensibility: modules support conceptual modeling of the software through units that are understandable from the point of view of their meaning, semantics and business logic.

·         Abstraction: modules allow for designing and implementing software on several abstraction levels, due to encapsulation and information hiding.

·         Decomposition: modules decompose a program into many software units that can be designed and implemented independently.

·         Safety: modules reduce the possibility of critical bugs in the software through well defined interfaces that determine exported functions (through export lists) and well defined back-end specifications that define possible side effects (through import lists). Exported and imported features can be strongly (statically) typechecked.

·         Reuse: modules present ready-to-use software units that can be reused in many configurations for different applications.

·         Changeability: it is possible to substitute a module by another module, providing its specification and functionality is compatible. The programmer can change a module by another module having compatible specification and functionality, but different in other aspects (e.g. performance).

Almost all programming languages allow some forms of modules, but not all the above qualities are supported. For instance, in C/C++ modules are equivalent to source code files perhaps with different roles (such program and header files).  Modula-2 is considered as a mature accomplishment of the idea of modules where almost all the above qualities are supported. In Modula-2 modules can be nested (unlike C/C++). Each module has specification and implementation. Implementation is assumed to be not available externally, it is a „private” part with hidden details.  Specification consists of export list (that is currently known from Java, ODMG, CORBA etc. as interface) and import list that specifies properties of other modules that can be used in the given module. (Unfortunately, the idea of import list seems to be forgotten.)  Programmers can use exported features of a module, but internals of the module are hidden for external programmers. External programmers are informed about side effects of a module through inspecting its import list. The programmer of the module is constrained to use only those properties of other modules that are specified on its import list.

Considering object databases, the concept of module should combine two aspects. From one side, a module should be an encapsulated piece of a source code, just like in Modula-2 or similar modular languages. From another side, a module is a database object that encapsulates all the programs and data that are devoted to some application goal. Note that in databases the situation is different than in programming languages because of late binding. Many features (e.g. views, stored procedures) can be dynamically inserted during a run of a database server, hence the classical subdivision on compilation, linking and executing is no more valid. In the ODRA system a module is a specialized object that stores and encapsulates different database or program entities, including (compiled) classes and their methods, (compiled) procedures, views, proper objects, etc. Source codes are stored independently as operating system files, but connected with stored modules by some naming convention.


Last modified: January 10, 2008