6. CAPSULES AND SEPARATE TRANSLATION
6.1 USE OF CAPSULES
6.1.1 INTRODUCTION
The capsule feature of RED, in accordance with Steelman (SM 3.5), is not simply
a facility for defining abstract data types, but a general mechanism for controlling
scopes of names. Among the encapsulation mechanisms provided by recent languages,
capsules are most like modules in Modula [Wi77].
In the languages CLU [LSA77],
Alphard [WLS76],
Simula [DNM69],
and Euclid [LHLMP76], the encapsulation mechanism
defines a new type. In both RED capsules and Modula modules there is no particular
association between the mechanism and abstract types; rather, a set of related
definitions can be defined, some of which are exported for general use. RED differs
from Modula primarily in the way that hiding of underlying properties for abstract A
data types is accomplished, and in the way that the lifetime of definitions in a
capsule is defined.
RED capsules can be used to define a single abstract data type, a group of
related abstract data types, a group of related procedures (e.g., an arithmetic
package) and a group of related data (a compool). In addition, the capsule serves
as the unit of separate translation. In this section we discuss uses of the capsule
mechanism and some related technical issues. Separate translation is treated below,
in Section 6.2, including a discussion of the EXTERNAL feature.
6.1.2 EXPORTING FROM A CAPSULE
The capsule is a mechanism for controlling scope of names. Only those
definitions that are explicitly exported are available for use outside the capsule.
All definitions in the capsule are potential candidates for exporting except for
labels. Prohibiting the exporting of labels is consistent with RED's prohibition on
non-local goto's. Ordinarily definitions will be exported by listing their names
explicitly in the export list; variables can be exported readgnl; if desired. For
some uses, e.g., libraries and compools, all definitions are intended to be
exported, which can be accomplished by saying "EXPORTS ALL".
In Modula, the lifetime of the definitions in a module is the lifetime of the
block that contains the module declaration. In RED, lifetime may be determined not
by where a capsule is defined, but by where the definitions it provides are useful.
These definitions are made available by exposing the capsule; e.g., "EXPOSE ALL FROM
NEW M". The lifetime of the definitions is the lifetime of the scope to which the
exposure is local, and the exported definitions are local to that scope.
Since it is explicitly exposed, a capsule can have parameters that permit it to
be specialized to the using environment. These parameters, for example, can be used
to determine the size of an internal table, or to initialize some data of the
capsule. The capsule may not export its parameters; this rule avoids a source of
aliasing.
6.1.3 ABSTRACT DATA TYPES
An abstract data type is defined by writing a capsule that exports the type and
the operations that determine the abstract behavior. For example, to define the
abstract type "intstack", one can define the following capsule:
Inside the capsule, the abstract type is defined in terms of an underlying type, and
the abstract operations are implemented in terms of that underlying type. In
addition, internal, helping routines can be defined; these are not visible
externally because only the exported operations are available for outside use.
The abstract type is defined using a TYPE declaration (not an ABBREV
declaration); for example:
Use of a TYPE declaration makes the abstract type a new type distinct from all other
types. In addition, the abstract type can have attributes (e.g., "size" above), and
attribute inquiry is automatically available for external use if the type is
exported.
A type declaration provides a few operations for the newly defined type; in
particular, .ALL, :=, =, and selectors if the underlying type is a composite type.
For example, intstack has operations .ALL, :=, =, .top, and .els. These operations
can be used in implementing the operations of the abstract type. However, with the
exception of := and possibly =, they should not be exported. The .ALL, .top, and
.els operators are not abstract operations; i.e., they make no sense as far as the
abstract type is concerned. Furthermore, they reveal information about the
underlying type. Sometimes even := does not make sense as an abstract operation, in
which case, it is not exported.
Occasionally, the definitions of := and = that are automatically provided for a
new type are not the correct ones for the abstract type. This is expecially likely
when the underlying type is indirect, since := for indirect types introduces
sharing, and = tests pointer equality. The definer of the new type can override the
automatic definitions of := by providing explicit definitions of := and = for the
new type. The issue of user defined assignment is discussed further in Section 9.1.
In contrast with the approach taken in Modula, the semantics of an exported
type do not change when it crosses a capsule boundary. In Modula a type becomes
"opaque" as it crosses the module boundaries; i.e., knowledge of its underlying type
is lost. In RED, the type is the same both inside and outside the capsule, but some
operations available inside may not be available outside because they were not
exported.
A capsule can be used to define several abstract types. This is useful when
operations (such as conversion functions) need access to the underlying properties
of two or more abstract types.
Capsules can be nested in other capsules. This provides the ability to define
two or more types with underlying properties hidden from one another, but with
special access rights with respect to one another. For example:
Operation q3 for t2 is only available for use inside capsule def; it is known in
capsule deft1 and may be used in defining t1. Such an ability would be useful, for
example, in a capsule controlling the use of a group of abstract printers, where the
individual abstract printers can be used outside (i.e., written), but where creation
of new abstract printers can only be done by an operation on the group as a whole.
Another use of nested capsules is to define layers of abstract machines, where
some definitions provided by a lower level machine are available to a user of a
higher level machine, and others are not.
As shown in the following example, a capsule may be used to define a single
instance of an abstract object rather than to define a new abstract type:
This capsule defines a single object, which is represented by the data internal to
the capsule (the variable named hashtable). The size of hashtable is determined at
the time the capsule is EXPOSEd, at which time the body of the capsule (the REPEAT
statement) initializes hashtable. The advantage of such a capsule is that there is
no need to pass hashtable as an actual parameter to the operations.
6.1.4 OWN DATA
Capsules can have "own" data; i.e., local variables which are not exported.
The own data can be initialized by the capsule body when the capsule is exposed; the
initialization can be based on the values of the capsule parameters. One
illustration of this is the symbol_table capsule in the previous section. As
another example:
Note that random is declared to be an abnormal function, because different
invocations with the same actual parameter values (in this case, none) are intended
to provide different results.
Another use of own data is to collect statistics about the use of a capsule.
Consider the case of two capsules that both define the same abstract data type, but
one collects statistics about usage. It is desirable that both capsules appear the
same to the using programs: it should be possible to collect information about the
use of a module, provided the behavior of the module doesn't change in any other
way, without affecting using modules. To satisfy this goal, functions must be
permitted to modify data local to their capsule. Such an ability is required by
Steelman (SM 4C) and is provided by RED. Note that such modifications are
"invisible" side effects, and will not cause a function to become abnormal. See
also Section 5.2.
6.1.5 LIBRARIES AND COMPOOLS
Capsules can be used to provide libraries of useful procedures and data types.
For example, a capsule might provide a library of mathematical functions (e.g.,
various integration methods) and related data types (e.g., complex numbers).
Capsules can also be used to provide compools of related data items.. The body of
the capsule can initialize the data, based on the capsule parameters if desired.
Capsule parameters can also be used to control the sizes of the data.
6.2 SEPARATE TRANSLATION
Block structure, or nested name scopes, is one of the major contributions of
ALGOL-60. Almost all of its successors have adopted this approach to name scoping.
In recent years, the need to hide declarations has become clear and has generated
interest in encapsulating some declarations so that they are not as generally
visible as the scope rules would otherwise have made them (capsules perform this
function in RED). Nonetheless, the basic concept of names being defined in outer
scopes and used in statically enclosed inner scopes remains the best way of sharing
declarations of both data and program objects.
The ALGOL-60 rules require that a name's declaration be visible (according to
the scope rules) at any point where the name is referenced. A direct consequence of
this rule is that everything must be translated together. In the world of embedded
applications, such an all-or-nothing philosophy is unacceptable:
- Embedded applications often involve tens of thousands or hundreds of
thousands of lines of code. Retranslating the entire program for each
change is not economically feasible.
- Large applications are typically built by large groups of people. It is an
organizational impossibility to have them all working on parts of the same
source text and yet the partial and total integration of their components
must be easily achieved. This integration must be performed without
invalidating any component testing performed before integration.
- Since translators, linkage editors, and other support software can have
bugs, final verification must be performed on the embedded object code, not
on the source. If the whole program is retranslated, the great bulk of the
code must be retested even though no source level changes were made. Such
retesting is unacceptably costly in both money and time.
- Programs may contain classified information (e.g., encryption/decryption
routines). The invoker of a routine has no need to know about the internals
of the routine.
A language and associated translators for supporting embedded applications must
provide a separate translation capability. This capability must sacrifice none of
the convenience, reliability, efficiency, or access control obtainable by a single
unified translation scheme. Furthermore, the separate translation facility must
support the integration of components produced by diverse sources, including
libraries with conflicting name usages and programs written in other languages. RED
satisfies all these requirements.
6.2.1 THE TRANSLATION UNIT
The unit of translation is the capsule. The capsule is ideally suited for this
role as it can, with equal ease, be used to define:
- a shared pool of data;
- a shared routine or library of shared routines;
- an abstract data type or library of such types;
- an integrated module.
Capsules have the essential properties which facilitate program modularization:
- they can be combined with other capsules to form a larger, more powerful
capsule;
- they can completely control what subset of their capabilities is exported;
- they can be written to provide some collection of virtual machine
facilities (e.g., abstract data types) independent of their environment.
Elaboration of the main capsules of a system is initiated by some extra-lingual
facility (e.g., the operating system, the computer operator, or the power up
sequencing hardware). A capsule which is not a main capsule must be referenced from
some other capsule in order to be useful. In order for a capsule (or its exported
objects) to be referenced, its name must be visible. When two capsules are
translated together, name visibility is provided by the standard scope rules. When
the capsules are translated separately, the scope rules must be extended to provide
the required visibility.
6.2.2 EXTENSIONS OF THE SCOPE CONCEPT
Extending the scope rules can be done in several different ways. The FORTRAN
approach has the virtue of simplicity. External routines are declared implicitly by
calling them. External data are declared explicitly via COMMON declarations. All
bindings are performed by a linker, independently of the language. There is no
attempt to perform any type checks. In fact, type mismatches in COMMON declarations
are legal and often intentional. Since this approach is clearly incompatible with
the Steelman and RED philosophy, it is perhaps interesting to consider the logically
opposite extreme.
The system implementation language LIS [CII76] is a strongly typed
block-structured language supporting separate translation. The LIS philosophy is that an
application is a single (perhaps large) block structured unit. It is possible to
translate this whole object or to separately translate any properly nested subpart.
This philosophy has a great deal of intuitive appeal. Unfortunately, when all the
details are worked out, there are problems. Since the underlying idea is simply to
use block structure to provide name access, one would expect there to be few or no
rules pertaining specifically to separate translation. In fact, there are a large
number of complicated rules. These rules are so complicated that an otherwise very
favorable analysis of LIS [KL78] concluded that
its separate translation facility
would have to be simplified before the language was usable in a production
environment. In spite of the language's intentions, an unacceptably large part of
the total application must be retranslated whenever a structural change is made
(e.g., adding a new routine).
6.2.3 ENVIRONMENT SPECIFICATIONS
The RED language provides a simple mechanism satisfying the requirements for
separate translation. Associated with each separately translated capsule is a
descriptor containing, among other things, a "template" of the capsule. The
template (described in more detail in Section 6.2.5) is automatically generated by
the translator when the capsule is translated. The environment of the compilation
is determined by which EXTERNAL capsules are exposed. The environment determines
which templates are to be visible when the unit is translated. By controlling the
use of EXTERNAL environment capsules, project management can completely control
access rights between separately translated capsules.
6.2.4 USE OF ENVIRONMENTS
Example 1
Many capsules import nothing. This is particularly true of library capsules.
Such an empty environment is created by not exposing any EXTERNAL capsules.
Example 2
Most large projects will define a library of utility routines available to any
capsule. Such a library might include the trigonometric functions, some abstract
mathematical types (e.g., complex, matrix, vector), and some simple arithmetic
functions (e.g., ceiling, floor, absolute value). If each of these groups is
provided through a separate capsule, an appropriate environment might be provided
by:
EXPOSE ALL FROM EXTERNAL trig;
EXPOSE ALL FROM EXTERNAL mathtype;
EXPOSE ALL FROM EXTERNAL arithfunc;
The separate translation facility provides the effect of a global scope containing
all of the separately translated capsules. Whether a capsule is defined locally or
separately, it still must be exposed before it is used. Occasionally, several
capsules each need their own instance of a separately translated capsule. To
implement this, each capsule performs an EXPOSE selecting the NEW option. However,
in our example it is more likely that all the capsules should share one instance of
the library capsules.
Not selecting the NEW option has the effect of performing an EXPOSE in the
scope containing the translation unit. Data are allocated and initialized in that
scope and routine names are made available. The capsule being translated may use
the routine names and import the data using exactly the same mechanisms as it would
if it were embedded in a larger capsule actually containing appropriate expose
declarations.
Example 3
The library capsules of Example 2 probably only export routine names. Embedded
applications frequently require pools of data to be shared among several separately
translated units. For instance, the guidance, navigation, and control software of
an airplane might use a shared pool of data defined in capsule gncpool, as well as
the library environment. The appropriate environment would be provided by:
EXPOSE ALL FROM EXTERNAL gncpool;
EXPOSE ALL FROM EXTERNAL trig;
EXPOSE ALL FROM EXTERNAL mathtype;
EXPOSE ALL FROM EXTERNAL arithfunc;
Example 4
When independently produced libraries are combined, name conflicts may arise.
For instance, both trig and arithfuncs might provide square root routines. If both
capsules are exposed in the same scope, a name conflict will arise. RED offers two
solutions to this problem.
It is possible that each square root routine is needed. For instance, one
might be faster and the other more accurate. In that case, one or both routines
must be renamed. For instance:
EXPOSE ALL FROM EXTERNAL trig RENAMING sqrt TO fastsqrt;
EXPOSE ALL FROM EXTERNAL arithfuncs RENAMING sqrt TO accsqrt;
If the two routines are redundant, the desired alternative is to eliminate one
of the square root routines. This is done by limiting the access to one of the
capsules:
EXPOSE sin, cos, tan, arctan FROM EXTERNAL trig;
Example 5
The main purpose of the visible list in a capsule invocation is to restrict a
particular translation unit's power rather than to resolve name conflicts. Suppose
a mission contains a capsule for logging events. That capsule might export routines
for reading the date and time of day, for rewinding the log, for positioning the
log, for modifying a log entry, and for appending an entry to the end of the log.
Most applications will require only a subset of these capabilities and project
management may well wish to guarantee that the log is not accessed improperly. This
may be accomplished through the definition of the following capsule:
CAPSULE log_subset EXPORTS ALL;
EXPOSE date, time, append FROM EXTERNAL log;
END CAPSULE log_subset;
If the users are informed only of the existence of log_subset, then improper access
to the log_capsule is prevented.
6.2.5 IMPLEMENTATION ISSUES OF SEPARATE TRANSLATION
The RED separate translation facility provides a convenient and flexible
facility for developing a large program as a collection of separately translated
modules. Of course separate translation raises several problems other than name
visibility. These issues are addressed in the following paragraphs.
Version Skew
If two capsules are translated together, with capsule1 using resources exported
by Gapsulez, the translator can guarantee that all interfacing between the capsules
is performed correctly. If the two capsules are translated separately, then
translating capsulez automatically generates a template. Capsulez must be EXPOSEd
EXTERNAL in capsule1. The translator is led, by capsule1's EXPOSE, to read the
template for capsulez and can thus once again guarantee that all templates are
correct. The potential problem arises when capsulez is modified. In the unified
translation case, translating capsulez implies also translating capsule] -- the
consistency checks are made every time. In the separate translation case, the
templates used in capsule1 are not rechecked. A mechanism must be provided which
guarantees that all the template information in a given program is consistent. The
various component modules of a complete program are brought together by the linker.
The linker is thus the appropriate place to check for consistency and these
consistency checks can be reduced to simply checking version numbers supplied by the
translator.
Whenever a capsule is translated, the translator generates a new template.
This template is compared to the old one. If the two are identical, no changes are
made. If there is any difference, the new template is installed in the descriptor
and the version number is incremented. Whenever a descriptor is used as part of an
environment, the version number of the imported template is recorded as part of the
descriptor of the capsule being translated. At link time, the version numbers of
the capsules being linked must match the version numbers specified in the importer's
descriptors. If a mismatch occurs, the linker reports a version skew error. Some
implementations may ease this problem by generating lists of capsules with template
mismatches or automatically retranslating importers whenever a template is changed.
Such facilities are convenience factors -- the linker check is all that is
necessary.
Interfacing to Foreign Code
The template part of a capsule's descriptor contains all the information
necessary to use the capsule. The information required is implementation dependent.
A routine's template might simply specify the types and binding classes of its
parameters. On the other hand, highly optimizing translators may use different
parameter passing conventions depending on user specified optimization criteria and
the use of a particular parameter. If there are options, the particular conventions
to be used must be specified. When linking to programs written in another language
the RED translator must know the parameter passing conventions. An implementation
is free to choose whatever mechanism seems appropriate. If there are only a few
other high level languages involved, the natural scheme would be to incorporate
knowledge about the other language processors' conventions into the RED translator.
If many hand crafted templates must be met, a template will have to specify detailed
calling sequence information. In any case, this is information provided by project
management in the template. The fact that a capsule is built out of foreign code is
nowhere reflected in the RED source code. If the foreign program is subsequently
rewritten in RED, all that is required is a retranslation of the callers.
The RED translator must be able to write a template description. By
restricting programmers' access to other template writing programs, project
management can effectively enforce control over the use of foreign code.
Inline and Generic Routines
The template part of a capsule's descriptor must contain all the information
necessary to translate uses of any exported object. For routines, this information
is typically limited to the types and binding classes of the parameters. Details
about the inner workings of the routine are unnecessary because they are contained
in the routine's object code and the linker will make the necessary connection
between the invocation and the routine.
Routines which are to be expanded in—line or which have generic parameters
cannot be completely translated at their point of declaration. Only at the point of
invocation does enough information become available to complete the translation of
the routine. This implies that the template must contain a translatable
representation of the routine. The form of representation is totally implementation
dependent. Certainly, the complete source text of the capsule's declarations and
the body of the routine are sufficient. The appropriate division of labor between
declaration time processing and invocation time processing depends upon the expected
number and complexity of invocations.
Mutual Recursion
Although the capability is not required by Steelman, there are many cases where
it is appropriate to separately translate mutually recursive routines. For
instance, in a translator using a recursive descent parser, almost all routines are
mutually recursive. Nonetheless, the translator can be logically broken up into
modules.
The separate translation of capsules containing mutually recursive routines
poses no problems to the RED programmer. The key is to first provide appropriate
templates, either by translating capsules containing stubs for the mutually
recursive routines or by using the global translate facility (described below).
Development of the routines can then continue independently with changes in one of
the routines occasionally requiring translation of the other capsule, just as in the
non-recursive situation. It is possible that making massive internal changes, such
as replacing a stub by the actual routine, may modify a template. This is
particularly likely in optimizing translators. When such a template change occurs,
the other capsules must be retranslated to reflect the changes. This process is
inherently stable and converges in a small number (usually 1) of iterations.
Efficiency
Efficiency is a major concern in embedded applications. Although separate
translations and convenient library facilities are attractive, it is essential that
no run-time price be paid for them. The two possible sources of inefficiency are 1)
degradation of the quality of the object code and 2) the loading of unused routines.
RED suffers from neither of these problems.
It should be kept in mind that some of the best object code now produced is
produced by FORTRAN translators which translate every routine independently of all
others. The present state of the art in optimization does not avail itself of
information about a routine's calls. In fact, highly optimizing translators usually
go to great lengths to translate routines before translating the callers so that
information about the routine's operation can be used to optimize the caller. A RED
translator can put any useful information about a routine into the template, thus
enabling it to perform exactly the same optimizations as in the unified translation.
For generic and inline routines the required amount of information is certainly
large. This information is required, however, simply to support the separate
translation of these routines. No additional information is required to support
optimization. It is likely that in the near future improved optimization techniques
will be developed that use information about the caller. For instance, practical
algorithms for automatically deciding upon inline vs. out-of-line expansion may be
developed. The GLOBAL TRANSLATE facility described below supports even this extreme
.level of optimization at some additional translation-time expense.
Not infrequently, changes are made in several separately translated but
interrelated capsules. Retranslating the capsules separately has several
disadvantages:
- if templates change, care must be taken to translate the capsules in the
right order;
- substantial calendar time and staff time may be wasted because some
translation requests cannot be submitted until others have been completed;
- substantial computer time is wasted because the translator must redo
substantial analysis for each separate module.
The GLOBAL TRANSLATE facility of a RED translator enables the user to translate any
subset (or all) of the capsules in a library. The translations are performed as if
all the translated capsules were enclosed in one large capsule exporting all of the
individual capsules, but several advantages accrue:
- template information for capsules being translated is taken from the
current translation, thereby eliminating the ordering problems;
- templates need to be interpreted only once regardless of the number of
capsules requiring them;
- the performance of the translator can be optimized by eliminating multiple
loadings of the same phases;
- the object code of routines can be optimized using knowledge about the
caller.
In present-day systems, routines are not loaded unless they are required. This
is equally true in RED. Although a capsule will often contain several routines, the
translator can generate a separate object module for each routine. When the linker
binds object modules into a load module, it need only link in object modules which
are called by some already required routine. If a capsule contains a library of
routines, only those routines actually required by a particular application are
linked in. Once a routine has been linked in, subsequent requests for the same
routine are always linked to the already linked copy. It is likely that a library
will contain several instantiations of a generic routine; nonetheless, the linker
will only link in the instantiations actually used.