Intermetrics, Inc. - RED Rationale - March 1979

.	Navigate

3. Types (Part 3)

4. Expressions & Statements

TYPES
(Part 4)

Indirect Types     Basic Properties     Explicit Decl     Properties     Indirect Subtypes     Alloc-Time     Recursive
Programming Example     Algorithm     Data Structure     Procedures     Putting It Together
Justification     Initialization     Dereferencing     Allocation Statement     Explicit Deallocation

3.   TYPES (Part 4)

3.7  INDIRECT TYPES

The RED language provides a versatile but straightforward facility for realizing indirect (i.e., "pointer") types as required by SM 3-3I,   3-3J, and 8E. In this section we give an informal discussion of the basic properties of indirect types, a programming example illustrating their application, and a justification for the design of the facility.

3.7.1  BASIC PROPERTIES

Explicit Declaration of Indirect Types

Each indirect type must be declared as a new type. For example, to obtain pointers to strings of length 5, the user can declare
      TYPE pstring5 : PTR STRING[ASCII](5);
      VAR p, q : pstring5;
An attempt to declare a variable of an "anonymous" indirect type, such as
      VAR r : PTR STRING[ASCII](5);
is in error, since the requirement for explicit declaration of indirect types as new types precludes such usage. This is not a serious restriction in practice and is justified by the resulting simplification with respect to the type equivalence rules (see Section 3.2).

Properties of Indirect Types

When an indirect type is declared, a number of behavioral properties are automatically provided. These include an assignment operator with "sharing" (i.e., pointer-copy) semantics, equality and inequality operators which check for pointer identity, a dereference form (the ".ALL" selector), a literal value (NIL), an allocation statement {ALLOC), and a procedure for explicitly deallocating pointed-to storage (FREE). The following statements, based on the declaration of p and q above, illustrate these properties:

An object having an indirect type is called a pointer. At any instant, a pointer either has the value NIL or references (i.e., "points to") an object called a dynamic variable. In the example above, p is a pointer, and p.ALL is its dynamic variable. Through sharing, two different pointers may reference the same dynamic variable. Moreover, even if a pointer is a constant, its dynamic variable may be the target of an assignment.

Indirect Subtypes

Like direct (i.e., non-indirect) types, an indirect type may be declared with subtype properties which can be interrogated via attribute inquiry. For example, to declare pointers to strings of different lengths, where the length of the string is set at the declaration of the pointer, the user could write the following:

Several aspects of the above example are worth emphasizing. First, note that the ".ALL" may be omitted when (and only when) a selector is being applied to a pointer's dynamic variable. That is, the assignments
      P4(1) := p3(1);
and
      p4.ALL(1) := p3.ALL(1);
are equivalent. Second, pointer assignment semantics require that source and target have the same subtype. That is why
      q3 := p3;
is legal (both q3 and p3 having subtype pstring(3)), whereas
      p4 := p3;
is illegal (p4 has subtype pstring(4)). It is important that this second assignment be illegal. If it were permitted, then either the subtype of the target would have to be changed to that of the source (in conflict with the notion of a subtype as a set of fixed properties of an object), or else a pointer which appears to be referencing a N-character string would in fact only be referencing a 3-character string (a subsequent assignment p4(4) := 'A thus modifying the contents of an unknown location).

Allocation-Time Properties

An indirect type may be declared with declaration-time properties (i.e., subtype constraints for the pointers, as described in the preceding paragraph), allocation-time properties (subtype constraints for the dynamic variables), or both. By specifying allocation-time properties for an indirect type, the user obtains the flexibility of having a pointer that can refer at different times to dynamic variables of different subtypes. For example, the following sequence could be written to declare pointers to strings so that the same pointer can refer to strings of different lengths at different times.

The syntactic position of the allocation-time formal and actual parameters -- after the keyword "PTR" in both the type declaration and the allocation statement -- was chosen to provide a clear distinction from the declaration time parameters.
Although a pointer may at different times reference dynamic variables of different subtypes, these subtypes will all be from the same type. Thus the RED indirect type facility provides a "typed pointer" mechanism consistent with the type checking in the rest of the language.
The allocation-time properties specified in the declaration of an indirect type are not interrogatable attributes of the pointer or the dynamic variable. To determine the length of the string referenced by p in the preceding example, one would select the length attribute of the string; i.e., p.LEN (or p.ALL.LEN), rather than p.n, should be specified. Allocation-time properties are not interrogatable as attributes for the same reasons that the formal parameters to ABBREViation declarations are not interrogatable: efficiency and simplicity. With regard to efficiency, if an allocation-time property could be interrogated via, say, the name of the formal parameter, then in general storage must be reserved with the dynamic variable for the value of this parameter. For example, with the declarations

the value of sp.n would have to be stored explicitly. The simplicity consideration is that attributes are treated consistently in RED as abstract properties of new types (for example, the subtype properties of indirect types). To allow attribute inquiry in other contexts would compromise this uniformity. In the preceding example, the subtype of sp is simply special_pstring, and the subtype of sp.ALL is STRING[ASCII](5). If the allocation-time property n were an interrogatable attribute, special rules would be required to answer what it is an attribute of.
It should also be noted that, from the point of view of the implementation, values for the subtype properties of an indirect type may be stored with the pointer, whereas the values for the allocation-time properties will typically be stored with the dynamic variable.

Recursive Data Structures

Recursive data structures are easily obtained in RED through indirect types. The following example creates a linked list of strings, where string length is determined at node allocation time.

3.7.2  PROGRAMMING EXAMPLE: ALPHABETIZER VIA BINARY TREE

This section illustrates the use of indirect types through a short but realistic sample program. In the discussion, we develop the program in "top-down" form to illustrate the applicability of the RED language to such methodology. The program also demonstrates the use of the language's facilities for exception handling, string processing, and input-output.
The purpose of the program is to read in and alphabetize a sequence of character strings, and then write out the strings in alphabetical order.
The input is assumed to be in the form of 80-character units (terminated by an end-of-file) on logical unit "TAPE_IN", and the output is to go to the system output device. The program must extract each input string from the corresponding 80-character unit. When a duplication is found (i.e., the same string occurring more than once), each occurrence after the first is to be output with an error message. Similarly, when a malformed string is found (e.g., not beginning with a letter) it is to be output with an error message. The exact rules for determining when a string is malformed are not specified, since the string extraction logic is not of primary interest in this example.

Selecting the Main Algorithm

The nature of the problem makes the basic algorithm fairly straightforward. The only decision to be made is whether to first read in the entire sequence of strings and then sort them, or to create an alphabetized data structure as they are being read. For efficiency reasons the second approach was chosen. An initial approximation to the algorithm is given below:

The above skeletal program handles the cases where the input strings are in proper form (i.e., no duplications, no malformed strings). We can now refine the program so that the exceptional cases are also taken into account. In particular, the extract_string procedure will check if the input string was properly formed and, if not, raise an exception named "malformed". Similarly, the insert_string procedure will check for duplication and raise an exception named "duplicate" if it occurs. We also note that, since the number of input strings is not known a priori, it is possible that more strings will be read in than the data structure will support (independent of the representation chosen for the data structure). The insert_string procedure will check for such a circumstance and, if the number of input strings exceeds the capacity of the data structure, it raises the "out_of_room" exception. In the case of either malformed or duplicate strings, the program is to continue processing input. When the "out_of_room" exception arises, however, no further input should be read; i.e., an exit from the repeat statement should be taken. The resulting refinement to our first algorithm is thus:

Selecting the Data Structure

The data structure required for this program must satisfy two criteria:

the number of string entries is not known until after all the strings have been read in, and

different strings in the structure can be of different lengths.

The first criterion suggests that an indirect type, rather than, say, an array type, be used for the structure. The second criterion suggests that, in the interest of space efficiency, string length be specified at the time that the particular entry of the data structure is allocated. (It would be possible to store the strings homogeneously with length 80, but at considerable storage overhead.)
In view of our intention to maintain the data structure in alphabetized form during the processing, it is natural to use a binary tree of strings to represent the structure. Thus the following type declaration is appropriate:

Note that binary_tree is an indirect type, and that the string length for each node is an allocation-time property. The binary_tree data structure can be declared and initialized as follows:
      VAR bt : binary_tree := NIL;
The invariant property of the data structure which is to be maintained during the program is that, for any node t in bt (including the root node bt itself), the string at t.data is always greater (alphabetically) than each string in its left subtree (t.left), and less than each string in its right subtree (t.right). The issue of representing the input string can now be addressed. First, we need a STRING[ASCII] of length 80 for holding the 80-character input:
      VAR current_string : STRING[ASCII](80);
Next, we have to take into account that the actual string extracted from current_string will be a contiguous section, say current_string(start..finish) where start and finish are determined by the extract_string procedure. Thus
      VAR start, finish : INT(1..80);

Writing the Procedures

Now that the basic data representation has been established, we can write the procedures which are invoked in the main algorithm. Let us consider these in turn.
The procedure read_in_string simply reads an 80-character record from logical device "TAPE_IN".

The variable tape_in will have to be appropriately declared as a file and opened. The procedure extract_string sets the variables start and finish to index the first and last characters of the relevant substring of current_string. The "malformed" exception may be raised. The extract_string procedure thus has the following skeletal structure:

In view of the recursive nature of the binary_tree data structure, recursive versions of insert_string and output_structure are natural. The only issue is whether data should be passed as free (i.e., imported) variables, or as explicit parameters. There are valid arguments on both sides, and the scheme we have chosen is to import the data into the two procedures, but then declare inner parameterized routines so that the binary_tree can be traversed recursively. Thus:

Note, however, that the allocation statement may raise the X_DYNAMlC_STORAGE exception. Since the insert_string procedure is supposed to raise the out_pf_room exception when the number of strings exceeds the capacity of the data structure, the allocation statement should be protected by a guard statement which handles the representation-specific X_DYNAMIC_STORAGE exception by simply raising the out_of_room exception:

The output_structure procedure is declared analogously:

The error procedure simply outputs its argument together with the erroneous input string:

Putting It Together

The following program is the result of the preceding analysis: CAPSULE alphabetizer;

3.7.3  JUSTIFICATION

In this section we explain the reasons for some of the major decisions underlying the design of the indirect type facility.

Initialization

The RED language guarantees that, in the absence of an initialization specification on the declaration of a variable from an indirect type, the initial value will be NIL. Though this seems to conflict with SM 5E ("There shall be no default initial values for variables"), strong arguments concerning program reliability support our decision. In particular, if the language did not guarantee an initial value for pointer variables, then it would be possible for an implementation to allow the user to suppress detection of the X_lNIT exception and then to treat an arbitrary bit pattern as the address of the destination for an assignment. (Although the same danger arises if the detection of the X_SUBSCRlPT, exception is suppressed, the programmer is much more likely to realize the potential danger in this case than if X_INIT detection is suppressed.)

Dereferencing

RED provides the ".ALL" selector as the means for dereferencing a pointer; the same notation is used to permit an object of a new direct type to be treated as though it were of the underlying type. The same notation was selected for the two cases, based on considerations of language uniformity and simplicity. For example, consider

Vector is a new type based on ARRAY INT OF FLOAT; i.e., v.ALL is an ARRAY INT OF FLOAT. Pvector is also a new type and is based on the same underlying type; i.e., p.ALL is an ARRAY INT OF FLOAT. Selector operations from the underlying type may be applied immediately to both v and p without an explicit ".ALL" qualification; the assignments
      v(1) := 0.0;
      p(1) := 0.0;
are equivalent to
      v.ALL(1) := 0.0;
      p.ALL(1) := 0.0;
The difference between vector and pvector is that the pvector type carries with it "sharing" assignment semantics, as well as other behavioral properties intrinsic to indirect types, whereas the vector type has "copying" semantics.

Allocation Statement

A syntactic issue with regard to dynamic allocation is whether to embody the behavior in a function, a procedure, or a statement. In RED, the statement syntax was chosen because the information which can be specified for an allocation is cumbersome to express via function or procedure invocations. In general, values for allocation-time properties must be provided, and an initial value for the new dynamic variable may be included.
Another reason for providing a distinctive syntactic form for allocation is to emphasize the analogy between an allocation and a declaration. The semantics of these two concepts are similar; in fact, one may regard an allocation as a dynamic declaration of an anonymous variable. When a non-dynamic (i.e., automatic) variable is declared, subtype constraints must be specified and an initial value may be supplied. Similarly, when a dynamic variable is allocated, its subtype constraints must be specified (these are the values of the allocation-time properties) and an initial value may be supplied. The main difference between automatic and dynamic variables is that the lifetime of an automatic variable is that of the scope in which it is declared, whereas the lifetime of a dynamic variable is independent of the scope in which it is allocated.

Explicit Deallocation

The RED language provides a. means for the user explicitly to deallocate a dynamic variable -- viz., the FREE procedure applied to the pointer referencing the dynamic variable. Use of this procedure is illustrated in the following routine, which releases the storage referenced via the binary tree created by the alphabetizer capsule in Section 3.7.2:

The invocation cleanup(bt) deallocates bt.ALL and all its descendant nodes.
The decision to provide an explicit FREE procedure was motivated by considerations of applicability and efficiency. In many embedded applications -- communications processing is an example -- dynamic storage management is necessary but the overhead of run-time "garbage collection" is not acceptable. In such environments a user-invocable FREE mechanism is essential. Such a facility is, in fact, implied by SM 8E: "There shall be a few low level operations to interrogate and control physical resources (e.g., memory or processors) that are managed (e.g., allocated or scheduled) by built-in features of the language."
The problem with an explicit FREE is that if it) is used erroneously, a "dangling pointer" could result. For example, if two pointers reference the same dynamic variable, and then FREE is applied to one of the pointers, the storage occupied by the dynamic variable may subsequently be allocated for other use even though another pointer is referencing this storage. In fact, such a situation is in conflict with SM 3-31, which requires that a dynamic variable "shall remain allocated as long as it can be referenced by the program." How is such a conflict to be resolved? Three alternatives were considered:

Raise an exception if a dynamic variable is used after a pointer which references it has been FREEd.

Raise an exception when an attempt is made to FREE a pointer whose dynamic variable is referenced from some other pointer.

Have no enforced checking (i.e., violate SM 3-3I).

Our choice of B was based on considerations of both efficiency and reliability. One of the fundamental intentions of our design is to make the language as secure as possible and to allow escapes (in the interest of efficiency) only when explicitly requested by the user. In particular, when run-time checking is deemed too costly, the user can (through a pragmat) suppress the generation of such code. Thus C was rejected, since, in addition to violating the Steelman, it would have resulted in a second (and implicit) mechanism for suppressing run-time checking.
Alternative A was rejected on efficiency grounds; with A, each use of a dynamic variable would have to be checked. In the general case, this implies a check each time a formal VAR or READONLY parameter is used, since a dynamic variable may be passed to a procedure which frees a pointer referencing this variable. As an example:

This is a prohibitive overhead to pay, especially since it is hard to avoid even when indirect types are not used. (Procedure q might be separately compiled and, instead of importing p, it could call another separately compiled routine which FREEs p.)
With B, the overhead occurs only when pointers are actually used. The most straightforward implementation is to maintain a "reference count" for each dynamic variable; the reference count for a dynamic variable is incremented by 1 when a new pointer references it and is decremented by 1 when a pointer no longer references it. Thus the reference count may change through allocation, pointer assignment, passing a dynamic variable VAR or READONLY, FREE, and scope exit.
According to the RED semantics, the invocation' FREE(p) raises the X_FREE exception if p is not the only pointer which references p.ALL. Two issues must be addressed. First, what happens in the presence of cycles in data structures? _ Second, how can the user avoid the overhead of reference counts when efficiency considerations so dictate?
Circular data structures create problems because an attempt to FREE a pointer which is part of a cycle will cause the X_FREE exception to be raised. A solution is for the user to "break" the cycle by setting one pointer to NIL, and then to FREE nodes one at a time. For example, given the declaration
      TYPE list : PTR RECORD[head: INT(0..100), tail: list];
the following routine deallocates the nodes in a circular list:

If the run-time expense of reference counts cannot be tolerated, the user can gain efficiency at the sacrifice of reliability through a pragmat suppressing the detection of the X_FREE exception. This will result in an unchecked FREE as in Alternative C above. Although such a practice is discouraged, we recognize that it is required in some embedded applications and have provided a mechanism for its realization -- exception detection suppression -- consistent with the rest of the language.
Considered from a slightly different perspective, implementations may provide three levels of support for dynamic storage management, and the programmer's decision whether or not to use an explicit FREE can be tuned to the support which is present. If an automatic garbage collection scheme is implemented, no explicit FREEs should be performed. If a reference count scheme exists, then the user should explicitly FREE cyclic data structures (and possibly non-cyclic structures as well, unless the implementation automatically deallocates a dynamic variable and all its descendants when the dynamic variable's reference count reaches 0). lf no system-provided support exists for storage reclamation, the user should suppress the X_FREE exception and explicitly FREE pointers when reclamation is needed.

3. Types (Part 3)

4. Expressions & Statements

Overview
Requirements
      Strawman
      Woodenman
      Tinman
      Ironman
      Steelman
RED Reference
RED Rationale
Types in RED
Time/Life Computer Languages
Memories
Site Index

Overview Reference ToC Rationale ToC Site Index