. navigate
 Navigate

Sigplan Notices cover

Types in Red
M. Van Deusen

ACM SIGPLAN Notices
Vol 16, Issue 12, (Dec 1981)
ISSN:0362-1340
pp.27-38









TYPES IN RED

1.  INTRODUCTION

The RED language [Nestor and Van Deusen 79, Brosgol 79] was designed as a candidate for the Department of Defense common hlgh-order Language, Ada, in Phase 2 of the language design competition. RED was deslgned to meet the requirements of Steelman [Fisher 78]. Although RED was not chosen by the Department of Defense, it was generally acknowledged to be an innovative and ambitious design. In spirit, it gained much from the TARTAN language [Shaw, Hilfinger, Wulf 78], a language designed to the Ironman specification [Fisher TT] following the review of the Red {REDL) [Brosgol 78], Yellow [Spitzen 78], Green [Ichbiah 78], and Blue [Goodenough 78] language designs. It should be noted that REDL, the Red language of Phase I, and RED, the Red Language of Phase 2, are two entirely dlfferent languages. No Implementation of RED, beyond the original translator delivered with the language design, has been planned [Davis and Levine 79]. This paper describes the type system of RED in the hopes that it will be of interest to the designers of future languages. The present section summarizes the design goals for the RED type system. The following sections discuss each of these design goals in greater depth.

The type system was designed to meet several goals:

  • clean distinction between compile-time and runtime information. A language with extensive compile-time type-checking increases the likelihood that errors will be caught during early development. To make the most reliable use of the type-checking properties of a language. it is necessary for the programmer to understand which properties are checked at compile-time and which are checked at runtime. The RED Language helps programmers by making a clear syntactic distinction between compile-time and runtime properties.

  • uniform syntax and semantics for built-in and user-defined types. No language can possibly build in all the types that will be needed for all applications. A language must include facilities for the definition of new, user-defined types. Because, in RED, built-in and user-defined types follow the same language rules, a project manager can predefine a set of both kinds of types and introduce application programmers to them as though they were built-in.

  • varying degrees of visibility into types. The robustness of a large system depends upon its ability to absorb the inevitable changes easily. The RED language was designed to simplify change by allowing the implementation considerations of data abstractions to be completely hidden from the user of the abstraction. The common example of this is the ability to implement a stack either as an array or a linked list, while not changing any uses of the stack. The language allows the user to control the amount of visibility into a type. The visibility can range from the completely hidden data abstraction to the simpler abbreviated type.

  • flexibility and explicit user control of the time at which type information is bound. If type information is bound early, the reliability of the system is increased. If type information is bound late, the flexibility of the system is increased. RED provides facilities to allow the tradeoffs for each type to be made by the user, rather than by the system. It provides a fixed level of type checking, and then allows the user to redeflne that level, usually to gain reliability. There are three possible times when type information may be needed: declaration evaluation during compilation, declaration evaluation at runtime, and storage allocation at runtime. The last two occur at the same time for stack-allocated objects, and at different times for heap-allocated oojects. RED takes advantage of the separation for heap-allocated objects and allows the delay of binding of type information.




2.  TYPES AND SUBTYPES

The RED Language distinguishes two concepts, types and subtypes, for specifying the properties of data items. Although these concepts are uniform over both built-in and user-defined types, only built-in type examples will be illustrated in this section. The properties control both the values that a data item may have, and the operations that may be applied to it.

The type of a data item is the collection of all properties which must be known during compilation. The subtype of a data item is the collection of all properties which must be known when a data item is created, and which remain constant throughout the lifetime of the object. To emphasize the distinction between compile-time and runtime, any information in addition to the type name which is needed at compile-time is enclosed in square brackets. A type, then, is a type name and an optional square-bracketed list. This square-bracketed list may, itself, contain types. The following examples show a fixed length STRING type whose characters are of type ASCII, a RECORD type whose components are of INT, BOOL, and STRING types, and an ENUM type, whose values are enumeration literals. All this information is known at compile-time.

STRING [ASCII]
RECORD [a : INT,
        b : BOOL,
        c : STRING [ASCII]]
ENUM   ['apple, 'orange, 'tangerine, 'lemon]

Each type is distinct in that assignment is legal only between data items of the same type. Type checking occurs at compile-time. For example, assignment between two STRING[ASCII] data items would be legal, but assignment between a STRING[ASCII] and a STRING[EBCDIC] would not be legal (such an assignment could be effected, however, by defining an explicit conversion function).

Additional information which can be deferred until data is actually created at runtime appears in round brackets following the type. A subtype, then, is a type name, an optional square-bracketed list, and an optional round-bracketed list. When the square-bracketed list of the type contains types, the square-bracketed list of the subtype must contain subtypes. For example,

STRING [ASCII] (lengthexpr)
RECORD [a : INT (1..10),
        b : BOOL,
        c : STRING [ASCII] (lengthexpr) ]
ENUM   ['apple, 'orange, 'tangerine, 'lemon]
       ('apple .. 'tangerine)

Assignment between data items of different subtypes of the same type is legal if the value of the source data item (right hand side of assignment) is within the value set of the subtype of the target data item (left hand side of assignment). All subtype checking which cannot be done at compile-time occurs at runtime. For example, if a variable is declared to be a STRING[ASCII] (10), then a data item having size 10 may be legally assigned, but a data item having size 12 would cause an exception to be raised at runtime. The type corresponding to a subtype can be determined by eliminating all round-bracketed information. Note that the name ASCII has neither a square-bracketed list nor a round-bracketed list, and can be used as either a type or a subtype.

When a data item is created, a subtype must be known for that data item. For example, it is not possible to obtain storage for an array until its size is known. A variable or constant declaration must explicitly specify the subtype of the variable or constant. In the case of formal parameters, the creation of any local storage does not take place until procedure or function invocation, so the needed subtype information does not have to be available until invocation. This means that there are two possible ways to associate subtype information with a formal parameter - explicitly from the formal parameter definition, or implicitly from the associated actual parameter. In the latter case, a type alone is explicitly specified with the formal parameter. If a subtype is explicitly specified, that subtype information specifies a constraint against which all actual parameters must be checked. It is also possible, in many cases, for the compiler to use the subtype information to generate better code. If a type is explicitly specified, the subtype of the actual parameter (which must have the same type as that of the associated formal parameter) is used for the creation of the formal parameter. Specifying a type means that a routine can be defined for a wider variety of possible values, for example, a sort routine which sorts arrays of any size.

There is one exception to this syntactic form for types and subtypes, the built-in ARRAY type. Rather than using the form

ARRAY [indices, element-type]

RED chooses the more common "sugared" notation

ARRAY indices OF element-type




3.  USER-DEFINED TYPES

The previous section illustrated the syntactic separation of compile-time and runtime information using built-in types. This section will show how the same separation is achieved for user-defined types using generic parameters for compile-time information and parameters for runtime information. It will also show how both built-in and user-defined types can be used to compose new types.




3.1  Type Declaration and Runtime Information

User-defined types maintain the same semantic and syntactic separation of compile-time and runtime information. The runtime information is provided with the same mechanism used for procedures -- parameter lists. A type declaration contains a type name, a list of compile-time properties (described later in this section), a formal parameter list enclosed in round brackets, and an underlying subtype to be used to represent objects of the new type.

TYPE type-name [ct-properties] (formal-parms) : underlying-subtype;

A type declaration specifies a set of types and a set of subtypes. (Because a type declaration specifies subtypes, as well as types, the type declaration is defined in terms of an underlying subtype.) Each type in the set is distinguished by a unique value for its compile-time property list. (The association of operations with types is discussed in the next section.) For each type in the set, there is a set of subtypes where each subtype is distinguished by a unique value of the parameter list. The built-in type STRING, for example, specifies a set of string types, each type having a different character type. Given a particular type, such as STRING[ASCII], there is a set of subtypes consisting of the ASCII strings of differing lengths.

Each type specified by a type declaration is unique, that is, a variable of one type cannot be assigned to a variable of another type. Type operations are described in Section 4.

A parameterized type is one which includes a formal parameter list in the definition of the type, and which must include an actual parameter corresponding to each formal parameter whenever a subtype of the type is used. Thus, the runtime information of both built-in and user-defined subtypes appears in round brackets.

If no square-bracketed list is part of the definition of the type, then only a single type is defined with a set of subtypes. For example,

TYPE ascii_string (j : INT) : STRING [ASCII] (j);
VAR x : ascii_string (20);

specifies a type, ascii_string, and the set of all subtypes of ascii_string. If neither a square-bracketed list, nor a round-bracketed list, is part of the definition of the type, then only a single type and subtype are defined. For example,

TYPE ascii_string_10 : STRING[ASCII] (10);

TYPE ebcdic_vstring_16 :
     RECORD[ cur_len : INT(0..16),
             st      : STRING[EBCDIC] (16) ];

specifies an ASCII fixed length string which is 10 characters long, and an EBCDIC varying length string, which can be at most 16 characters long. Note that the underlying subtype of ebcdic_vstring_16 includes a fixed length ebcdic string of length 16, and a current length field which can be used with a set of user-defined operations to implement a varying length string of maximum length. Both declarations declare single types having single subtypes.




3.2  Compile-time Information

Compile-time information for user-defined types is provided through the generic mechanism. In its most general form, the generic mechanism provides compile-time replication of declarations, particularized by generic parameters, which can be replaced by subtypes, types, procedures or functions, or values. A generic declaration provides a "pattern declaration" and a set of formal generic parameters which are together replaced at compile-time by a set of individualized declarations. The formal generic parameters, like the compile-time information of built-in types, appear in square brackets. A generic declaration of a type specifies the set of all types which have the same type name, but which differ in the values of their square-bracketed list. The following example is a more generalized version of the previously defined type, ebcdic_vstring_16. In this case, the character set has been generalized by a generic parameter and the maximum length has been generalized by a formal parameter. The result, vstring [s] (max_len), specifies the set of all varying length strings of s having maximum length max_len.

GENERIC s : SUBTYPE
  TYPE vstring [s] (max_len : INT) :
    RECORD[ cur_len : INT(0..max_len),
                 st : STRING [s] (max_len) ];

At compile-time, the compiler will remove "GENERIC s : SUBTYPE* from the front of this declaration and create copies of the vstring declaration which will differ only in the value of the generic parameter, s. Which copies are actually made depends upon the uses made of vstring in the program. if two uses of vstring appear in a program, and the users have different values of s, then two copies of declarations for vstring will be created. For example, the following uses of vstring

VAR x : vstring [ASCII] (15);
VAR y : vstring [EBCDIC] (10);

will result in the following copies of the vstring type declaration being made:

TYPE vstring [ASCII] (max_len : INT) :
  RECORD [ cur_len : INT (0..max_len),
                st : STRING  [ASCII] (max_len) ];
TYPE vstring [EBCDIC] (max_len : INT) :
  RECORD [ cur_len : INT(0..max_len),
                st : STRING [EBCDIC] (max_len) ];               

In the two examples, the value of the generic parameter replaces all instances of s in the pattern declaration. Note that the runtime information, the lengths 15 and 10, do not have any impact upon the replication which takes place at compile-time. Each copy of the declaration still contains a formal parameter, max_len, to be replaced at runtime.

The language provides overloading rules so that uses of two declarations of the same name, in this case vstring, can be differentiated. For types, two declarations can be differentiated if their names or compile-time information (the square-bracketed lists) differ. Since generic replication of a type produces copies having the same name, the generic parameter must appear in the square brackets so that each copy can be differentiated. In the above example, vstring[ASCII] can be differentiated from vstring[EBCDIC].

Multiple uses of a generic type which specify the same value for the generic parameter cause only a single declaration to be produced. For example,

TYPE ascii_string (j : INT) : vstring [ASCII] (j);
VAR x : ascii_vstring (15);
VAR y : vstring[ASCII] (15);

will result in only one copy of the vstring type declaration being made. Note that the types of ascii_vstring and vstring are different, so that it would not be legal to assign x to y, or vice versa (see Section 4).

It can be seen that built-in composite types, such as arrays and records, can be considered as generic types. No mechanism is needed to cause the creation of specific declarations of arrays and records from the generic patterns. Each copy, or instance, is created based on the uses made of arrays and records. This automatic creation of copies is known as implicit instantiation. A desire for symmetry with built-in types required implicit instantiation for user-defined types as well.



3.3  Composition

User-defined types, like built-in types, may be composed. A format parameter of one type may appear as an actual parameter of another type. The following example allows the specification of square matrices of any size, containing integer elements in the range 0 to 100.

TYPE vector (x:INT) : ARRAY INT(1..x) of INT(0..100);
TYPE matrix (y:INT) : ARRAY INT(1..y) OF vector(1..y);

The example is modified to allow the specification of square matrices of any size, containing elements of any subtype.

GENERIC s : SUBTYPE
  TYPE vector [s]  (x:INT) : ARRAY INT(1..x) OF s;
GENERIC ss : SUBTYPE
  TYPE vector [ss] (y:INT) : ARRAY INT(0..y) OF vector[ss] (1..y);
VAR square : matrix [INT(0..100] (3);



4.  TYPE OPERATIONS AND VISIBILITY

A type is associated with a set of operations as well as a set of values. An operation is considered to be associated with a type if the operation (a function, procedure, assignment, infix or prefix operator) takes a parameter of that type. A single operation, such as mixed mode arithmetic, may be associated with more than one type. Each built-in type has a set of operatlons which are associated with the type. One can think of built-in types as simply having been declared in a block surrounding the user program. In addition, by defining an operation which takes a parameter of the type, a user can define new associated operations. To be at all useful, a user-defined type must also have associated operations. The typing facility of RED provides ways of automatically associating operations with new types. In addition. a user can also explicitly define additional operations for a new type. The RED Language has two definitional mechanisms for types -- an abbreviation facility and a type facility. These mechanisms differ in the way operations are associated with the type being defined.



4.1  Operations of Abbreviated Types

The abbreviation provides a notation for the writing of long, complex data descriptions. It can be used to abbreviate either types or subtypes.

ABBREV abbreviation-name [ct-info] (formal-type-parms)
         : underlying type;
ABBREV abbreviation-name [ct-info] (formal-type-parms)
         : underlying subtype;
         

Since it is an abbreviation facility, any operations which are explicitly specified for the abbreviation are also associated with the underlying type or subtype. Note that abbreviations, like type declarations, can be parameterized and can be generic.

GENERIC s : SUBTYPE
  TYPE vstring [s] (max_len : INT) :
    RECORD[ cur_len : INT(0..max_len),
                 st : STRING [s] (max_len)];
ABBREV avstr : vstring [ASCII];      % type abbreviation                 
ABBREV avstr : vstring [EBCDIC];     % type abbreviation 
VAR x : avstr (15);
VAR y : evstr (10);

ABBREV flags (n : INT) :
  ARRAY INT (1..n) OF BOOL;          % subtype abbreviation
VAR z : flags (12);                  
         



4.2  Operations of Declared Types

The type declaration automatically associates four operations with the new type - access to the underlying type, access to any components of an aggregate underlying subtype, equality, and assignment for the new type defined in terms of assignment of the underlying subtype. Any other operations which are needed for the new type can be written as procedures and functions in terms of the automatically operations. For example,

CONST size := 100;
ABBREV elemtype : FLOAT (10, -1000.0 .. 1000.0);
TYPE stack : RECORD[ top  : INT (0..size),
                     elem : ARRAY INT (1..size) OF elemtype];
                     
PROC init (VAR s : stack);
  s.top := 0;
END PROC init;

PROC push (VAR s : stack, e : elemtype);
  s.top  := s.top + 1;
  s.elem (s.top) := e;
END PROC push;

PROC pop (VAR s : stack, OUT e : elemtype);
  e := s.elem (s.top);
  s.top := s.top -1;
END PROC pop;

specifies a stack with the four automatically-defined operations, and three additional user-defined ones: init, push, and pop. Notice that each user-defined operation is written in terms of the automatically defined ones. For example, init, push and pop are all written in terms of the automatically defined operations of access to the record components of stack - s.top and s.elem.



4.3  Type Equality

The rules for type equality, or type equivalence, are fundamentally structural comparison rules. Abbreviations are replaced by their underlying types, but user-defined types (defined by the type declaration described in the previous section) are not expanded. The resulting expansion is compared structurally and must be identical for the types to be the same. This rule does not lead to infinite expansion because of the requirement that pointer types (discussed below) be named.



4.4  Restricting Type Visibility

RED has been designed to simplify a programming style relying strongly on the use of data abstractions. As such, the ability to hide the representational properties of data from a user, and show only the abstract properties, is a crucial feature. To completely hide the representation of a stack, for example, it must be possible to hide the automatically-defined operations, and make visible only that set of automatically-defined and user-defined operations which represent the abstract operations. This can be done in RED by enclosing the stack definition within a capsule, and exporting (making available) only the abstract operations.

CAPSULE stack_cap EXPORTS stack, init, :=, push, pop;
   ...
END CAPSULE stack_cap;
         

RED allows the assignment operation to be defined by the user, overriding the automatically associated one. This means that it is possible to hide the representation of a new type completely, even when the representation has been changed from a non-pointer to a pointer representation. RED allows the implementation of two or more abstract data types within a single capsule. An example would be a capsule which implements vectors and matrices, and the mixed type operations on those types.



5.  INFORMATION BINDING

The type system allows flexibility in the movement of information from runtime to compile-time and, to a more limited extent, from compile-time to runtime. In the examples below, the runtime binding of length, (i), and range, (i..j), is restricted to compile-time, [i] and [i..j]. Moving information to compile-time has the advantage of reducing the amount of runtime checking required.

GENERIC   i : INT
  TYPE mstring [i] : STRING[ASCII] (i);
  
GENERIC i,j : INT
  TYPE compiletime_int [i,j] : INT (i..j);
         

It is also possible, in some cases, to delay binding information which would be known at compile-time until runtime. The following example shows a number which is either an integer or a floating point number.

TYPE number : UNION [a : INT   (0..1000),
                     b : FLOAT (4, 0..1000) ];
VAR x : number;         

The type system permits one other important flexibility - the deferral of information until allocation time for pointed-to objects. Pointers in the RED language are named types which may only be defined within a type declaration. It is the fact that pointers are named which simplifies the type equality rules for the language, since a named type is never expanded for type comparison. All pointers are typed, that is, a pointer variable must point at objects of a single type. The pointed-to object is created, and optionally initialized, by an allocation statement. For example,

TYPE list : PTR RECORD[  val : INT(0..100),
                        next : list];
VAR  lst  : list;

% now make the list circular
lst.next.next.next := lst;                                 

The compile-time information and runtime information which follow the type name are associated with the pointed-to object. For example,

TYPE prunstr (j : INT) : PTR STRING[ASCII] (j);

GENERIC i : INT
  TYPE pcompilestr [i] : PTR STRING[ASCII] (i);         

specifies a pointer type, prunstr, which points only to ASCII strings of a length fixed at runtime, and a pointer type, pcompilestr[i], which points only to ASCII strings of a length fixed at compile-time.

In addition to these pieces of information, a pointer type declaration can specify information to be supplied when the pointed to object is allocated by the ALLOC stateent. For example,

TYPE pallocationstr : PTR (len : INT) STRING[ASCII] (len);
VAR v1, v2 : pallocationstr;

ALLOC v1 PTR (3) := "abc";
ALLOC v2 PTR (4) := "abcd";         

specifies a pointer type, pallocationstr, which points to ASCII strings of any length.



6.  SUMMARY OF STRING TYPES

The language provides the user with rich facilities for defining types having exactly the abstract properties needed. This has been demonstrated in previous examples of string types. The following types are given to exemplify the wide range of user-defined types which can be created in RED. Required brackets are empty for notational convenience.

  • the set of types of all fixed length strings of any character type
        STRING []()

  • the type of all fixed length strings of a specific character type
        ascii_string ()

  • the type of fixed length strings of a specific character type and length
        ascii_string_10

  • the set of types of all varying length strings of any character types
        vstring []()

  • the type of all varying length strings of a specific character type
        ascii_vstring ()

  • the type of varying length strings of a specific character type and maximum length
        ebcdic_vstring_16

  • the set of types of strings with (one generic parameter) a specific character type, and the length known at compile-time, or (two generic parameters) any character type and the length known at compile-time. For example,
        mstring []

  • pointer types pointing to any of the previously described types with lengths known at compile-time, runtime, or object allocation time. For example,
        pcompilestr []
        prunstr ()
        pallocationstr



ACKNOWLEDGMENTS

The following individuals participated in the design and review of the RED language:

Paul Abrahams (NYU)
Peter Belmont (I2)
Toby Boyd (I2)
Benjamin Brosgol (project manager) (I2)
Robert Cassels (I2)
Mark Davis (I2)
Robert Dewar (NYU)
Bruce Knobe (I2)
David Levine (I2)
Barbara Liskov (MIT)
Fred Martin (I2)
Eliot Moss (I2)
John Nestor (chief designer) (I2)
Craig Schaffert
Michael Tighe (I2)
Mary Van Deusen (I2)
and William Wulf (CMU)



REFERENCE LIST

B. Brosgol. Informal Language Specification, RED, Intermetrics, Inc. February 1978.

B. Brosgol, RED Language Design Rationale, Intermetrics, Inc., IR-382, March 1979.

M. Davis and D. Levine. RED Translator, Intermetrics, Inc., March 1979.

D. Fisher, Ironman: Requirements for High Order Computer Programming Languages, Department of Defense, January 1977.

D. Fisher, Steelman: Requirements for High Order Computer Programming Languages, Department of Defense, June 1978.

J. Goodenough, Language Specification, BLUE, SofTech, Inc., February 1978.

J. Ichbiah, Language Specification, GREEN, Honeywell-Bull, February 1978.

J. Nestor and M. Van Deusen, RED Language Reference Manual, IR-310--2, March 1979.

M. Shaw, P. Hilfinger, and W. Wulf, TARTAN, Language Design for the Ironman Requirement: Reference Manual, Carnegie-Mellon University, CMU-CS-78--133, June 1978.

J. Spitzen, Language Specification, YELLOW, SRI, February 1978.



Requirements
      Strawman
      Woodenman
      Tinman
      Ironman
      Steelman

RED Reference
RED Rationale

Types in RED
Time/Life Computer Languages
Memories

Site Index

Overview             Reference ToC             Rationale ToC             Site Index


Home   Favorites   Map


IME logo Copyright © 2009, Mary S. Van Deusen