. navigate
 Navigate
1. Introduction left arrow
x
right arrow 3. Program Structure
Red Reference Manual
2.

LEXICAL STRUCTURE

Character Set & Translator Input     Tokens & Token Separators     Tokens
Reserved Words     Operator Symbols     Special Symbols     Identifiers     Literals
Numeric Literals     Enum Literals     String Literals     Boolean Literals     Indirect Literal
Token Separators     Comment     Pragmat



2.   LEXICAL STRUCTURE


2.1  CHARACTER SET AND TRANSLATOR INPUT

Programs are composed of any sequence of characters from the 95-character ASCII or basic 55-character set. Any program can be written using only the 55-character set given below. Rules for converting from the 95-character ASCII set to the 55-character set are given in the description for specific tokens and token separators.



RULES

No distinction is made between upper and lower case letters except within a string literal.



Basic 55-Character Set

basic 55 character set



95-Character ASCII Set

    All characters in the basic 55-character set plus

95 character ASCII set


NOTES

    This document uses the 95-character ASCII set to describe the language.

RED RATIONALE

Steelman requires only the 55-character ASCII set. RED extends this to the 95-character set, and specifies the conversion from the 95-character set to the 55-character set.




2.2  TOKENS AND TOKEN SEPARATORS

token diagram
C - identifier,   D - literal,   K - comment,   L - pragmat,  

A token is the basic component used to build all constructs of the language. It is an indivisible lexical unit that is interpreted as a complete 'word' by the translator. A token separator is required in some cases between tokens and can be used otherwise to improve readability. A token or token separator is composed of a contiguous sequence of characters.



RULES

Input text is organized into lines, each of which is composed of tokens and token separators. No token or token separator can extend over more than one line of text. An end of line, eol, is a token separator.

A token separator must appear between any two adjacent tokens, unless one of the tokens is a special symbol or an operator symbol
(such as < >) which does not have the form of an identifier (e.g., AND)). One or more token separators may appear between any two tokens.



EXAMPLES

example 1




2.3  TOKENS

2.3.1  RESERVED WORDS

Reserved words have a fixed meaning within the syntax of the language.



RULES

The following are reserved words:

reserved words



NOTES

     Reserved words may not be redefined.

    No distinction is made in the user of upper or lower case characters in a reserved word, thus, end, End, END, and enD are all equivalent.




2.3.2  OPERATOR SYMBOLS

Operator symbols are names of functions which are invoked with a special prefix or infix syntax (see Section 5.2).



RULES

The operator symbols are:

operator symbols



NOTES

    The definition of operator symbol names is discussed in Section 13.2.

RED RATIONALE

MODULO AND INTEGER DIVISION
DIV was chosen over / for lexical consistency with MOD, and to make easy the user overloading of /.

CATENATION
A one character symbol (&) was preferred since it will be heavily used.

LOGICAL AND SET OPERATORS
Since there is no conventional symbol for symmetric difference, consistency and readability indicated the use of menumonic operators (AND, OR and NOT) rather than operator symbols (*, +, and -).

RELATIONAL OPERATORS
The relational operator symbols =, /, <, <=, >, and >= are required by SM 3-1C.

Since = is used for equality, RED adopts the standard practice of using := for assignment, avoiding such error-prone constructs as X=Y=Z.






2.3.3  SPECIAL SYMBOLS

Special symbols are tokens which have special meaning in the syntax.



RULES

The table below lists the special symbols and their uses.

special symbols

All of the special symbols, except [, ], and #, consist of characters exclusively from the 55-character set. The following 55-character alternates are provided.

55-character alternates

RED RATIONALE

SPECIAL SYMBOLS

ASSIGNMENT
Since = is used for equality, RED adopts the standard practice of using := for assignment, avoiding such error-prone constructs as X=Y=Z.

SQUARE BRACKETS
Distinguishing "type" vs. "subtype" properties based on whether square or round brackets appear is an easily remembered convention, with "square" suggesting "hard" properties which are the same for each object of the type, and "round" suggesting "soft" properties which may differ from one object to the next.

POUND SIGN
The # sign denotes explicit literal resolution, and is less ambiguous than using a functional notation.



2.3.4  IDENTIFIERS

identifier diagram

An identifier is a name which is associated with a language construct by a definition (see Section 3.5).



RULES

Reserved words and operator symbols (e.g., AND) may not be used as identifiers.

All characters in an identifier, including underscore, are significant.



NOTES

     No distinction is mode in the use of upper or lower case charoctors in an identifier (e.g.,'Abc, abc, ond ABC are all equivalent).


EXAMPLES

identifier examples

RED RATIONALE

The underscore serves as the break character in identifiers. The space character was considered for this role but was rejected on grounds of simplicity and readability.

The RED language, in compliance with SM 13F, does not impose an upper bound on the length of identifiers (other than requiring them to fit on one line (SM 2D)).





2.3.5  LITERALS

literal diagram
E - int literal,   F - float literal,   G - emum literal,   H - string literal,   I - boolean literal,   J - indirect literal,  

Literals are used to specify values for some built-in types and for user-defined indirect types (see Section 4.4.3).



RULES

The values of all literals are known at translation time.

The rules for resolution of the type and subtype of a literal are described in Section 5.7.



NOTES

     The following sections describe specific literals. User-defined literals, which are language constructs rather then tokens, are described in Sections 5.7 and 13.5.


2.3.6  NUMERIC LITERALS

numeric literal diagram

Numeric literals specify integer values for the INT type and floating point values for the FLOAT type.



RULES

A floating point literal in E form is interpreted as the decimal number times ten to the integer value following E. The default precision of a floating point number is the number of digits preceding E, minus any leading zeros to the left of the decimal point.



NOTES

     Numeric literals are always positive values. A negative literal is obtained by preceding the literal with the prefix minus operator. This operation is performed at translation time. `

    Because a float literal is a token, blanks may not appear within the float literal.

    The precision of a float literal is first determined by context and, if that is not sufficient, the default precision is used (see Section 5.7).

    No distinction is made in the use of upper or lower case e in a float literal.



EXAMPLES

numeric literal examples

RED RATIONALE

INTEGER LITERAL
The problems with the space are first, that it would be a different break character than that used in identifiers, and second, that it would result in the treatment of errors as legal tokens (e.g., 1 2 where the user does not intend 12).

FLOATING POINT LITERAL
A break character for floating point literals was rejected, for consistency with integer literals. RED requires at least one digit before and after the decimal point, in the interest of readability.




2.3.7  ENUM LITERALS

enum literal diagram
C - identifer

An enum literal (enumeration literal) specifies a named value of an enumeration type.



NOTES

     The same enumeration literal may appear in several enumeration types. For example, the enumeration literal 'ORANGE may appear simultaneously in the ENUM types FRUIT and COLOR.

    Because an enum literal ia a token, no blank may appear between the apostrophe and the identifier.

    Because the enum literal is distinguished by an apostrophe, the identifier following the apostrophe may be defined in the same scope (i.e., RED may be the name of a variable in the same scope in which RED is an enum literal).

    The language views a character set as an enumeration type, where each character corresponds to an enum literal. `

    No distinction is made in use of upper or lower case (e.g., 'RED, 'Red, 'red, and 'reD are all equivalent).



EXAMPLES

enumeration literal examples

RED RATIONALE

Distinguishing enumeration from identifiers by requiring a leading apostrophe for enumeration literals is useful for readability, ease of writing, and language simplicity.

The programmer can compose enumeration types without worrying about name conflicts with identifiers.

The choice of a leading apostrophe as the means for distinguishing enumeration literals was motivated by the desire to use a character in the Basic 55 subset which would not visually overwhelm the remaining characters in the literal.




2.3.8  STRING LITERALS

string literal diagram

A string literal specifies a value of a STRING type. A string is a sequence of characters. Each character is an enumeration literal. If a string includes only those characters in the 95-character set, the special literal form, string literal, can be used. A string literal is considered to be a shorthand form for the concatenation of characters defined by enum literals>; e.g., "ABC" is considered a shorthand form of

string example



RULES

'' (quote quote) is the 55-character set form of " (double quote).



NOTES

     A string literal may not extend over one line. If an end of line ia found before the terminating quote, the string literal token is terminated and an error message is issued. The & infix operator can be used to obtain a long string by concatenation.

    Upper or lower case characters are distinguished in strings; e.g., "ABC" ia not equivalent to "abc". All characters in the 95-character ASCII set have a corresponding enumeration literal (defined in Appendix C.15). For example, since 'lbracket is the enumeration literal for [ and 'number the enumeration literal for #, the string

string example

    is equivalent to

string example



EXAMPLES

string example

RED RATIONALE

To obtain a quotation mark, apostrophe, or a control character outside 95 ASCII as part of a string, the user can catenate onto the string the enumeration literal denoting the desired character. No run-time overhead is implied by the use of this technique to build strings out of string and enumeration literals.

When converting from the 95 ASCII set to the Basic 55 set, the surrounding double quotes are converted to two consecutive apostrophes. For this reason, apostrophes are excluded from string literals.

For each non-Basic-55 character within the string literal, a string escape (i.e., &) is employed, and the character is replaced by its corresponding enumeration literal.




2.3.9  BOOLEAN LITERALS

boolean literal diagram

A boolean literal specifies a value of type BOOL.


2.3.10  INDIRECT LITERALS

indirect literal diagram

The indirect literal NIL specifies that value of an indirect type (see Section 4.4.3) which points to no dynamic variable.


2.4  TOKEN SEPARATORS


2.4.1  COMMENT

comment diagram

A comment provides program documentation.


RULES

A comment is terminated by the end of the line on which it appears. Comments are ignored by the translator.

RED RATIONALE

Each comment begins with a percent sign and terminates at the first end-of-line. Embedded comments are thus excluded, but this was the explicit intent of Steelman,




2.4.2  PRAGMAT

pragmat diagram
82 - pragmats

Pragmats supply information to the translator which does not affect language semantics. Pragmats are described in Appendix B.






Character Set & Translator Input    Tokens & Token Separators    Tokens
Reserved Words    Operator Symbols    Special Symbols    Identifiers    Literals
Numeric Literals    Enum Literals    String Literals    Boolean Literals    Indirect Literal
Token Separators    Comment    Pragmat

1. Introduction left arrow
x
right arrow 3. Program Structure


Overview

Requirements
     Strawman
     Woodenman
     Tinman
     Ironman
     Steelman

RED Reference
RED Rationale

Types in RED
Time/Life Computer Languages
Memories

Site Index

Overview             Reference ToC             Rationale ToC             Site Index



Home   Favorites   Map

IME logo Copyright © 2009, Mary S. Van Deusen