


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A reference for expressions and operators in the C-- programming language. It covers various types of expressions, such as integer and floating point expressions, and operators like +, -, *, /, %, neg, abs, and exponent. The document also discusses flags like t, u, h, z, n, and p, which affect the behavior of certain operators. It is intended for use by students and developers who want to understand the intricacies of the C-- language.
What you will learn
Typology: Exams
1 / 34
This page cannot be seen from the preview
Don't miss anything!
- April 23, Simon Peyton Jones Thomas Nordin Dino Oliva Pablo Nogueira Iglesias
P r og r am pr og r am! pal [pr og r am]
P al pal! data j [conv ] N ame(ar g 1 , : : : ar gn ) [data] bl ock n 0 j import N ame 1 , : : : N amen ; n 1 j export N ame 1 , : : : N amen ; n 1
D ata data! data {datum 1 : : : datumn } n 1
D atum datum! N ame : j ty pe[[sconst]][{expr 1 , : : : exprn }] ; n 1 j ty pe[]{expr 1 , : : : exprn } ; n 1 j word1[]AsciiS tr ing ; Abbr ev iation j word2[]U nicodeS tr ing ; Abbr ev iation j alignn ; Al ig nment dir ectiv e
S impl e C onstants sconst! N um I nteg er constant j ’char ’ Ascii char: constant j unicode(’char^ ’)^ U^ nicode^ char:^ constant
C onstants const! sconst S impl e constants j F N um F l oat number constant j N ame S y mbol ic constant j AsciiS tr ing S tr ing constant j U nicodeS tr ing U nicode str ing constant
S tr ing s AsciiS tr ing! "char 1 : : : charn " n 0 U nicodeS tr ing! unicode("char 1 : : : charn ") n 0
C onv ention conv! foreign conv k ind C onv ention decl ar ation
C onv entions conv k ind! C C al l ing C onv entions j Pascal j : : :
F or mal Ar g uments ar g! ty pe N ame
T y pe ty pe! wordn j floatm n 2 f 1 ; 2 ; 4 ; 8 g; m 2 f 4 ; 8 g
B l ock bl ock! {stm 1 : : : stmn } n 0
Figure 1: C-- syntax
S tatements stm! skip; N ul l statement j ty pe N ame 1 , : : : N amen ; V ar: decl :; n 1 j N ame = expr ; Assig nment j ty pe[{alignn}][expr ] = expr ; M emor y w r ite; al ig n: n j if expr r el expr bl ock [else bl ock ] j switch[[sconst 1 ..sconstn ]] expr {sw t 1 : : : sw tn } n 1 j bl ock S coping j N ame: Local contr ol l abel j goto N ame ; Goto l ocal l abel j jump expr (expr 1 , : : : exprn ) ; n 0 ; J ump to expr j [conv ] [N ame 1 , : : : N amem = ] expr (expr 1 , : : : exprn ) ; n; m 0 j [conv ] return(expr 1 , : : : exprn ); n 0
E xpr essions expr! const j N ame V ar iabl e or l abel j ty pe [{alignn}][expr ] M emor y r ead; al ig n: n j (expr ) j expr op expr j pr im(expr 1 , : : : ,exprn ) n 1
O per ator s op! +f l ag j - f l ag j *f l ag j /f l ag j %f l ag Ar ithmetic j & j | j ˆ j << j >>f l ag j ˜ B itw ise
P r imitiv es pr im! negf l ag j absf l ag j signf l ag j exponentf l ag j fractionf l ag j scalef l ag j succf l ag j predf l ag j ulpf l ag j truncf l ag j roundf l ag j intpartf l ag j fractpartf l ag j ty pe f l ag T y pe C asts
F l ag s f l ag! N o F l ag j o U nO r der ed j u j t j ut U nsig ned and T r apping j f j fz j fn j fp F l oating and R ounding j ft j ftz j ftn j ftp F l oating and T r apping
R el ations r el! ==f l ag j !=f l ag j >f l ag j
S w itch br anch sw t! sconst 1 , : : : sconstn : bl ock n 1 j default : bl ock
Figure 2: Statements in C--
Constants can be (signed) integers, (signed) floating point numbers, characters, strings and names. C-- follows C’s syntax for denoting integer, floating point, character, and string constants.
2.6.1 Integer and floating point numbers
Integer constants have of type word. Floating point constants have type float. Their size is architecture-dependent.
2.6.2 Characters and strings
Character and string constants are treated as integers and as pointer labels respectively. Character constants are ASCII characters surrounded by single quotes. String constants are a sequence of ASCII characters surrounded by double quotes.
A character constant is treated as an integer whose value is the character’s 8-bit ASCII code. There- fore, character constants have type word1. C-- uses C’s escape sequences to denote special charac- ters, such as \n for the new line and \t for the tabulator.
For example, character constant ’H’ is a word1 with value 72.
String constants are like labels that point to the first word1 of an array of word1s stored in static memory. Therefore, they have type wordn where n is the particular architecture’s natural pointer size. String constants are not automatically null-terminated.
For example, the string "Hello World" is viewed as a label that points to the first byte of the array of bytes with values 72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, stored in static memory.
It is possible to have UTF-8 integers for single characters and for string characters.^1
The syntax to specify an UTF-8 constant is:
unicode(constant)
where constant is a character constant or a string constant. The type of UTF-8 characters is word2, for it requires two bytes—two ASCII characters—to code a Unicode character. UTF-8 strings are pointers to the first word2 of an array of word2s stored in static memory, and therefore, they have type wordn, where n is the architecture’s natural pointer size.
Memory is an array of bytes from which different sized types (Section 3.4) can be read and written. The size of the addressable memory is implementation dependent (Section 3.6). All addresses and
(^1) UTF-8 is an encoding of Unicode characters into 8-bit ASCII characters that does not use any of the ASCII control characters to perform the coding. Unicode is an abbreviation for Universal Multiple-Octet Coded Charac- ter Set (UCS), and it is defined in ISO/IEC 10646. It is an international standard for encoding computer char- acter sets that differs from historical ASCII. UTF-8 stands for Universal Transformation Format, 8-Bit form. See http://www.unicode.org/unicode/standard/utf8.html for more information.
offsets are specified in bytes. No guarantee about endianess is given, i.e. a portable program should either not depend on a specific endianess or find it out.
The data segment is the part of memory where the static, initialised or uninitialised, data is allocated. The data segment is read/write, so the values stored can be changed at runtime. The size and initial content of the data segment is determined at compile time (Section 4). C-- does not provide dy- namic memory allocation natively, nonetheless, it can be accomplished with foreign language calls (Section 3.8 and Section 5.3).
The code segment is the part of memory where the executable program code is stored (Section 5). The code segment consists of a series of procedure definitions.
C-- does not currently provide a mechanism for creating code at runtime.
There are only two kinds of types provided by C--, namely word and float. These types can have different sizes.
The size of a word can be 1, 2, 4 or 8 bytes.
The size of a float can be 4 or 8 bytes.
A type must be qualified with a size. Thus word1 and word2 are different types.
There is no pointer type. The wordn type can be used for pointers (addresses), where n is the particular architecture’s natural pointer size , i.e.: n is the number of bytes needed to hold a memory address in the particular architecture.
For example, a four byte word is specified as word4, an eight byte float is specified as float8 and so on.
Types are used in
Any number of local variable names may be declared inside procedure bodies. They are typed storage locations that don’t have an address. The term “local variable name” is interchangeable with the term “register”, since there is an unlimited supply of (virtual) registers: i.e. a local variable name will be
Labels are the means to refer to the allocated memory. They should be viewed as pointers and not as memory locations. A label declaration consists of a name followed by a colon. Once declared, a label is a name (and so an expression) that refers (points) to a memory address. Therefore it has wordn type, where n is the particular architecture’s natural pointer size. Labels may be used before their are declared, e.g. label ff is used in the initialisation of the data directive’s first datum before it is declared pointing to the third.
Note that labels do not provide any information about the type of the data pointed to by them.
A label points to the first byte after its declaration. Here is an example in which four labels point to the same datum:
data { foo: label1: label2: bar: word4} /* just allocates */
Memory is always allocated without padding inside a single data layout directive, so it is possible to find any given data in the data segment by starting from a label and adding the right offset, as in, for example, the read expression word2[foo+4]. Indeed, foo+4 does not have to point to the beginning of a data element. It may point to any other data byte, but it is assumed by C-- that it is 2-byte aligned.
To align a label (and hence the datum it points to) to a specific boundary, an alignment directive (Section 4.3) has to be placed before the label. In the following example, foo and bar might or might not be the same address, but bar is guaranteed to be aligned on an eight byte boundary.
data { foo: align8; bar: word4{0}; }
It is be possible to have a stupid data layout directive with no labels that is inaccessible.
Memory is allocated by specifying the type of the datum, the number of datum’s elements to allocate, and the initial value for each element. The particular syntax is:
ty pe[n]{constant-l ist};
where n specifies how many elements of the type ty pe have to be allocated, and constant-l ist pro- vides the initial value (of type ty pe) for each allocated element, in the form of a comma-separated list of constants or constant expressions (i.e. expressions whose value is known at compile time).
There are a number of possible variants:
data { lb1: word1; } /* Allocates one byte (contains garbage) / data { lb2: word1{17}; } / Allocates one byte and initialises it to (ASCII code) 17 */
data { lb3: word4{17}; } /* Allocates one 4-byte word and initialises it to integer 17 */
data { lb1: word1[17]; } /* Allocates 17 bytes (that contain garbage) / data { lb2: word1[17]{0}; } / Allocates 17 bytes and initialises all of them to 0 / data { lb3: word4[6]{1,2,3}; } / Allocates six 4-byte words and initialises them
ty pe[]{constant-l ist}; is an abbr ev iation f or ty pe[c]{constant-l ist}; word1[]"char 1 : : : charn "; is an abbr ev iation f or word1[n]{’char 1 ’, : : : ,’charn ’}; word2[]unicode("char 1 : : : charn "); is an abbr ev iation f or word2[n]{unicode(’char 1 ’),... ,unicode(’charn ’)};
For example:
data { s1: word1[6]{’h’,’e’,’l’,’l’,’o’,’\0’}; } data { s2: word1[]"hello\0"; } /* Both directives allocate 7 bytes and initialise
Since the initialised value might have dependencies on the endianess, the only way to guarantee that a memory read (Section 6.2) gets the same initialised (or written) value, is to read the datum or the element with the same type as it was initialised (or written). For example, if a datum was initialised with data {foo: word2{17};}, if read back with word1[foo] the value might be 0 or 17 depending on the architechture, but if read with word2[foo] it is guaranteed to be 17.
The return type needs not be specified in the definition.
For example, procedure foo is defined as a procedure that expects one word4 argument. Inside the procedure body, the local variable (or register) x is declared, followed by an assignment statement and a jump statement to procedure bar.
foo(word4 y) { word4 x;
x = y + 1; jump bar(x); }
5.2.1 skip;
This is just the null statement and can be inserted anywhere an ordinary statement can. It does not have any effects. It is used for clarity instead of the error-prone stand-alone semicolon.
5.2.2 Declaration
A declaration statement has the following syntax:
ty pe name 1 ,... ,namen ;
It declares the local variable names name 1 : : : namen of type ty pe. These names will be mapped to (virtual) machine registers. As names, they are also expressions of type ty pe.
Local variables have to be declared before they are used.
A declaration statement may appear anywhere inside the procedure body. All declarations are treated as if they were declared at the beginning of the procedure body. All the local variable names must be unique. It is not possible to redeclare a name.
5.2.3 Assignment
An assignment statement has the following syntax:
name = expr ;
It stores the value of expr in the local variable (or register) name, where expr has the same type as name.
5.2.4 Memory write
A memory write statement has the following syntax:
wordn[expr 1 ] = expr 2 ;
to write wordn values, or
floatn[expr 1 ] = expr 2 ;
to write floatn values.
Expression expr 1 has type wordn, where n is the particular architecture’s natural pointer size, and its value is the memory address in which the value of expr 2 is written. Expression expr 1 will typically contain one or more labels. Expression expr 2 should be of type wordn or floatn respectively, otherwise the value written in memory is unspecified.
The following example stores the ASCII integer code of ’A’ in the 4th byte of the datum pointed to by label
word1[label+4] = ’A’;
The address yielded by expr 1 is assumed aligned to the size of the type, namely, n. A memory write can optionally be qualified with an alignment flag {aligna}, so the syntax is now:
wordn{aligna}[expr 1 ] = expr 2 ; floatn{aligna}[expr 1 ] = expr 2 ;
A few examples of memory writes with flagged alignment:
float8{align4}[label] = expr does a 8-byte write but assumes that label is aligned to a 4 byte boundary.
word4{align1}[label] = expr does a 4-byte write but assumes that label is aligned to a byte boundary (pointer to a byte).
word1{align4}[label] = expr does a 1-byte write but assumes that label is aligned to a 4 byte boundary.
5.2.5 if and relational operations
Conditional execution of code is accomplished with the if statement. It has the following syntax:
if expr 1 r el expr 2 {... } else {... }
The else branch is optional and the statement blocks may be empty, as in if x == 0 {}, but the curly braces are mandatory even for single statements, as in
if x == 0 { x = x + 1;}
The condition test is very simple: it consists of a relational operation, r el , that takes two expressions as arguments. The term “operation” is used instead of “operator”, therefore avoiding confusion with C-- operators that are used in expressions (Section 6). Relational operations are only used in if condition tests; they cannot be used anywhere else.
This is the set of relational operations:
sconst 11 , : : : ,sconst 1 i : { : : : } .. . sconstm 1 , : : : ,sconstmj : { : : : } default : { : : : } }
where:
expr is an expression that yields a word value.
sconstk 1 , : : : ,sconstk l : { : : : } is branch k (k : 1 : : : m), in which multiple simple-constant^2 alternatives (l : 1 : : : i; j : : : ) may be specified. When the value of expr is any of sconstk 1 : : : sconstk l , branch k is taken, executing its block of statements and resuming control at the first statement after the switch. There is no fall through between different branches: C-- assumes that earlier branches are more likely to be taken.
default is the (optional) default branch that is taken when none of the others are taken. The effect is unspecified if none of the branches are taken—none of the sconstij match expr ’s value—and no default branch is provided.
[sconst 1 ..sconstn ] is an (optional) range of simple constants in which the value of expr is guaranteed to be. This range is a hint to the compiler. No bounds checking is performed at run-time to see whether expr ’s value is in the range.
In the following example, expression x+23 is assumed to yield a value in between 0 and 7. If the value is 1,2 or 3, then the first branch is taken. If the value is 5, then the second branch is taken. If the value is 0,4,6, or 7, then the default branch is taken.
switch [0..7] x + 23 { 1,2,3 : { y = y + 1;} 5 : { y = x + 1; x = y;} default : { y = f(); if y == 0 { x = 1;} } }
5.2.7 Local control labels and goto
Local control labels are used in conjuction with the goto statement to alter the control flow within a procedure body. A local control label declaration consists of a label name followed by a colon. This kind of control label is not a name in the sense of Section 3.7, and so, it should not be confused with the pointer labels mentioned so far. The only thing that can be done with a local control label is to provide it as argument to goto statements.
In turn, a goto statement transfers control to the label it takes as argument. Only a local control label can be the argument of a goto.
(^2) That is, word integers or characters. See Figure 1.
In the following example, the goto statement forces the control flow to resume to the very first statement after the label declaration.
bar() { label: word8[foo] = 18; word8[foo+4*8] = word8[bar]; goto label; return(); }
5.2.8 Procedure call
A call statement invokes a procedure in the conventional way of function invocation, so all the invok- ing procedure’s local variables are saved across the call. The particular syntax is:
name 1 ,... ,namen = conv expr (expr 1 ,... ,exprm );
where:
name 1 ,... ,namen = is the local variable name list. The results returned back by the proce- dure are stored in each variable in the order in which they are returned, from left to right, by the invoked procedure’s return statement (Section 5.2.10). If the invoked procedure returns no values, the name list should be omitted, otherwise the values of the names are unspecified after the call.
conv is the (optional) calling convention declaration needed for inter-operating with foreign code. (Section 5.3)
expr is any expression that evaluates to a procedure address. It will typically be a (procedure) name.
expr 1 ,... ,exprm is the (optional) actual argument list, where each actual argument expri is an expression. All the expressions are passed by value to the called procedure. If no arguments are passed, the list should be empty, as in, for example, x = f();.
It is unspecified what the effects are if the number and the types of the actual arguments in a call statement do not match the number and the types of the formal arguments of the invoked procedure. It is also unspecified what the effects are if the number and the types of the names in the name list do not match the number and the types of the results returned by the invoked procedure.
Call statements are not expressions and so cannot be used inside expressions. Procedure calls are complete statements. Things such as y = f(g(x)) + 1; are not allowed. Recall, however, that procedure names , as such, are expressions with the procedure address as value.
The following example is self-explanatory: foo() { word4 x, y; x, y = bar(5);
bar(word4 z) { return (1+z, z/3); { foo(word4 z) { return (); {
To use a foreign language calling convention for a procedure, the name of the calling convention should be declared before the procedure name with the foreign keyword. Here, foo uses the standard C calling convention.
export foo; foreign C foo() { word4 x; jump bar(x); }
The calling convention should be also specified in the same way in call statements and in return statements, if it is not C--’s calling convention.
import printf, fun; goo() { word4 i; foreign C fun(5); /* fun has type int -> void / foreign C i = printf(str, arg); / printf() returns an int / return (); } bar(word4 a) { a = a + 1; foreign C return (a); / uses C’s convention to return ’a’ */ }
There supported calling conventions are:
All foreign language functions/procedures must have been imported with import declarations. All C-- procedures directly invoked from a foreign language must have been exported with export dec- larations.
When calling a C-- procedure from a foreign program, the types and sizes of the actual arguments should match the types and sizes of the formal arguments in the particular platform, otherwise the effects are unspecified. The same applies for the types and sizes of returned values.
When inter-operating with foreign languages, since the size of a particular foreign language type may differ between platforms, and since C-- types always has fixed-size types, it is impossible for C-- to be completely platform independent when inter-operating with foreign languages.
An C-- expression can be a constant, a name, a memory read, a primitive, or an operator applied to other expressions. C-- makes a distinction between integer and floating point expressions, i.e., expressions that yield words or floats as result.
The integer and floating point model is based on the LIA-1 standard (ISO/IEC 10967-1:1994(E)) and if there are any inconsistencies between this manual and LIA-1, the LIA-1 standard is correct, unless otherwise noted.
Signed and unsigned numbers are not distinguished. Instead, like any other assembler, it is the oper- ations that are typed.
The type of any subexpression is always known and there are no automatic type casts or type conver- sions.
The following sections cover all the C-- operators, all the C-- primitives, and the memory read expression.
Memory read expressions have the following syntax:
wordm[expr ] Type: wordn! wordm
to read a wordm value, and
floatm[expr ] Type: wordn! floatm
to read a floatm value.
Expression expr has type wordn, where n is the particular architecture’s natural pointer size. Its value is the address of the memory location to read from. It will typically contain one or more labels. The size m indicates how many bytes to read from that location.
The following example expression reads a 4-byte word from the second byte pointed to by label p: