Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Generating Compilers with Coco R-Making a Compiler-Lecture Slides, Slides of Compiler Construction

This lecture was delivered by Ruiz Pereira at Chandra Shekhar Azad University of Agriculture

Typology: Slides

2011/2012

Uploaded on 07/11/2012

dhansukh
dhansukh 🇮🇳

5

(2)

34 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Generating Compilers with Coco/R
1. Compilers
2. Grammars
3. Coco/R Overview
4. Scanner Specification
5. Parser Specification
6. Error Handling
7. LL(1) Conflicts
8. Case Study
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Generating Compilers with Coco R-Making a Compiler-Lecture Slides and more Slides Compiler Construction in PDF only on Docsity!

Generating Compilers with Coco/R

1. Compilers

2. Grammars

3. Coco/R Overview

4. Scanner Specification

5. Parser Specification

6. Error Handling

7. LL(1) Conflicts

8. Case Study

Compilation Phases

character stream v a l = 1 0 * v a l + i

lexical analysis (scanning)

token stream^1

(ident) "val"

(assign)

(number) 10

(times)

(ident) "val"

(plus)

(ident) "i"

token number

token value

syntax analysis (parsing)

syntax tree

ident = number * ident + ident

Term

Expression

Statement

Structure of a Compiler

parser &

sem. processing

scanner

symbol table

code generation

provides tokens from

the source code

maintains information about

declared names and types

generates machine code

"main program"

directs the whole compilation

uses

data flow

Generating Compilers with Coco/R

1. Compilers

2. Grammars

3. Coco/R Overview

4. Scanner Specification

5. Parser Specification

6. Error Handling

7. LL(1) Conflicts

8. Case Study

EBNF Notation

Extended Backus-Naur form

for writing grammars

John Backus : developed the first Fortran compiler Peter Naur : edited the Algol60 report

Statement = "write" ident "," Expression ";".

literal

terminal symbol

nonterminal symbol

terminates a production

left-hand side right-hand side

Productions

Metasymbols

[...]

separates alternatives

groups alternatives

optional part

iterative part

a | b | c  a or b or c

a (b | c)  ab | ac

[a] b  ab | b

{a}b  b | ab | aab | aaab | ...

by convention

  • terminal symbols start with lower-case letters
  • nonterminal symbols start with upper-case letters

Example: Grammar for Arithmetic Expressions

Productions

Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")".

Expr

Term

Factor

Terminal symbols

simple TS:

terminal classes:

(just 1 instance)

ident, number

(multiple instances)

Nonterminal symbols

Expr, Term, Factor

Start symbol

Expr

Coco/R - Compiler Compiler / Recursive Descent

  • Generates a scanner and a parser from an attributed grammar
    • scanner as a deterministic finite automaton (DFA)
    • recursive descent parser
  • Developed at the University of Linz (Austria)
  • There are versions for C#, Java, C/C++, VB.NET, Delphi, Modula-2, Oberon, ...
  • Gnu GPL open source: http://ssw.jku.at/Coco/

Facts

How it works

Coco/R

scanner

parser

main

user-supplied classes

(e.g. symbol table)

csc

attributed

grammar

A Very Simple Example

Assume that we want to parse one of the following two alternatives

red apple

We invoke Coco/R to generate a scanner and a parser

>coco Sample.atg Coco/R (Aug 22, 2006) checking parser + scanner generated 0 errors detected

orange

We write a grammar ...

Sample = "red" "apple" | "orange".

COMPILER Sample PRODUCTIONS Sample = "red" "apple" | "orange". END Sample.

file Sample.atg

and embed it into a Coco/R compiler description

Generated Parser

class Parser { ... void Sample () { if (la.kind == 1) { Get(); Expect(2); } else if (la.kind == 3) { Get(); } else SynErr(5); } ... Token la ; // lookahead token void Get () { la = Scanner.Scan(); ... } void Expect (int n) { if (la.kind == n) Get(); else SynErr(n); } public void Parse () { Get(); Sample(); } ... }

Grammar

Sample = "red" "apple" | "orange".

token codes returned by the scanner

A Slightly Larger Example

Parse simple arithmetic expressions

calc 34 + 2 + 5

calc 2 + 10 + 123 + 3

Coco/R compiler description

COMPILER Sample CHARACTERS digit = '0'..'9'. TOKENS number = digit {digit}. IGNORE '\r' + '\n' PRODUCTIONS Sample = {"calc" Expr}. Expr = Term {'+' Term}. Term = number. END Sample.

file Sample.atg

The generated scanner and parser will

check the syntactic correctness of the input

**>coco Sample.atg

csc Compile.cs Scanner.cs Parser.cs Compile Input.txt**

Generated Parser

class Parser { ... void Sample () { int n; while (la.kind == 2) { Get(); Expr(out n); Console.WriteLine(n); } } void Expr (out int n) { int n1; Term(out n); while (la.kind == 3) { Get(); Term(out n1); n = n + n1; } } void Term (out int n) { Expect(1); n = Convert.ToInt32(t.val); } ... }

Token codes

1 ... number 2 ... "calc" 3 ... '+'

**>coco Sample.atg

csc Compile.cs Scanner.cs Parser.cs Compile Input.txt**

calc 1 + 2 + 3

calc 100 + 10 + 1

Compile

Sample (. int n; .) = { "calc" Expr (. Console.WriteLine(n); .) }. ...

Structure of a Compiler Description

[UsingClauses]

"COMPILER" ident

[GlobalFieldsAndMethods]

ScannerSpecification

ParserSpecification

"END" ident "."

using System; using System.Collections;

int sum; void Add(int x) { sum = sum + x; }

ident denotes the start symbol of the grammar (i.e. the topmost nonterminal symbol)

Structure of a Scanner Specification

ScannerSpecification =

["IGNORECASE"]

["CHARACTERS" {SetDecl}]

["TOKENS" {TokenDecl}]

["PRAGMAS" {PragmaDecl}]

{CommentDecl}

{WhiteSpaceDecl}.

Should the generated compiler be case-sensitive?

Which character sets are used in the token declarations?

Here one has to declare all structured tokens

(i.e. terminal symbols) of the grammar

Pragmas are tokens which are not part of the grammar

Here one can declare one or several kinds of comments

for the language to be compiled

Which characters should be ignored (e.g. \t, \n, \r)?

Character Sets

Example

CHARACTERS

digit = "0123456789". hexDigit = digit + "ABCDEF". letter = 'A' .. 'Z'. eol = '\r'. noDigit = ANY - digit.

the set of all digits the set of all hexadecimal digits the set of all upper-case letters the end-of-line character any character that is not a digit

Valid escape sequences in character constants and strings

\ backslash \r carriage return \f form feed ' apostrophe \n new line \a bell " quote \t horizontal tab \b backspace \0 null character \v vertical tab \uxxxx hex character value