
















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This content covers the following topics: Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Design of lexical analyzer generator
Typology: Slides
1 / 56
This page cannot be seen from the preview
Don't miss anything!
Why separate Lexical analysis and parsing
A token is a pair: a token name and an optional token value A pattern is a description of the form that the lexemes of a token may take A lexeme is a sequence of characters in the source program that matches the pattern for a token Ex: the input 31 + 28 + 59 is transformed into the sequence <num, 31> <+> <num, 28> <+> <num, 59>
fi (a == f(x)) …
d = 2r
E = M * C * * 2 eof lexeme begin forward
Two pointers to the input are maintained:
1. Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are attempting to determine. 2. Pointer forward scans ahead until a pattern match is found. Once the next lexeme is determined, forward is set to the character at its right end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser, lexemeBegin is set to the character immediately after the lexeme just found.
In theory of compilation regular expressions are used to formalize the specification of tokens Regular expressions are means for specifying regular languages Example: here x* indicates zero or more strings matching x Id -> letter_(letter_ | digit)* Each regular expression is a pattern specifying the form of strings a+ : one or more strings matching string a
letter_ -> A | B | … | Z | a | b | … | z | _ digit -> 0 | 1 | … | 9 id -> letter_ (letter_ | digit)* Ref. table 3.8 in the text book. _ denotes concatenation
stmt -> if expr then stmt | if expr then stmt else stmt | Ɛ expr -> term relop term | term term -> id | number
digit → [0-9] => 0|1| …| digits → digit+ number → digits optionalFraction optionalExponent (ex: 2340, 0.012, 2.32E4 or 1.89E-4) number → digits(. digits)? (E[+-]? digits)? letter → [A-Za-z_] id → letter (letter|digit)* if → if then → then else → else relop → < | > | <= | >= | = | < > We also need to handle whitespaces: ws → (blank | tab | newline)+
Notations: Single circle: state, Edge: transition, double circle: accepting state indicating detection of lexeme
Start (^0 1 2 3 4 )