Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Lexical Analysis and Token Recognition, Slides of Compiler Design

NITTE University Compiler Design

This content covers the following topics: Role of lexical analyzer Specification of tokens Recognition of tokens Lexical analyzer generator Design of lexical analyzer generator

Typology: Slides

2021/2022

Available from 09/23/2023

avishek-1 🇮🇳

4 documents

1 / 56

This page cannot be seen from the preview

Don't miss anything!

Lexical Analysis

Partial preview of the text

Download Lexical Analysis and Token Recognition and more Slides Compiler Design in PDF only on Docsity!

Lexical Analysis

Overview

Role of lexical analyzer

Specification of tokens

Recognition of tokens

Lexical analyzer generator

Design of lexical analyzer generator

Why separate Lexical analysis and parsing

Simplicity of design
Improving compiler efficiency
Enhancing compiler portability

Tokens, Patterns and Lexemes

A token is a pair: a token name and an optional token value A pattern is a description of the form that the lexemes of a token may take A lexeme is a sequence of characters in the source program that matches the pattern for a token Ex: the input 31 + 28 + 59 is transformed into the sequence <num, 31> <+> <num, 28> <+> <num, 59>

Lexical errors

Some errors are out of power of lexical

analyzer to recognize as fi is a valid lexeme.

fi (a == f(x)) …

However it may be able to recognize errors

like:

d = 2r

Such errors are recognized when no pattern

for tokens matches a character sequence.

Error recovery

Panic mode: successive characters are ignored

until we reach to a well formed token

Delete one character from the remaining input

Insert a missing character into the remaining

input

Replace a character by another character

Transpose two adjacent characters (For

example , the string “ALISA” and “ALYSSA”

are separated by an edit distance of 2 to

transform 1 string to another)

Input buffering

In C language: we need to look after -, = or <

to decide what token to return; here it could

be start of two character operators like ==

There are many situations where we need to

look at least one additional character ahead

We need to introduce a two buffer scheme,

which are alternately reloaded, to handle

large look aheads safely

E = M * C * * 2 eof lexeme begin forward

Pointers in Input Buffer

Two pointers to the input are maintained:

1. Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent we are attempting to determine. 2. Pointer forward scans ahead until a pattern match is found. Once the next lexeme is determined, forward is set to the character at its right end. Then, after the lexeme is recorded as an attribute value of a token returned to the parser, lexemeBegin is set to the character immediately after the lexeme just found.

Specification of tokens

In theory of compilation regular expressions are used to formalize the specification of tokens Regular expressions are means for specifying regular languages Example: here x* indicates zero or more strings matching x Id -> letter_(letter_ | digit)* Each regular expression is a pattern specifying the form of strings a+ : one or more strings matching string a

. Any character but newline a.*b

Regular definitions

d1 -> r

d2 -> r

dn -> rn

 Example:

letter_ -> A | B | … | Z | a | b | … | z | _ digit -> 0 | 1 | … | 9 id -> letter_ (letter_ | digit)* Ref. table 3.8 in the text book. _ denotes concatenation

Recognition of tokens

Starting point is the language grammar to

understand the tokens:

stmt -> if expr then stmt | if expr then stmt else stmt | Ɛ expr -> term relop term | term term -> id | number

Recognition of tokens (cont.)

The next step is to formalize the patterns:

digit → [0-9] => 0|1| …| digits → digit+ number → digits optionalFraction optionalExponent (ex: 2340, 0.012, 2.32E4 or 1.89E-4) number → digits(. digits)? (E[+-]? digits)? letter → [A-Za-z_] id → letter (letter|digit)* if → if then → then else → else relop → < | > | <= | >= | = | < > We also need to handle whitespaces: ws → (blank | tab | newline)+

Transition diagrams

Transition diagram for relop

Notations: Single circle: state, Edge: transition, double circle: accepting state indicating detection of lexeme

retract Note: Start state enters from nowhere.

Transition diagrams (cont.)

Transition diagram for reserved words and

identifiers

Hypothetical transition diagram for keyword

then

Start (^0 1 2 3 4 )

Lexical Analysis and Token Recognition, Slides of Compiler Design

Related documents

Partial preview of the text

Download Lexical Analysis and Token Recognition and more Slides Compiler Design in PDF only on Docsity!

Lexical Analysis

Overview

Role of lexical analyzer

Specification of tokens

Recognition of tokens

Lexical analyzer generator

Design of lexical analyzer generator

Tokens, Patterns and Lexemes

Lexical errors

Some errors are out of power of lexical

analyzer to recognize as fi is a valid lexeme.

However it may be able to recognize errors

like:

Such errors are recognized when no pattern

for tokens matches a character sequence.

Error recovery

Panic mode: successive characters are ignored

until we reach to a well formed token

Delete one character from the remaining input

Insert a missing character into the remaining

input

Replace a character by another character

Transpose two adjacent characters (For

example , the string “ALISA” and “ALYSSA”

are separated by an edit distance of 2 to

transform 1 string to another)

Input buffering

In C language: we need to look after -, = or <

to decide what token to return; here it could

be start of two character operators like ==

There are many situations where we need to

look at least one additional character ahead

We need to introduce a two buffer scheme,

which are alternately reloaded, to handle

large look aheads safely

Pointers in Input Buffer

Specification of tokens

. Any character but newline a.*b

Regular definitions

d1 -> r

d2 -> r

dn -> rn

 Example:

Recognition of tokens

Starting point is the language grammar to

understand the tokens:

Recognition of tokens (cont.)

The next step is to formalize the patterns:

Transition diagrams

Transition diagram for relop

Transition diagrams (cont.)

Transition diagram for reserved words and

identifiers

Hypothetical transition diagram for keyword

then