Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

module 3 notes of atc, Study notes of Theory of Automata

all the topics under atc mod 3 is covered

Typology: Study notes

2022/2023

Uploaded on 01/27/2023

hethishe
hethishe 🇮🇳

5 documents

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download module 3 notes of atc and more Study notes Theory of Automata in PDF only on Docsity!

Module - III

Context-Free Grammars and Pushdown Automata (PDA)

Syllabus of Module 3

i. Context-Free Grammars(CFG): Introduction to Rewrite Systems and Grammars ii. CFGs and languages, designing CFGs, iii. Simplifying CFGs, iv. Proving that a Grammar is correct, v. Derivation and Parse trees, Ambiguity, vi. Normal Forms. vii. Pushdown Automata (PDA): Definition of non-deterministic PDA, viii. Deterministic and Non-deterministic PDAs, ix. Non-determinism and Halting, Alternative equivalent definitions of a PDA, x. Alternatives those are not equivalent to PDA. Text Books:

  1. Elaine Rich, Automata, Computability and Complexity, 1st Edition, Pearson Education, 2012/2013. Text Book 1: Ch 11, 12: 11.1 to 11.8, 12.1 to 12.6 excluding 12.3.
  2. K L P Mishra, N Chandrasekaran , 3rd Edition, Theory of Computer Science, PHI, 2012 Reference Books:
  3. John E Hopcroft, Rajeev Motwani, Jeffery D Ullman, Introduction to Automata Theory, Languages, and Computation, 3rd Edition, Pearson Education, 2013
  4. Michael Sipser : Introduction to the Theory of Computation, 3rd edition, Cengage learning,
  5. John C Martin, Introduction to Languages and The Theory of Computation, 3rd Edition,Tata McGraw – Hill Publishing Company Limited, 2013
  6. Peter Linz, “An Introduction to Formal Languages and Automata”, 3rd Edition, Narosa Publishers, 1998
  7. Basavaraj S. Anami, Karibasappa K G, Formal Languages and Automata theory, WileyIndia, 2012

Learning Outcomes: At the end of the module student should be able to: Sl.No TLO’s

  1. Define context free grammars and languages
  2. Design the grammar for the given context free languages.
  3. Apply the simplification algorithm to simplify the given grammar
  4. Prove the correctness of the grammar
  5. Define leftmost derivation and rightmost derivation
  6. Draw the parse tree to a string for the given grammar.
  7. Define ambiguous and inherently ambiguous grammars.
  8. Prove whether the given grammar is ambiguous grammar or not.
  9. Define Chomsky normal form. Apply the normalization algorithm to convert the grammar to Chomsky normal form.
  10. Define Push down automata (NPDA). Design a NPDA for the given CFG.
  11. Design a DPDA for the given language.
  12. Define alternative equivalent definitions of a PDA.

1. Introduction to Rewrite Systems and Grammars

What is Rewrite System? A rewrite system (or production system or rule based system) is a list of rules, and an algorithm for applying them. Each rule has a left-hand side and a right hand side. X → Y (LHS) (RHS) Examples of rewrite-system rules: S  aSb, aS  , aSb  bSabSa When a rewrite system R is invoked on some initial string w, it operates as follows: simple-rewrite(R: rewrite system, w: initial string) =

  1. Set working-string to w.
  2. Until told by R to halt do: 1.1 Match the LHS of some rule against some part of working-string. 1.2 Replace the matched part of working-string with the RHS of the rule that was matched.
  3. Return working-string. If it returns some string s then R can derive s from w or there exists a derivation in R of s from w. Examples:
  4. A rule is simply a pair of strings where, if the string on the LHS matches, it is replaced by the string on the RHS.
  5. The rule axa  aa squeeze out whatever comes between a pair of a’s.
  6. The rule ababa  aaa squeeze out b’s between a’s.

When to Stop Case 1: The working string no longer contains any nonterminal symbols (including, when it is ). In this case, we say that the working string is generated by the grammar. Example: S ⇒ aSb ⇒ aaSbb ⇒ aabb Case 2: There are nonterminal symbols in the working string but none of them appears on the left-hand side of any rule in the grammar. In this case, we have a blocked or non-terminated derivation but no generated string. Given: S  aSb S bTa S   ----- rule 1 ----- rule 2 ----- rule 3 Derivation so far: S ⇒ aSb ⇒ abTab ⇒ Case 3: It is possible that neither case 1 nor case 2 is achieved. Given: S  Ba ------------ rule 1 B  bB ------------ rule 2 Then all derivations proceed as: S ⇒ Ba ⇒ bBa ⇒ bbBa ⇒ bbbBa ⇒ bbbbBa ⇒ ... The grammar generates the language Ø.

2. Context – Free Grammar and Languages

Recall Regular Grammar which has a left-hand side that is a single nonterminal and have a right-hand side that is  or nonterminal. a single terminal or a single terminal followed by a single X → Y ( NT) ( or T or T NT) Example: L = {w Î {a, b}* : |w| is even} RE = ((aa) (ab) (ba) (bb))* Context Free Grammars X → Y (NT) (No restriction) No restrictions on the form of the right hand sides. But require single non-terminal on left hand side. Example: S  , S  a, S  T, S  aSb, S aSbbT are allowed. ST aSb, a aSb,   a are not allowed. The name for these grammars “Context Free” makes sense because using these rules the decision to replace a nonterminal by some other sequence is made without looking at the context in which the nonterminals occurs.

Definition Context-Free Grammar A context-free grammar G is a quadruple, (V, , R, S), where:

  • V is the rule alphabet, which contains nonterminals and terminals.
  •  (the set of terminals) is a subset of V,
  • R (the set of rules) is a finite subset of (V - ) V*,
  • S (the start symbol) is an element of V - . Given a grammar G, define x ⇒G y to be the binary relation derives-in-one-step, defined so that ∀ x,y  V* (x ⇒G y iff x = A, y =    and there exists a rule A   is in RG ) Any sequence of the form w 0 ⇒G w 1 ⇒G w 2 ⇒G... ⇒G wn is called a derivation in G. Let ⇒G* be the reflexive, transitive closure of ⇒G. We’ll call ⇒G* the derive relation. A derivation will halt whenever no rules on the left hand side matches against working-string. At every step, any rule that matches may be chosen. Language generated by G, denoted L(G), is: L(G) = {w  * : S ⇒G* w}. A language L is context-free iff it is generated by some context-free grammar G. The context-free languages (or CFLs) are a proper superset of the regular languages. Example: L = AnBn^ = {anbn^ : n ≥ 0} = {, ab, aabb, aaabbb, …} G = {{S, a, b}, {a, b}, R, S}, where: R = { S  aSb , S  } Example derivation in G: S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb or S ⇒* aaabbb Recursive Grammar Rules A grammar is recursive iff it contains at least one recursive rule. A rule is recursive iff it is X  w 1 Yw 2 , where: Y ⇒* w 3 Xw 4 for some w 1 , w 2 , w 3 , and w 4 in V. Expanding a non- terminal according to these rules can eventually lead to a string that includes the same non- terminal again. Example1: L = AnBn^ = {anbn^ : n ≥ 0} Let G = ({S, a, b}, {a, b}, {S  a S b, S  }, S) Example 2: Regular grammar whose rules are {S  a T, T  a W, W  a S, W  a } Example 3: The Balanced Parenthesis language Bal = {w  { ),(}: the parenthesis are balanced} = { , (), (()), ()(), (()()) ............................. } G={{S,),( }, {),(},R,S} where R={ S   S  SS S  (S) } Some example derivations in G: S ⇒ (S) ⇒ () S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ (() S)) ⇒ (() (S)) ⇒ (()()) So, S ⇒* () and S ⇒* (()()) Recursive rules make it possible for a finite grammar to generate an infinite set of strings.

Example2: A CFG for C++ compound statements:  { } | epsilon  if ( )  if ( ) else  while ( )  do while ( ) ;  for ( ; )  case :  switch ( )  break ; | continue ;  return ; | goto ; Example3: A Fragment of English Grammar Notational conventions used are

  • Nonterminal = whose first symbol is an uppercase letter
  • NP = derive noun phrase
  • VP = derive verb phrase S  NP VP NP  the Nominal | a Nominal | Nominal | ProperNoun | NP PP Nominal  N | Adjs N N  cat | dogs | bear | girl | chocolate | rifle ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VP PP V  like | likes | thinks | shots | smells PP  Prep NP Prep  with

3. Designing Context-Free Grammars

If L has a property that every string in it has two regions & those regions must bear some relationship to each other, then the two regions must be generated in tandem. Otherwise, there is no way to enforce the necessary constraint.

Example 1: L = {anbncm^ : n, m ≥ 0} = L = {, ab, c, abc, abcc, aabbc, …….} The cm^ portion of any string in L is completely independent of the anbn^ portion, so we should generate the two portions separately and concatenate them together. G = ({S, A, C, a, b, c}, {a, b, c}, R, S} where: R = { S  AC /* generate the two independent portions A  aAb |  /* generate the anbn^ portion, from the outside in C  cC |  } /* generate the cm^ portion Example derivation in G for string abcc: S ⇒ AC ⇒ aAbC ⇒ abC ⇒ abcC ⇒ abccC ⇒ abcc Example 2: L={ aibjck^ : j=i+k, i ,k ≥ 0} on substituting j=i+k ⇒ L = {aibibkck^ : i, k ≥ 0} L = {, abbc, aabbbbcc, abbbcc …….} The aibi^ portion of any string in L is completely independent of the bkck^ portion, so we should generate the two portions separately and concatenate them together. G = ({S, A, B, a, b, c}, {a, b, c}, R, S} where: R = { S  AB /* generate the two independent portions A  aAb |  /* generate the aibi^ portion, from the outside in B  bBc |  } /* generate the bkck^ portion Example derivation in G for string abbc: S ⇒ AB ⇒ aAbB ⇒ abB ⇒ abbBc ⇒ abbc Example 3: L={ aibjck^ : i=j+k, j ,k ≥0} on substituting i=j+k ⇒ L = {akajbjck^ : j, k ≥0} L = {, ac, ab, aabc, aaabcc, …….} The aibi^ is the inner portion and akck^ is the outer portion of any string in L. G = ({S, A, a, b, c}, {a, b, c}, R, S} where: R = { S  aSc | A /* generate the akck^ outer portion A  aAb |  /* generate the ajbj^ inner portion} Example derivation in G for string aabc: S ⇒ aSc ⇒ aAc ⇒ aaAbc ⇒ aabc Example 4: L = {anwwR^ bn: w  {a, b}*} = {, ab, aaab, abbb, aabbab, aabbbbab, ........... } The anbn^ is the inner portion and wwR^ is the outer portion of any string in L. G = {{S, A, a, b}, {a, b}, R, S}, where: R = {S  aSb ------------ rule 1 S  A --------------- rule 2 A aAa ------------ rule 3 A bAb ----------- rule 4 A   --------------- rule 5 }. Example derivation in G for string aabbab: S ⇒ aSb ⇒ aAb ⇒ aaAab ⇒ aabAbab ⇒ aabbab

Example: G = ({S, A, B, C, D, a, b}, {a, b}, R, S), where R = { S  AB | AC A  aAb |  B  bA C  bCa D  AB }

  1. a and b terminal symbols are productive
  2. A is productive( because A  aAb )
  3. B is productive( because B  bA )
  4. S & D are productive(because S  AB & D  AB )
  5. C is unproductive On eliminating C from both LHS and RHS the rule set R obtained is R = { S  AB A  aAb |  B  bA D  AB } b. Removing Unreachable Nonterminals Removeunreachable (G: CFG) =
  1. G = G.
  2. Mark S as reachable.
  3. Mark every other nonterminal symbol as unreachable.
  4. Until one entire pass has been made without any new symbol being marked do: For each rule X  A (where A  V - ) in R do: If X has been marked as reachable and A has not then: Mark A as reachable.
  5. Remove from G every unreachable symbol.
  6. Remove from G every rule with an unreachable symbol on the left-hand side.
  7. Return G. Example G = ({S, A, B, C, D, a, b}, {a, b}, R, S), where R = {S  AB A  aAb |  B  bA D  AB } S, A, B are reachable but D is not reachable, on eliminating D from both LHS and RHS the rule set R is R = { S  AB A  aAb |  B  bA }

5. Proving the Correctness of a Grammar

Given some language L and a grammar G, can we prove that G is correct (ie it generates exactly the strings in L) To do so, we need to prove two things:

  1. Prove that G generates only strings in L.
  2. Prove that G generates all the strings in L.

6. Derivations and Parse Trees

Algorithms used for generation and recognition must be systematic. The expansion order is important for algorithms that work with CFG. To make it easier to describe such algorithms, we define two useful families of derivations. a. A leftmost derivation is one in which, at each step, the leftmost nonterminal in the working string is chosen for expansion. b. A rightmost derivation is one in which, at each step, the rightmost nonterminal in the working string is chosen for expansion. Example 1 : S → AB | aaB A → a | Aa B → b Left-most derivation for string aab is S ⇒ AB ⇒ AaB ⇒ aaB ⇒ aab Right-most derivation for string aab is S ⇒ AB ⇒ Ab ⇒ Aab ⇒ aab Example 2: SiCtS | iCtSeS | x Cy Left-most Derivation for string iytiytxex is S ⇒ iCtS ⇒ iytS ⇒ iytiCtSeS ⇒ iytiytSeS ⇒ iytiytxe ⇒ iytiytxex Right-most Derivation for string iytiytxex is S ⇒ iCtSeS ⇒ iCtSex ⇒ iCtiCtSex ⇒ iCtiCtxex ⇒ iCtiytxex ⇒ iytiytxex Example 3: A Fragment of English Grammar are S  NP VP NP  the Nominal | a Nominal | Nominal | ProperNoun | NP PP Nominal  N | Adjs N N  cat | dogs | bear | girl | chocolate | rifle ProperNoun  Chris | Fluffy Adjs  Adj Adjs | Adj Adj  young | older | smart VP  V | V NP | VP PP V  like | likes | thinks | shots | smells PP  Prep NP Prep  with

Example 1: S  AB | aaB A  a | Aa B  b Left-most derivation for the string aab is S ⇒ AB ⇒ AaB ⇒ aaB ⇒ aab Parse tree obtained is Example 2: SiCtS | iCtSeS | x Cy Left-most Derivation for string iytiytxex isS ⇒ iCtS ⇒ iytS ⇒ iytiCtSeS ⇒ iytiytSeS ⇒ iytiytxeS ⇒ iytiytxex Example 3: Parse Tree - Structure in English for the string “the smart cat smells chocolate”. It is clear from the tree that the sentence is not about cat smells or smart cat smells. the smart cat smells chocolate A parse tree may correspond to multiple derivations. From the parse tree, we cannot tell which of the following is used in derivation: S ⇒ NP VP ⇒ the Nominal VP ⇒ S ⇒ NP VP ⇒ NP V NP ⇒ Parse trees capture the important structural facts about a derivation but throw away the details of the nonterminal expansion order. The order has no bearing on the structure we wish to assign to a string. Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between:

  1. G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and
  2. G’s strong generative capacity, defined to be the set of parse trees that G generates. When we design grammar, it will be important that we consider both their weak and their strong generative capacities.

7. Ambiguity

Sometimes a grammar may produce more than one parse tree for some (or all ) of the strings it generates. When this happens we say that the grammar is ambiguous. A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree.

Example 1: Bal={w  { ),(}*: the parenthesis are balanced}. G={{S,),( }, {),(},R,S} where R={ S   S  SS S  (S) } Left-most Derivation1 for the string (())() is S ⇒ S⇒(S)S ⇒ ((S))S ⇒ (())S ⇒ (())(S) ⇒ (())() Left-most Derivation2 for the string (())() is S ⇒ SS ⇒SSS ⇒SS ⇒ (S)S ⇒ ((S))S ⇒ (())S ⇒ (())(S) ⇒ (())() Since both the parse trees obtained for the same string (())() are different, the grammar is ambiguous. Example 2: S iCtS | iCtSeS | x C y Left-most Derivation for the string iytiytxex is S ⇒ iCtS ⇒ iytS ⇒ iytiCtSeS ⇒ iytiytSeS ⇒ iytiytxeS ⇒ iytiytxex Right-most Derivation for the string iytiytxex is S ⇒ iCtSeS ⇒ iCtSex ⇒ iCtiCtSex ⇒iCtiCtxex ⇒ iCtiytxex ⇒ iytiytxex Since both the parse trees obtained for the same string iytiytxex are different, the grammar is ambiguous. Example 3: S  AB | aaB A  a | Aa B  b Left-most derivation for string aab is S ⇒ AB ⇒ AaB ⇒ aaB ⇒ aab Right-most derivation for string aab is S ⇒ aaB ⇒ aab Since both the parse trees obtained for the same string aab are different, the grammar is ambiguous.

a. Eliminating ε-Rules Let G =(V, , R, S) be a CFG. The following algorithm constructs a G such that L(G ) = L(G)-{} and G contains no  rules: removeEps(G: CFG) =

  1. Let G = G.
  2. Find the set N of nullable variables in G.
  3. Repeat until G contains no modifiable rules that haven’t been processed: Given the rule PQ, where Q  N, add the rule P if it is not already present and if    and if P  .
  4. Delete from G all rules of the form X  .
  5. Return G. Nullable Variables & Modifiable Rules A variable X is nullable iff either: (1) there is a rule X  , or (2) there is a rule X  PQR… and P, Q, R, … are all nullable. So compute N, the set of nullable variables, as follows: 2.1. Set N to the set of variables that satisfy (1). 2.2. Until an entire pass is made without adding anything to N do Evaluate all other variables with respect to (2). If any variable satisfies (2) and is not in N, insert it. A rule is modifiable iff it is of the form: P  Q, for some nullable Q. Example: G = {{S, T, A, B, C, a, b, c}, {a, b, c}, R, S), R = {S  aTa T  ABC A  aA | C B  Bb | C C  c |  } Applying removeEps Step2: N = { C } Step2.2 pass1: N = { A, B, C } Step2.2 pass2: N = { A, B, C, T } Step2.2 pass3: no new element found. Step2: halts. Step3: adds the following new rules to G. { S  aa T  AB | BC | AC | A | B | C A  a B  b } The rules obtained after eliminating -rules : { S  aTa | aa T  ABC | AB | BC | AC | A | B | C A  aA | C | a B  Bb | C | b C  c }

What If εL? Sometimes L(G) contains  and it is important to retain it. To handle this case the algorithm used is atmostoneEps(G: CFG) =

  1. G = removeEps(G).
  2. If SG is nullable then /* i. e.,   L(G) 2.1 Create in G a new start symbol S. 2.2 Add to RG the two rules:S   and S*  SG.
  3. Return G. Example: Bal={w  { ),(}*: the parenthesis are balanced}. R={ S  SS S  (S) S   }

R={ S  SS

S  (S)

S  ( ) }

R={ S*  

S*  S

S  SS

S  (S)

S  ( )}

The new grammar built is better than the original one. The string (())() has only one parse tree. But it is still ambiguous as the string ()()() has two parse trees? Replace S  SS with one of: S  S S 1 /* force branching to the left S  S 1 S /* force branching to the right So we get: S*   | S S  SS 1 /* force branching only to the left S  S 1 /* add rule S 1  (S) | ()

Proving that the grammar is Unambiguous A grammar is unambiguous iff for all strings w, at every point in a leftmost derivation or rightmost derivation of w, only one rule in G can be applied. S*   ---(1) S*  S ---(2) S  SS 1 ---(3) S  S 1 ---(4) S 1  (S) ---(5) S 1  () ---(6) S* ⇒ S ⇒SS 1 ⇒SS 1 S 1 ⇒S 1 S 1 S 1 ⇒ () S 1 S 1 ⇒ () () S 1 ⇒ () () () Inherent Ambiguity In many cases, for an ambiguous grammar G, it is possible to construct a new grammar G that generate L(G) with less or no ambiguity. However, not always. Some languages have the property that every grammar for them is ambiguous.We call such languages inherently ambiguous. Example: L = {aibjck: i, j , k  0, i=j or j=k}. Every string in L has either (or both) the same number of a’s and b’s or the same number of b’s and c’s. L = {anbncm: n, m  0}  {anbmcm: n, m  0}. One grammar for L has the rules: S  S 1 | S 2 S 1  S 1 c | A /* Generate all strings in {anbncm}. A  aAb |  S 2  aS 2 | B /* Generate all strings in {anbmcm}. B  bBc |  Consider the string a^2 b^2 c^2. It has two distinct derivations, one through S 1 and the other through S 2 S ⇒ S 1 ⇒ S 1 c⇒ S 1 cc ⇒ Acc ⇒ aAbcc ⇒ aaAbbcc ⇒ aabbcc S ⇒ S 2 ⇒ aS 2 ⇒ aaS 2 ⇒ aaB ⇒ aabBc ⇒ aabbBcc⇒ aabbcc Given any grammar G that generates L, there is at least one string with two derivations in G. Both of the following problems are undecidable:

  • Given a context-free grammar G, is G ambiguous?
  • Given a context-free language L, is L inherently ambiguous

8. Normal Forms

We have seen in some grammar where RHS is  , it makes grammar harder to use. Lets see what happens if we carry the idea of getting rid of  - productions a few steps farther. To make our tasks easier we define normal forms. Normal Forms - When the grammar rules in G satisfy certain restrictions, then G is said to be in Normal Form.

  • Normal Forms for queries & data can simplify database processing.
  • Normal Forms for logical formulas can simplify automated reasoning in AI systems and in program verification system.
  • It might be easier to build a parser if we could make some assumptions about the form of the grammar rules that the parser will use. Normal Forms for Grammar Among several normal forms, two of them are:-
  • Chomsky Normal Form(CNF)
  • Greibach Normal Form(GNF) Chomsky Normal Form (CNF) In CNF we have restrictions on the length of RHS and the nature of symbols on the RHS of the grammar rules. A context-free grammar G = (V, Σ, R, S) is said to be in Chomsky Normal Form (CNF), iff every rule in R is of one of the following forms: X  a where a   , or X  BC where B and C  V- Example: S  AB, A  a, B  b Every parse tree that is generated by a grammar in CNF has a branching factor of exactly 2 except at the branches that leads to the terminal nodes, where the branching factor is 1. Using this property parser can exploit efficient data structure for storing and manipulating binary trees. Define straight forward decision procedure to determine whether w can be generated by a CNF grammar G. Easier to define other algorithms that manipulates grammars. Greibach Normal Form (GNF) GNF is a context free grammar G = (V, , R, S), where all rules have one of the following forms: X  a where a   and   (V-)* Example: S aA | aAB, A  a, B  b In every derivation precisely one terminal is generated for each rule application. This property is useful to define a straight forward decision procedure to determine whether w can be generated by GNF grammar G. GNF grammars can be easily converted to PDA with no  transitions.