Download module 2 notes for atc and more Study notes Theory of Automata in PDF only on Docsity!
Chapter- 6
Regular Expressions
Regular Expression (RE) A RE is a string that can be formed according to the following rules:
- ø is a RE.
- ε is a RE.
- Every element in ∑ is a RE.
- Given two REs α and β,αβ is a RE.
- Given two REs α and β, α U β is a RE.
- Given a RE α, α* is a RE.
- Given a RE α, α+ is a RE.
- Given a RE α, (α) is a RE. if ∑ = {a,b}, the following strings are regular expressions: ø, ε, a,b, (a U b)*, abba U ε. Semantic interpretation function L for the language of regular expressions:
- L (ø) = ø, the language that contains no strings.
- L (ε) = {ε}, the language that contains empty string.
- For any cϵ∑, L(c) = {c}, the language that contains single character string c.
- For any regular expressions α and β, L (αβ) = L (α) L (β).
- For any regular expressions α and β, L (α U β) = L (α) U L (β).
- For any regular expression α, L (α) = (L (α)).
- For any regular expression α, L (α+) = L (αα) = L (α) (L (α))
- For any regular expression α, L ((α)) = L (α). Analysing Simple Regular Expressions 1.L( (a U b)b) = L((a U b))L(b) = (L((a U b)))*L(b)
= (L(a) U L(b))L(b) =({a} U {b}){b} = {a,b}{b} (a U b)b is the set of all strings over the alphabet {a, b} that end in b.
- L( ((a U b) (a U b))a(a U b)) = L(((a U b)(a U b)))L(a) L((a U b)) = L((a U b)(a U b)) {a} (L((a U b)})* = L((a U b))L((a U b)) {a} {a,b}* = {a, b} { a, b} {a} {a, b}*
- ((a U b)(a U b))a(a U b)* is {xay : x and y are strings of a's and b's and lxl = 2}. Finding RE for a given Language
- Let L = {w ϵ {a, b }: |w| is even}. L = {aa,ab,abba,aabb,ba,baabaa, -------} RE = ((a U b)(a U b)) or ( aa U ab U ba U bb )*
- Let L = {w ϵ {a, b }: w starting with string abb}. L = {abb,abba,abbb,abbab ------ } RE = abb(a U b)
- Let L = {w ϵ {a, b }: w ending with string abb}. L = {abb,aabb,babb,ababb ------ } RE = (a U b)abb
- L = {w ϵ {0, 1}* : w have 001 as a substring}. L = { 001 ,1 001 ,0 001 01, ------ } RE = (0 U 1)001(0 U 1)
- L = {w ϵ {0, 1}* : w does not have 001 as a substring}. L = {0,1,010,110,101, ---- } RE = (1 U 01)0
Three operators of RE in precedence order(highest to lowest)
- Kleene star
- Concatenation
- Union Eg: (a U bba) is evaluated as (a U (b(b)a)) Kleene's Theorem Theorem 1 : Any language that can be defined by a regular expression can be accepted by some finite state machine. Theorem 2 : Any language that can be accepted by a finite state machine can be defined by some regular expressions. Note: These two theorems are proved further. Buiding an FSM from a RE Theorem 1 :For Every RE, there is an Equivalent FSM. Proof: The proof is by construction. We can show that given a RE α, we can construct an FSM M such that L (α) = L (M). Steps:
- If α is any cϵ∑ ,we construct simple FSM shown in Figure(1) Figure (1)
- If α is any ø, we construct simple FSM shown in Figure(2). Figure (2)
- If α is ε,we construct simple FSM shown in Figure(3). Figure (3)
- Let β and γ be regular expressions. If L(β) is regular,then FSM M1 = (K1, ∑ , δ1, s1, A1). If L(γ) is regular,then FSM M2 = (K2, ∑ , δ2, s2, A2). If α is the RE β U γ, FSM M3=(K3, ∑ , δ3, s3, A3) and L(M3)=L(α)=L(β) U L(γ) M3 = ({S3} U K1 U K2, ∑ , δ3, s3, A1 U A2), where δ3 = δ1 U δ2 U { ((S3, ε), S1),((S3, ε),S2)}. α = β U γ
- If α is the RE βγ, FSM M3=(K3, ∑ , δ3, s3, A3) and L(M3)=L(α)=L(β)L(γ) M3 = (K1 U K2, ∑ , δ3, s1, A2), where
An FSM for a An FSM for ab An FSM for (b U ab)
An FSM for (b U ab)*
2. Construct FSM for the RE (b(a U b)b)*
Building a Regular Expression from an FSM Building an Equivalent Machine M Algorithm for FSM to RE(heuristic) fsmtoregexheuristic(M: FSM) =
- Remove from M-any unreachable states.
- No accepting states then return the RE ø.
- If the start state of M is has incoming transitions into it, create a new start state s.
- If there is more than one accepting state of M or one accepting state with outgoing transitions from it, create a new accepting state.
- M has only one state, So L (M} = { ε } and return RE ε.
- Until only the start state and the accepting state remain do: 6.1. Select some state rip of M. 6.2. Remove rip from M. 6.3. Modify the transitions. The labels on the rewritten transitions may be any regular expression.
- Return the regular expression that labels from the start state to the accepting state.
Example 1 for building a RE from FSM Let M be: Step 1 :Create a new start state and a new accepting state and link them to M After adding new start state 4 and accepting state 5 Step 2 : let rip be state 3
Theorem 2 :For Every FSM ,there is an equivalent regular expression Statement : Every regular language can be defined with a regular expression. Proof : By Construction Let FSM M = (K,∑,δ,S,A),construct a regular expression α such that L(M) = L(α) Collapsing Multiple Transitions {C1,C2,C3. ..... Cn} - Multiple Transition Delete and replace by {C1 U C2 U C3. ..... U Cn} If any of the transitions are missing, add them without changing L(M) by labeling all of the new transitions with the RE ø.
Select a state rip and remove it and modify the transitions as shown below. Consider any states p and q.once we remove rip,how can M get from p to q? Let R(p,q) be RE that labels the transition in M from P to Q.Then the new machine M' will be removing rip,so R'(p,q) R'(p,q) = R(p,q) U R(p,rip)R(rip,rip)R(rip,q)* Ripping States out one at a time R'(1,3) = R(1,3) U R(1,rip)R(rip,rip)R(rip,3) = R(1,3) U R(1,2)R(2,2)R(2,3) = ø U aba = aba Algorithm to build RE that describes L(M) from any FSM M = (K,∑,δ,S,A) Two Sub Routines:
- standardize : To convert M to the required form
- buildregex : Construct the required RE from modified machine M 1. Standardize (M:FSM) i. Remove unreachable states from M ii. Modify start state iii. Modify accepting states iv. If there is more than one transition between states p and q ,collapse them to single transition v. If there is no transition between p and q and p ∉A, q ∉S,then create a transiton between p and q labled Φ
Step 3 : let rip be state 2 1 - 3: (a U bb)ba After removing rip state 2 RE = (a U bb)ba Example 3: Build RE From FSM Step 1 : Remove state s as it is dead state After removing state s Step 2 : Add new start state t and new accepting state u
After adding t and u Step 3 : Let rip be state q p-q-p: 01 After removing rip state q Step 4 : Let rip be state r p-r-p: 10 After removing rip state r RE = (01 U 10)*