Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

P problem NP problem, Study notes of Algorithms and Programming

Given two strings s and t of lengths m and n, find the minimum window substring in s that contains all the characters of t (including duplicates). If no such substring exists, return an empty string "".

Typology: Study notes

2024/2025

Uploaded on 12/05/2024

yashika-jain-4
yashika-jain-4 🇮🇳

2 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Topics:
Basics of Strings
Brute-force String Matcher
Rabin-Karp String Matching Algorithm
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download P problem NP problem and more Study notes Algorithms and Programming in PDF only on Docsity!

Topics:

  • Basics of Strings
  • Brute-force String Matcher
  • Rabin-Karp String Matching Algorithm

Basics of Strings:

  • In string matching problems, it is required to find the occurrences of a pattern

in a text.

  • These problems find applications in text processing, text-editing, computer

security, and DNA sequence analysis.

Brute force string matching algorithm:

  • The most simple approach for the string matching problem is Brute-force

algorithm, which is also known as Naïve (Brute-Force) Approach/Algorithm.

-- The basic idea behind it is that the pattern and text are compared character by character.

-- In case if a character is not matched, then the pattern is shifted one position to the right and the comparison is repeated until a match is found or the end of the text is reached.

Brute force string matching algorithm:

  • Text : T[1..n] of length n and Pattern P[1..m] of length m.
  • The elements of P and T are characters drawn from a finite alphabet set.
  • For example = {0,1} or = {a, b,... , z}, or = {c, g, a, t}.
  • The character arrays of P and T are also referred to as strings of characters.
  • Pattern P is said to occur with shift s in text T

The string-matching problem is the problem of finding all valid shifts with

which a given pattern P occurs in a given text T.

if 0  s  n-m and

T[s+1..s+m] = P[1..m] or

T[s + j] = P[j] for 1  j m,

such a shift is called a valid shift.

Naïve (Brute-Force) Algorithm:

  • Check for pattern starting at each text position -- Recursive Formulation (naiveMatch_rec) -- Iterative Approach (naiveMatch_itr)

T=

P=

Find whether P is present

in T or not

Naïve (Brute-Force) Algorithm:

  • Check for pattern starting at each text position -- Recursive Formulation (naiveMatch_rec) -- Iterative Approach (naiveMatch_itr)

Algorithm naiveMatch_rec (T[ ], N, P[ ], M) if (N < M) then return 0; else if (M == -1) then return 1; else if (T[N] == P[M]) then return (naiveMatchRec (T, N-1, P, M-1)); else return (naiveMatchRec (T, N-1, P, M));

Rabin-Karp Algorithm:

Let  = {0,1,2,.. .,9}.

We can view a string of k consecutive characters as representing a length-k

decimal number.

Let p denote the decimal number for P[1..m]

Let ts denote the decimal value of the length-m substring T[s+1..s+m] of T[1..n]

for s = 0, 1,.. ., n-m.

ts = p if and only if

T[s+1..s+m] = P[1..m], and s is a valid shift.

p = P[m] + 10(P[m-1] +10(P[m-2]+... +10(P[2]+10(P[1]))

We can compute p in O(m) time.

Similarly we can compute t 0 from T[1..m] in O(m) time.

Rabin-Karp Algorithm:

p = P[m] + 10(P[m-1] +10(P[m-2]+... +10(P[2]+10(P[1]))

m =

Rabin-Karp Algorithm:

Computation of p and t 0 and the recurrence is done using modulus q.

In general, with a d-ary alphabet {0,1,…,d-1}, q is chosen such that d  q fits within a

computer word.

The recurrence equation can be rewritten as ts+1 = (d(ts –T[s+1]h)+ T[s+m+1]) mod q, where h = dm-1(mod q) is the value of the digit “1” in the high order position of an m- digit text window.

Note that ts  p mod q does not imply that ts = p.

However, if ts is not equivalent to p mod q ,

then ts  p, and the shift s is invalid.

We use ts  p mod q as a fast heuristic test to rule out the invalid shifts.

Further testing is done to eliminate spurious hits.

  • an explicit test to check whether P[1..m] = T[s+1..s+m]

Rabin-Karp Algorithm:

ts+1 = (d(ts –T[s+1]h)+ T[s+m+1]) mod q h = dm-1(mod q) Example :

T = 31415; P = 26, n = 5, m = 2, q = 11

p = 26 mod 11 = 4 t0 = 31 mod 11 = 9 t1 = (10(9 - 3(10) mod 11 ) + 4) mod 11 = (10 (9- 8) + 4) mod 11 = 14 mod 11 = 3

Rabin-Karp Algorithm:

Figure: The same text string with values computed modulo 13 for each possible position of a length-5 window. Assuming the pattern P = 31415, we look for windows whose value modulo 13 is 7, since 31415 7 (mod 13). Two such windows are found, shown shaded in the figure. The first, beginning at text position 7, is indeed an occurrence of the pattern, while the second, beginning at text position 13, is a spurious hit.

ts+1 = (d(ts - T[s + 1]h) + T[s + m + 1]) mod q

Rabin-Karp Algorithm:

Procedure RABIN-KARP-MATCHER(T, P, d, q)

Input : Text T, pattern P, radix d ( which is typically = ), and the

prime q. Output : valid shifts s where P matches

  1. n  length[T];
  2. m  length[P];
  3. h  dm-1^ mod q;
  4. p  0;
  5. t 0  0;
  6. for i  1 to m
  7. do p  (d p + P[i] mod q;
  8. t 0  (d t 0 +T[i] mod q;
  9. for s  0 to n-m
  10. do if p = ts
  11. then if P[1..m] = T[s+1..s+m]
  12. then “pattern occurs with shift ‘s’
  13. if s < n-m
  14. then ts+1  (d(ts –T[s+1]h)+ T[s+m+1]) mod q;

The loop of line 9 takes 𝜽((n-m+1)m) time The loop 6-8 takes O(m) time The overall running time is O((n-m)m)