Longest repeated substring problem is a problem of finding the longest substring that occurs at least twice in a given string. This is also one of the important interview questions.
Problem Statement
Given a string S
, consider all duplicated substrings: (contiguous) substrings of S that occur more than once. (The occurrences may overlap.) Return any duplicated substring that has the longest possible length. (If S
does not have a duplicated substring, the answer is ""
.)
Example 1:
Input: “banana”, Output: “ana”
Example 2:
Input: “abcd”, Output: “”
Optimized Solution Using Binary Search & Rabin-Karp
The task of searching longest repeated substring can be divided into following two sub tasks
Subtask 1: Perform a search by a substring length L in interval 1 to N
IA naïve solution to check all possible string length one by one would be in-efficient. The fact that if there is a duplicate string of length k then there will be duplicated string of length k – 1 could be used to optimize the algorithm. Binary search algorithm reduces the complexity of searching the length to O(logN).
Subtask 2: Then check if there is a duplicate substring of length L
The optimum way to check for duplicate sub-string of a given length is by Rabin-karp method. It uses hashing to find an exact match of a pattern string in a text.
The idea of the algorithm is
- Calculate the hash for the pattern of length L
- Move a sliding window of length L along the string of length N
- Check if the hash of string in the sliding window is equal to hash pattern
- If yes, check if two string are equal
Improvement in Rabin-Karp for our problem
For solving longest duplicate sub-string problem; we need to make the following improvement in Rabin-Karp.
- Search multiple patterns instead of one by storing previous hash in a set.
- Use rolling hash instead of calculating it every time
- Use bigger hashing mod to calculate hash in constant time reduces complexity to O(N)
Java Code Snippet
class Solution {
long mod=0;
public String longestDupSubstring(String S) {
mod=(long)1<<32;
int n=S.length();
int left=1, right=n;
char[] nums=S.toCharArray();
while(left<=right){
int mid=left+ (right-left)/2;
if(search(mid,n,nums)!=-1) left=mid+1;
else right=mid-1;
}
int start=search(left-1,n,nums);
return S.substring(start,start+left-1);
}
int search(int l,int n, char[] nums){
long h=0;
for(int i=0;i<l;i++){
h=(h*26 + (nums[i] - 'a'))%mod;
}
Set<Long> set=new HashSet<>();
set.add(h);
long aL = 1;
for (int i = 1; i <= l; ++i) aL = (aL * 26) % mod;
for(int i=1;i<n-l+1;i++){
h=(long)(h*26-(nums[i-1]-'a')*aL%mod +mod)%mod;
h= (h+(nums[i+l-1]-'a'))%mod;
if(set.contains(h)) return i;
set.add(h);
}
return -1;
}
}
Performance
Above algorithm is better than most other algorithm. It has time complexity of O(nlog(n)) and space complexity of O(n).
Related Posts
C P Gupta is a YouTuber and Blogger. He is expert in Microsoft Word, Excel and PowerPoint. His YouTube channel @pickupbrain is very popular and has crossed 9.9 Million Views.