Substring with Concatenation of All Words
Substring with Concatenation of All Words:
You are given a string s
and an array of strings words
. All the strings of words
are of the same length.
A concatenated substring in s is a substring that contains all the strings of any permutation of words
concatenated.
For example, if words = ["ab","cd","ef"]
, then "abcdef"
, "abefcd"
, "cdabef"
, "cdefab"
, "efabcd"
, and "efcdab"
are all concatenated strings. "acdbef"
is not a concatenated substring because it is not the concatenation of any permutation of words
.
Return the starting indices of all the concatenated substrings in s
. You can return the answer in any order.
Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Since words.length == 2 and words[i].length == 3,
the concatenated substring has to be of length 6.
The substring starting at 0 is "barfoo". It is the concatenation of
["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of
["foo","bar"] which is a permutation of words.
The output order does not matter. Returning [9,0] is fine too.
Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: Since words.length == 4 and words[i].length == 4,
the concatenated substring has to be of length 16.
There is no substring of length 16 is s that
is equal to the concatenation of any permutation of words.
We return an empty array.
Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation: Since words.length == 3 and words[i].length == 3,
the concatenated substring has to be of length 9.
The substring starting at 6 is "foobarthe".
It is the concatenation of ["foo","bar","the"] which is a permutation of words.
The substring starting at 9 is "barthefoo".
It is the concatenation of ["bar","the","foo"] which is a permutation of words.
The substring starting at 12 is "thefoobar".
It is the concatenation of ["the","foo","bar"] which is a permutation of words.
Constraints:
1 <= s.length <= 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 30
s
andwords[i]
consist of lowercase English letters.
Try this Problem on your own or check similar problems:
Solution:
- Java
- JavaScript
- Python
- C++
class Solution {
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
Map<String, Integer> wordFreq = new HashMap<>();
for(String word: words){
wordFreq.put(word, wordFreq.getOrDefault(word, 0) + 1);
}
int wordLength = words[0].length(), windowLength = wordLength * words.length;
for(int i = 0; i < wordLength; ++i){
checkWindow(i, s, wordFreq, windowLength, wordLength, words.length, result);
}
return result;
}
private void checkWindow(int start, String s, Map<String, Integer> freqMap, int windowLength, int wordLength, int numberOfWords, List<Integer> result){
Map<String, Integer> found = new HashMap<>();
int used = 0;
boolean tooManyOfAWord = false;
for (int end = start; end <= s.length() - wordLength; end += wordLength) {
String subString = s.substring(end, end + wordLength);
if (!freqMap.containsKey(subString)) {
found.clear();
used = 0;
tooManyOfAWord = false;
start = end + wordLength;
} else {
while (end - start == windowLength || tooManyOfAWord) {
String wordClosestToStart = s.substring(start, start + wordLength);
start += wordLength;
found.put(wordClosestToStart, found.get(wordClosestToStart) - 1);
if (found.get(wordClosestToStart) >= freqMap.get(wordClosestToStart)) {
tooManyOfAWord = false;
} else {
--used;
}
}
found.put(subString, found.getOrDefault(subString, 0) + 1);
if (found.get(subString) <= freqMap.get(subString)) {
++used;
} else {
tooManyOfAWord = true;
}
if (used == numberOfWords && !tooManyOfAWord) {
result.add(start);
}
}
}
}
}
/**
* @param {string} s
* @param {string[]} words
* @return {number[]}
*/
var findSubstring = function (s, words) {
const result = [];
const wordFreq = new Map();
for (const word of words) {
wordFreq.set(word, (wordFreq.get(word) || 0) + 1);
}
const wordLength = words[0].length;
const windowLength = wordLength * words.length;
for (let i = 0; i < wordLength; i++) {
checkWindow(i, s, wordFreq, windowLength, wordLength, words.length, result);
}
return result;
};
function checkWindow(
start,
s,
freqMap,
windowLength,
wordLength,
numberOfWords,
result
) {
const found = new Map();
let used = 0;
let tooManyOfAWord = false;
for (let end = start; end <= s.length - wordLength; end += wordLength) {
const subString = s.substring(end, end + wordLength);
if (!freqMap.has(subString)) {
found.clear();
used = 0;
tooManyOfAWord = false;
start = end + wordLength;
} else {
while (end - start === windowLength || tooManyOfAWord) {
const wordClosestToStart = s.substring(start, start + wordLength);
start += wordLength;
found.set(wordClosestToStart, found.get(wordClosestToStart) - 1);
if (found.get(wordClosestToStart) >= freqMap.get(wordClosestToStart)) {
tooManyOfAWord = false;
} else {
used--;
}
}
found.set(subString, (found.get(subString) || 0) + 1);
if (found.get(subString) <= freqMap.get(subString)) {
used++;
} else {
tooManyOfAWord = true;
}
if (used === numberOfWords && !tooManyOfAWord) {
result.push(start);
}
}
}
}
class Solution:
def findSubstring(self, s: str, words: List[str]) -> List[int]:
result = []
wordFreq = {}
for word in words:
wordFreq[word] = wordFreq.get(word, 0) + 1
wordLength = len(words[0])
windowLength = wordLength * len(words)
for i in range(wordLength):
self.checkWindow(i, s, wordFreq, windowLength, wordLength, len(words), result)
return result
def checkWindow(self, start, s, freqMap, windowLength, wordLength, numberOfWords, result):
found = {}
used = 0
tooManyOfAWord = False
for end in range(start, len(s) - wordLength + 1, wordLength):
subString = s[end : end + wordLength]
if subString not in freqMap:
found = {}
used = 0
tooManyOfAWord = False
start = end + wordLength
else:
while end - start == windowLength or tooManyOfAWord:
wordClosestToStart = s[start : start + wordLength]
start += wordLength
found[wordClosestToStart] = found.get(wordClosestToStart, 0) - 1
if found[wordClosestToStart] >= freqMap[wordClosestToStart]:
tooManyOfAWord = False
else:
used -= 1
found[subString] = found.get(subString, 0) + 1
if found[subString] <= freqMap[subString]:
used += 1
else:
tooManyOfAWord = True
if used == numberOfWords and not tooManyOfAWord:
result.append(start)
class Solution {
public:
vector<int> findSubstring(string s, vector<string>& words) {
int n = s.length(), k = words.size(), wordLength = words[0].length(), totalLength = wordLength * k;
unordered_map<string, int> wordCount;
for(string word: words)
++wordCount[word];
vector<int> result;
for(int i = 0; i < wordLength; ++i)
checkWindow(i, s, n, wordCount, k , wordLength, totalLength, result);
return result;
}
private:
void checkWindow(int l, string &s, int n, unordered_map<string, int> &wordCount, int k, int wordLength, int totalLength, vector<int> &result) {
unordered_map<string, int> wordsFound;
int wordsUsed = 0;
bool excessWord = false;
for(int r = l; r <= n - wordLength; r += wordLength){
string sub = s.substr(r, wordLength);
if(wordCount.find(sub) == wordCount.end()){
wordsFound.clear();
wordsUsed = 0;
excessWord = false;
l = r + wordLength;
}
else{
while(r - l == totalLength || excessWord) {
string leftmostWord = s.substr(l, wordLength);
l += wordLength;
--wordsFound[leftmostWord];
if(wordsFound[leftmostWord] >= wordCount[leftmostWord])
excessWord = false;
else
--wordsUsed;
}
++wordsFound[sub];
if(wordsFound[sub] <= wordCount[sub])
++wordsUsed;
else
excessWord = true;
if(wordsUsed == k && !excessWord){
result.push_back(l);
}
}
}
}
};
Time/Space Complexity:
- Time Complexity: O(n*w+x)
- Space Complexity: O(x+w)
Explanation:
So, the straightforward approach would be to utilize the information that substring will be of length equal to sum of all equal length words and create a window that's going to be shifted from the start to end of the input string each time checking if we have a valid concatenation.
- Java
- JavaScript
- Python
- C++
class Solution {
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
Map<String, Integer> wordFreq = new HashMap<>();
for(String word: words){
wordFreq.put(word, wordFreq.getOrDefault(word, 0) + 1);
}
int windowLength = words[0].length() * words.length;
for(int i = 0; i <= s.length() - windowLength; ++i){
Map<String, Integer> map = new HashMap<>(wordFreq);
if(isConcatenatedSubstring(i, s, map, windowLength, words[0].length())){
result.add(i);
}
}
return result;
}
private boolean isConcatenatedSubstring(int index, String s, Map<String, Integer> map, int windowLength, int wordLength){
for(int i = index; i < index + windowLength; i += wordLength){
String subString = s.substring(i, i + wordLength);
if(!map.containsKey(subString)) return false;
if(map.get(subString) == 1) map.remove(subString);
else map.put(subString, map.get(subString) - 1);
}
return map.isEmpty();
}
}
/**
* @param {string} s
* @param {string[]} words
* @return {number[]}
*/
var findSubstring = function (s, words) {
const result = [];
const wordFreq = new Map();
for (const word of words) {
wordFreq.set(word, (wordFreq.get(word) || 0) + 1);
}
const windowLength = words[0].length * words.length;
for (let i = 0; i <= s.length - windowLength; ++i) {
const map = new Map(wordFreq);
if (isConcatenatedSubstring(i, s, map, windowLength, words[0].length)) {
result.push(i);
}
}
return result;
};
function isConcatenatedSubstring(index, s, map, windowLength, wordLength) {
for (let i = index; i < index + windowLength; i += wordLength) {
const subString = s.substring(i, i + wordLength);
if (!map.has(subString)) return false;
if (map.get(subString) === 1) map.delete(subString);
else map.set(subString, map.get(subString) - 1);
}
return map.size === 0;
}
class Solution:
def findSubstring(self, s: str, words: List[str]) -> List[int]:
result = []
wordFreq = {}
for word in words:
wordFreq[word] = wordFreq.get(word, 0) + 1
windowLength = len(words[0]) * len(words)
for i in range(len(s) - windowLength + 1):
map = wordFreq.copy()
if self.isConcatenatedSubstring(i, s, map, windowLength, len(words[0])):
result.append(i)
return result
def isConcatenatedSubstring(self, index, s, map, windowLength, wordLength):
for i in range(index, index + windowLength, wordLength):
subString = s[i:i + wordLength]
if subString not in map:
return False
if map[subString] == 1:
del map[subString]
else:
map[subString] -= 1
return len(map) == 0
class Solution {
public:
vector<int> findSubstring(string s, vector<string>& words) {
vector<int> result;
unordered_map<string, int> wordFreq;
for (const string& word : words) {
wordFreq[word]++;
}
int windowLength = words[0].length() * words.size();
for (int i = 0; i <= s.length() - windowLength; ++i) {
unordered_map<string, int> map = wordFreq;
if (isConcatenatedSubstring(i, s, map, windowLength, words[0].length())) {
result.push_back(i);
}
}
return result;
}
private:
bool isConcatenatedSubstring(int index, const string& s, unordered_map<string, int>& map, int windowLength, int wordLength) {
for (int i = index; i < index + windowLength; i += wordLength) {
string subString = s.substr(i, wordLength);
if (map.find(subString) == map.end()) {
return false;
}
if (map[subString] == 1) {
map.erase(subString);
} else {
map[subString]--;
}
}
return map.empty();
}
};
The approach being used here is as follows:
- First, a frequency map
wordFreq
of the words in words is created. - Then, the length of the concatenated string is calculated as
windowLength
which is the length of each word in words multiplied by the number of words in words. - A sliding window of size
windowLength
is then moved over the string s from left to right and for each starting indexi
, the following steps are performed:
- A copy of wordFreq map is created, so we maintain count for the current window
- The function
isConcatenatedSubstring
is called with the current starting indexi
, the strings
, the copy ofwordFreq map
,windowLength
and thelength of each word
in words. If the functionisConcatenatedSubstring
returns true, it means that the substring of s starting at indexi
and of lengthwindowLength
is a valid concatenation of all the words in words. The starting indexi
is added to the list result. After the loop ends, the list result containing all the starting indices of the valid substrings is returned.
The function isConcatenatedSubstring
checks if the substring of s starting at index
and of length windowLength
is a valid concatenation of all the words in words. It does this by iterating over the substring and for each wordLength sized substring
, it checks if the substring is present in the map, decrementing it frequency each time we encounter it. We finally check if the map is empty (all words have been visited). The time complexity is the time required to iterate over O(n - w * x)
where n
is the length of the input string, w
is the length of single word and x
is the number of words, while we doing a isConcatenatedSubstring
check for each iteration which is O(w*x)
leading to O((n-w*x)*w*x)
time complexity. The space complexity is O(x)
number of words in the worst case and O(w)
since we're creating a substring so in total O(w+x)
.
In the approach given in the Solution section, a sliding window is used to try to find all the valid substrings in a single pass of the string s
, instead of checking for one valid substring at a time as in the previous approach.
The start
and end
bounds of the sliding window are initialized at 0
and a variable end
is used to iterate over the string s by wordLength
at a time, where wordLength
is the length of each word in words. For each iteration, a substring subString
is extracted from s
using the current value of end
as the starting index and wordLength
as the length.
If subString
is not present in words, the entire window is reset and the process starts again from the next iteration. If subString
is present in words, it is added to a hash table that is used to keep track of the words in the current window.
When the size of the current window reaches the maximum size, which is the length of the concatenation of all the words in words, it is checked if it is a valid substring. This is done by checking if the number of distinct words in the current window is equal to the number of words in words using a variable used
.
If the current window is a valid substring, the start
bound of the window is added to the list of results. Whether the current window is a valid substring or not, if its size has reached the maximum size, the start
bound is moved to the right by wordLength
. This is done by removing the word closest to the start
bound from the hash table and updating used
accordingly.
If at any point, a word is found in the current window that has a higher frequency in the current window than in words, it is marked as a tooManyOfAWord
and the start
bound is moved to the right until there valid number of that word occurrences. If the tooManyOfAWord
is true
in the current window, a valid substring cannot be formed.
The above process is repeated for each starting index of the sliding window, which is incremented by wordLength
each time. The starting indices for the sliding window are chosen such that the concatenation of all the words in words will not overlap with a previously checked substring.
We have the same space complexity as in previous approach, but for time complexity we have O(x + n * w)
since we will call the checkWindow
function w
times, and the complexity of it is n
since it's called n/w * w = n
times (w
since we're creating a substring of w
length and we have loop with at most n/w
iterations).