Skip to main content

Substring with Concatenation of All Words


Substring with Concatenation of All Words: You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated substring in s is a substring that contains all the strings of any permutation of words concatenated.

For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated substring because it is not the concatenation of any permutation of words. Return the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Since words.length == 2 and words[i].length == 3,
the concatenated substring has to be of length 6.
The substring starting at 0 is "barfoo". It is the concatenation of
["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of
["foo","bar"] which is a permutation of words.
The output order does not matter. Returning [9,0] is fine too.

Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: Since words.length == 4 and words[i].length == 4,
the concatenated substring has to be of length 16.
There is no substring of length 16 is s that
is equal to the concatenation of any permutation of words.
We return an empty array.

Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation: Since words.length == 3 and words[i].length == 3,
the concatenated substring has to be of length 9.
The substring starting at 6 is "foobarthe".
It is the concatenation of ["foo","bar","the"] which is a permutation of words.
The substring starting at 9 is "barthefoo".
It is the concatenation of ["bar","the","foo"] which is a permutation of words.
The substring starting at 12 is "thefoobar".
It is the concatenation of ["the","foo","bar"] which is a permutation of words.

Constraints:
  • 1 <= s.length <= 10^4
  • 1 <= words.length <= 5000
  • 1 <= words[i].length <= 30
  • s and words[i] consist of lowercase English letters.

Try this Problem on your own or check similar problems:

  1. Minimum Window Substring
Solution:
class Solution {
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
Map<String, Integer> wordFreq = new HashMap<>();
for(String word: words){
wordFreq.put(word, wordFreq.getOrDefault(word, 0) + 1);
}

int wordLength = words[0].length(), windowLength = wordLength * words.length;
for(int i = 0; i < wordLength; ++i){
checkWindow(i, s, wordFreq, windowLength, wordLength, words.length, result);
}
return result;
}

private void checkWindow(int start, String s, Map<String, Integer> freqMap, int windowLength, int wordLength, int numberOfWords, List<Integer> result){
Map<String, Integer> found = new HashMap<>();
int used = 0;
boolean tooManyOfAWord = false;

for (int end = start; end <= s.length() - wordLength; end += wordLength) {
String subString = s.substring(end, end + wordLength);
if (!freqMap.containsKey(subString)) {
found.clear();
used = 0;
tooManyOfAWord = false;
start = end + wordLength;
} else {
while (end - start == windowLength || tooManyOfAWord) {
String wordClosestToStart = s.substring(start, start + wordLength);
start += wordLength;
found.put(wordClosestToStart, found.get(wordClosestToStart) - 1);

if (found.get(wordClosestToStart) >= freqMap.get(wordClosestToStart)) {
tooManyOfAWord = false;
} else {
--used;
}
}

found.put(subString, found.getOrDefault(subString, 0) + 1);
if (found.get(subString) <= freqMap.get(subString)) {
++used;
} else {
tooManyOfAWord = true;
}

if (used == numberOfWords && !tooManyOfAWord) {
result.add(start);
}
}
}
}
}

Time/Space Complexity:
  • Time Complexity: O(n*w+x)
  • Space Complexity: O(x+w)

Explanation:

So, the straightforward approach would be to utilize the information that substring will be of length equal to sum of all equal length words and create a window that's going to be shifted from the start to end of the input string each time checking if we have a valid concatenation.

class Solution {
public List<Integer> findSubstring(String s, String[] words) {
List<Integer> result = new ArrayList<>();
Map<String, Integer> wordFreq = new HashMap<>();
for(String word: words){
wordFreq.put(word, wordFreq.getOrDefault(word, 0) + 1);
}
int windowLength = words[0].length() * words.length;
for(int i = 0; i <= s.length() - windowLength; ++i){
Map<String, Integer> map = new HashMap<>(wordFreq);
if(isConcatenatedSubstring(i, s, map, windowLength, words[0].length())){
result.add(i);
}
}
return result;
}

private boolean isConcatenatedSubstring(int index, String s, Map<String, Integer> map, int windowLength, int wordLength){
for(int i = index; i < index + windowLength; i += wordLength){
String subString = s.substring(i, i + wordLength);
if(!map.containsKey(subString)) return false;
if(map.get(subString) == 1) map.remove(subString);
else map.put(subString, map.get(subString) - 1);
}

return map.isEmpty();
}
}

The approach being used here is as follows:

  • First, a frequency map wordFreq of the words in words is created.
  • Then, the length of the concatenated string is calculated as windowLength which is the length of each word in words multiplied by the number of words in words.
  • A sliding window of size windowLength is then moved over the string s from left to right and for each starting index i, the following steps are performed:
  1. A copy of wordFreq map is created, so we maintain count for the current window
  2. The function isConcatenatedSubstring is called with the current starting index i, the string s, the copy of wordFreq map, windowLength and the length of each word in words. If the function isConcatenatedSubstring returns true, it means that the substring of s starting at index i and of length windowLength is a valid concatenation of all the words in words. The starting index i is added to the list result. After the loop ends, the list result containing all the starting indices of the valid substrings is returned.

The function isConcatenatedSubstring checks if the substring of s starting at index and of length windowLength is a valid concatenation of all the words in words. It does this by iterating over the substring and for each wordLength sized substring, it checks if the substring is present in the map, decrementing it frequency each time we encounter it. We finally check if the map is empty (all words have been visited). The time complexity is the time required to iterate over O(n - w * x) where n is the length of the input string, w is the length of single word and x is the number of words, while we doing a isConcatenatedSubstring check for each iteration which is O(w*x) leading to O((n-w*x)*w*x) time complexity. The space complexity is O(x) number of words in the worst case and O(w) since we're creating a substring so in total O(w+x).

In the approach given in the Solution section, a sliding window is used to try to find all the valid substrings in a single pass of the string s, instead of checking for one valid substring at a time as in the previous approach.

The start and end bounds of the sliding window are initialized at 0 and a variable end is used to iterate over the string s by wordLength at a time, where wordLength is the length of each word in words. For each iteration, a substring subString is extracted from s using the current value of end as the starting index and wordLength as the length.

If subString is not present in words, the entire window is reset and the process starts again from the next iteration. If subString is present in words, it is added to a hash table that is used to keep track of the words in the current window.

When the size of the current window reaches the maximum size, which is the length of the concatenation of all the words in words, it is checked if it is a valid substring. This is done by checking if the number of distinct words in the current window is equal to the number of words in words using a variable used.

If the current window is a valid substring, the start bound of the window is added to the list of results. Whether the current window is a valid substring or not, if its size has reached the maximum size, the start bound is moved to the right by wordLength. This is done by removing the word closest to the start bound from the hash table and updating used accordingly.

If at any point, a word is found in the current window that has a higher frequency in the current window than in words, it is marked as a tooManyOfAWord and the start bound is moved to the right until there valid number of that word occurrences. If the tooManyOfAWord is true in the current window, a valid substring cannot be formed.

The above process is repeated for each starting index of the sliding window, which is incremented by wordLength each time. The starting indices for the sliding window are chosen such that the concatenation of all the words in words will not overlap with a previously checked substring.

We have the same space complexity as in previous approach, but for time complexity we have O(x + n * w) since we will call the checkWindow function w times, and the complexity of it is n since it's called n/w * w = n times (w since we're creating a substring of w length and we have loop with at most n/w iterations).