Hash table and set

2 minute read

The Problem

Leetcode 720. Longest Word in Dictionary: Given a list of strings words representing an English Dictionary, find the longest word in words that can be built one character at a time by other words in words. If there is more than one possible answer, return the longest word with the smallest lexicographical order.

If there is no answer, return the empty string.

Example 1:
words = ["w","wo","wor","worl", "world"]
Output: "world"
The word "world" can be built one character at a time by "w", "wo", "wor", and "worl".

Example 2:
words = ["a", "banana", "app", "appl", "ap", "apply", "apple"]
Output: "apple"
Both "apply" and "apple" can be built from other words in the dictionary. However, "apple" is lexicographically smaller than "apply".

Note: All the strings in the input will only contain lowercase letters. The length of words will be in the range [1, 1000]. The length of words[i] will be in the range [1, 30].

The Solution

There are 2 observations:

  1. Suppose we find a sequence of words, i.e. “w” -> “wo” -> “wor” -> “worl” -> “world”, then every words in the sequence are words that can be built one character at a time by other words.
  2. Since we need to find the longest word, in the worst case, we need to check every words in the list.

From observation 2, we can iterate the list then check whether each word can be built one character at a time by other words

Observation 1 tells us that for word “world”, we need to check that at least one of the substrings “wrld”, “worl”, “orld”, “wold”, “word” also can be build one character at a time by other words.

Now supposed we know the substring “worl” is a valid word, then when we check the work “world”, we need a quick way to tell that “worl” is valid word, one of the way is to use hash table or set.

def longestWord(words):
    res = ""
    my_set = set()
    for word in sorted(words):
        for i in range(len(word)):
            if word[:i] + word[i+1:] in my_set:
                if len(word) > len(res):
                    res = word
    return res

The Analysis

If the length of the array is n and the length of the longest word is m, then sorting time will be O(nlg(n)), and the complexity of looping will be O(n*m), hence the overall complexity will be O(nlg(n) + n*m).

Hash table or set is usually good when we need to keep track and retrieve the item quickly since their theoretical lookup time is O(1) (although it is not always true when it comes to real-world implementation). Hash table is only different from set if you need to look up value from key. Set can also be implemented with hash table by mapping from key to boolean value true/false.

Similar problems:


Leave a comment