In programming, we often encounter tasks related to string processing, and one such task is finding anagrams. This can be useful in creating games, text analyzers, and in algorithmic competitions and interviews.
In this article, we will break down what an anagram is, review the algorithm for finding it, and implement the solution in Python with a detailed explanation of the code.
An anagram is a way of forming new words by rearranging the letters of another, given word.
For example: listen => silent; save => vase
This principle is often used in puzzles and logic tests. The task of finding anagrams is a popular livecoding task at Python developer interviews.
The task usually goes as follows:
Write a method to find anagrams. The input of the method should be a list of words, and the output should be a list of words that are anagrams.
from collections import defaultdict
def find_anagrams(*, original_word: list[str]) -> list[str]:
anagram_groups = defaultdict(list)
for word in original_word:
sorted_word = "".join(sorted(word))
anagram_groups[sorted_word].append(word)
result = [group[0] for group in anagram_groups.values() if len(group) > 1]
return result
>>> words = ["aba", "bac", "abb", "bab", "bba", "aab", "abca"]
>>> find_anagrams(original_word=words)
Let's break down the algorithm for finding anagrams in Python step by step:
Importing defaultdict
from the module collections
, which creates a dictionary with default values for non-existent keys.
Define the function find_anagrams
, which takes one named argument original_word
— is a list of strings (words). We will return a list of strings.
Use defaultdict
to create a dictionary anagram_groups
, where the keys will be the sorted letters of the words, and the values will be lists of words that are anagrams of each other.
We loop through every word from original_word
. In each iteration, the word is sorted by letter (so that all anagrams have the same key), and this sorted word is used as a key to add the original word to the anagram_groups
.
We form a new result
list, which is filled with the first word from each anagram group, if there is more than one word in this group.
The considered algorithm of anagram search shows how to group words by their composition with minimal computational resources. The use of defaultdict
simplifies work with dictionaries, and string sorting allows to find matches efficiently.
The algorithm can be useful in word processing tasks and comes in handy in job interviews, especially in companies where knowledge of data structures is important. In real projects, similar methods are used in NLP, bioinformatics and automatic text processing.
If you have ideas on how to improve the code or better ways to accomplish this task, share them.
defaultdict
automatically creates a new entry (list) for the new key, simplifying the code. With a normal dictionary, you would have to check the existence of the key and create the list manually.