Efficient methods for finding anagrams in Python: algorithm parsing and code

In programming, we often encounter tasks related to string processing, and one such task is finding anagrams. This can be useful in creating games, text analyzers, and in algorithmic competitions and interviews.

In this article, we will break down what an anagram is, review the algorithm for finding it, and implement the solution in Python with a detailed explanation of the code.

What is an anagram?

An anagram is a way of forming new words by rearranging the letters of another, given word.

For example: listen => silent; save => vase

This principle is often used in puzzles and logic tests. The task of finding anagrams is a popular livecoding task at Python developer interviews.

The condition of the problem to find anagrams

The task usually goes as follows:

Write a method to find anagrams. The input of the method should be a list of words, and the output should be a list of words that are anagrams.

Example solution

from collections import defaultdict

def find_anagrams(*, original_word: list[str]) -> list[str]:
    anagram_groups = defaultdict(list)

    for word in original_word:
        sorted_word = "".join(sorted(word))
        anagram_groups[sorted_word].append(word)

    result = [group[0] for group in anagram_groups.values() if len(group) > 1]

    return result

>>> words = ["aba", "bac", "abb", "bab", "bba", "aab", "abca"]
>>> find_anagrams(original_word=words)

Code Explanation

Let's break down the algorithm for finding anagrams in Python step by step:

Importing defaultdict from the module collections, which creates a dictionary with default values for non-existent keys.
Define the function find_anagrams, which takes one named argument original_word — is a list of strings (words). We will return a list of strings.
Use defaultdict to create a dictionary anagram_groups, where the keys will be the sorted letters of the words, and the values will be lists of words that are anagrams of each other.
We loop through every word from original_word. In each iteration, the word is sorted by letter (so that all anagrams have the same key), and this sorted word is used as a key to add the original word to the anagram_groups.

We form a new result list, which is filled with the first word from each anagram group, if there is more than one word in this group.

The considered algorithm of anagram search shows how to group words by their composition with minimal computational resources. The use of defaultdict simplifies work with dictionaries, and string sorting allows to find matches efficiently.

The algorithm can be useful in word processing tasks and comes in handy in job interviews, especially in companies where knowledge of data structures is important. In real projects, similar methods are used in NLP, bioinformatics and automatic text processing.

If you have ideas on how to improve the code or better ways to accomplish this task, share them.

Questions

How does the algorithm for finding anagrams in the given code work?

Why does the code use defaultdict and not a regular dictionary?

defaultdict automatically creates a new entry (list) for the new key, simplifying the code. With a normal dictionary, you would have to check the existence of the key and create the list manually.