Finding the most common items in a list in Python

I recently had a simple, but quite interesting task at a job interview. And it has to do with our favorite lists again.

Finding the most common items in a list is a popular task in Python interviews. In this tutorial, we'll look at two ways to solve it.

Task:

We have an array of integers with the following numbers: nums = [1, 1, 1, 2, 2, 3, 3, 3, 0, 5]

This array of numbers may or may not be sorted. You need to write a function that returns the k = 2 most frequent elements in the array num.

Expected Result: [1, 3] or [3, 1] — order is not important.

Explanation: number 1 occurs 3 times, number 2 occurs 2 times, number 3 — 3 times, number 0 — 1 time and number 5 1 time too. Based on the condition, that the number k = 2 is expected to find the 2 most frequently repeated elements of the array.

You can solve this problem in several ways: using built-in Python functions and libraries, or you can solve it differently — the very way you are expected to do in a job interview. Let's look at both.

Solution using the built-in Python libraries

# solution using built-in Python libraries
from collections import Counter


def top_numbers(nums: list[int], k: int) -> list[int]:
    x = Counter(nums)
    y = x.most_common(k)
    return [i[0] for i in y]

>>> nums = [1, 1, 1, 2, 2, 3, 3, 3, 0, 5]
>>> k = 2
>>> top_numbers(nums=nums, k=k)
[1, 3]

Explanation:

This solution uses a built-in library Python — collections, and specifically the class Counter. This class is specifically designed to count the number of items in a Python list

from collections import Counter — we import the Counter class to count the frequency of elements of the sequence

def top_numbers(nums: list[int], k: int) -> list[int]: — define the function top_numbers, which takes as input a list of numbers nums and an integer k, and returns a list of the most frequently occurring k numbers

x = Counter(nums) — create an object Counter. Now x is a dictionary where keys are numbers from nums and values are the number of occurrences of those numbers. For example, for an input list [1, 1, 1, 2, 2, 3, 3, 3, 0, 5], x will look as follows: {1: 3, 2: 2, 3: 3, 0: 1, 5: 1}

y = x.most_common(k) — call the most_common(k) method of object x. This method returns a list of k tuples sorted by descending frequency. Each tuple contains a number and its frequency. For example, if k = 2, for the previous example y would look like this: [(1, 3), (3, 3)]. (Note that 1 and 3 have the same frequency, the order may be different).

return [i[0] for i in y] — utilise list comprehension to extract only numbers from the tuple list y. For each tuple i in y, we take the first element i[0] (that is, the number itself) and add it to the new list. As a result, the function returns a list of k most frequent numbers. In our example, this would be [1, 3] or [3, 1] (the order may vary).

It's a great, readable solution, and most importantly, it's pretty optimal in terms of resource costs for this function.

Solution without Counter

At the interview it is important not only to be able to use the built-in methods of Python (which is a big plus, because, for example, the collections module is often overlooked), but also to demonstrate the ability to reason logically and apply algorithms to solve problems.

Let's break down the option you most often expect to hear in a job interview:

def top_numbers (nums: List[int], k: int) -> List[int]:
    ct = {}
    for i in nums:
        if i in ct:
            ct[i] += 1
         else:
            ct[i] = 1
        
    items = list(ct.items())
    items.sort(key=lambda item: item[1], reverse=True)
    return [i[0] for i in items[:k]]

Explanation:

ct = {} — initialize an empty dictionary ct, which will store the frequency of each number.

for i in nums: — search through all the numbers in the nums list.

if i in ct: ct[i] += 1 else: ct[i] = 1 — if a number i already exists in the ct dictionary, increment its counter by 1. Otherwise, add number i to the dictionary with counter 1. This code block counts the frequency of each number in the list.

items = list(ct.items()) — transform the dictionary ct into a list of tuples items, where each tuple contains a number and its frequency (e.g., [(1, 3), (2, 2), ...]).

items.sort(key=lambda item: item[1], reverse=True) — sort the list of items in descending order of frequency.

lambda item: item[1] — is an anonymous function that returns the frequency (second element) of the tuple, and this function is used as a key for sorting.

reverse=True indicates sorting in descending order.

return [i[0] for i in items[:k]] — extract the first k elements from the sorted list of items (i.e. k most frequent numbers) and create a new list containing only numbers (first elements of tuples).

If you look closely at the implementation, it is practically 1 to 1 like Counter. If we look at the source code of the implementation of this class, we will find similar methods. This means that in general, if you know how a particular built-in function or class works in Python, you can easily implement it and show it at a job interview.

The interviewer may specifically ask you not to use Counter and to implement the frequency counting logic yourself. In this case, the second option is a good answer. This option also demonstrates an understanding of dictionaries, lists, and sorting algorithms. You can discuss different sorting algorithms and their time complexity with your interviewer.

What in terms of performance and complexity

The second option is a bit more logically complex than the first. In the first option Counter «magically» does the frequency counting for us, while in the second option we have to implement this logic manually. This makes the second option more prone to implementation errors if you don't fully understand what you're doing.

Method comparison: how to count items in a Python list optimally?

Decision	Time complexity	Spatial complexity
With Counter	O(N log k)	O(N)
Without Counter	O(N log N)	O(N)

The first option (with Counter): O(N) to calculate frequencies + O(N log k) for most_common(k) (in the worst case, when all elements are unique). Total time complexity: O(N log k)
The second option: O(N) to calculate frequencies + O(N log N)to sort the list of items. Total time complexity: O(N log N)
Spatial Complexity: Both variants have O(N) spatial complexity for storing frequencies.

The second option has the worst time complexity (O(N log N)), than the first option (O(N log k)), especially when k is much smaller than N. This is because the second variant sorts all elements, whereas Counter.most_common(k) uses an algorithm that is more efficient for finding only the k largest elements.

The first option (with Counter) more readable and concise. It expresses the intent of the code more clearly («calculate the frequencies and take the k most frequent»).

That's all for now, have a great interview and subscribe to our channel to stay up to date with all the latest!