Why do the functions hash(-1) and hash(-2) in Python behave the same?

The hash() function in Python allows you to calculate hash values for various objects. For integers, hash is usually the same as the value, but there are exceptions that can surprise even experienced programmers.

Let's see why hash(-1) and hash(-2) return the same value in CPython. Let's look at the peculiarities of hash(), the internal implementation of hashing integers, and the reason for the special handling of -1.

Interview question

Once, during a job interview, I was asked a very obvious question:

Question:

What does hash() function output for the following values: 1, 0, -1, -2?

Confident in my knowledge and not expecting a trick, I answered daringly:

>>> hash(1)
1
>>> hash(0)
0
>>> hash(-1)
-1
>>> hash(-2)
-2

Oh, how wrong I was! My interviewer corrected me with a smile and demonstrated the unexpected result:

>>> hash(1)
1
>>> hash(0)
0
>>> hash(-1)
-2
>>> hash(-2)
-2

# in addition:
>>> hash(-1) == hash(-2)
True
>>> hash(-1) is hash(-2)
True

The world around me seemed to turn upside down. I even thought the interviewer was joking with me. But no, he was deadly serious. How is this possible? Let's see why hash(-1) and hash(-2) return the same value, -2.

Behind the scenes of hash(): Optimization and CPython

The point is that the behavior of hash() function for integers in Python (more precisely, in CPython implementation, which is the most common one) has its own specifics related to optimization. To understand the essence, we need to consider a few points.

1. Interning small integers

In an effort to be efficient, Python stores small integers in the range -5 to 256 inclusive. This means that for each integer in this range, only one object is created in memory. When you assign a value such as -1 to a variable, Python does not create a new object, but reuses the existing interned -1 object.

The is operator checks the identity of objects, that is, whether two variables refer to the same object in memory. Therefore, hash(-1) is hash(-2) returns True, because both calls to hash() return the same interned object -2.

2. Hash function for integers in CPython

CPython's implementation of the hash function for integers is defined so that hash(x) == x for all integers except -1. An exception is made for -1: hash(-1) == -2.

Why is -1 an exception? To avoid collisions!

The main reason for the -1 exception is to prevent potential collisions of hashes with objects of other types. Hash values play a key role in the operation of dictionaries and sets, data structures that rely on unique hash values to quickly find and store items.

Imagine a situation where you have a custom class and its __hash__ method returns -1:

class MyObject:
    def __hash__(self):
        return -1

obj = MyObject()

print(hash(obj))  # Outputs -1

If hash(-1) also returned -1, this would cause hash collisions when using a MyObject object and the integer -1 in a dictionary or set. To avoid this problem, hash(-1) was deliberately made equal to -2 to separate it from potential hash values of other object types.

How hash() is implemented in CPython source code

Let's take a look at the source code to understand how this is implemented in CPython.

Why source code? Because it is the final answer to the question of how a function is implemented. Documentation describes behavior, but source code shows how that behavior is achieved.

File longobject.c

The source code for CPython is available on GitHub. We need to find the file responsible for implementing integers and hashing them. This file is called longobject.c (previously, before Python 3, it may have been called intobject.c). It is located in the Objects/ directory of the CPython repository.

In the file longobject.c you will find a function called long_hash. This function is responsible for calculating the hash of integers (the long type in C, which is used to represent integers of arbitrary length in Python). For older versions of Python (before 3), the function can be called int_hash

Here's a variant of the long_hash code (the code may vary slightly depending on the Python version):

static Py_hash_t
long_hash(PyObject *obj)
{
    PyLongObject *v = (PyLongObject *)obj; // Convert PyObject in PyLongObject
    Py_uhash_t x; // Variable for storing hash value (unsigned type)
    Py_ssize_t i; // Index for finding digits of a number
    int sign;    // Number sign

    if (_PyLong_IsCompact(v)) { // Verification that the number is "compact"
        x = (Py_uhash_t)_PyLong_CompactValue(v); // We get a compact value
        if (x == (Py_uhash_t)-1) { // Check that the value is equal to -1
            x = (Py_uhash_t)-2; // If it is equal, change it to -2 (key point!)
        }
        return x; // Return hash value
    }
    // The next step is the processing of "non-compact" numbers (large numbers)
    i = _PyLong_DigitCount(v); // We get the number of digits in the number
    sign = _PyLong_NonCompactSign(v); // We get the sign of the number
    x = 0; // Initialize the hash value to zero
    while (--i >= 0) { // Going through the digits of the number from the end
        /* ... (complex hash calculations for large numbers) ... */
    }
    x = x * sign; // Considering the sign of the number
    if (x == (Py_uhash_t)-1) // Check that the final hash value is equal to -1
        x = (Py_uhash_t)-2; // If equal, change to -2 (key point!)
    return (Py_hash_t)x; // Return hash value
}

Code Explanation

static Py_hash_t long_hash(PyObject *obj): This function takes a PyObject, which is a common object type in Python. It then converts this to a PyLongObject, which represents a Python integer. The Py_hash_t type is the type used to store the hash value.
if (_PyLong_IsCompact(v)): This function determines if an integer is «compact». «Compact» integers are integers that can be represented as a single machine word (such as long in C). Interned numbers such as -1 and -2, would be compact. For compact numbers, a faster way of calculating the hash is used.
x = (Py_uhash_t)_PyLong_CompactValue(v);: If the number is compact, _PyLong_CompactValue(v) returns its value as an unsigned integer (Py_uhash_t). This value will be assigned to the variable x.
if (x == (Py_uhash_t)-1): This is where the magic happens! This check looks to see if the calculated hash value is equal to -1.
x = (Py_uhash_t)-2;: If the hash value is -1, it will be replaced by -2. This is the reason why hash(-1) returns -2!
return x;: The calculated (and possibly modified) hash value is returned by the function.
// Next is the processing of «non-compact» numbers (large numbers): If the number is not «compact» (meaning it is a large number that requires more memory to represent), then a more complex algorithm is used to compute the hash.
while (--i >= 0) { ... }: This loop goes through the «digits» of a large number and performs a series of operations to create a hash. The algorithm uses bit shifts and modulo operations to produce a well-distributed hash value.
x = x * sign;: Once the hash value has been calculated, it is multiplied by the sign of the number.
if (x == (Py_uhash_t)-1): Even for large numbers, it is checked that the final hash value is not -1. If it is, it will be changed to -2 as well.

What to expect and what to remember

While this hash() function may seem insignificant, it is important to keep it in mind when working with hash functions and hash-based data structures. In most cases, you won't run into problems, but knowing this detail will help you avoid potential mistakes and better understand the inner workings of Python.

Key insights:

For small integers, Python uses optimization. This is called interning.
hash(x) == x for most integers, but hash(-1) == -2 due to internal implementation and to avoid collision.
This behavior is specific to CPython and may be different in other Python (for example, PyPy).
Use == to compare values and is to compare the identity of objects.

I hope that the mystery of hash(-1) is a little bit clearer now!

Questions

Why can hash(-1) and hash(-2) values be equal in Python?

Python since version 3.3 uses a protection mechanism for the hash function of numbers. For cross-platform compatibility, hash(-1) always returns -2, so hash(-1) == hash(-2).

Is it possible to change the behavior of hash() for numbers?

No, the default hashing mechanism is built into Python, but you can override __hash__() in custom classes.