The hash()
function in Python
allows you to calculate hash values for various objects. For integers, hash
is usually the same as the value, but there are exceptions that can surprise even experienced programmers.
Let's see why hash(-1)
and hash(-2)
return the same value in CPython. Let's look at the peculiarities of hash()
, the internal implementation of hashing integers, and the reason for the special handling of -1
.
Once, during a job interview, I was asked a very obvious question:
Question:
What does hash() function output for the following values: 1, 0, -1, -2
?
Confident in my knowledge and not expecting a trick, I answered daringly:
>>> hash(1)
1
>>> hash(0)
0
>>> hash(-1)
-1
>>> hash(-2)
-2
Oh, how wrong I was! My interviewer corrected me with a smile and demonstrated the unexpected result:
>>> hash(1)
1
>>> hash(0)
0
>>> hash(-1)
-2
>>> hash(-2)
-2
# in addition:
>>> hash(-1) == hash(-2)
True
>>> hash(-1) is hash(-2)
True
The world around me seemed to turn upside down. I even thought the interviewer was joking with me. But no, he was deadly serious. How is this possible? Let's see why hash(-1)
and hash(-2)
return the same value, -2
.
The point is that the behavior of hash()
function for integers in Python
(more precisely, in CPython implementation, which is the most common one) has its own specifics related to optimization. To understand the essence, we need to consider a few points.
In an effort to be efficient, Python
stores small integers in the range -5
to 256
inclusive. This means that for each integer in this range, only one object is created in memory. When you assign a value such as -1
to a variable, Python
does not create a new object, but reuses the existing interned -1
object.
The is
operator checks the identity of objects, that is, whether two variables refer to the same object in memory. Therefore, hash(-1)
is hash(-2)
returns True
, because both calls to hash()
return the same interned object -2
.
CPython's implementation of the hash
function for integers is defined so that hash(x) == x
for all integers except -1
. An exception is made for -1: hash(-1) == -2
.
The main reason for the -1
exception is to prevent potential collisions of hashes with objects of other types. Hash values play a key role in the operation of dictionaries
and sets
, data structures that rely on unique hash values to quickly find and store items.
Imagine a situation where you have a custom class and its __hash__
method returns -1
:
class MyObject:
def __hash__(self):
return -1
obj = MyObject()
print(hash(obj)) # Outputs -1
If hash(-1)
also returned -1
, this would cause hash collisions when using a MyObject
object and the integer -1
in a dictionary or set. To avoid this problem, hash(-1)
was deliberately made equal to -2
to separate it from potential hash values of other object types.
Let's take a look at the source code to understand how this is implemented in CPython.
Why source code? Because it is the final answer to the question of how a function is implemented. Documentation describes behavior, but source code shows how that behavior is achieved.
The source code for CPython is available on GitHub. We need to find the file responsible for implementing integers and hashing them. This file is called longobject.c
(previously, before Python 3
, it may have been called intobject.c
). It is located in the Objects/
directory of the CPython repository.
In the file longobject.c
you will find a function called long_hash
. This function is responsible for calculating the hash of integers (the long
type in C
, which is used to represent integers of arbitrary length in Python
). For older versions of Python (before 3), the function can be called int_hash
Here's a variant of the long_hash
code (the code may vary slightly depending on the Python version):
static Py_hash_t
long_hash(PyObject *obj)
{
PyLongObject *v = (PyLongObject *)obj; // Convert PyObject in PyLongObject
Py_uhash_t x; // Variable for storing hash value (unsigned type)
Py_ssize_t i; // Index for finding digits of a number
int sign; // Number sign
if (_PyLong_IsCompact(v)) { // Verification that the number is "compact"
x = (Py_uhash_t)_PyLong_CompactValue(v); // We get a compact value
if (x == (Py_uhash_t)-1) { // Check that the value is equal to -1
x = (Py_uhash_t)-2; // If it is equal, change it to -2 (key point!)
}
return x; // Return hash value
}
// The next step is the processing of "non-compact" numbers (large numbers)
i = _PyLong_DigitCount(v); // We get the number of digits in the number
sign = _PyLong_NonCompactSign(v); // We get the sign of the number
x = 0; // Initialize the hash value to zero
while (--i >= 0) { // Going through the digits of the number from the end
/* ... (complex hash calculations for large numbers) ... */
}
x = x * sign; // Considering the sign of the number
if (x == (Py_uhash_t)-1) // Check that the final hash value is equal to -1
x = (Py_uhash_t)-2; // If equal, change to -2 (key point!)
return (Py_hash_t)x; // Return hash value
}
static Py_hash_t long_hash(PyObject *obj)
: This function takes a PyObject
, which is a common object type in Python
. It then converts this to a PyLongObject
, which represents a Python
integer. The Py_hash_t
type is the type used to store the hash value.
if (_PyLong_IsCompact(v))
: This function determines if an integer is «compact». «Compact» integers are integers that can be represented as a single machine word (such as long
in C
). Interned numbers such as -1
and -2
, would be compact. For compact numbers, a faster way of calculating the hash is used.
x = (Py_uhash_t)_PyLong_CompactValue(v);
: If the number is compact, _PyLong_CompactValue(v)
returns its value as an unsigned integer (Py_uhash_t)
. This value will be assigned to the variable x
.
if (x == (Py_uhash_t)-1)
: This is where the magic happens! This check looks to see if the calculated hash value is equal to -1
.
x = (Py_uhash_t)-2;
: If the hash value is -1
, it will be replaced by -2
. This is the reason why hash(-1)
returns -2
!
return x;
: The calculated (and possibly modified) hash value is returned by the function.
//
Next is the processing of «non-compact» numbers (large numbers): If the number is not «compact» (meaning it is a large number that requires more memory to represent), then a more complex algorithm is used to compute the hash.
while (--i >= 0) { ... }
: This loop goes through the «digits» of a large number and performs a series of operations to create a hash. The algorithm uses bit shifts and modulo operations to produce a well-distributed hash value.
x = x * sign;
: Once the hash value has been calculated, it is multiplied by the sign of the number.
if (x == (Py_uhash_t)-1)
: Even for large numbers, it is checked that the final hash value is not -1
. If it is, it will be changed to -2
as well.
While this hash()
function may seem insignificant, it is important to keep it in mind when working with hash functions and hash-based data structures. In most cases, you won't run into problems, but knowing this detail will help you avoid potential mistakes and better understand the inner workings of Python
.
Key insights:
For small integers, Python
uses optimization. This is called interning.
hash(x) == x
for most integers, but hash(-1) == -2
due to internal implementation and to avoid collision.
This behavior is specific to CPython
and may be different in other Python
(for example, PyPy
).
Use ==
to compare values and is
to compare the identity of objects.
I hope that the mystery of hash(-1)
is a little bit clearer now!
Python
since version 3.3 uses a protection mechanism for the hash
function of numbers. For cross-platform compatibility, hash(-1)
always returns -2
, so hash(-1) == hash(-2)
.__hash__()
in custom classes.