26

Consider this simple setup:

x = pd.Series([1, 2, 3], index=list('abc'))
y = pd.Series([2, 3, 3], index=list('bca'))

x

a    1
b    2
c    3
dtype: int64

y

b    2
c    3
a    3
dtype: int64

As you can see, the indexes are the same, just in a different order.

Now, consider a simple logical comparison using the equality (==) operator:

x == y
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

This throws a ValueError, most likely because the indexes do not match. On the other hand, calling the equivalent eq operator works:

x.eq(y)

a    False
b     True
c     True
dtype: bool

OTOH, the operator method works given y is first reordered...

x == y.reindex_like(x)

a    False
b     True
c     True
dtype: bool

My understanding was that the function and operator comparison should do the same thing, all other things equal. What is eq doing that the operator comparison doesn't?

1

3 Answers 3

36
+50

Viewing the whole traceback for a Series comparison with mismatched indexes, particularly focusing on the exception message:

In [1]: import pandas as pd
In [2]: x = pd.Series([1, 2, 3], index=list('abc'))
In [3]: y = pd.Series([2, 3, 3], index=list('bca'))
In [4]: x == y
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-73b2790c1e5e> in <module>()
----> 1 x == y
/usr/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
   1188 
   1189         elif isinstance(other, ABCSeries) and not self._indexed_same(othe
r):
-> 1190             raise ValueError("Can only compare identically-labeled "
   1191                              "Series objects")
   1192 
ValueError: Can only compare identically-labeled Series objects

we see that this is a deliberate implementation decision. Also, this is not unique to Series objects - DataFrames raise a similar error.

Digging through the Git blame for the relevant lines eventually turns up some relevant commits and issue tracker threads. For example, Series.__eq__ used to completely ignore the RHS's index, and in a comment on a bug report about that behavior, Pandas author Wes McKinney says the following:

This is actually a feature / deliberate choice and not a bug-- it's related to #652. Back in January I changed the comparison methods to do auto-alignment, but found that it led to a large amount of bugs / breakage for users and, in particular, many NumPy functions (which regularly do things like arr[1:] == arr[:-1]; example: np.unique) stopped working.

This gets back to the issue that Series isn't quite ndarray-like enough and should probably not be a subclass of ndarray.

So, I haven't got a good answer for you except for that; auto-alignment would be ideal but I don't think I can do it unless I make Series not a subclass of ndarray. I think this is probably a good idea but not likely to happen until 0.9 or 0.10 (several months down the road).

This was then changed to the current behavior in pandas 0.19.0. Quoting the "what's new" page:

Following Series operators have been changed to make all operators consistent, including DataFrame (GH1134, GH4581, GH13538)

  • Series comparison operators now raise ValueError when index are different.
  • Series logical operators align both index of left and right hand side.

This made the Series behavior match that of DataFrame, which already rejected mismatched indices in comparisons.

In summary, making the comparison operators align indices automatically turned out to break too much stuff, so this was the best alternative.

Sign up to request clarification or add additional context in comments.

1 Comment

Great answer. There should be an investigator badge. Designed for answers like this where the answer author clearly has taken the time to research, read the code, dig through Git to find logical explanations. +1
8

One thing I love about python is that you can peak into source code of almost anything. And from pd.Series.eq source code, it calls:

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # other stuff
    # ...

    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)

and go on to pd.Series._binop:

def _binop(self, other, func, level=None, fill_value=None):

    # other stuff
    # ...
    if not self.index.equals(other.index):
        this, other = self.align(other, level=level, join='outer',
                                 copy=False)
        new_index = this.index

That means the eq operator aligns the two series before comparison (which, apparently, the normal operator == does not).

Comments

5

Back to 2012 , when we do not have eq , ne and gt , pandas have the problem : disorder Series will return the unexpected output with logic (>,<,==,!=) , so they doing with a fix (new function added, gt,ge,ne..)

GitHub Ticket reference

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.