Why isn't pandas logical operator aligning on the index like it should?

Question

Consider this simple setup:

x = pd.Series([1, 2, 3], index=list('abc'))
y = pd.Series([2, 3, 3], index=list('bca'))

x

a    1
b    2
c    3
dtype: int64

y

b    2
c    3
a    3
dtype: int64

As you can see, the indexes are the same, just in a different order.

Now, consider a simple logical comparison using the equality (==) operator:

x == y
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

This throws a ValueError, most likely because the indexes do not match. On the other hand, calling the equivalent eq operator works:

x.eq(y)

a    False
b     True
c     True
dtype: bool

OTOH, the operator method works given y is first reordered...

x == y.reindex_like(x)

a    False
b     True
c     True
dtype: bool

My understanding was that the function and operator comparison should do the same thing, all other things equal. What is eq doing that the operator comparison doesn't?

Different issue, but just for network connectedness: stackoverflow.com/questions/41410178/… — sanyassh
– sanyassh, Commented Jun 1, 2019 at 9:13

user2357112 · Accepted Answer · 2019-06-01 00:56:31Z

Viewing the whole traceback for a Series comparison with mismatched indexes, particularly focusing on the exception message:

In [1]: import pandas as pd
In [2]: x = pd.Series([1, 2, 3], index=list('abc'))
In [3]: y = pd.Series([2, 3, 3], index=list('bca'))
In [4]: x == y
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-73b2790c1e5e> in <module>()
----> 1 x == y
/usr/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
   1188 
   1189         elif isinstance(other, ABCSeries) and not self._indexed_same(othe
r):
-> 1190             raise ValueError("Can only compare identically-labeled "
   1191                              "Series objects")
   1192 
ValueError: Can only compare identically-labeled Series objects

we see that this is a deliberate implementation decision. Also, this is not unique to Series objects - DataFrames raise a similar error.

Digging through the Git blame for the relevant lines eventually turns up some relevant commits and issue tracker threads. For example, Series.__eq__ used to completely ignore the RHS's index, and in a comment on a bug report about that behavior, Pandas author Wes McKinney says the following:

This is actually a feature / deliberate choice and not a bug-- it's related to #652. Back in January I changed the comparison methods to do auto-alignment, but found that it led to a large amount of bugs / breakage for users and, in particular, many NumPy functions (which regularly do things like arr[1:] == arr[:-1]; example: np.unique) stopped working.

This gets back to the issue that Series isn't quite ndarray-like enough and should probably not be a subclass of ndarray.

So, I haven't got a good answer for you except for that; auto-alignment would be ideal but I don't think I can do it unless I make Series not a subclass of ndarray. I think this is probably a good idea but not likely to happen until 0.9 or 0.10 (several months down the road).

This was then changed to the current behavior in pandas 0.19.0. Quoting the "what's new" page:

Following Series operators have been changed to make all operators consistent, including DataFrame (GH1134, GH4581, GH13538)

Series comparison operators now raise ValueError when index are different.

Series logical operators align both index of left and right hand side.

This made the Series behavior match that of DataFrame, which already rejected mismatched indices in comparisons.

In summary, making the comparison operators align indices automatically turned out to break too much stuff, so this was the best alternative.

Great answer. There should be an investigator badge. Designed for answers like this where the answer author clearly has taken the time to research, read the code, dig through Git to find logical explanations. +1

Quang Hoang · Accepted Answer · 2019-06-01 00:50:24Z

One thing I love about python is that you can peak into source code of almost anything. And from pd.Series.eq source code, it calls:

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # other stuff
    # ...

    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)

and go on to pd.Series._binop:

def _binop(self, other, func, level=None, fill_value=None):

    # other stuff
    # ...
    if not self.index.equals(other.index):
        this, other = self.align(other, level=level, join='outer',
                                 copy=False)
        new_index = this.index

That means the eq operator aligns the two series before comparison (which, apparently, the normal operator == does not).

BENY · Accepted Answer · 2019-06-01 00:53:42Z

5

Back to 2012 , when we do not have eq , ne and gt , pandas have the problem : disorder Series will return the unexpected output with logic (>,<,==,!=) , so they doing with a fix (new function added, gt,ge,ne..)

GitHub Ticket reference

answered Jun 1, 2019 at 0:53

BENY

324k22 gold badges177 silver badges250 bronze badges

Collectives™ on Stack Overflow

Why isn't pandas logical operator aligning on the index like it should?

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related