Skip to content

Multiarray searchsorted fails #14833

Open
@Anaphory

Description

@Anaphory

Code Sample, a copy-pastable example if possible

pandas.MultiIndex([[0],["a"]], [[0],[0]]).searchsorted((1,"b"))

Problem description

The entry (1,"b") should come after the existing (0,"a") in the MultiIndex. (Alternatively, MultiIndex could throw a clean error message.) Instead, an intransparent exception is raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/site-packages/pandas/core/base.py", line 1156, in searchsorted
    return self.values.searchsorted(key, side=side, sorter=sorter)
TypeError: unorderable types: tuple() > str()

This is because Index.searchsorted naïvely passes its arguments to numpy.searchsorted, which is unaware that its second argument is a sequence of tuples, not a plain array just of dimension one higher.

Expected Output

1

Output of pd.show_versions()

# Paste the output here pd.show_versions() here NSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.8.10-1-ARCH machine: x86_64 processor:  byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 0.8.0
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Activity

jreback

jreback commented on Dec 8, 2016

@jreback
Contributor

I suppose could just disable this. numpy doesn't undertsand object array searchsorted generally

In [4]: pandas.MultiIndex([[0],["a"]], [[0],[0]]).values.searchsorted((1,"b"))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b9352c7b8bea> in <module>()
----> 1 pandas.MultiIndex([[0],["a"]], [[0],[0]]).values.searchsorted((1,"b"))

TypeError: unorderable types: tuple() > str()

maybe just raise a NotImplementedError. This is pretty much a useless operation anyhow, you always to search by levels via indexing

added this to the Next Major Release milestone on Dec 8, 2016
ritwickdsouza

ritwickdsouza commented on Feb 1, 2017

@ritwickdsouza

Is this issue still available to fix ?

jreback

jreback commented on Feb 1, 2017

@jreback
Contributor

yes,

note that we should simply define .searchsorted in pandas/indexes/multi.py and use the direct indexers, .get_indexer, which is way more efficient (as its hashtable based).

ritwickdsouza

ritwickdsouza commented on Feb 3, 2017

@ritwickdsouza

@jreback I am new to pandas, could you throw some light on how i can use .get_indexer to implement .searchsorted ?

jreback

jreback commented on Feb 3, 2017

@jreback
Contributor

.get_indexer returns the indexer , IOW the location of the point. -1 marks not found items. This works on any multi-index. Note these don't even have to be sorted (but is more efficient if they are).

In [2]: i = pd.MultiIndex.from_tuples([(0, 'a'), (0, 'b'), (1, 'a')])

In [3]: i
Out[3]: 
MultiIndex(levels=[[0, 1], ['a', 'b']],
           labels=[[0, 0, 1], [0, 1, 0]])

In [4]: i.values
Out[4]: array([(0, 'a'), (0, 'b'), (1, 'a')], dtype=object)

In [5]: i.get_indexer([(0,'b'), (1, 'a'), (2, 'c')])
Out[5]: array([ 1,  2, -1])

Here's what searchsorted does; I am using an interger array because numpy doesn't play nice with tuples. It returns the indexer of the match (IOW where it is in the array). Note if something is not found it returns the last index before that (which is really unintuitve!)

In [6]: np.array([1, 2, 3]).searchsorted([2, 3])
Out[6]: array([1, 2])

In [7]: np.array([1, 2, 3]).searchsorted([2, 3, 5])
Out[7]: array([1, 2, 3])
bhavybarca

bhavybarca commented on Mar 2, 2018

@bhavybarca

@jreback this issue still open ?

jreback

jreback commented on Mar 2, 2018

@jreback
Contributor

yes

5 remaining items

SaturnFromTitan

SaturnFromTitan commented on Mar 4, 2020

@SaturnFromTitan
Contributor

take

removed their assignment
on Mar 4, 2020
Condielj

Condielj commented on Mar 28, 2022

@Condielj

take

removed this from the Contributions Welcome milestone on Oct 13, 2022
GSAUC3

GSAUC3 commented on May 10, 2025

@GSAUC3

Hi, is anyone still working on this, or may I take it up?
if the answer is NO, i.e. no one is working on this, then i have a couple of question:

btel suggested in #14833 (comment)

This can be one way to handle it. but it assumes the input array to be of 2-dimensional.
Should the input array be restricted to 2 dimensional ?
May i go ahead with this implementation, or should i just simple raise NotImplementedError ?

GSAUC3

GSAUC3 commented on May 11, 2025

@GSAUC3

take

GSAUC3

GSAUC3 commented on May 22, 2025

@GSAUC3

@jreback hi, I am relatively new to open source, and i saw there was no activity on this, since march 2022, so i thought of taking this issue up. I see that @mroeschke has removed this from contributions welcome. I am not entirely sure what exactly it means. Does that mean no contributions will be accepted? Apologies if this is a silly doubt, and thanks for your guidance in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @btel@jreback@TomAugspurger@Anaphory@jbrockmendel

    Issue actions

      Multiarray searchsorted fails · Issue #14833 · pandas-dev/pandas