Description
Code Sample, a copy-pastable example if possible
pandas.MultiIndex([[0],["a"]], [[0],[0]]).searchsorted((1,"b"))
Problem description
The entry (1,"b")
should come after the existing (0,"a")
in the MultiIndex
. (Alternatively, MultiIndex could throw a clean error message.) Instead, an intransparent exception is raised:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/site-packages/pandas/core/base.py", line 1156, in searchsorted
return self.values.searchsorted(key, side=side, sorter=sorter)
TypeError: unorderable types: tuple() > str()
This is because Index.searchsorted
naïvely passes its arguments to numpy.searchsorted
, which is unaware that its second argument is a sequence of tuples, not a plain array just of dimension one higher.
Expected Output
1
Output of pd.show_versions()
pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 0.8.0
xlsxwriter: None
lxml: None
bs4: 4.5.1
html5lib: 0.9999999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
Activity
jreback commentedon Dec 8, 2016
I suppose could just disable this. numpy doesn't undertsand object array searchsorted generally
maybe just raise a
NotImplementedError
. This is pretty much a useless operation anyhow, you always to search by levels via indexingritwickdsouza commentedon Feb 1, 2017
Is this issue still available to fix ?
jreback commentedon Feb 1, 2017
yes,
note that we should simply define
.searchsorted
inpandas/indexes/multi.py
and use the direct indexers,.get_indexer
, which is way more efficient (as its hashtable based).ritwickdsouza commentedon Feb 3, 2017
@jreback I am new to pandas, could you throw some light on how i can use
.get_indexer
to implement.searchsorted
?jreback commentedon Feb 3, 2017
.get_indexer
returns the indexer , IOW the location of the point. -1 marks not found items. This works on any multi-index. Note these don't even have to be sorted (but is more efficient if they are).Here's what searchsorted does; I am using an interger array because numpy doesn't play nice with tuples. It returns the indexer of the match (IOW where it is in the array). Note if something is not found it returns the last index before that (which is really unintuitve!)
bhavybarca commentedon Mar 2, 2018
@jreback this issue still open ?
jreback commentedon Mar 2, 2018
yes
5 remaining items
SaturnFromTitan commentedon Mar 4, 2020
take
Condielj commentedon Mar 28, 2022
take
GSAUC3 commentedon May 10, 2025
Hi, is anyone still working on this, or may I take it up?
if the answer is NO, i.e. no one is working on this, then i have a couple of question:
btel suggested in #14833 (comment)
This can be one way to handle it. but it assumes the input array to be of 2-dimensional.
Should the input array be restricted to 2 dimensional ?
May i go ahead with this implementation, or should i just simple raise NotImplementedError ?
GSAUC3 commentedon May 11, 2025
take
GSAUC3 commentedon May 22, 2025
@jreback hi, I am relatively new to open source, and i saw there was no activity on this, since march 2022, so i thought of taking this issue up. I see that @mroeschke has removed this from contributions welcome. I am not entirely sure what exactly it means. Does that mean no contributions will be accepted? Apologies if this is a silly doubt, and thanks for your guidance in advance.