Closed
Description
DataFrame.lookup
was deprecated in #35224 in 1.2. After some feedback (#39171 ) I opened this ticket to discuss re-implentation of lookup in a performant way. As mentioned in the discussion on 35244: "but it would have to be performant and not be yet another indexing api".
This ticket can be a starting point for proposed methods, although the old implementation was actually quite performant look at given tests in the discussion of 35244:
Lines 3848 to 3861 in b5958ee
Activity
[-]ENH: re-implement DataFrame.lookup in a performant way.[/-][+]ENH: re-implement DataFrame.lookup.[/+]challisd commentedon Aug 4, 2021
I think there should definitely be a lookup function. Since the old one seems to work well, is un-deprecating it an option? I find the proposed alternative using melt to be unreadable, and based on the (sadly heated) discussion here the old lookup function is faster than the melt alternative suggested. Pandas is a module used by many thousands of programmers and scientists who often have only a vague (or no) idea what the melt function does. The ability to run a quick series of lookups using lists of row and column coordinates is a fairly ordinary task, but if you don't provide this lookup function most users will likely fall back on using a slow for loop; and if that's too slow for them, decide to forget it and just use NumPy where you can do
the_data[row_index_list, column_index_list]
Can we please keep this function?
berkgercek commentedon May 9, 2022
Going to throw my voice in here and say that this is a pretty important feature for dataframes that allows for numpy-like behavior with labeled complex indexes and columns.
My personal use case for Pandas is often reliant on using it to keep labels and data together, and working with a method like lookup is a part of how I use it. It also fits in the scope of the package description provided in the documentation:
It's not something I use often, but the proposed solution linked in the deprecation notice is very inelegant.
One use case I have today is to do something similar to the following (obviously with meaningful data), which I have done with the above solution:
This is not a readable solution for me and when others need to maintain this code it will be very much not obvious at a glance what I'm doing here.
If there is another solution that would work equally well but be more intuitive I am happy to use that instead, but I see no alternative to the
.lookup
method for this use case.rhshadrach commentedon May 11, 2022
+1 on re-implementing unless there is a more understandable alternative; I too find it hard to discern what the current alternative is doing.
challisd commentedon Feb 22, 2023
Any word on if or when this feature will be added back in, or has anyone figured out a viable alternative?
erfannariman commentedon Feb 22, 2023
Just to check, is there an agreement that this will be added back in if there's a viable PR before someone (or myself) starts to work on it. @jorisvandenbossche @mroeschke @rhshadrach
18 remaining items
challisd commentedon Nov 7, 2023
I agree that we can't provide a ton of short combinations of pandas methods, but such a common and basic use case certainly should be included in my opinion. What about something like the following:
It allows greater control over the output data type and gives warnings when dealing with mixed data types. It gives an additional warning if attempting to coerce an object column to a non-object data-type as this can easily lead to an exception
MarcoGorelli commentedon Nov 7, 2023
just checking, are there any other dataframe libraries which include this?
challisd commentedon Nov 7, 2023
Not sure, I've mostly only used Pandas in Python. I'm making the claim it's a basic feature based off the evidence that both R and Numpy support this functionality as part of the built-in [] indexing function.
challisd commentedon Nov 8, 2023
Actually, it seems I remembered incorrectly and it is not a basic feature of R. Sorry for the mistake!
stevenae commentedon Feb 21, 2025
Unless this has been deprioritized, I'll try out some optimizations and aim to put a PR up in the 1 week - 1 month time frame (sometime March 2025)
stevenae commentedon Mar 26, 2025
take
stevenae commentedon Mar 27, 2025
Hi, I put up a PR (#61185). There is one CI test failing* but it appears to be unrelated to the change itself (perhaps a flaky test).
*
[Unit Tests / macos-13 actions-312.yaml (pull_request)]
(https://github.com/pandas-dev/pandas/actions/runs/14094479324/job/39478857250?pr=61185)stevenae commentedon Mar 27, 2025
Added another optimization for lookups on subset of columns.
stevenae commentedon Mar 27, 2025
Added one final optimization -- subsetting rows as well when there is a mixture of types.
Also reduced complexity, on the assumption that lookup will be done for less than all rows and/or columns.
stevenae commentedon May 21, 2025
Just a heads up: feedback on #61185 has led to a decision not to re-implement
DataFrame.lookup
. Instead I will add documentation recommending user code.I recommend closing this issue.
stevenae commentedon May 21, 2025
Adding documentation for usercode instead, at #61471