Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Ghost Column Generation Bug in .loc[] with Series Column Selector (Pandas 2.2.2) #61049

Closed
3 tasks done
Jeevacse07 opened this issue Mar 4, 2025 · 2 comments
Closed
3 tasks done
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@Jeevacse07
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
print(pd.__version__)  # 2.2.2

data = {'CustomerID': [101, 102, 103, 104, 105],
        'Name': ['John', 'Alice', 'Bob', 'David', 'Mike'],
        'CreditScore': [650, 720, 710, 600, 750],
        'LoanAmount': [40000, 70000, 80000, 30000, 120000],
        'AccountType': ['Savings', 'Current', 'Current', 'Savings', 'Current']}

df = pd.DataFrame(data)

# This line generates unexpected columns without warning
df.loc[(df['AccountType'] == "Current") & (df['CreditScore'] > 700), df['LoanAmount']] = 90000

print(df)

Issue Description

I found a hidden bug inside .loc[] method that silently generates new columns when the second parameter is a Series column selector.

Expected Behavior

Only LoanAmount column should be updated.

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2e python : 3.12.7.final.0 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.26100 machine : AMD64 processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_India.1252

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 75.1.0
pip : 24.2
Cython : None
pytest : 7.4.4
hypothesis : None
sphinx : 7.3.7
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.2.1
html5lib : None
pymysql : 1.4.6
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.27.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : 1.3.7
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.1
gcsfs : None
matplotlib : 3.9.2
numba : 0.60.0
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.13.1
sqlalchemy : 2.0.34
tables : 3.10.1
tabulate : 0.9.0
xarray : 2023.6.0
xlrd : None
zstandard : 0.23.0
tzdata : 2023.3
qtpy : 2.4.1
pyqt5 : None

@Jeevacse07 Jeevacse07 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 4, 2025
@goutam-kul
Copy link

Hello @Jeevacse07 this is not really a bug, in your code instead of passing df['LoanAmount'] as column_selector argument you need to pass just the string(name) of the column. e.g:

df.loc[(df['AccountType'] == "Current") & (df['CreditScore'] > 700), ['LoanAmount']] = 90000
print(df)

Output:

CustomerID   Name  CreditScore  LoanAmount AccountType
0         101   John          650       40000     Savings
1         102  Alice          720       90000     Current
2         103    Bob          710       90000     Current
3         104  David          600       30000     Savings
4         105   Mike          750       90000     Current
  • As you can see only the LoanAmount column is updated with new values.

The column_selector argument only accepts column names as string or list of strings.

Using df['LoanAmount'] returns all the values of the columns and hence there are new columns being created all with these names : 40000, 70000, 80000, 30000, 120000 and since we are querying results based on condition:
(df['AccountType'] == "Current") & (df['CreditScore'] > 700)

The values are set to 90000 where these conditions match.

Hope this helps !

@rhshadrach
Copy link
Member

Agreed @goutam-kul - this is expected behavior. Closing.

@rhshadrach rhshadrach closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2025
@rhshadrach rhshadrach added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

3 participants