Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning when sending only some column names to profile report #194

Closed
adamrossnelson opened this issue Jun 27, 2019 · 2 comments
Closed
Labels
bug 🐛 Something isn't working

Comments

@adamrossnelson
Copy link

Describe the bug
I was interested in hand-selecting a list of columns to report. Example:

from pathlib import Path

import pandas as pd
import pandas_profiling

# if __name__ == "__main__":
df = pd.read_csv(
    "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
)

df[['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch']].profile_report()

The code above returns the profile. But it also throws an warning.

/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py:3781: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)

To Reproduce
The above code is reproducible using the Titanic dataset. Other code that also reproduces the error:

df[list(df.columns[0:8])].profile_report(title="Sliced Titanic Dataset")

Version information:

absl-py==0.3.0
alabaster==0.7.10
anaconda-client==1.6.14
anaconda-navigator==1.8.7
anaconda-project==0.8.2
appnope==0.1.0
appscript==1.0.1
asn1crypto==0.24.0
astor==0.7.1
astroid==1.6.3
astropy==3.0.2
atomicwrites==1.3.0
attrs==18.1.0
Babel==2.5.3
backcall==0.1.0
backports.shutil-get-terminal-size==1.0.0
beautifulsoup4==4.6.3
bitarray==0.8.1
bkcharts==0.2
blaze==0.11.3
bleach==2.1.3
bokeh==0.12.16
boto==2.48.0
boto3==1.9.78
botocore==1.12.78
Bottleneck==1.2.1
bz2file==0.98
certifi==2018.4.16
cffi==1.11.5
chardet==3.0.4
click==6.7
cloudpickle==0.5.3
clyent==1.2.2
colorama==0.3.9
conda==4.6.14
conda-build==3.10.5
conda-verify==2.0.0
confuse==1.0.0
contextlib2==0.5.5
contractions==0.0.16
cryptography==2.2.2
cycler==0.10.0
Cython==0.28.2
cytoolz==0.9.0.1
dask==0.17.5
datashape==0.5.4
decorator==4.3.0
Deprecated==1.2.4
distributed==1.21.8
docutils==0.14
entrypoints==0.2.3
et-xmlfile==1.0.1
fastcache==1.0.2
fbprophet==0.3.post2
filelock==3.0.4
flair==0.4.0
Flask==1.0.2
Flask-Cors==3.0.4
future==0.17.1
gast==0.2.0
gensim==3.4.0
gevent==1.3.0
glob2==0.6
gmpy2==2.0.8
graphviz==0.8.4
greenlet==0.4.13
grpcio==1.13.0
h5py==2.7.1
heapdict==1.0.0
html5lib==1.0.1
htmlmin==0.1.12
hyperopt==0.1.1
idna==2.6
imageio==2.3.0
imagesize==1.0.0
importlib-metadata==0.18
inflect==1.0.1
ipykernel==4.8.2
ipython==6.5.0
ipython-genutils==0.2.0
ipywidgets==7.2.1
isort==4.3.4
itsdangerous==0.24
jdcal==1.4
jedi==0.12.0
Jinja2==2.10
jmespath==0.9.3
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==5.2.0
jupyter-core==4.4.0
jupyterlab==0.32.1
jupyterlab-launcher==0.10.5
Keras==2.2.2
Keras-Applications==1.0.4
Keras-Preprocessing==1.0.2
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
llvmlite==0.29.0
locket==0.2.0
lxml==4.2.1
Markdown==2.6.11
MarkupSafe==1.0
matplotlib==3.0.0
mccabe==0.6.1
missingno==0.4.1
mistune==0.8.3
mkl-fft==1.0.0
mkl-random==1.0.1
more-itertools==4.1.0
mpld3==0.3
mpmath==1.0.0
msgpack-python==0.5.6
multipledispatch==0.5.0
mysql-connector-python==8.0.12
navigator-updater==0.2.1
nbconvert==5.3.1
nbformat==4.4.0
networkx==2.1
nltk==3.3
nose==1.3.7
notebook==5.5.0
numba==0.44.1
numexpr==2.6.5
numpy==1.14.3
numpydoc==0.8.0
odo==0.5.1
olefile==0.45.1
openpyxl==2.5.3
packaging==17.1
pandas==0.23.4
pandas-datareader==0.7.0
pandas-profiling==2.0.3
pandocfilters==1.4.2
parso==0.2.0
partd==0.3.8
path.py==11.0.1
pathlib2==2.3.2
patsy==0.5.0
pep8==1.7.1
pexpect==4.6.0
phik==0.9.8
pickleshare==0.7.4
Pillow==5.2.0
pkginfo==1.4.2
pluggy==0.12.0
ply==3.11
prompt-toolkit==1.0.15
protobuf==3.6.0
psutil==5.4.5
psycopg2==2.7.5
ptyprocess==0.5.2
py==1.5.3
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.18
pycrypto==2.6.1
pycurl==7.43.0.1
pyflakes==1.6.0
Pygments==2.2.0
pylint==1.8.4
pymongo==3.7.2
pyodbc==4.0.23
pyOpenSSL==18.0.0
pyparsing==2.2.0
PyPDF2==1.26.0
PySocks==1.6.8
pystan==2.18.0.0
pytest==4.6.3
pytest-arraydiff==0.2
pytest-astropy==0.3.0
pytest-doctestplus==0.1.3
pytest-openfiles==0.3.0
pytest-pylint==0.14.0
pytest-remotedata==0.2.1
python-dateutil==2.7.3
pytorch-pretrained-bert==0.3.0
pytz==2018.4
PyWavelets==0.5.2
PyYAML==3.12
pyzmq==17.0.0
QtAwesome==0.4.4
qtconsole==4.3.1
QtPy==1.4.1
regex==2018.8.29
requests==2.19.1
rope==0.10.7
ruamel-yaml==0.15.35
s3transfer==0.1.13
scikit-image==0.13.1
scikit-learn==0.19.1
scipy==1.1.0
seaborn==0.8.1
segtok==1.5.7
selenium==3.14.0
Send2Trash==1.5.0
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.11.0
sklearn==0.0
smart-open==1.7.1
snowballstemmer==1.2.1
sortedcollections==0.6.1
sortedcontainers==1.5.10
Sphinx==1.7.4
sphinxcontrib-websupport==1.0.1
spyder==3.2.8
SQLAlchemy==1.2.7
sqlitedict==1.6.0
stata-kernel==1.5.2
statsmodels==0.9.0
sympy==1.1.1
tables==3.4.3
tblib==1.3.2
tensorboard==1.9.0
tensorflow==1.9.0
termcolor==1.1.0
terminado==0.8.1
testpath==0.3.1
toolz==0.9.0
torch==1.0.0
tornado==5.0.2
tqdm==4.26.0
traitlets==4.3.2
typing==3.6.4
unicodecsv==0.14.1
urllib3==1.22
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.14.1
widgetsnbextension==3.2.1
wrapt==1.10.11
xlrd==1.1.0
XlsxWriter==1.0.4
xlwings==0.11.8
xlwt==1.2.0
zict==0.1.3
zipp==0.5.1

Additional context
For my purposes this code produced the desired output without throwing a warning:

df.iloc[:,0:8].profile_report(title="Sliced Titanic Dataset")
@adamrossnelson adamrossnelson added the bug 🐛 Something isn't working label Jun 27, 2019
@sbrugman
Copy link
Collaborator

Thank you for reporting this. As I understand it, pandas does not recommend chaining with indexing.

Reference:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#why-does-assignment-fail-when-using-chained-indexing

@adamrossnelson
Copy link
Author

I thought this would likely be a pandas issue - not a pandas-profiling issue.

Happy to have reported. Nice to have it documented for others that might have thought to try a similar strategy.

Thanks for a great package. This is great work.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants