docs(introduction): intro and install improvements (#979)

* overall improvements to the docs initial page * Adding details about missing report sections, ways to consume it, tweaking and uniformizing language * details about file and image analysis * installations: update install instructions for widgets version
ydataai · May 15, 2022 · c31b627 · c31b627
1 parent 188e02b
commit c31b627
Show file tree

Hide file tree

Showing 2 changed files with 43 additions and 37 deletions.
diff --git a/docsrc/source/pages/installation.rst b/docsrc/source/pages/installation.rst
@@ -17,22 +17,22 @@ Using pip
   :alt: PyPi Version
   :target: https://pypi.org/project/pandas-profiling/
 
-You can install using the pip package manager by running
+You can install using the ``pip`` package manager by running:
 
 .. code-block:: console
 
     pip install -U pandas-profiling[notebook]
     jupyter nbextension enable --py widgetsnbextension
 
-If you are in a notebook (locally, at LambdaLabs, on Google Colab or Kaggle), you can run:
+If you are in a notebook (locally, LambdaLabs, Google Colab or Kaggle), you can run:
 
 .. code-block::
 
     import sys
     !{sys.executable} -m pip install -U pandas-profiling[notebook]
     !jupyter nbextension enable --py widgetsnbextension
 
-You may have to restart the kernel or runtime.
+You may have to restart the kernel or runtime for the package to work.
 
 Using conda
 -----------
@@ -45,53 +45,49 @@ Using conda
   :alt: Conda Version
   :target: https://anaconda.org/conda-forge/pandas-profiling
 
-You can install using the conda package manager by running
+A new conda environment containing the module can be created via: 
 
 .. code-block:: console
 
     conda env create -n pandas-profiling
     conda activate pandas-profiling
     conda install -c conda-forge pandas-profiling
 
-This creates a new conda environment containing the module.
-
 .. hint::
 
-        Don't forget to specify the ``conda-forge`` channel. Omitting it won't result in an error, as an outdated package lives on the main channel. See `frequent issues <Support.rst#frequent-issues>`_
-
-Jupyter notebook/lab
---------------------
+        Don't forget to specify the ``conda-forge`` channel. Omitting it won't result in an error, as an outdated package lives on the ``main`` channel and will be installed. See `Frequent issues <Support.rst#frequent-issues>`_ for details. 
 
-For the Jupyter widgets extension to work, which is used for Progress Bars and the widget interface, you might need to activate the extensions. Installing with conda will enable the extension for you for Jupyter Notebooks (not lab).
+Widgets in Jupyter Notebook/Lab
+-------------------------------
 
-For Jupyter notebooks:
+For the Jupyter widgets extension to work (used for progress bars and the interactive widget-based report), you might need to activate the corresponding extensions. 
+This can be done via ``pip``: 
 
 .. code-block::
 
-  jupyter nbextension enable --py widgetsnbextension
+  pip install ipywidgets
 
-For Jupyter lab:
+Or via ``conda``: 
 
 .. code-block::
 
-  conda install -c conda-forge nodejs
-  jupyter labextension install @jupyter-widgets/jupyterlab-manager
-
+  conda install -c conda-forge ipywidgets
 
-More information is available at the `ipywidgets documentation <https://ipywidgets.readthedocs.io/en/stable/user_install.html>`_.
+In most cases, this will also automatically configure Jupyter Notebook and Jupyter Lab (``>=3.0``). For older versions of both or in more complex
+environment configurations, refer to `the official ipywidgets documentation <https://ipywidgets.readthedocs.io/en/stable/user_install.html>`_.
 
 From source
 -----------
 
 Download the source code by cloning the repository or by pressing `'Download ZIP' <https://github.com/ydataai/pandas-profiling/archive/master.zip>`_ on this page.
-Install by navigating to the proper directory and running
+Install it by navigating to the uncompressed directory and running:
 
 .. code-block:: console
 
     python setup.py install
 
-This can also be done in one line:
+This can also be done via the following one-liner: 
 
 .. code-block:: console
 
-    pip install https://github.com/ydataai/pandas-profiling/archive/master.zip
+    pip install https://github.com/ydataai/pandas-profiling/archive/master.zip
diff --git a/docsrc/source/pages/introduction.rst b/docsrc/source/pages/introduction.rst
@@ -25,19 +25,29 @@ Introduction
   :alt: Code style: black
   :target: https://github.com/python/black
 
-Generates profile reports from a pandas ``DataFrame``.
-The pandas ``df.describe()`` function is great but a little basic for serious exploratory data analysis.
-``pandas_profiling`` extends the pandas DataFrame with ``df.profile_report()`` for quick data analysis.
-
-For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
-
-* **Type inference**: detect the types of columns in a dataframe.
-* **Essentials**: type, unique values, missing values
-* **Quantile statistics** like minimum value, Q1, median, Q3, maximum, range, interquartile range
-* **Descriptive statistics** like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
-* **Most frequent values**
-* **Histograms**
-* **Correlations** highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
-* **Missing values** matrix, count, heatmap and dendrogram of missing values
-* **Duplicate rows** Lists the most occurring duplicate rows
-* **Text analysis** learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data
+``pandas-profiling`` generates profile reports from a pandas ``DataFrame``.
+The pandas ``df.describe()`` function is handy yet a little basic for exploratory data analysis. ``pandas_profiling`` extends pandas DataFrame with ``df.profile_report()``,  
+which automatically generates a standardized univariate and multivariate report for data understanding. 
+
+For each column, the following information (whenever relevant for the column type) is presented in an interactive HTML report:
+
+* **Type inference**: detect the types of columns in a ``DataFrame``
+* **Essentials**: type, unique values, indication of missing values
+* **Quantile statistics**: minimum value, Q1, median, Q3, maximum, range, interquartile range
+* **Descriptive statistics**: mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
+* **Most frequent and extreme values**
+* **Histograms:** categorical and numerical
+* **Correlations**: high correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér's V, Phik)
+* **Missing values**: through counts, matrix, heatmap and dendrograms
+* **Duplicate rows**: list of the most common duplicated rows
+* **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
+* **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata
+
+
+The report contains three additional sections: 
+
+* **Overview**: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
+* **Warnings**: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others) 
+* **Reproduction**: technical details about the analysis (time, version and configuration)
+
+The package can be used via code but also directly as a CLI utility. The generated interactive report can be consumed and shared as regular HTML or embedded in an interactive way inside Jupyter Notebooks.