# what do we learn when _we look at the distributions on our system_
<!-- TEASER_END -->

[`importlib.metadata`](https://docs.python.org/3/library/importlib.metadata.html) is an addition to Python 3.8 that makes it easier to explore the packages in your current python environment. we are going to load this data into pandas and see what we can learn from this data. this approach is fabulous way to generate really data for a demonstration.

In [1]:
import importlib.metadata, pandas, toolz

create a series from `importlib.metadata.distributions` . each distribution contains information about a package.

In [2]:
    distributions = pandas.Series({x.metadata.get("Name"): x for x in importlib.metadata.distributions()})
    distributions.sample(2)

sphinxcontrib-bibtex    <importlib.metadata.PathDistribution object at...
pyasn1-modules          <importlib.metadata.PathDistribution object at...
dtype: object

the distributions can be expanded into a tidy dataframe with the following `features`.

In [3]:
    features = ['files', 'version', 'requires', 'metadata']

we'll widen our `distributions` to a tidy dataframe

In [4]:
    df = distributions.apply(
        toolz.compose_left(operator.attrgetter(*features), pandas.Series)
    ).rename(columns=dict(zip(range(len(features)), features)))
    df.sample(2)

Unnamed: 0,files,version,requires,metadata
blinker,"[blinker-1.4.dist-info/AUTHORS, blinker-1.4.di...",1.4,,"[Metadata-Version, Name, Version, Summary, Hom..."
conda-package-handling,"[../../../bin/cph, conda_package_handling-1.7....",1.7.3,[six],"[Metadata-Version, Name, Version, Summary, Hom..."


there are still some goodies in this dataframe nested into the `metadata` column. in the next segment we create a wider dataframe with `distribution` details and package `metadata`.

In [5]:
    df = pandas.concat(dict(
        distribution=df,
        metadata=df["metadata"].apply(
            toolz.compose_left(dict, pandas.Series)
        )
    ), axis=1)
    df.sample(2)

Unnamed: 0_level_0,distribution,distribution,distribution,distribution,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata,metadata
Unnamed: 0_level_1,files,version,requires,metadata,Metadata-Version,Name,Version,Summary,Home-page,Author,...,Requires-Dist,Project-URL,Description-Content-Type,Maintainer,Maintainer-email,License-File,Provides-Extra,Description,Download-URL,Provides
sphinx-book-theme,"[sphinx_book_theme-0.1.4.dist-info/INSTALLER, ...",0.1.4,"[beautifulsoup4 (<5,>=4.6.1), click (~=7.1), d...","[Metadata-Version, Name, Version, Summary, Hom...",2.1,sphinx-book-theme,0.1.4,Jupyter Book: Create an online book with Jupyt...,https://jupyterbook.org/,Project Jupyter Contributors,...,"beautifulsoup4 (<5,>=4.6.1)","Documentation, https://jupyterbook.org",text/markdown,,,,code_style,,,
html5lib,"[html5lib-1.1.dist-info/AUTHORS.rst, html5lib-...",1.1,"[six (>=1.9), webencodings, genshi ; extra == ...","[Metadata-Version, Name, Version, Summary, Hom...",2.1,html5lib,1.1,HTML parser based on the WHATWG HTML specifica...,https://github.com/html5lib/html5lib-python,,...,six (>=1.9),,,James Graham,james@hoppipolla.co.uk,,all,,,


## what can we learn about our environment?

### how many distributions does it contain?

In [6]:
    F"""in this environment, there are {len(df)} distributions installed."""

'in this environment, there are 363 distributions installed.'

### how many files are in each distribution

In [7]:
    df["distribution"]["files"].apply(lambda x: len(x or [])).sort_values(ascending=False).iloc[:10].to_frame().T

Unnamed: 0,mkdocs-material,bokeh,jedi,pandas,notebook,mypy,panel,numpy,Faker,holoviews
files,8144,2244,1763,1661,1516,1511,1350,1194,1017,951


### what is the distribution of licenses?

In [8]:
df["metadata"]["License"].value_counts().iloc[:10].to_frame().T

Unnamed: 0,MIT,BSD,BSD-3-Clause,UNKNOWN,MIT License,Apache 2.0,ISC,BSD License,"Apache License, Version 2.0",Apache Software License
License,100,62,37,36,17,13,6,6,5,5


### who authored my packages?

In [9]:
df["metadata"]["Author"].value_counts().iloc[:10].to_frame().T

Unnamed: 0,Jupyter Development Team,wxyz contributors,Sébastien Eustace,Georg Brandl,IPython Development Team,Chris Sewell,Kenneth Reitz,Executable Book Project,Armin Ronacher,Thomas Kluyver
Author,16,12,10,8,7,6,6,6,5,4


## conclusion

there is a juicy dataset in your environment just waiting for you to explore. what will you find?