# Creating a DataFrame

We began to consider this sort of data in [Lists](../../04/1/Lists.html#other-lists), with the distances of planets from our sun. Let's expand on this example with the below data, adding the planets' masses, densities and gravities.

In [1]:
planets_features = [
    'name',                # familiar name
    'solar_distance_km_6', # distance from sun: 10**6 km
    'mass_kg_24',          # absolute mass: 10**24 kg
    'density_kg_m3',       # density: kg/m**3
    'gravity_m_s2',        # gravity: m/s**2
]

planets_data = [
    ['Mercury', 57.9, 0.33, 5427.0, 3.7],
    ['Venus', 108.2, 4.87, 5243.0, 8.9],
    ['Earth', 149.6, 5.97, 5514.0, 9.8],
    ['Mars', 227.9, 0.642, 3933.0, 3.7],
    ['Jupiter', 778.6, 1898.0, 1326.0, 23.1],
    ['Saturn', 1433.5, 568.0, 687.0, 9.0],
    ['Uranus', 2872.5, 86.8, 1271.0, 8.7],
    ['Neptune', 4495.1, 102.0, 1638.0, 11.0]
]

planets_data

[['Mercury', 57.9, 0.33, 5427.0, 3.7],
 ['Venus', 108.2, 4.87, 5243.0, 8.9],
 ['Earth', 149.6, 5.97, 5514.0, 9.8],
 ['Mars', 227.9, 0.642, 3933.0, 3.7],
 ['Jupiter', 778.6, 1898.0, 1326.0, 23.1],
 ['Saturn', 1433.5, 568.0, 687.0, 9.0],
 ['Uranus', 2872.5, 86.8, 1271.0, 8.7],
 ['Neptune', 4495.1, 102.0, 1638.0, 11.0]]

Now let's *construct* a `DataFrame` for these data.

First, we'll have to ensure that the <a href="https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html" target="_blank" rel="noopener">pandas library is installed</a>.

Then, we can tell Python to make `pandas` available to us using an `import` statement. For example:

    import pandas

Having done so, the `DataFrame` type would be available as: `pandas.DataFrame`.

That is, unlike with the built-in `list`, we would refer to it as "under" the name `pandas`, with a dot between the two names.

Or, we could import just `DataFrame`, such that it's available as just `DataFrame`, without the rigmarole:

    from pandas import DataFrame

However, we'll be using `pandas` a lot! And not *just* `DataFrame`. Following a common convention, we'll tell Python to assign the library module the name `pd`.

In [2]:
import pandas as pd

planets = pd.DataFrame(planets_data)

planets

Unnamed: 0,0,1,2,3,4
0,Mercury,57.9,0.33,5427.0,3.7
1,Venus,108.2,4.87,5243.0,8.9
2,Earth,149.6,5.97,5514.0,9.8
3,Mars,227.9,0.642,3933.0,3.7
4,Jupiter,778.6,1898.0,1326.0,23.1
5,Saturn,1433.5,568.0,687.0,9.0
6,Uranus,2872.5,86.8,1271.0,8.7
7,Neptune,4495.1,102.0,1638.0,11.0


Above, we've:

* imported the `pandas` library under the name `pd`
* constructed a new `DataFrame` from our list-of-list data
* assigned to the `DataFrame` the name `planets`

And this presentation of our data is already looking more like a spreadsheet.

However, there's something odd about the above. We're accustomed now to numbering elements of a sequence by their *offset* – 0, 1, 2, 3, … – and this works in this case for numbering our rows. But this isn't a useful scheme for labeling our columns. We'll make manipulation of this data easier, and avoid confusion about what these values represent, by defining useful column labels.

In [3]:
planets = pd.DataFrame(planets_data, columns=planets_features)

planets

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
1,Venus,108.2,4.87,5243.0,8.9
2,Earth,149.6,5.97,5514.0,9.8
3,Mars,227.9,0.642,3933.0,3.7
4,Jupiter,778.6,1898.0,1326.0,23.1
5,Saturn,1433.5,568.0,687.0,9.0
6,Uranus,2872.5,86.8,1271.0,8.7
7,Neptune,4495.1,102.0,1638.0,11.0


That's better!

As we've seen with the `list`, (and the string), the `DataFrame` can be manipulated by functions and built-in operators. Moreover, these offer special-purpose functions which have been *bound* to their types – that is, *methods* – which are invoked with expressions of the form below:

    name_of_dataframe.name_of_method(argument0, argument1, ..., keyword0=value0, ...)

And, similar to methods, there are *attributes* and *properties*. These are values which are similarly bound to the `DataFrame`, but which need not be called:

    name_of_dataframe.name_of_property

Now we're ready to explore the dimensions our data.