# Accessing rows

The second dimension of our data consists of its rows or *individuals*.

As with the columns, `pandas` has constructed an index for our rows. By default, this is the familiar `0, 1, 2, …`, and represented by the `RangeIndex` type.

In [10]:
planets.index

RangeIndex(start=0, stop=8, step=1)

And, as with the `list`, we can use the built-in function `len` to see that there are eight planets.

In [11]:
len(planets)

8

We can also *slice* the `DataFrame`, for example to extract only its first three rows.

In [12]:
planets[:3]

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
1,Venus,108.2,4.87,5243.0,8.9
2,Earth,149.6,5.97,5514.0,9.8


Above, our slice has constructed a new `DataFrame`, consisting of only the data for the first three planets.

**However**: you *can't* access *individual* rows or elements in the same manner as with a `list`:

In [13]:
planets[2]

KeyError: 2

After all, the `DataFrame` is a more complex structure than the `list` – the above reference to the offset `2` was treated as a reference to a column!

Indeed, this is an alternate syntax *for extracting columns*, and which is useful and necessary when columns are given complex names. (Complex names are any including characters which are part of Python's syntax, and therefore invalid names of assignment, such as spaces, quotes, brackets, *etc.*)

In [14]:
planets['name']

0    Mercury
1      Venus
2      Earth
3       Mars
4    Jupiter
5     Saturn
6     Uranus
7    Neptune
Name: name, dtype: object

Instead, `DataFrame` offers the properties `iloc` and `loc`, which may themselves be queried with a syntax similar to that of retrieving elements from a `list`.

`iloc` is intended for *integer-location* based look-up of elements by their position in the index.

We can reproduce our slice explicitly using `iloc`:

In [15]:
planets.iloc[:3]

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
1,Venus,108.2,4.87,5243.0,8.9
2,Earth,149.6,5.97,5514.0,9.8


We can also do something new – construct a new `DataFrame` consisting of only the rows at the specified offsets.

In [16]:
planets.iloc[[0, 7]]

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
7,Neptune,4495.1,102.0,1638.0,11.0


Above, we've specified to our `iloc`-based look-up a `list` – `[0, 7]` – indicating that we are interested in selecting out the rows at those offsets.

Note that in the new `DataFrame`, the planets' row index values have been preserved. This is highly useful – indeed, Neptune is still the same planet as it was before. But `iloc` is **strictly** intended for offsets, like in a `list`. If we were to repeat our selection of `[0, 7]` on the above, this would fail. Rather, the offset references for these two planets in the new `DataFrame` are now given by `[0, 1]`. According to `iloc` in this new `DataFrame`, Neptune will now be available at offset `1`.

In [17]:
bookends = planets.iloc[[0, 7]]

bookends.iloc[1]

name                   Neptune
solar_distance_km_6     4495.1
mass_kg_24                 102
density_kg_m3             1638
gravity_m_s2                11
Name: 7, dtype: object

In the next section, we will explore selection of elements by index value or label.

But now we can retrieve the features of another individual of our data set – Earth, the third planet from the sun.

In [18]:
earth = planets.iloc[2]

earth

name                   Earth
solar_distance_km_6    149.6
mass_kg_24              5.97
density_kg_m3           5514
gravity_m_s2             9.8
Name: 2, dtype: object

We can further extract from this just the Earth's distance from the Sun.

In [19]:
earth.solar_distance_km_6

149.6

And we can get an idea from the above of how spread out the planets are.

The Earth is the third of eight planets, yet its distance from the sun is less than 12% their average.

In [20]:
earth.solar_distance_km_6 / planets.solar_distance_km_6.mean()

0.11822231880908397

Generally, we might note that the median distance of planets from the sun is less than half their mean.

In [21]:
planets.solar_distance_km_6.median() / planets.solar_distance_km_6.mean()

0.39769640334673473

`pandas` supports a number of statistical methods, such as standard deviation `std`, mean absolute deviation `mad`, and more.

We know that the planets are spread out. Let's take a look at simply how their distances increase.

In [22]:
planets.solar_distance_km_6.diff()

0       NaN
1      50.3
2      41.4
3      78.3
4     550.7
5     654.9
6    1439.0
7    1622.6
Name: solar_distance_km_6, dtype: float64

The above `Series` tells us that Venus is 50.3 million kilometers farther from the sun than Mercury, Earth is 41.4 million kilometers farther than Venus, *etc.* (The first value, for Mercury, is `NaN` – "not a number" – because there is no planet closer than it to the sun, with which to compare its distance.)

And, indeed, the distances between the planets increase dramatically.

We can see roughly that the first big jump is in the distance between Mars and Jupiter. The distances between the outer planets then continue to be greater than those between the inner planets, and continue to increase. But there's another big intermediate jump, between Saturn and Uranus.

We can express the above quantitatively as well:

In [23]:
distance_rel_change = planets.solar_distance_km_6.pct_change()

distance_rel_change

0         NaN
1    0.868739
2    0.382625
3    0.523396
4    2.416411
5    0.841125
6    1.003837
7    0.564874
Name: solar_distance_km_6, dtype: float64