Skip to content

Commit

Permalink
DataFrames: split chapter into intro + six sections
Browse files Browse the repository at this point in the history
Individual changes:

* DataFrames: added hidden ("removed") cell to re-create required variables
* DataFrames: added newly-created sections to table of contents
* DataFrames: tweaked section file names
* DataFrames: decremented split-up DataFrame chapter sections to begin with 1
* DataFrames: fixed up newly-added section titles and intro
* Split lesson 04 dataframes into seven pieces

Co-authored-by: W Trimble <wltrimbl@school.edu>
  • Loading branch information
jesteria and W Trimble committed Nov 2, 2022
1 parent af84430 commit 26c4122
Show file tree
Hide file tree
Showing 8 changed files with 4,741 additions and 4,397 deletions.
394 changes: 394 additions & 0 deletions textbook/06/1/Creating_DataFrame.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,394 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "33576a92-1c8e-45a6-b487-183137f22f81",
"metadata": {},
"source": [
"# Creating a DataFrame\n",
"\n",
"We began to consider this sort of data in [Sequences](/03/3/3/Sequences.html#other-lists), with the distances of planets from our sun. Let's expand on this example with the below data, adding the planets' masses, densities and gravities."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e5f7d27f-af8d-44ea-a2bf-7c646a8811c1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[['Mercury', 57.9, 0.33, 5427.0, 3.7],\n",
" ['Venus', 108.2, 4.87, 5243.0, 8.9],\n",
" ['Earth', 149.6, 5.97, 5514.0, 9.8],\n",
" ['Mars', 227.9, 0.642, 3933.0, 3.7],\n",
" ['Jupiter', 778.6, 1898.0, 1326.0, 23.1],\n",
" ['Saturn', 1433.5, 568.0, 687.0, 9.0],\n",
" ['Uranus', 2872.5, 86.8, 1271.0, 8.7],\n",
" ['Neptune', 4495.1, 102.0, 1638.0, 11.0]]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"planets_features = [\n",
" 'name', # familiar name\n",
" 'solar_distance_km_6', # distance from sun: 10**6 km\n",
" 'mass_kg_24', # absolute mass: 10**24 kg\n",
" 'density_kg_m3', # density: kg/m**3\n",
" 'gravity_m_s2', # gravity: m/s**2\n",
"]\n",
"\n",
"planets_data = [\n",
" ['Mercury', 57.9, 0.33, 5427.0, 3.7],\n",
" ['Venus', 108.2, 4.87, 5243.0, 8.9],\n",
" ['Earth', 149.6, 5.97, 5514.0, 9.8],\n",
" ['Mars', 227.9, 0.642, 3933.0, 3.7],\n",
" ['Jupiter', 778.6, 1898.0, 1326.0, 23.1],\n",
" ['Saturn', 1433.5, 568.0, 687.0, 9.0],\n",
" ['Uranus', 2872.5, 86.8, 1271.0, 8.7],\n",
" ['Neptune', 4495.1, 102.0, 1638.0, 11.0]\n",
"]\n",
"\n",
"planets_data"
]
},
{
"cell_type": "markdown",
"id": "c565f02c-beb8-419b-88ce-15dd2b6dca80",
"metadata": {},
"source": [
"Now let's *construct* a `DataFrame` for these data.\n",
"\n",
"First, we'll have to ensure that the <a href=\"https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html\" target=\"_blank\" rel=\"noopener\">pandas library is installed</a>.\n",
"\n",
"Then, we can tell Python to make `pandas` available to us using an `import` statement. For example:\n",
"\n",
" import pandas\n",
"\n",
"Having done so, the `DataFrame` type would be available as: `pandas.DataFrame`.\n",
"\n",
"That is, unlike with the built-in `list`, we would refer to it as \"under\" the name `pandas`, with a dot between the two names.\n",
"\n",
"Or, we could import just `DataFrame`, such that it's available as just `DataFrame`, without the rigmarole:\n",
"\n",
" from pandas import DataFrame\n",
"\n",
"However, we'll be using `pandas` a lot! And not *just* `DataFrame`. Following a common convention, we'll tell Python to assign the library module the name `pd`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "01bbbeaf-8766-43ef-869e-3ea3846cc9e4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Mercury</td>\n",
" <td>57.9</td>\n",
" <td>0.330</td>\n",
" <td>5427.0</td>\n",
" <td>3.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Venus</td>\n",
" <td>108.2</td>\n",
" <td>4.870</td>\n",
" <td>5243.0</td>\n",
" <td>8.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Earth</td>\n",
" <td>149.6</td>\n",
" <td>5.970</td>\n",
" <td>5514.0</td>\n",
" <td>9.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Mars</td>\n",
" <td>227.9</td>\n",
" <td>0.642</td>\n",
" <td>3933.0</td>\n",
" <td>3.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Jupiter</td>\n",
" <td>778.6</td>\n",
" <td>1898.000</td>\n",
" <td>1326.0</td>\n",
" <td>23.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Saturn</td>\n",
" <td>1433.5</td>\n",
" <td>568.000</td>\n",
" <td>687.0</td>\n",
" <td>9.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Uranus</td>\n",
" <td>2872.5</td>\n",
" <td>86.800</td>\n",
" <td>1271.0</td>\n",
" <td>8.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Neptune</td>\n",
" <td>4495.1</td>\n",
" <td>102.000</td>\n",
" <td>1638.0</td>\n",
" <td>11.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4\n",
"0 Mercury 57.9 0.330 5427.0 3.7\n",
"1 Venus 108.2 4.870 5243.0 8.9\n",
"2 Earth 149.6 5.970 5514.0 9.8\n",
"3 Mars 227.9 0.642 3933.0 3.7\n",
"4 Jupiter 778.6 1898.000 1326.0 23.1\n",
"5 Saturn 1433.5 568.000 687.0 9.0\n",
"6 Uranus 2872.5 86.800 1271.0 8.7\n",
"7 Neptune 4495.1 102.000 1638.0 11.0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"planets = pd.DataFrame(planets_data)\n",
"\n",
"planets"
]
},
{
"cell_type": "markdown",
"id": "ccdcf7e2-b474-4c3a-a083-3de292ff0585",
"metadata": {},
"source": [
"Above, we've:\n",
"\n",
"* imported the `pandas` library under the name `pd`\n",
"* constructed a new `DataFrame` from our list-of-list data\n",
"* assigned to the `DataFrame` the name `planets`\n",
"\n",
"And this presentation of our data is already looking more like a spreadsheet.\n",
"\n",
"However, there's something odd about the above. We're accustomed now to numbering elements of a sequence by their *offset* – 0, 1, 2, 3, … – and this works in this case for numbering our rows. But this isn't a useful scheme for labeling our columns. We'll make manipulation of this data easier, and avoid confusion about what these values represent, by defining useful column labels."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5140ea86-2e1a-4a72-b79a-8540bd2bbe40",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>solar_distance_km_6</th>\n",
" <th>mass_kg_24</th>\n",
" <th>density_kg_m3</th>\n",
" <th>gravity_m_s2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Mercury</td>\n",
" <td>57.9</td>\n",
" <td>0.330</td>\n",
" <td>5427.0</td>\n",
" <td>3.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Venus</td>\n",
" <td>108.2</td>\n",
" <td>4.870</td>\n",
" <td>5243.0</td>\n",
" <td>8.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Earth</td>\n",
" <td>149.6</td>\n",
" <td>5.970</td>\n",
" <td>5514.0</td>\n",
" <td>9.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Mars</td>\n",
" <td>227.9</td>\n",
" <td>0.642</td>\n",
" <td>3933.0</td>\n",
" <td>3.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Jupiter</td>\n",
" <td>778.6</td>\n",
" <td>1898.000</td>\n",
" <td>1326.0</td>\n",
" <td>23.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Saturn</td>\n",
" <td>1433.5</td>\n",
" <td>568.000</td>\n",
" <td>687.0</td>\n",
" <td>9.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Uranus</td>\n",
" <td>2872.5</td>\n",
" <td>86.800</td>\n",
" <td>1271.0</td>\n",
" <td>8.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Neptune</td>\n",
" <td>4495.1</td>\n",
" <td>102.000</td>\n",
" <td>1638.0</td>\n",
" <td>11.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name solar_distance_km_6 mass_kg_24 density_kg_m3 gravity_m_s2\n",
"0 Mercury 57.9 0.330 5427.0 3.7\n",
"1 Venus 108.2 4.870 5243.0 8.9\n",
"2 Earth 149.6 5.970 5514.0 9.8\n",
"3 Mars 227.9 0.642 3933.0 3.7\n",
"4 Jupiter 778.6 1898.000 1326.0 23.1\n",
"5 Saturn 1433.5 568.000 687.0 9.0\n",
"6 Uranus 2872.5 86.800 1271.0 8.7\n",
"7 Neptune 4495.1 102.000 1638.0 11.0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"planets = pd.DataFrame(planets_data, columns=planets_features)\n",
"\n",
"planets"
]
},
{
"cell_type": "markdown",
"id": "42b20307-55c2-438f-881a-6f9381e4bd86",
"metadata": {},
"source": [
"That's better!\n",
"\n",
"As we've seen with the `list`, (and the string), the `DataFrame` can be manipulated by functions and built-in operators. Moreover, these offer special-purpose functions which have been *bound* to their types – that is, *methods* – which are invoked with expressions of the form below:\n",
"\n",
" name_of_dataframe.name_of_method(argument0, argument1, ..., keyword0=value0, ...)\n",
"\n",
"And, similar to methods, there are *attributes* and *properties*. These are values which are similarly bound to the `DataFrame`, but which need not be called:\n",
"\n",
" name_of_dataframe.name_of_property\n",
"\n",
"Now we're ready to explore the dimensions our data."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

0 comments on commit 26c4122

Please sign in to comment.