diff --git a/_episodes/00-short-introduction-to-Python.md b/_episodes/00-short-introduction-to-Python.md index 435e70476..132224763 100644 --- a/_episodes/00-short-introduction-to-Python.md +++ b/_episodes/00-short-introduction-to-Python.md @@ -180,7 +180,7 @@ another_tuple = ('blue','green','red') a_list = [1,2,3] ``` -> ### Challenge +> ## Challenge - Tuples > 1. What happens when you type `a_tuple[2]=5` vs `a_list[1]=5` ? > 2. Type `type(a_tuple)` into python - what is the object type? > @@ -242,9 +242,7 @@ or >>> ``` -> ## Challenge -> -> Can you do reassignment in a dictionary? Give it a try. +> ## Challenge - Can you do reassignment in a dictionary? > > 1. First check what `rev` is right now (remember `rev` is the name of our dictionary). > @@ -264,6 +262,8 @@ sequence of their items (i.e. the order in which key:value pairs were added to the dictionary). Because of this, the order in which items are returned from loops over dictionaries might appear random and can even change with time. + + ## Functions Defining part of a program in Python as a function is done using the `def` diff --git a/_episodes/01-starting-with-data.md b/_episodes/01-starting-with-data.md index 5f4457098..d9bbe7921 100644 --- a/_episodes/01-starting-with-data.md +++ b/_episodes/01-starting-with-data.md @@ -223,18 +223,20 @@ information in the parens to control behaviour. Let's look at the data using these. -## Challenges +> ## Challenge - DataFrames +> +> Using our DataFrame `surveys_df`, try out the attributes & methods below to see +> what they return. +> +> 1. `surveys_df.columns` +> 2. `surveys_df.shape` Take note of the output of `shape` - what format does it +> return the shape of the DataFrame in? +> +> HINT: [More on tuples, here](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences). +> 3. `surveys_df.head()` Also, what does `surveys_df.head(15)` do? +> 4. `surveys_df.tail()` +{: .challenge} -Using our DataFrame `surveys_df`, try out the attributes & methods below to see -what they return. - -1. `surveys_df.columns` -2. `surveys_df.shape` Take note of the output of `shape` - what format does it - return the shape of the DataFrame in? - - HINT: [More on tuples, here](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences). -3. `surveys_df.head()` Also, what does `surveys_df.head(15)` do? -4. `surveys_df.tail()` ## Calculating Statistics From Data In A Pandas DataFrame @@ -275,13 +277,14 @@ array(['NL', 'DM', 'PF', 'PE', 'DS', 'PP', 'SH', 'OT', 'DO', 'OX', 'SS', 'PB', 'PL', 'PX', 'CT', 'US'], dtype=object) ``` -## Challenges - -1. Create a list of unique plot ID's found in the surveys data. Call it - `plot_names`. How many unique plots are there in the data? How many unique - species are in the data? - -2. What is the difference between `len(plot_names)` and `plot_names.nunique()`? +> ## Challenge - Statistics +> +> 1. Create a list of unique plot ID's found in the surveys data. Call it +> `plot_names`. How many unique plots are there in the data? How many unique +> species are in the data? +> +> 2. What is the difference between `len(plot_names)` and `plot_names.nunique()`? +{: .challenge} # Groups in Pandas @@ -358,32 +361,35 @@ M 29.709578 42.995379 The `groupby` command is powerful in that it allows us to quickly generate summary stats. -# Challenge - -1. How many recorded individuals are female `F` and how many male `M` -2. What happens when you group by two columns using the following syntax and - then grab mean values: - - `sorted_data2 = surveys_df.groupby(['plot_id','sex'])` - - `sorted_data2.mean()` -3. Summarize weight values for each plot in your data. HINT: you can use the - following syntax to only create summary statistics for one column in your data - `by_plot['weight'].describe()` - - -Did you get #3 right? **A Snippet of the Output from challenge 3 looks like:** - -``` - plot - 1 count 1903.000000 - mean 51.822911 - std 38.176670 - min 4.000000 - 25% 30.000000 - 50% 44.000000 - 75% 53.000000 - max 231.000000 - ... -``` +> ## Challenge - Summary Data +> +> 1. How many recorded individuals are female `F` and how many male `M` +> 2. What happens when you group by two columns using the following syntax and +> then grab mean values: +> - `sorted_data2 = surveys_df.groupby(['plot_id','sex'])` +> - `sorted_data2.mean()` +> 3. Summarize weight values for each plot in your data. HINT: you can use the +> following syntax to only create summary statistics for one column in your data +> `by_plot['weight'].describe()` +> +> +>> ## Did you get #3 right? +>> **A Snippet of the Output from challenge 3 looks like:** +>> +>> ``` +>> plot +>> 1 count 1903.000000 +>> mean 51.822911 +>> std 38.176670 +>> min 4.000000 +>> 25% 30.000000 +>> 50% 44.000000 +>> 75% 53.000000 +>> max 231.000000 +>> ... +>> ``` +> {: .solution} +{: .challenge} ## Quickly Creating Summary Counts in Pandas @@ -413,13 +419,12 @@ calculated from our data. surveys_df['weight']*2 -## Another Challenge - -1. What's another way to create a list of species and associated `count` of the - records in the data? Hint: you can perform `count`, `min`, etc functions on - groupby DataFrames in the same way you can perform them on regular - DataFrames. - +> ## Challenge - Make a list +> +> What's another way to create a list of species and associated `count` of the +> records in the data? Hint: you can perform `count`, `min`, etc functions on +> groupby DataFrames in the same way you can perform them on regular DataFrames. +{: .challenge} # Quick & Easy Plotting Data Using Pandas @@ -441,112 +446,114 @@ total_count = surveys_df['record_id'].groupby(surveys_df['plot_id']).nunique() total_count.plot(kind='bar'); ``` -# Challenge Activities - -1. Create a plot of average weight across all species per plot. -2. Create a plot of total males versus total females for the entire dataset. - - -# Summary Plotting Challenge - -Create a stacked bar plot, with weight on the Y axis, and the stacked variable -being sex. The plot should show total weight by sex for each plot. Some -tips are below to help you solve this challenge: - -* [For more on Pandas plots, visit this link.](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.core.groupby.DataFrameGroupBy.plot.html) -* You can use the code that follows to create a stacked bar plot but the data to stack - need to be in individual columns. Here's a simple example with some data where - 'a', 'b', and 'c' are the groups, and 'one' and 'two' are the subgroups. - -``` -d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} -pd.DataFrame(d) -``` - -shows the following data - -``` - one two - a 1 1 - b 2 2 - c 3 3 - d NaN 4 -``` - -We can plot the above with - -``` -# plot stacked data so columns 'one' and 'two' are stacked -my_df = pd.DataFrame(d) -my_df.plot(kind='bar',stacked=True,title="The title of my graph") -``` - -![Stacked Bar Plot](../fig/stackedBar1.png) - -* You can use the `.unstack()` method to transform grouped data into columns -for each plotting. Try running `.unstack()` on some DataFrames above and see -what it yields. - -Start by transforming the grouped data (by plot and sex) into an unstacked layout, then create -a stacked plot. - - -## Solution to Summary Challenge - -First we group data by plot and by sex, and then calculate a total for each plot. - -```python -by_plot_sex = surveys_df.groupby(['plot_id','sex']) -plot_sex_count = by_plot_sex['weight'].sum() -``` - -This calculates the sums of weights for each sex within each plot as a table - -``` -plot sex -plot_id sex -1 F 38253 - M 59979 -2 F 50144 - M 57250 -3 F 27251 - M 28253 -4 F 39796 - M 49377 - -``` - -Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each plot. - -```python -by_plot_sex = surveys_df.groupby(['plot_id','sex']) -plot_sex_count = by_plot_sex['weight'].sum() -plot_sex_count.unstack() -``` - -The `unstack` function above will display the following output: - -``` -sex F M -plot_id -1 38253 59979 -2 50144 57250 -3 27251 28253 -4 39796 49377 - -``` - -Now, create a stacked bar plot with that data where the weights for each sex are stacked by plot. - -Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows: - -```python -by_plot_sex = surveys_df.groupby(['plot_id','sex']) -plot_sex_count = by_plot_sex['weight'].sum() -spc = plot_sex_count.unstack() -s_plot = spc.plot(kind='bar',stacked=True,title="Total weight by plot and sex") -s_plot.set_ylabel("Weight") -s_plot.set_xlabel("Plot") -``` - -![Stacked Bar Plot](../fig/stackedBar.png) +> ## Challenge - Plots +> +> 1. Create a plot of average weight across all species per plot. +> 2. Create a plot of total males versus total females for the entire dataset. +{: .challenge} + +> ## Summary Plotting Challenge +> +> Create a stacked bar plot, with weight on the Y axis, and the stacked variable +> being sex. The plot should show total weight by sex for each plot. Some +> tips are below to help you solve this challenge: +> +> * [For more on Pandas plots, visit this link.](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.core.groupby.DataFrameGroupBy.plot.html) +> * You can use the code that follows to create a stacked bar plot but the data to stack +> need to be in individual columns. Here's a simple example with some data where +> 'a', 'b', and 'c' are the groups, and 'one' and 'two' are the subgroups. +> +> ``` +> d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} +> pd.DataFrame(d) +> ``` +> +> shows the following data +> +> ``` +> one two +> a 1 1 +> b 2 2 +> c 3 3 +> d NaN 4 +> ``` +> +> We can plot the above with +> +> ``` +> # plot stacked data so columns 'one' and 'two' are stacked +> my_df = pd.DataFrame(d) +> my_df.plot(kind='bar',stacked=True,title="The title of my graph") +> ``` +> +> ![Stacked Bar Plot](../fig/stackedBar1.png) +> +> * You can use the `.unstack()` method to transform grouped data into columns +> for each plotting. Try running `.unstack()` on some DataFrames above and see +> what it yields. +> +> Start by transforming the grouped data (by plot and sex) into an unstacked layout, then create +> a stacked plot. +> +> +>> ## Solution to Summary Challenge +>> +>> First we group data by plot and by sex, and then calculate a total for each plot. +>> +>> ```python +>> by_plot_sex = surveys_df.groupby(['plot_id','sex']) +>> plot_sex_count = by_plot_sex['weight'].sum() +>> ``` +>> +>> This calculates the sums of weights for each sex within each plot as a table +>> +>> ``` +>> plot sex +>> plot_id sex +>> 1 F 38253 +>> M 59979 +>> 2 F 50144 +>> M 57250 +>> 3 F 27251 +>> M 28253 +>> 4 F 39796 +>> M 49377 +>> +>> ``` +>> +>> Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each plot. +>> +>> ```python +>> by_plot_sex = surveys_df.groupby(['plot_id','sex']) +>> plot_sex_count = by_plot_sex['weight'].sum() +>> plot_sex_count.unstack() +>> ``` +>> +>> The `unstack` function above will display the following output: +>> +>> ``` +>> sex F M +>> plot_id +>> 1 38253 59979 +>> 2 50144 57250 +>> 3 27251 28253 +>> 4 39796 49377 +>> +>> ``` +>> +>> Now, create a stacked bar plot with that data where the weights for each sex are stacked by plot. +>> +>> Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows: +>> +>> ```python +>> by_plot_sex = surveys_df.groupby(['plot_id','sex']) +>> plot_sex_count = by_plot_sex['weight'].sum() +>> spc = plot_sex_count.unstack() +>> s_plot = spc.plot(kind='bar',stacked=True,title="Total weight by plot and sex") +>> s_plot.set_ylabel("Weight") +>> s_plot.set_xlabel("Plot") +>> ``` +>> +>> ![Stacked Bar Plot](../fig/stackedBar.png) +> {: .soultion} +{: .challenge}