# Dataset: African American Poetry

The [**African American Periodical Poetry** dataset](https://www.responsible-datasets-in-context.com/posts/african-american-periodical-poetry/aa-periodical-poetry.html) contains information about poems published in African American periodicals. Key columns in the dataset include:

### Description of Columns in the Dataset:

1. **title**:  
   The title of the poem.
2. **author (first last)**:  
   The full name of the author, in the format of "first name last name."
3. **author (last name)**:  
   The last name of the author.
4. **text**:  
   The full text of the poem.
5. **month**:  
   The month when the poem was published.
6. **year**:  
   The year when the poem was published.
7. **venue**:  
   The primary venue or periodical where the poem was published.
8. **edited by**:  
   The name of the editor or the person responsible for editing the publication in which the poem appeared.
9. **form (if known)**:  
   The poetic form used in the poem, if it is known (e.g., "Elegy", "Common Measure").
10. **gender (if known)**:  
    The gender of the poet, if known ("male" or "female").
11. **themes**:  
    A list of themes present in the poem, which may include multiple themes separated by commas (e.g., "Spanish-American War, Empire").
12. **second venue**:  
    A secondary venue where the poem may have been published, if applicable.
13. **published in (city)**:  
    The city where the poem was published.
14. **Magazine Type**:  
    Indicates the type of magazine, such as "Predom. Black" for predominantly Black magazines.
15. **Author Bio.**:  
    A link to the author’s biography or a brief note about their life.

This dataset captures details about poems written by African-American poets between 1900 and 1928, including metadata on publication, themes, and author information.

This dataset can be used to explore trends in African American poetry, analyze poem lengths, publication patterns, and compare characteristics of poets across periodicals. It is especially valuable for historical and cultural analysis in the [digital humanities](https://en.wikipedia.org/wiki/Digital_humanities).

In [1]:
# load the dataset

import pandas as pd
aap = pd.read_csv("https://raw.githubusercontent.com/melaniewalsh/responsible-datasets-in-context/main/datasets/aa-periodical-poetry/African-American-Periodical-Poetry_1900-1928-Created-by-Amardeep-Singh-and-Kate-Hennessey,-Lehigh-University.csv")
aap.head()

Unnamed: 0,title,author (first last),author (last name),text,month,year,venue,edited by,form (if known),gender (if known),themes,second venue,published in (city),Magazine Type,Author Bio.
0,New Wars,Benjamin Griffith Brawley,Brawley,HURL on the lance! Break up the ancient peace!...,November,1900,Colored American,Walter W. Wallace,Common Measure,male,"Spanish-American War, Empire",,Boston,Predom. Black,https://en.wikipedia.org/wiki/Benjamin_Griffit...
1,A Picture,Olivia Ward Bush-Banks,Bush-Banks,I drew a picture long ago —\nA picture of a su...,June,1900,Colored American,Walter W. Wallace,Common Measure,female,,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Olivia_Ward_Bush...
2,The Christmas Reunion,Augustus M. Hodges,Hodges,"Twas a bright Christmas morning in ""Ole Kentuc...",December,1900,Colored American,Walter W. Wallace,,male,Slavery,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Augustus_M._Hodges
3,A Memorial of Frederick Douglass,C. Henry Holmes,Holmes,"He was a noble hero, born in an humble state,\...",September,1900,Colored American,Walter W. Wallace,Elegy,male,Frederick Douglass,,Boston,Predom. Black,
4,The Negro's Worth,Alonzo Milton Skrine,Skrine,"Who casts a slur on Negro worth, a stain on Ne...",December,1900,Colored American,Walter W. Wallace,,male,"Civil War, Spanish-American War, Labor, Slaver...",,Boston,Predom. Black,https://scalar.lehigh.edu/african-american-poe...


# Wrangling

**📝 Exercise 1**: Create a new column `Poem_Length` that calculates the length of each poem (number of words or characters).

In [2]:
aap['Poem_Length'] = aap['text'].apply(lambda x: len(str(x).split()))
aap.head()

Unnamed: 0,title,author (first last),author (last name),text,month,year,venue,edited by,form (if known),gender (if known),themes,second venue,published in (city),Magazine Type,Author Bio.,Poem_Length
0,New Wars,Benjamin Griffith Brawley,Brawley,HURL on the lance! Break up the ancient peace!...,November,1900,Colored American,Walter W. Wallace,Common Measure,male,"Spanish-American War, Empire",,Boston,Predom. Black,https://en.wikipedia.org/wiki/Benjamin_Griffit...,266
1,A Picture,Olivia Ward Bush-Banks,Bush-Banks,I drew a picture long ago —\nA picture of a su...,June,1900,Colored American,Walter W. Wallace,Common Measure,female,,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Olivia_Ward_Bush...,254
2,The Christmas Reunion,Augustus M. Hodges,Hodges,"Twas a bright Christmas morning in ""Ole Kentuc...",December,1900,Colored American,Walter W. Wallace,,male,Slavery,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Augustus_M._Hodges,835
3,A Memorial of Frederick Douglass,C. Henry Holmes,Holmes,"He was a noble hero, born in an humble state,\...",September,1900,Colored American,Walter W. Wallace,Elegy,male,Frederick Douglass,,Boston,Predom. Black,,199
4,The Negro's Worth,Alonzo Milton Skrine,Skrine,"Who casts a slur on Negro worth, a stain on Ne...",December,1900,Colored American,Walter W. Wallace,,male,"Civil War, Spanish-American War, Labor, Slaver...",,Boston,Predom. Black,https://scalar.lehigh.edu/african-american-poe...,280


**📝 Exercise 2**: Create a new dataframe `poems_themes` which "explodes" the column `themes`. Each poem can then contain multiple rows for its different themes.

*Attention*: split by ", " (a comma + a space), otherwise, if you just use a comma, you would get two categories like "Religion" and " Religion"

In [3]:
poems_themes = aap.assign(themes=aap['themes'].str.split(', ')).explode('themes')
poems_themes.head()

Unnamed: 0,title,author (first last),author (last name),text,month,year,venue,edited by,form (if known),gender (if known),themes,second venue,published in (city),Magazine Type,Author Bio.,Poem_Length
0,New Wars,Benjamin Griffith Brawley,Brawley,HURL on the lance! Break up the ancient peace!...,November,1900,Colored American,Walter W. Wallace,Common Measure,male,Spanish-American War,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Benjamin_Griffit...,266
0,New Wars,Benjamin Griffith Brawley,Brawley,HURL on the lance! Break up the ancient peace!...,November,1900,Colored American,Walter W. Wallace,Common Measure,male,Empire,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Benjamin_Griffit...,266
1,A Picture,Olivia Ward Bush-Banks,Bush-Banks,I drew a picture long ago —\nA picture of a su...,June,1900,Colored American,Walter W. Wallace,Common Measure,female,,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Olivia_Ward_Bush...,254
2,The Christmas Reunion,Augustus M. Hodges,Hodges,"Twas a bright Christmas morning in ""Ole Kentuc...",December,1900,Colored American,Walter W. Wallace,,male,Slavery,,Boston,Predom. Black,https://en.wikipedia.org/wiki/Augustus_M._Hodges,835
3,A Memorial of Frederick Douglass,C. Henry Holmes,Holmes,"He was a noble hero, born in an humble state,\...",September,1900,Colored American,Walter W. Wallace,Elegy,male,Frederick Douglass,,Boston,Predom. Black,,199


# Histograms

**📝 Exercise 3**: Using the dataframe `poems_themes`, plot a **histogram** showing its distribution of values. (i.e., how many times each theme is represented).

(Tip: You can make a horizontal histogram by assigning the column to the y axis.
Use `fig.update_yaxes(categoryorder='total ascending')` to sort the values.)

In [4]:
import plotly.express as px

fig = px.histogram(poems_themes, y='themes')
fig.update_yaxes(categoryorder='total ascending')
fig.show()

**📝 Exercise 4**: Again, using the dataframe `poems_themes`, plot a **histogram** showing its distribution of values, but now group them by gender.

In [5]:
fig = px.histogram(poems_themes, y='themes', color='gender (if known)')
fig.update_yaxes(categoryorder='total ascending')
fig.show()

**📝 Exercise 5**: Now, use the same plot as before, but show it:
- as percentages (100% stacked) - use the argument `barnorm='percent'`
- in horizontal position

In [6]:
fig = px.histogram(poems_themes, y='themes', color='gender (if known)', barnorm='percent')
fig.update_yaxes(categoryorder='total ascending')
fig.show()

# Violin plots

**📝 Exercise 6**: Make a violin plot for the column `Poem_Length`. Categorize it by gender, two (i.e., show two violins, one for each gender).

In [8]:
fig = px.violin(aap, y='Poem_Length', color='gender (if known)', box=True, points='all')
fig.show()


**📝 Exercise 7**: Change the colors and opacity of the violins to whatever you like. Try to overlay the two violins by setting an option in `px.violin()`

(tips: you may search for an option to update opacity with `fig.update_traces()`. Colors can be changed setting the argument of violin plot `color_sequence` to a list like `['green', 'purple']` etc.)

In [9]:
fig = px.violin(aap, y='Poem_Length', color='gender (if known)', box=True, points='all', color_discrete_sequence=['green', 'purple'])
fig.update_traces(opacity=0.7)
fig.show()

# Line plot

**📝 Exercise 8**: Make a line plot of the average `Poem_Length` for every year.

In [10]:
avg_length_by_year = aap.groupby('year')['Poem_Length'].mean().reset_index()
fig = px.line(avg_length_by_year, x='year', y='Poem_Length', title='Average Poem Length by Year')
fig.show()

# Scatter plot

**📝 Exercise 9**: Create a scatter plot where `Poem_Length` is mapped to point y-axis, `year` to x-axis, and another column like `venue` to point color.
Experiment with changing the mappings to gain different insights.

In [11]:
fig = px.scatter(aap, x='year', y='Poem_Length', color='venue', title='Poem Length by Year and Venue')
fig.show()

**📝 Exercise 10**: We will filter `poems_themes` for only themes that have more than 40 rows.

Now, using this new `poems_themes_filtered`, make a plot with different facets for the scatter plot considering different themes.



In [12]:
# Group by 'themes' and count occurrences, then filter.
theme_counts = poems_themes.groupby('themes').size()
themes_to_keep = theme_counts[theme_counts > 40].index.tolist()
poems_themes_filtered = poems_themes[poems_themes['themes'].isin(themes_to_keep)]

In [13]:
fig = px.scatter(poems_themes_filtered, x='year', y='Poem_Length', color='venue', facet_col='themes', title='Poem Length by Year for Themes with More Than 40 Rows')
fig.show()