**SA433 &#x25aa; Data Wrangling and Visualization &#x25aa; Fall 2024**

# Lesson 23. Parting Thoughts

## We learned a lot this semester... 🤔

- We learned about **visualizing data** with Altair through a **grammar of graphics**:
    - *Encoding channels* map variables to visual attributes (e.g. x-position, y-position, color, shape)<br style='margin-bottom:100px'/>
    - *Graphical marks* specify how those attributes should be visually represented 
    - *Transformations* such as *aggregations* recast or filter the data before visualization
    - *Bindings*, *selections*, and *conditions* specify how to make a chart interactive
    - *Layering*, *concatenation*, and *faceting* specify how to combine or generate multiple related charts

- We can generate a broad range of charts by specifying these components 

- We also learned about **wrangling** data with Pandas through a **grammar of data manipulation**:
    - *Filtering rows* based on their values
    - *Selecting and dropping columns* based on their names
    - *Sorting rows* based on their values
    - *Creating new columns* that are functions of existing columns
    - Aggregating, transforming, and filtering *groups of data* through *split-apply-combine*
    - *Pivoting* data from long form to wide form and vice versa
    - *Merging* datasets together based on key columns

- We can perform a wide array of data wrangling tasks by combining these operations
    - For example, we learned how to wrangle tabular data into **tidy data**: each variable has its own column, each observation has its own row, and each value has its own cell

- These grammars aren't specific to Altair and Pandas, or even Python:
    - [ggplot2](https://ggplot2.tidyverse.org/) is a grammar of graphics package for R
    - [dplyr](https://dplyr.tidyverse.org/) is a grammar of data manipulation package for R

## There's so much more to learn... 😃

*Here are a few examples...*

### Storytelling with data visualization

- Data visualization is often used to tell a story: 
    - analysts sharing the results of their work
    - staff convincing managers to make a particular decision
    - leaders proving their impact

- What are effective ways to tell a story with data visualization?

- This book provides some practical guidance on how to better communicate visually with data:

    > Cole Nussbaumer Knaflic. *Storytelling with Data: A Data Visualization Guide for Business Professionals*. Wiley, 2015.

### Getting data through web scraping

- **Web scraping** is the process of collecting structured data from websites in an automated fashion


- We learned about some rudimentary web scraping functionality in Pandas


- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) is a popular Python library for web scraping

### Getting data through website APIs

- Some websites have an **API** (application programming interface) that lets you access its data in a programmatic way


- For example, [the Twitter API](https://developer.twitter.com/en/docs) lets you write code to retrieve tweets so that you can analyze them with the techniques of your choice

### Interoperability between R and Python

- You've used R in your other operations research classes &mdash; how can you tie what you've learned in this class to R?


- For example, suppose you wrangled some data in Python/Pandas and now, you want to run a linear regression on that data using R


- You could write the Pandas DataFrame to a CSV file, and then read that CSV file back into R


- Or, you could use [rpy2](https://rpy2.github.io/doc.html), a Python library that lets you call R directly from inside Python!


- Alternately, you could use [reticulate](https://rstudio.github.io/reticulate/), an R library that lets you do the opposite: call Python directly from inside R

## I hope you find this course useful!

- Go be a data wrangling and visualization superstar in your academic, professional, and personal projects ⭐️