In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from numpy import NaN
from glob import glob
import re

In [None]:
pd.set_option('max_columns', 200)
pd.set_option('max_rows', 300)
pd.set_option('display.expand_frame_repr', True)

# Review of Pandas DataFrames

## Data ingestion & inspection

### pandas DataFrames

* Example: DataFrame of Apple Stock data

In [None]:
AAPL = pd.read_csv(r'DataCamp-master/11-pandas-foundations/_datasets/AAPL.csv',
                   index_col='Date', parse_dates=True)

In [None]:
AAPL.head()

* The rows are labeled by a special data structure called an Index.
    * Indexes in Pandas are tailored lists of labels that permit fast look-up and some powerful relational operations.
* The index labels in the AAPL DataFrame are dates in reverse chronological order.
* Labeled rows & columns improves the clarity and intuition of many data analysis tasks.

In [None]:
type(AAPL)

In [None]:
AAPL.shape

In [None]:
AAPL.columns

In [None]:
type(AAPL.columns)

In [None]:
AAPL.index

In [None]:
type(AAPL.index)

* DataFrames can be sliced like NumPy arrays or Python lists using colons to specify the start, end and stride of a slice.

In [None]:
# Start of the DataFrame to the 5th row, inclusive of all columns
AAPL.iloc[:5,:]

In [None]:
# Start at the 5th last row to the end of the DataFrame using a negative index
AAPL.iloc[-5:,:]

In [None]:
AAPL.head()

In [None]:
AAPL.tail()

In [None]:
AAPL.info()

In [None]:
AAPL.Close.plot(kind='line')

# Add first subplot
plt.subplot(2, 1, 1)
AAPL.Close.plot(kind='line')

# Add title and specify axis labels
plt.title('Close')
plt.ylabel('Value - $')
plt.xlabel('Year')

# Add second subplot
plt.subplot(2, 1, 2)
AAPL.Volume.plot(kind='line')

# Add title and specify axis labels
plt.title('Volume')
plt.ylabel('Number of Shares')
plt.xlabel('Year')

# Display the plots
plt.tight_layout()
plt.show()

### Broadcasting

* Assigning scalar value to column slice broadcasts value to each row

In [None]:
AAPL.iloc[::3, -1] = np.nan  # every 3rd row of Volume is now NaN

In [None]:
AAPL.head(7)

In [None]:
AAPL.info()

* Note Volume now has few non-null numbers

### Series

In [None]:
low = AAPL.Low

In [None]:
type(low)

In [None]:
low.head()

In [None]:
lows = low.values

In [None]:
type(lows)

In [None]:
lows[0:5]

* A Pandas Seriew, then, is a 1D labeled NumPy array and a DataFrame is a 2D labeled array whose columns as Series

### Inspecting your data

You can use the DataFrame methods ```.head()``` and ```.tail()``` to view the first few and last few rows of a DataFrame. In this exercise, we have imported pandas as ```pd``` and loaded population data from 1960 to 2014 as a DataFrame ```df```. This dataset was obtained from the World Bank.

Your job is to use ```df.head()``` and ```df.tail()``` to verify that the first and last rows match a file on disk. In later exercises, you will see how to extract values from DataFrames with indexing, but for now, manually copy/paste or type values into assignment statements where needed. Select the correct answer for the first and last values in the ```'Year'``` and ```'Total Population'``` columns.

### Instructions

Possible Answers
* First: 1980, 26183676.0; Last: 2000, 35.
* First: 1960, 92495902.0; Last: 2014, 15245855.0.
* First: 40.472, 2001; Last: 44.5, 1880.
* First: CSS, 104170.0; Last: USA, 95.203.

In [None]:
wb_df = pd.read_csv(r'DataCamp-master/11-pandas-foundations/_datasets/world_ind_pop_data.csv')

In [None]:
wb_df.head()

In [None]:
wb_df.tail()

### DataFrame data types

Pandas is aware of the data types in the columns of your DataFrame. It is also aware of null and ```NaN``` ('Not-a-Number') types which often indicate missing data. In this exercise, we have imported pandas as ```pd``` and read in the world population data which contains some ```NaN``` values, a value often used as a place-holder for missing or otherwise invalid data entries. Your job is to use ```df.info()``` to determine information about the total count of ```non-null``` entries and infer the total count of ```'null'``` entries, which likely indicates missing data. Select the best description of this data set from the following:

### Instructions

Possible Answers
* The data is all of type float64 and none of it is missing.
* The data is of mixed type, and 9914 of it is missing.
* The data is of mixed type, and 3460 float64s are missing.
* The data is all of type float64, and 3460 float64s are missing.

```python
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13374 entries, 0 to 13373
Data columns (total 5 columns):
CountryName                      13374 non-null object
CountryCode                      13374 non-null object
Year                             13374 non-null int64
Total Population                 9914 non-null float64
Urban population (% of total)    13374 non-null float64
dtypes: float64(2), int64(1), object(2)
memory usage: 522.5+ KB
```

In [None]:
wb_df.info()

### NumPy and pandas working together
Pandas depends upon and interoperates with NumPy, the Python library for fast numeric array computations. For example, you can use the DataFrame attribute ```.values``` to represent a DataFrame ```df``` as a NumPy array. You can also pass pandas data structures to NumPy methods. In this exercise, we have imported pandas as ```pd``` and loaded world population data every 10 years since 1960 into the DataFrame ```df```. This dataset was derived from the one used in the previous exercise.

Your job is to extract the values and store them in an array using the attribute ```.values```. You'll then use those values as input into the NumPy ```np.log10()``` method to compute the base 10 logarithm of the population values. Finally, you will pass the entire pandas DataFrame into the same NumPy ```np.log10()``` method and compare the results.

### Instructions

* Import ```numpy``` using the standard alias ```np```.
* Assign the numerical values in the DataFrame ```df``` to an array ```np_vals``` using the attribute ```values```.
* Pass ```np_vals``` into the NumPy method ```log10()``` and store the results in ```np_vals_log10```.
* Pass the entire ```df``` DataFrame into the NumPy method ```log10()``` and store the results in ```df_log10```.
* Inspect the output of the ```print()``` code to see the ```type()``` of the variables that you created.

In [None]:
pop_df = pd.read_csv(r'DataCamp-master/11-pandas-foundations/_datasets/world_population.csv')

In [None]:
pop_df.info()

In [None]:
# Create array of DataFrame values: np_vals
np_vals = pop_df.values

In [None]:
np_vals

In [None]:
# Create new array of base 10 logarithm values: np_vals_log10
np_vals_log10 = np.log10(np_vals)

In [None]:
np_vals_log10

In [None]:
# Create array of new DataFrame by passing df to np.log10(): df_log10
pop_df_log10 = np.log10(pop_df)

In [None]:
pop_df_log10

In [None]:
# Print original and new data containers
[print(x, 'has type', type(eval(x))) for x in ['np_vals', 'np_vals_log10', 'pop_df', 'pop_df_log10']]

#### As a data scientist, you'll frequently interact with NumPy arrays, pandas Series, and pandas DataFrames, and you'll leverage a variety of NumPy and pandas methods to perform your desired computations. Understanding how NumPy and pandas work together will prove to be very useful.

### Building DataFrames from Scratch

* DataFrames read in from CSV
```python
pd.read_csv()
```

* DataFrames from dict (1)

In [None]:
data = {'weekday': ['Sun', 'Sun', 'Mon', 'Mon'],
        'city': ['Austin', 'Dallas', 'Austin', 'Dallas'],
        'visitors': [139, 237, 326, 456],
        'signups': [7, 12, 3, 5]}

In [None]:
users = pd.DataFrame(data)

In [None]:
users

* DataFrames from dict (2)
    * lists

In [None]:
cities = ['Austin', 'Dallas', 'Austin', 'Dallas']
signups = [7, 12, 3, 5]
weekdays = ['Sun', 'Sun', 'Mon', 'Mon']
visitors = [139, 237, 326, 456]

list_labels = ['city', 'signups', 'visitors', 'weekday']
list_cols = [cities, signups, visitors, weekdays]  # list of lists

zipped = list(zip(list_labels, list_cols))  # tuples
zipped

* DataFrames from dict (3)

In [None]:
data2 = dict(zipped)

In [None]:
users2 = pd.DataFrame(data2)

In [None]:
users2

### Broadcasting

* Saves time by generating long lists, arrays or columns without loops

In [None]:
users['fees'] = 0  # Broadcasts value to entire column

In [None]:
users

### Broadcasting with a dict

In [None]:
heights = [59.0, 65.2, 62.9, 65.4, 63.7, 65.7, 64.1]

In [None]:
data = {'height': heights, 'sex': 'M'}  # M is broadcast to the entire column

In [None]:
results = pd.DataFrame(data)

In [None]:
results

### Index and columns

* We can assign list of strings to the attributes columns and index as long as they are of suitable length.

In [None]:
results.columns = ['height (in)', 'sex']

In [None]:
results.index = ['A', 'B', 'C', 'D', 'E', 'F', 'G']

In [None]:
results

### Zip lists to build a DataFrame

In this exercise, you're going to make a pandas DataFrame of the top three countries to win gold medals since 1896 by first building a dictionary. ```list_keys``` contains the column names ```'Country'``` and ```'Total'```. ```list_values``` contains the full names of each country and the number of gold medals awarded. The values have been taken from [Wikipedia](#https://en.wikipedia.org/wiki/All-time_Olympic_Games_medal_table).

Your job is to use these lists to construct a list of tuples, use the list of tuples to construct a dictionary, and then use that dictionary to construct a DataFrame. In doing so, you'll make use of the ```list()```, ```zip()```, ```dict()``` and ```pd.DataFrame()``` functions. Pandas has already been imported as pd.

Note: The [zip()](#https://docs.python.org/3/library/functions.html#zip) function in Python 3 and above returns a special zip object, which is essentially a generator. To convert this ```zip``` object into a list, you'll need to use ```list()```. You can learn more about the ```zip()``` function as well as generators in [Python Data Science Toolbox (Part 2)](#https://www.datacamp.com/courses/python-data-science-toolbox-part-2).

### Instructions

* Zip the 2 lists ```list_keys``` and ```list_values``` together into one list of (key, value) tuples. Be sure to convert the ```zip``` object into a list, and store the result in ```zipped```.
* Inspect the contents of ```zipped``` using ```print()```. This has been done for you.
* Construct a dictionary using ```zipped```. Store the result as ```data```.
* Construct a DataFrame using the dictionary. Store the result as ```df```.

In [None]:
list_keys = ['Country', 'Total']
list_values = [['United States', 'Soviet Union', 'United Kingdom'], [1118, 473, 273]]

In [None]:
zipped = list(zip(list_keys, list_values))  # tuples
zipped

In [None]:
data = dict(zipped)

In [None]:
data

In [None]:
data_df = pd.DataFrame.from_dict(data)

In [None]:
data_df

### Labeling your data

You can use the DataFrame attribute ```df.columns``` to view and assign new string labels to columns in a pandas DataFrame.

In this exercise, we have imported pandas as ```pd``` and defined a DataFrame ```df``` containing top Billboard hits from the 1980s (from [Wikipedia](#https://en.wikipedia.org/wiki/List_of_Billboard_Hot_100_number-one_singles_of_the_1980s#1980)). Each row has the year, artist, song name and the number of weeks at the top. However, this DataFrame has the column labels ```a, b, c, d```. Your job is to use the ```df.columns``` attribute to re-assign descriptive column labels.

### Instructions

* Create a list of new column labels with ```'year'```, ```'artist'```, ```'song'```, ```'chart weeks'```, and assign it to ```list_labels```.
* Assign your list of labels to ```df.columns```.

In [None]:
billboard_values = np.array([['1980', 'Blondie', 'Call Me', '6'],
                             ['1981', 'Chistorpher Cross', 'Arthurs Theme', '3'],
                             ['1982', 'Joan Jett', 'I Love Rock and Roll', '7']]).transpose()
billboard_keys = ['a', 'b', 'c', 'd']

billboard_zipped = list(zip(billboard_keys, billboard_values))
billboard_zipped

In [None]:
billboard_dict = dict(billboard_zipped)

In [None]:
billboard_dict

In [None]:
billboard = pd.DataFrame.from_dict(billboard_dict)

In [None]:
billboard

In [None]:
# Build a list of labels: list_labels
list_labels = ['year', 'artist', 'song', 'chart weeks']

In [None]:
# Assign the list of labels to the columns attribute: df.columns
billboard.columns = list_labels

In [None]:
billboard

### Building DataFrames with broadcasting

You can implicitly use 'broadcasting', a feature of NumPy, when creating pandas DataFrames. In this exercise, you're going to create a DataFrame of cities in Pennsylvania that contains the city name in one column and the state name in the second. We have imported the names of 15 cities as the list ```cities```.

Your job is to construct a DataFrame from the list of cities and the string ```'PA'```.

### Instructions

* Make a string object with the value 'PA' and assign it to state.
* Construct a dictionary with 2 key:value pairs: 'state':state and 'city':cities.
* Construct a pandas DataFrame from the dictionary you created and assign it to df

In [None]:
cities = ['Manheim', 'Preston park', 'Biglerville',
          'Indiana', 'Curwensville', 'Crown',
          'Harveys lake', 'Mineral springs', 'Cassville',\
          'Hannastown', 'Saltsburg', 'Tunkhannock',
          'Pittsburgh', 'Lemasters', 'Great bend']

In [None]:
# Make a string with the value 'PA': state
state = 'PA'

In [None]:
# Construct a dictionary: data
data = {'state': state, 'city': cities}

In [None]:
# Construct a DataFrame from dictionary data: df
pa_df = pd.DataFrame.from_dict(data)

In [None]:
# Print the DataFrame
print(pa_df)

### Importing & Exporting Data

* Dataset: Sunspot observations collected from SILSO

```python
Format: Comma Separated values (adapted for import in spreadsheets)
The separator is the semicolon ';'.

Contents:
Column 1-3: Gregorian calendar date
- Year
- Month
- Day
Column 4: Date in fraction of year.
Column 5: Daily total sunspot number. A value of -1 indicates that no number is available for that day (missing value).
Column 6: Daily standard deviation of the input sunspot numbers from individual stations.
Column 7: Number of observations used to compute the daily value.
Column 8: Definitive/provisional indicator. '1' indicates that the value is definitive. '0' indicates that the value is still provisional.
```

In [None]:
filepath = r'data/silso_sunspot_data_1818-2019.csv'

In [None]:
sunspots = pd.read_csv(filepath, sep=';')
sunspots.info()

In [None]:
sunspots.iloc[10:20, :]

#### Problems

* CSV file has no column headers
    * Columns 0-2: Gregorian date (year, month, day)
    * Column 3: Date as fraction as year
    * Column 4: Daily total sunspot number
    * Column 5: Definitive / provisional indicator (1 OR 0)
* Missing values in column 4: indicated by -1
* Date representation inconvenient

In [None]:
sunspots = pd.read_csv(filepath, sep=';', header=None)
sunspots.iloc[10:20, :]

#### Using names keyword

In [None]:
col_names = ['year', 'month', 'day', 'dec_date',
             'tot_sunspots', 'daily_std', 'observations', 'definite']

In [None]:
sunspots = pd.read_csv(filepath, sep=';', header=None, names=col_names)
sunspots.iloc[10:20, :]

#### Using na_values keyword (1)

In [None]:
sunspots = pd.read_csv(filepath, sep=';',
                       header=None,
                       names=col_names,
                       na_values='-1')
sunspots.iloc[10:20, :]

#### Using na_values keyword (2)

In [None]:
sunspots = pd.read_csv(filepath, sep=';',
                       header=None,
                       names=col_names,
                       na_values='  -1')
sunspots.iloc[10:20, :]

In [None]:
sunspots.info()

#### Using na_values keyword (3)

In [None]:
sunspots = pd.read_csv(filepath, sep=';',
                       header=None,
                       names=col_names,
                       na_values={'tot_sunspots':['  -1'],
                                  'daily_std':['-1']})
sunspots.iloc[10:20, :]

#### Using parse_dates keyword

In [None]:
sunspots = pd.read_csv(filepath, sep=';',
                       header=None,
                       names=col_names,
                       na_values={'tot_sunspots':['  -1'],
                                  'daily_std':['-1']},
                       parse_dates=[[0, 1, 2]])
sunspots.iloc[10:20, :]

#### Inspecting DataFrame

In [None]:
sunspots.info()

#### Using dates as index

In [None]:
sunspots.index = sunspots['year_month_day']
sunspots.index.name = 'date'
sunspots.iloc[10:20, :]

In [None]:
sunspots.info()

#### Trimming redundant columns

In [None]:
cols = ['tot_sunspots', 'daily_std', 'observations', 'definite']
sunspots = sunspots[cols]
sunspots.iloc[10:20, :]

#### Writing files

```python
out_csv = 'sunspots.csv'
sunspots.to_csv(out_csv)
out_tsv = 'sunspots.tsv'
sunspots.to_csv(out_tsv, sep='\t')
out_xlsx = 'sunspots.xlsx'
sunspots.to_excel(out_xlsx)
```

### Reading a flat file

In previous exercises, we have preloaded the data for you using the pandas function ```read_csv()```. Now, it's your turn! Your job is to read the World Bank population data you saw earlier into a DataFrame using ```read_csv()```. The file is available in the variable ```data_file```.

The next step is to reread the same file, but simultaneously rename the columns using the ```names``` keyword input parameter, set equal to a list of new column labels. You will also need to set ```header=0``` to rename the column labels.

Finish up by inspecting the result with ```df.head()``` and ```df.info()``` in the IPython Shell (changing ```df``` to the name of your DataFrame variable).

```pandas``` has already been imported and is available in the workspace as ```pd```.

### Instructions

* Use ***pd.read_csv()*** with the string ***data_file*** to read the CSV file into a DataFrame and assign it to ***df1***.
* Create a list of new column labels - ***'year'***, ***'population'*** - and assign it to the variable ***new_labels***.
* Reread the same file, again using ***pd.read_csv()***, but this time, add the keyword arguments ***header=0*** and ***names=new_labels***. Assign the resulting DataFrame to ***df2***.
* Print both the ***df1*** and ***df2*** DataFrames to see the change in column names. This has already been done for you.

In [None]:
data_file = 'DataCamp-master/11-pandas-foundations/_datasets/world_population.csv'

In [None]:
# Read in the file: df1
df1 = pd.read_csv(data_file)

In [None]:
# Create a list of the new column labels: new_labels
new_labels = ['year', 'population']

In [None]:
# Read in the file, specifying the header and names parameters: df2
df2 = pd.read_csv(data_file, header=0, names=new_labels)

In [None]:
# Print both the DataFrames
df1.head()

In [None]:
df2.head()

### Delimiters, headers, and extensions

Not all data files are clean and tidy. Pandas provides methods for reading those not-so-perfect data files that you encounter far too often.

In this exercise, you have monthly stock data for four companies downloaded from [Yahoo Finance](#http://finance.yahoo.com/). The data is stored as one row for each company and each column is the end-of-month closing price. The file name is given to you in the variable ```file_messy```.

In addition, this file has three aspects that may cause trouble for lesser tools: multiple header lines, comment records (rows) interleaved throughout the data rows, and space delimiters instead of commas.

Your job is to use pandas to read the data from this problematic ```file_messy``` using non-default input options with ```read_csv()``` so as to tidy up the mess at read time. Then, write the cleaned up data to a CSV file with the variable ```file_clean``` that has been prepared for you, as you might do in a real data workflow.

You can learn about the option input parameters needed by using ```help()``` on the pandas function ```pd.read_csv()```.

### Instructions

* Use ***pd.read_csv()*** without using any keyword arguments to read ***file_messy*** into a pandas DataFrame ***df1***.
* Use ***.head()*** to print the first 5 rows of ***df1*** and see how messy it is. Do this in the IPython Shell first so you can see how modifying ***read_csv()*** can clean up this mess.
* Using the keyword arguments ***delimiter=' '***, ***header=3*** and ***comment='#'***, use ***pd.read_csv()*** again to read ***file_messy*** into a new DataFrame ***df2***.
* Print the output of ***df2.head(***) to verify the file was read correctly.
* Use the DataFrame method ***.to_csv()*** to save the DataFrame ***df2*** to the variable ***file_clean***. Be sure to specify ***index=False***.
* Use the DataFrame method ***.to_excel()*** to save the DataFrame ***df2*** to the file ***'file_clean.xlsx'***. Again, remember to specify ***index=False***

In [None]:
# Read the raw file as-is: df1
file_messy = 'DataCamp-master/11-pandas-foundations/_datasets/messy_stock_data.tsv'
df1 = pd.read_csv(file_messy)

In [None]:
# Print the output of df1.head()
df1.head()

In [None]:
# Read in the file with the correct parameters: df2
df2 = pd.read_csv(file_messy, delimiter=' ', header=3, comment='#')

In [None]:
# Print the output of df2.head()
df2.head()

#### save files

```python
# Save the cleaned up DataFrame to a CSV file without the index
df2.to_csv(file_clean, index=False)
# Save the cleaned up DataFrame to an excel file without the index
df2.to_excel('file_clean.xlsx', index=False)
```

### Plotting with Pandas

In [None]:
cols = ['date', 'open', 'high', 'low', 'close', 'adj_close', 'volume']
aapl = pd.read_csv(r'DataCamp-master/11-pandas-foundations/_datasets/AAPL.csv',
                   names=cols,
                   index_col='date',
                   parse_dates=True,
                   header=0,
                   na_values='null')

In [None]:
aapl.head()

In [None]:
aapl.info()

In [None]:
aapl.tail()

#### Plotting arrays (matplotlib)

In [None]:
close_arr = aapl['close'].values

In [None]:
type(close_arr)

In [None]:
plt.plot(close_arr)

#### Plotting Series (matplotlib)

In [None]:
close_series = aapl['close']

In [None]:
type(close_series)

In [None]:
plt.plot(close_series)

#### Plotting Series (pandas)

In [None]:
close_series.plot()

#### Plotting DataFrames (pandas)

In [None]:
aapl.plot()

#### Plotting DataFrames (matplotlib)

In [None]:
plt.plot(aapl)

#### Fixing Scales

In [None]:
aapl.plot()
plt.yscale('log')
plt.show()

#### Customizing plots

In [None]:
aapl['open'].plot(color='b', style='.-', legend=True)
aapl['close'].plot(color='r', style='.', legend=True)
plt.axis(('2000', '2001', 0, 10))
plt.show()

#### Saving Plots

In [None]:
aapl.loc['2001':'2004', ['open', 'close', 'high', 'low']].plot()

plt.savefig('aapl.png')
plt.savefig('aapl.jpg')
plt.savefig('aapl.pdf')

plt.show()

Plotting series using pandas

Data visualization is often a very effective first step in gaining a rough understanding of a data set to be analyzed. Pandas provides data visualization by both depending upon and interoperating with the matplotlib library. You will now explore some of the basic plotting mechanics with pandas as well as related matplotlib options. We have pre-loaded a pandas DataFrame df which contains the data you need. Your job is to use the DataFrame method df.plot() to visualize the data, and then explore the optional matplotlib input parameters that this .plot() method accepts.

The pandas .plot() method makes calls to matplotlib to construct the plots. This means that you can use the skills you've learned in previous visualization courses to customize the plot. In this exercise, you'll add a custom title and axis labels to the figure.

Before plotting, inspect the DataFrame in the IPython Shell using df.head(). Also, use type(df) and note that it is a single column DataFrame.

Instructions

Create the plot with the DataFrame method df.plot(). Specify a color of 'red'.
Note: c and color are interchangeable as parameters here, but we ask you to be explicit and specify color.
Use plt.title() to give the plot a title of 'Temperature in Austin'.
Use plt.xlabel() to give the plot an x-axis label of 'Hours since midnight August 1, 2010'.
Use plt.ylabel() to give the plot a y-axis label of 'Temperature (degrees F)'.
Finally, display the plot using plt.show()

Plotting DataFrames

Comparing data from several columns can be very illuminating. Pandas makes doing so easy with multi-column DataFrames. By default, calling df.plot() will cause pandas to over-plot all column data, with each column as a single line. In this exercise, we have pre-loaded three columns of data from a weather data set - temperature, dew point, and pressure - but the problem is that pressure has different units of measure. The pressure data, measured in Atmospheres, has a different vertical scaling than that of the other two data columns, which are both measured in degrees Fahrenheit.

Your job is to plot all columns as a multi-line plot, to see the nature of vertical scaling problem. Then, use a list of column names passed into the DataFrame df[column_list] to limit plotting to just one column, and then just 2 columns of data. When you are finished, you will have created 4 plots. You can cycle through them by clicking on the 'Previous Plot' and 'Next Plot' buttons.

As in the previous exercise, inspect the DataFrame df in the IPython Shell using the .head() and .info() methods.

Instructions

Plot all columns together on one figure by calling df.plot(), and noting the vertical scaling problem.
Plot all columns as subplots. To do so, you need to specify subplots=True inside .plot().
Plot a single column of dew point data. To do this, define a column list containing a single column name 'Dew Point (deg F)', and call df[column_list1].plot().
Plot two columns of data, 'Temperature (deg F)' and 'Dew Point (deg F)'. To do this, define a list containing those column names and pass it into df[], as df[column_list2].plot().