# 5. DataFrame Attributes and Methods

### Objectives

In previous notebooks, we covered the most common and fundamental attributes and methods for a Series. This notebook will take on a similar task for DataFrames.

## DataFrames vs Series

DataFrames and Series are extremely similar objects. A Series is simply a single column of a DataFrame. Series have an index and values but no columns. A DataFrame can be thought of as a collection of Series objects each directly accessible by passing the column name to the brackets of the DataFrame.

## View the API for complete list of functionality
Just as we did for Series, it can be helpful to see the entire list of attributes and methods for a DataFrame. Please visit the [DataFrame section][1] of the API.

## Best of DataFrame API
DataFrames, as like Series, have an abundance of attributes and methods that do not give any additional functionality to the library. We will focus on the core attributes and methods that give you the most power to complete a data analysis.

## Minimally Sufficient Pandas
I can't stress enough how important it is to stick with a minimal subset of pandas when doing an analysis. Using more obscure methods does not make you a better analyst. This is akin to going to party and using the most obscure and difficult words to impress the guests. The point of a data analysis is to clearly expose the information held within the data. Just about everything that you want to do can be clearly expressed with minimal Pandas syntax. It is this syntax that we focus on during class.

## Bikes Dataset
We will use the bikes dataset to introduce the core attributes and methods of DataFrames.

[1]: http://pandas.pydata.org/pandas-docs/stable/api.html#dataframe

In [1]:
import pandas as pd

bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])
bikes.head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


## Core DataFrame Attributes

See the complete list of [DataFrame attributes][1]

* **`index`**
* **`columns`**
* **`values`**
* **`dtypes`**
* **`shape`**
* **`size`**

Let's further explore these attributes:

[1]: http://pandas.pydata.org/pandas-docs/stable/api.html#attributes-and-underlying-data

### Index Objects
The default index for both DataFrames and Series is a **`RangeIndex`**. This is a unique object to Pandas. All index objects have their own attributes and methods, which you can [view in the API][1]. There are actually about a dozen different Index objects, which sounds intimidating. But, you will rarely need to interact with these Index objects directly so its not necessary to put much time in studying their attributes and methods.

[1]: http://pandas.pydata.org/pandas-docs/stable/api.html#attributes-and-underlying-data

In [2]:
bikes.index

RangeIndex(start=0, stop=50089, step=1)

## The columns are an Index object
The columns are always going to be some kind of Index object just like the index. You can more or less think of the Index objects as an array of data.

In [3]:
bikes.columns

Index(['trip_id', 'usertype', 'gender', 'starttime', 'stoptime',
       'tripduration', 'from_station_name', 'latitude_start',
       'longitude_start', 'dpcapacity_start', 'to_station_name',
       'latitude_end', 'longitude_end', 'dpcapacity_end', 'temperature',
       'visibility', 'wind_speed', 'precipitation', 'events'],
      dtype='object')

In [4]:
type(bikes.columns)

pandas.core.indexes.base.Index

### `values` returns a 2-D NumPy array
The **`values`** attribute returns a 2-D NumPy array.

In [5]:
bikes.values

array([[7147, 'Subscriber', 'Male', ..., 12.7, -9999.0, 'mostlycloudy'],
       [7524, 'Subscriber', 'Male', ..., 6.9, -9999.0, 'partlycloudy'],
       [10927, 'Subscriber', 'Male', ..., 16.1, -9999.0, 'mostlycloudy'],
       ...,
       [17534972, 'Subscriber', 'Male', ..., 16.1, -9999.0,
        'partlycloudy'],
       [17535645, 'Subscriber', 'Female', ..., 11.5, -9999.0,
        'partlycloudy'],
       [17536246, 'Subscriber', 'Male', ..., 15.0, -9999.0,
        'partlycloudy']], dtype=object)

### `dtypes` returns a Series of data types
The **`dtypes`** attributes returns a Series of data types where the index of the Series is the column names and the values are the actual data type.

In [6]:
bikes.dtypes

trip_id                       int64
usertype                     object
gender                       object
starttime            datetime64[ns]
stoptime             datetime64[ns]
tripduration                  int64
from_station_name            object
latitude_start              float64
longitude_start             float64
dpcapacity_start            float64
to_station_name              object
latitude_end                float64
longitude_end               float64
dpcapacity_end              float64
temperature                 float64
visibility                  float64
wind_speed                  float64
precipitation               float64
events                       object
dtype: object

### `shape` returns a tuple of the number of rows and columns
The shape attribute will always return a Python tuple of length 2 containing the number of rows and columns

In [7]:
shape = bikes.shape
shape

(50089, 19)

In [8]:
type(shape)

tuple

You can get the number of rows and columns as an integer by selecting them from the tuple:

In [9]:
shape[0]

50089

In [10]:
shape[1]

19

### `size` return the total number of elements in the DataFrame
The **`size`** attribute is a bit tricky and returns the total number of elements in the DataFrame. This is simply the number of rows multiplied by the number of columns.

In [11]:
bikes.size

951691

Just the same as this:

In [12]:
shape[0] * shape[1]

951691

### The `len` function returns the number of rows
The built-in Python **`len`** function returns the number of rows.

In [13]:
len(bikes)

50089

## The `info` method
The **`info`** method returns the: 
* data types of each column
* type of index
* number of columns
* number of non-missing values in each column
* frequency count of data types
* amount of memory used

This information is **printed** to the screen. There is no object that is returned.

In [14]:
bikes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50089 entries, 0 to 50088
Data columns (total 19 columns):
trip_id              50089 non-null int64
usertype             50089 non-null object
gender               50089 non-null object
starttime            50089 non-null datetime64[ns]
stoptime             50089 non-null datetime64[ns]
tripduration         50089 non-null int64
from_station_name    50089 non-null object
latitude_start       50083 non-null float64
longitude_start      50083 non-null float64
dpcapacity_start     50083 non-null float64
to_station_name      50089 non-null object
latitude_end         50077 non-null float64
longitude_end        50077 non-null float64
dpcapacity_end       50077 non-null float64
temperature          50089 non-null float64
visibility           50089 non-null float64
wind_speed           50089 non-null float64
precipitation        50089 non-null float64
events               50089 non-null object
dtypes: datetime64[ns](2), float64(10), int64(2), 

# Arithmetic Operations with a DataFrame
We now cover what happens when we use the arithmetic operators **`+, -, *, /, **, //`** on a DataFrame. Let's say we have DataFrame, **`df`** and execute **`df + 5`**. Pandas will attempt to add 5 to each value in the DataFrame. This operation will only work if all the columns are numeric (or boolean).

### Attempt to add 5 to bikes
If we try and add 5 to bikes we will get an error as we have a mix of numeric, object, and datetime columns. 

In [15]:
bikes + 5 # only works for df with all int/float

ValueError: Cannot add integral value to Timestamp without freq.

### Select just numeric data with `select_dtypes`
DataFrames have a unique method to them called **`select_dtypes`** which selects a subset of columns with the passed type. Use the data type you want as a string to select it - int, float, bool, object, datetime, timedelta, and category.

Let's see some examples:

In [16]:
bikes.select_dtypes('int').head()

Unnamed: 0,trip_id,tripduration
0,7147,993
1,7524,623
2,10927,1040
3,12907,667
4,13168,130


#### Use the string 'number' to select all numeric data
This selects all int and float columns.

In [17]:
bikes_number = bikes.select_dtypes('number')
bikes_number.head()

Unnamed: 0,trip_id,tripduration,latitude_start,longitude_start,dpcapacity_start,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation
0,7147,993,41.88105,-87.61697,11.0,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0
1,7524,623,41.88338,-87.64117,31.0,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0
2,10927,1040,41.909592,-87.653497,15.0,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0
3,12907,667,41.894556,-87.653449,19.0,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0
4,13168,130,41.909396,-87.677692,19.0,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0


## Try adding 5 to `bikes_number`
Let's try adding 5 to the **`bikes_number`** DataFrame which consists of only numeric columns:

In [18]:
(bikes_number + 5).head()

Unnamed: 0,trip_id,tripduration,latitude_start,longitude_start,dpcapacity_start,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation
0,7152,998,46.88105,-82.61697,16.0,46.90096,-82.623777,20.0,78.9,15.0,17.7,-9994.0
1,7529,628,46.88338,-82.64117,36.0,46.89993,-82.63443,24.0,74.1,15.0,11.9,-9994.0
2,10932,1045,46.909592,-82.653497,20.0,46.88132,-82.629521,28.0,78.0,15.0,21.1,-9994.0
3,12912,672,46.894556,-82.653449,24.0,46.884576,-82.63189,36.0,77.0,15.0,21.1,-9994.0
4,13173,135,46.909396,-82.677692,24.0,46.909396,-82.677692,24.0,78.0,15.0,22.3,-9994.0


## Comparison Operators with DataFrames
The comparison operators work similarly to the mathematical ones and will return a DataFrame of all boolean columns:

In [19]:
(bikes_number > 5).head()

Unnamed: 0,trip_id,tripduration,latitude_start,longitude_start,dpcapacity_start,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation
0,True,True,True,False,True,True,False,True,True,True,True,False
1,True,True,True,False,True,True,False,True,True,True,True,False
2,True,True,True,False,True,True,False,True,True,True,True,False
3,True,True,True,False,True,True,False,True,True,True,True,False
4,True,True,True,False,True,True,False,True,True,True,True,False


## Take Care when Working with entire DataFrame
When you do one of the above operations, you are applying that operation to every value in the DataFrame. Make sure this is what you want because all values will be affected.

## Changing Display Options
Pandas gives you several options to change the display of the DataFrames and Series. DataFrames are output in HTML tables while Series are output in plain text.

Let's read in the college dataset and put the institution name ('INSTNM') in the index and then display it on the screen without the head method.

In [20]:
college = pd.read_csv('../data/college.csv', index_col='instnm')
college

Unnamed: 0_level_0,city,stabbr,hbcu,menonly,womenonly,relaffil,satvrmid,satmtmid,distanceonly,ugds,...,ugds_2mor,ugds_nra,ugds_unkn,pptug_ef,curroper,pctpell,pctfloan,ug25abv,md_earn_wne_p10,grad_debt_mdn_supp
instnm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,...,0.0000,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,...,0.0368,0.0179,0.0100,0.2607,1,0.3460,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,...,0.0000,0.0000,0.2715,0.4536,1,0.6801,0.7795,0.8540,40100,23370
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,...,0.0172,0.0332,0.0350,0.2146,1,0.3072,0.4596,0.2640,45500,24097
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,...,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.1270,26600,33118.5
The University of Alabama,Tuscaloosa,AL,0.0,0.0,0.0,0,555.0,565.0,0.0,29851.0,...,0.0261,0.0268,0.0026,0.0844,1,0.2040,0.4010,0.0853,41900,23750
Central Alabama Community College,Alexander City,AL,0.0,0.0,0.0,0,,,0.0,1592.0,...,0.0000,0.0000,0.0019,0.3882,1,0.5892,0.3977,0.3153,27500,16127
Athens State University,Athens,AL,0.0,0.0,0.0,0,,,0.0,2991.0,...,0.0174,0.0057,0.0334,0.5517,1,0.4088,0.6296,0.6410,39000,18595
Auburn University at Montgomery,Montgomery,AL,0.0,0.0,0.0,0,486.0,509.0,0.0,4304.0,...,0.0297,0.0397,0.0246,0.2853,1,0.4192,0.5803,0.2930,35000,21335
Auburn University,Auburn,AL,0.0,0.0,0.0,0,575.0,588.0,0.0,20514.0,...,0.0000,0.0100,0.0140,0.0862,1,0.1610,0.3494,0.0415,45700,21831


## Not enough column space
If you scroll to the right, you'll notice that some columns are hidden. You will see a column filled with three dots denoting this.

## The most common options
I typically only change the display of two of the available display options - **`max_columns`** and **`max_rows`**.

## Use dot notation to find, view, and change the options
You don't have to remember all the option names to make a change. Instead, you only must remember that they are all located under **`pd.options`**. From here we choose **`display`** and then press **tab** to have all the options come down in a menu. The code will look like this:

```
>>> pd.options.display.<press tab>
```

### Output `max_columns` and `max_rows`
We can output the default values for these options.

In [21]:
pd.options.display.max_columns

20

In [22]:
pd.options.display.max_rows

60

## Change the options with an assignment statement
You can use an assignment statement to change the options directly. Let's find out the number of columns in our DataFrame and change the options so all of them will be visible.

Let's also change the `max_rows` to 12 to limit the long output.

In [23]:
college.shape

(7535, 26)

Let's go with a 40 column display:

In [24]:
pd.options.display.max_columns = 40
pd.options.display.max_rows = 12

## Use the new display settings
We can now view all of the columns and a fraction of the rows that fit on our screen.

In [25]:
college.head()

Unnamed: 0_level_0,city,stabbr,hbcu,menonly,womenonly,relaffil,satvrmid,satmtmid,distanceonly,ugds,ugds_white,ugds_black,ugds_hisp,ugds_asian,ugds_aian,ugds_nhpi,ugds_2mor,ugds_nra,ugds_unkn,pptug_ef,curroper,pctpell,pctfloan,ug25abv,md_earn_wne_p10,grad_debt_mdn_supp
instnm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


## Many more options
There are a couple dozen more display options. Feel free to explore them.

# Data Dictionaries
A data dictionary is a very important element of a data analysis and at a minimum gives us the column name and description of each column. Other information on each column can be kept in it such as the data type of each column or number of missing values. This "data on the data" is often referred to as **metadata**.

The college dataset has an available data dictionary that can be read in as a DataFrame. It has the descriptions of each column, which is important with this dataset as the column names are not easily decipherable.

In [26]:
pd.read_csv('../data/college_data_dictionary.csv')
# data dic: list every columns and description

Unnamed: 0,column_name,description
0,instnm,Institution Name
1,city,City Location
2,stabbr,State Abbreviation
3,hbcu,Historically Black College or University
4,menonly,0/1 Men Only
5,womenonly,0/1 Women only
...,...,...
21,curroper,0/1 Currently Operating
22,pctpell,Percent Students with Pell grant
23,pctfloan,Percent Students with federal loan


# Extra

## More on selecting specific data types

In [27]:
bikes.select_dtypes('float').head()

Unnamed: 0,latitude_start,longitude_start,dpcapacity_start,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation
0,41.88105,-87.61697,11.0,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0
1,41.88338,-87.64117,31.0,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0
2,41.909592,-87.653497,15.0,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0
3,41.894556,-87.653449,19.0,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0
4,41.909396,-87.677692,19.0,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0


In [28]:
bikes.select_dtypes('datetime').head()

Unnamed: 0,starttime,stoptime
0,2013-06-28 19:01:00,2013-06-28 19:17:00
1,2013-06-28 22:53:00,2013-06-28 23:03:00
2,2013-06-30 14:43:00,2013-06-30 15:01:00
3,2013-07-01 10:05:00,2013-07-01 10:16:00
4,2013-07-01 11:16:00,2013-07-01 11:18:00


In [29]:
bikes.select_dtypes('object').head()

Unnamed: 0,usertype,gender,from_station_name,to_station_name,events
0,Subscriber,Male,Lake Shore Dr & Monroe St,Michigan Ave & Oak St,mostlycloudy
1,Subscriber,Male,Clinton St & Washington Blvd,Wells St & Walton St,partlycloudy
2,Subscriber,Male,Sheffield Ave & Kingsbury St,Dearborn St & Monroe St,mostlycloudy
3,Subscriber,Male,Carpenter St & Huron St,Clark St & Randolph St,mostlycloudy
4,Subscriber,Male,Damen Ave & Pierce Ave,Damen Ave & Pierce Ave,partlycloudy


Use a list to select multiple data types:

In [30]:
bikes.select_dtypes(['int', 'object']).head()

Unnamed: 0,trip_id,usertype,gender,tripduration,from_station_name,to_station_name,events
0,7147,Subscriber,Male,993,Lake Shore Dr & Monroe St,Michigan Ave & Oak St,mostlycloudy
1,7524,Subscriber,Male,623,Clinton St & Washington Blvd,Wells St & Walton St,partlycloudy
2,10927,Subscriber,Male,1040,Sheffield Ave & Kingsbury St,Dearborn St & Monroe St,mostlycloudy
3,12907,Subscriber,Male,667,Carpenter St & Huron St,Clark St & Randolph St,mostlycloudy
4,13168,Subscriber,Male,130,Damen Ave & Pierce Ave,Damen Ave & Pierce Ave,partlycloudy


### Other numeric operators
All the other numeric operators work in the same manner. They all apply the operation to every value in the DataFrame. For instance, the following does floor division to each value:

In [31]:
(bikes_number // 17).head()

Unnamed: 0,trip_id,tripduration,latitude_start,longitude_start,dpcapacity_start,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation
0,420,58,2.0,-6.0,0.0,2.0,-6.0,0.0,4.0,0.0,0.0,-589.0
1,442,36,2.0,-6.0,1.0,2.0,-6.0,1.0,4.0,0.0,0.0,-589.0
2,642,61,2.0,-6.0,0.0,2.0,-6.0,1.0,4.0,0.0,0.0,-589.0
3,759,39,2.0,-6.0,1.0,2.0,-6.0,1.0,4.0,0.0,0.0,-589.0
4,774,7,2.0,-6.0,1.0,2.0,-6.0,1.0,4.0,0.0,1.0,-589.0


### Strange addition and multiplication with objects
Addition actually does work with strings by appending the word being added to each value. You can also multiply strings by an integer which will make them repeat.

In [32]:
(bikes.select_dtypes('object') + 'SOMESTRING').head()

Unnamed: 0,usertype,gender,from_station_name,to_station_name,events
0,SubscriberSOMESTRING,MaleSOMESTRING,Lake Shore Dr & Monroe StSOMESTRING,Michigan Ave & Oak StSOMESTRING,mostlycloudySOMESTRING
1,SubscriberSOMESTRING,MaleSOMESTRING,Clinton St & Washington BlvdSOMESTRING,Wells St & Walton StSOMESTRING,partlycloudySOMESTRING
2,SubscriberSOMESTRING,MaleSOMESTRING,Sheffield Ave & Kingsbury StSOMESTRING,Dearborn St & Monroe StSOMESTRING,mostlycloudySOMESTRING
3,SubscriberSOMESTRING,MaleSOMESTRING,Carpenter St & Huron StSOMESTRING,Clark St & Randolph StSOMESTRING,mostlycloudySOMESTRING
4,SubscriberSOMESTRING,MaleSOMESTRING,Damen Ave & Pierce AveSOMESTRING,Damen Ave & Pierce AveSOMESTRING,partlycloudySOMESTRING


In [33]:
(bikes.select_dtypes('object') * 3).head()

Unnamed: 0,usertype,gender,from_station_name,to_station_name,events
0,SubscriberSubscriberSubscriber,MaleMaleMale,Lake Shore Dr & Monroe StLake Shore Dr & Monro...,Michigan Ave & Oak StMichigan Ave & Oak StMich...,mostlycloudymostlycloudymostlycloudy
1,SubscriberSubscriberSubscriber,MaleMaleMale,Clinton St & Washington BlvdClinton St & Washi...,Wells St & Walton StWells St & Walton StWells ...,partlycloudypartlycloudypartlycloudy
2,SubscriberSubscriberSubscriber,MaleMaleMale,Sheffield Ave & Kingsbury StSheffield Ave & Ki...,Dearborn St & Monroe StDearborn St & Monroe St...,mostlycloudymostlycloudymostlycloudy
3,SubscriberSubscriberSubscriber,MaleMaleMale,Carpenter St & Huron StCarpenter St & Huron St...,Clark St & Randolph StClark St & Randolph StCl...,mostlycloudymostlycloudymostlycloudy
4,SubscriberSubscriberSubscriber,MaleMaleMale,Damen Ave & Pierce AveDamen Ave & Pierce AveDa...,Damen Ave & Pierce AveDamen Ave & Pierce AveDa...,partlycloudypartlycloudypartlycloudy


# Explore selecting data types, arithmetic operations, and changing display options

# Mini Case Study: Finding the attributes and methods in common between DataFrames and Series
The DataFrames and Series have most of their attributes and methods in common so you won't have to remember too much more to use them.

Let's find all the public functionality that is in-common and unique to DataFrames and Series.

Use a set comprehension to get all public methods for each type:

In [None]:
df_public = {method for method in dir(pd.DataFrame) if method[0] != '_'}
series_public = {method for method in dir(pd.Series) if method[0] != '_'}

Output the total number of methods of each. Notice how they have nearly the same amount:

In [None]:
len(df_public)

In [None]:
len(series_public)

Let's find the number of methods in-common. About 90% of the methods are the same:

In [None]:
len(df_public & series_public)

#### Output the methods unique to DataFrames

In [None]:
df_public - series_public

#### Output the methods unique to Series

In [None]:
series_public - df_public