# SECTION 07: SQL DATABASES

- onl01-dtsc-pt-041320
- 05/08/20

## LEARNING OBJECTIVES:

- Understand what a database is and how it is different than a DataFrame/Excel sheet.
- Understand how to read database map
    - Primary keys vs forgein keys
- Understand how to select, filter, order, and group data using SQL
- Understand the different types of Joins 


- Breakout Group Activity:  [Survive on sql-island](https://sql-island.informatik.uni-kl.de/)

## Questions:

- Can you explain the below solution when trying to find offices with 0 employees?

```python
cur.execute("""SELECT o.officeCode, o.city, COUNT(e.employeeNumber) AS n_employees
               FROM offices AS o 
               LEFT JOIN employees AS e
               USING(officeCode)
               GROUP BY officeCode
               HAVING n_employees = 0;""")
```

- What is the difference between sqlite.connect() and sqlite.Connection()?

- Subqueries Lab section 1.8 Can we go over this solution?

- Doing updates on multiple columns?

- I am not sure if I understand the difference between ‘altering’ versus ‘updating’ sql tables.
    - https://www.quora.com/What-is-the-difference-between-an-alter-and-an-update-command-in-SQL
    
- pandas .query and slicing

# SQL Databases

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-sql-introduction-online-ds-sp-000/master/images/Database-Schema.png">


- SQL is designed to work with **relational data**. 
- This really just means pieces of data that are **related to eachother**.

- Each table has a **primary key** (like a DataFrame index), with a unique index for each row in the database.
- The name of the primary key is preceded by an asterix (\*). 

- Columns that are the **primary key one on table** can also appear on **other tables**. 
    - Then it is refered to as a **foreign key** aka the primary key from a different ("foreign") table. 

### ⨠ Q: Why do we need databases? Why can't we just use a bunch of Pandas DataFrames?

- 

## Querying Databases - `SELECT`ing data



- To retrieve data from one or more tables you usually use a `SELECT` statement. 
```SQL
SELECT * FROM table;
```


> - NOTE: SQL queries dot not _have_ to be all-caps, but it is a convention to help differentiate sql syntax versus names of tables/columns.



- A more advanced select query.
```SQL
SELECT col1, col2, col3
FROM table
WHERE records match criteria
LIMIT 100;
```

- **All select statements must:**
    1. **start with the `SELECT`**
    2. followed by **what you want to select**. Separate multiple column names separated by a `,` 
    3. Then specify where the data is coming `FROM` followed by the table name. 
    4. **Afterward, you can provide conditions such as filters or sorting**.

```SQL
SELECT *
FROM payments
ORDER BY amount DESC
LIMIT 10;
```



## SQL with `sqlite3`

- Use `sqlite3` for SQL queries in Python.
    1. Connect to database
    2. Create a cursor.
    3. Form your query
    4. Execute/fetch your results.

```python
import sqlite3
connection = sqlite3.connect('pet_database.db') # Creates pet_database, but empty until create a table    
cursor = connection.cursor()


# Select from table
cursor.execute('''SELECT name FROM cats;''').fetchall()

```

In [1]:
# !pip install -U fsds
# from fsds.imports import *
import pandas as pd
import os,glob,sys
os.getcwd()

'/Users/jamesirving/Documents/GitHub/_STUDY GROUP PREP/online-dtsc-pt-041320-cohort-notes/Mod 1'

In [2]:
db = '../datasets/SQL/data.sqlite'

In [3]:
import sqlite3
sqlite3.Connection()
# connect to database
conn = sqlite3.connect(db)

cur = conn.cursor()

In [4]:
type(conn)

sqlite3.Connection

### How to get all of the table names in a database

- The container for all tables in a database with sqlite3 is called `sqlite_master` 
- We can find the name of all of the tables in a db using:
```python
table_names = cur.execute("""
SELECT name 
FROM sqlite_master 
WHERE type='table';
""").fetchall()
```

In [5]:
# Get table names
table_names = cur.execute("""
SELECT name 
FROM sqlite_master 
WHERE type='table';
""").fetchall()
table_names

[('orderdetails',),
 ('payments',),
 ('offices',),
 ('customers',),
 ('orders',),
 ('productlines',),
 ('products',),
 ('employees',),
 ('cats',)]

In [6]:
table_names = [x[0] for x in table_names]
table_names

['orderdetails',
 'payments',
 'offices',
 'customers',
 'orders',
 'productlines',
 'products',
 'employees',
 'cats']

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-sql-introduction-online-ds-sp-000/master/images/Database-Schema.png" width=500>

### How to get the column names after executing a query:


In [7]:
df = pd.DataFrame(cur.execute("select * from products").fetchall())
df

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,S10_1678,1969 Harley Davidson Ultimate Chopper,Motorcycles,1:10,Min Lin Diecast,"This replica features working kickstand, front...",7933,48.81,95.70
1,S10_1949,1952 Alpine Renault 1300,Classic Cars,1:10,Classic Metal Creations,Turnable front wheels; steering function; deta...,7305,98.58,214.30
2,S10_2016,1996 Moto Guzzi 1100i,Motorcycles,1:10,Highway 66 Mini Classics,"Official Moto Guzzi logos and insignias, saddl...",6625,68.99,118.94
3,S10_4698,2003 Harley-Davidson Eagle Drag Bike,Motorcycles,1:10,Red Start Diecast,"Model features, official Harley Davidson logos...",5582,91.02,193.66
4,S10_4757,1972 Alfa Romeo GTA,Classic Cars,1:10,Motor City Art Classics,Features include: Turnable front wheels; steer...,3252,85.68,136.00
...,...,...,...,...,...,...,...,...,...
105,S700_3505,The Titanic,Ships,1:700,Carousel DieCast Legends,"Completed model measures 19 1/2 inches long, 9...",1956,51.09,100.17
106,S700_3962,The Queen Mary,Ships,1:700,Welly Diecast Productions,Exact replica. Wood and Metal. Many extras inc...,5088,53.63,99.31
107,S700_4002,American Airlines: MD-11S,Planes,1:700,Second Gear Diecast,Polished finish. Exact replia with official lo...,8820,36.27,74.03
108,S72_1253,Boeing X-32A JSF,Planes,1:72,Motor City Art Classics,"10"" Wingspan with retractable landing gears.Co...",4857,32.77,49.66


- the cursor has a `.description` that contains information about the column names

In [8]:
cur.description

(('productCode', None, None, None, None, None, None),
 ('productName', None, None, None, None, None, None),
 ('productLine', None, None, None, None, None, None),
 ('productScale', None, None, None, None, None, None),
 ('productVendor', None, None, None, None, None, None),
 ('productDescription', None, None, None, None, None, None),
 ('quantityInStock', None, None, None, None, None, None),
 ('buyPrice', None, None, None, None, None, None),
 ('MSRP', None, None, None, None, None, None))

In [9]:
col_names =[col[0] for col in cur.description]
print(col_names)

['productCode', 'productName', 'productLine', 'productScale', 'productVendor', 'productDescription', 'quantityInStock', 'buyPrice', 'MSRP']


In [10]:
df = pd.DataFrame(cur.execute('select * from products').fetchall(),
                  columns=col_names)
df.head()

Unnamed: 0,productCode,productName,productLine,productScale,productVendor,productDescription,quantityInStock,buyPrice,MSRP
0,S10_1678,1969 Harley Davidson Ultimate Chopper,Motorcycles,1:10,Min Lin Diecast,"This replica features working kickstand, front...",7933,48.81,95.7
1,S10_1949,1952 Alpine Renault 1300,Classic Cars,1:10,Classic Metal Creations,Turnable front wheels; steering function; deta...,7305,98.58,214.3
2,S10_2016,1996 Moto Guzzi 1100i,Motorcycles,1:10,Highway 66 Mini Classics,"Official Moto Guzzi logos and insignias, saddl...",6625,68.99,118.94
3,S10_4698,2003 Harley-Davidson Eagle Drag Bike,Motorcycles,1:10,Red Start Diecast,"Model features, official Harley Davidson logos...",5582,91.02,193.66
4,S10_4757,1972 Alfa Romeo GTA,Classic Cars,1:10,Motor City Art Classics,Features include: Turnable front wheels; steer...,3252,85.68,136.0


In [12]:
cur.execute("""PRAGMA table_info(products)""").fetchall()

[(0, 'productCode', '', 0, None, 0),
 (1, 'productName', '', 0, None, 0),
 (2, 'productLine', '', 0, None, 0),
 (3, 'productScale', '', 0, None, 0),
 (4, 'productVendor', '', 0, None, 0),
 (5, 'productDescription', '', 0, None, 0),
 (6, 'quantityInStock', '', 0, None, 0),
 (7, 'buyPrice', '', 0, None, 0),
 (8, 'MSRP', '', 0, None, 0)]

# FILTERING AND ORDERING

- `ORDER BY` - `DESC`/`ASC`
- `LIMIT`
- `BETWEEN`
- `NULL`
- `COUNT`
- `GROUP BY`

In [13]:
query = """select * from products
GROUP BY productLine 
ORDER BY quantityInStock DESC;"""
data = cur.execute(query).fetchall()
col_names =[col[0] for col in cur.description]
df = pd.DataFrame(data,
                  columns=col_names)
df

Unnamed: 0,productCode,productName,productLine,productScale,productVendor,productDescription,quantityInStock,buyPrice,MSRP
0,S18_1342,1937 Lincoln Berline,Vintage Cars,1:18,Motor City Art Classics,"Features opening engine cover, doors, trunk, a...",8693,60.62,102.74
1,S10_1678,1969 Harley Davidson Ultimate Chopper,Motorcycles,1:10,Min Lin Diecast,"This replica features working kickstand, front...",7933,48.81,95.7
2,S10_1949,1952 Alpine Renault 1300,Classic Cars,1:10,Classic Metal Creations,Turnable front wheels; steering function; deta...,7305,98.58,214.3
3,S18_3259,Collectable Wooden Train,Trains,1:18,Carousel DieCast Legends,Hand crafted wooden toy train set is in about ...,6450,67.56,100.84
4,S18_1662,1980s Black Hawk Helicopter,Planes,1:18,Red Start Diecast,1:18 scale replica of actual Army's UH-60L BLA...,5330,77.27,157.69
5,S18_3029,1999 Yamaha Speed Boat,Ships,1:18,Min Lin Diecast,Exact replica. Wood and Metal. Many extras inc...,4259,51.61,86.02
6,S12_1666,1958 Setra Bus,Trucks and Buses,1:12,Welly Diecast Productions,"Model features 30 windows, skylights & glare r...",1579,77.9,136.67


## GROUPING DATA WITH SQL

- Like we do with Pandas, we can use GROUP BY statements in SQL and then apply **aggregate functions:**
    - `COUNT`
    - `MAX`
    - `MIN`
    - `SUM`
    - `AVG`

In [14]:
cur.execute("""SELECT city, COUNT(employeeNumber)
FROM offices 
JOIN employees
USING(officeCode)
GROUP BY city
ORDER BY count(employeeNumber) DESC;""")

df = pd.DataFrame(cur.fetchall())

df.columns = [x[0] for x in cur.description]

df.head()

Unnamed: 0,city,COUNT(employeeNumber)
0,San Francisco,6
1,Paris,5
2,Sydney,4
3,Boston,2
4,London,2


## ALIASING

- can assign a temporary name to data being imported
- Useful for `JOIN`,`GROUP BY`, and aggregates.

In [15]:
cur.execute("""SELECT city, COUNT(employeeNumber) AS numEmployees
               FROM offices
               JOIN employees
               USING(officeCode)
               GROUP BY 1
               ORDER BY numEmployees DESC;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df.head()

Unnamed: 0,city,numEmployees
0,San Francisco,6
1,Paris,5
2,Sydney,4
3,Boston,2
4,London,2


In [16]:
cur.execute("""SELECT customerName,
               COUNT(customerName) AS number_purchases,
               MIN(amount) AS min_purchase,
               MAX(amount) AS max_purchase,
               AVG(amount) AS avg_purchase,
               SUM(amount) AS total_spent
               FROM customers
               JOIN payments
               USING(customerNumber)
               GROUP BY 1
               ORDER BY SUM(amount) DESC;""")
df = pd.DataFrame(cur.fetchall())
df. columns = [i[0] for i in cur.description]
print(len(df))
df.head()

98


Unnamed: 0,customerName,number_purchases,min_purchase,max_purchase,avg_purchase,total_spent
0,Euro+ Shopping Channel,13,116208.4,65071.26,55056.844615,715738.98
1,Mini Gifts Distributors Ltd.,9,101244.59,85410.87,64909.804444,584188.24
2,"Australian Collectors, Co.",4,44894.74,82261.22,45146.2675,180585.07
3,Muscle Machine Inc,4,20314.44,58841.35,44478.4875,177913.95
4,"Dragon Souveniers, Ltd.",4,105743.0,44380.15,39062.7575,156251.03


## The `WHERE` Clause

In general, the `WHERE` clause filters query results by some condition. As you are starting to see, you can also combine multiple conditions.

- 
```python
cur.execute("""SELECT * FROM customers WHERE city = 'Boston' OR city = 'Madrid';""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df
```


- To refine your searches, you can add `ORDER BY` and `LIMIT` clauses. 
    - The order by clause allows you to sort the results by a particular feature.
- Finally, the limit clause is typically the last argument in a SQL query and simply limits the output to a set number of results.



In [17]:
cur.execute("""SELECT * FROM customers WHERE city = 'Boston' OR city = 'Madrid';""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df

Unnamed: 0,customerNumber,customerName,contactLastName,contactFirstName,phone,addressLine1,addressLine2,city,state,postalCode,country,salesRepEmployeeNumber,creditLimit
0,141,Euro+ Shopping Channel,Freyre,Diego,(91) 555 94 44,"C/ Moralzarzal, 86",,Madrid,,28034,Spain,1370.0,227600.0
1,237,ANG Resellers,Camino,Alejandra,(91) 745 6555,"Gran Vía, 1",,Madrid,,28001,Spain,,0.0
2,344,CAF Imports,Fernandez,Jesus,+34 913 728 555,Merchants House,27-30 Merchant's Quay,Madrid,,28023,Spain,1702.0,59600.0
3,362,Gifts4AllAges.com,Yoshido,Juri,6175559555,8616 Spinnaker Dr.,,Boston,MA,51003,USA,1216.0,41900.0
4,458,"Corrida Auto Replicas, Ltd",Sommer,Martín,(91) 555 22 82,"C/ Araquil, 67",,Madrid,,28023,Spain,1702.0,104600.0
5,465,"Anton Designs, Ltd.",Anton,Carmen,+34 913 728555,"c/ Gobelas, 19-1 Urb. La Florida",,Madrid,,28023,Spain,,0.0
6,495,Diecast Collectables,Franco,Valarie,6175552555,6251 Ingle Ln.,,Boston,MA,51003,USA,1188.0,85100.0


## The `HAVING` clause

 The `HAVING` clause works similarly to the `WHERE` clause, except it is used to filter data selections on conditions **after** the `GROUP BY` clause.

In [27]:
cur.execute("""SELECT city, COUNT(customerNumber) AS number_customers
               FROM customers
               GROUP BY 1
               HAVING COUNT(customerNumber)>=5;""")
df = pd.DataFrame(cur.fetchall())
df. columns = [i[0] for i in cur.description]
print(len(df))
df.head()

2


Unnamed: 0,city,number_customers
0,Madrid,5
1,NYC,5


## Combining `WHERE` and `HAVING`

We can also use the `WHERE` and `HAVING` clauses in conjunction with each other for more complex rules.

- For example, let's say we want a list of customers who have made at least 3 purchases of over 50K each.

In [28]:
cur.execute("""SELECT customerName,
               COUNT(amount) AS number_purchases_over_50K
               FROM customers
               JOIN payments
               USING(customerNumber)
               WHERE amount >= 50000
               GROUP BY 1
               HAVING count(amount) >= 3
               ORDER BY count(amount) DESC;""")
df = pd.DataFrame(cur.fetchall())
df. columns = [i[0] for i in cur.description]
print(len(df))
df.head()

53


Unnamed: 0,customerName,number_purchases_over_50K
0,Euro+ Shopping Channel,13
1,Mini Gifts Distributors Ltd.,9
2,"Anna's Decorations, Ltd",4
3,"Australian Collectors, Co.",4
4,Baane Mini Imports,4


In [30]:
cur.execute("""SELECT customerName,
               COUNT(amount) AS number_purchases_over_50K
               FROM customers
               JOIN payments
               USING(customerNumber)
               WHERE amount >= 50000
               GROUP BY 1
               HAVING number_purchases_over_50K >= 3
               ORDER BY number_purchases_over_50K DESC;""")
df = pd.DataFrame(cur.fetchall())
df. columns = [i[0] for i in cur.description]
print(len(df))
df.head()

53


Unnamed: 0,customerName,number_purchases_over_50K
0,Euro+ Shopping Channel,13
1,Mini Gifts Distributors Ltd.,9
2,"Anna's Decorations, Ltd",4
3,"Australian Collectors, Co.",4
4,Baane Mini Imports,4


## `JOIN` STATEMENTS

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-sql-introduction-online-ds-sp-000/master/images/Database-Schema.png">

### Task: Displaying product details along with order details

Let's say you need to generate some report that includes details about products from orders. To do that, we would need to take data from multiple tables in a single statement. 

In [None]:
# import sqlite3
# import pandas as pd
# conn = sqlite3.connect('/content/drive/My Drive/Datasets/data.sqlite')
# cur = conn.cursor()

In [None]:
cur.execute("""SELECT * 
               FROM orderdetails
               JOIN products
               ON orderdetails.productCode = products.productCode
               LIMIT 10;
               """)
df = pd.DataFrame(cur.fetchall()) #Take results and create dataframe
df.columns = [i[0] for i in cur.description]
df.head()

# TYPES OF JOINS

- Joins may be:
    - INNER (default)
    - OUTER
    - LEFT 
    - RIGHT

<img src="https://raw.githubusercontent.com/learn-co-students/dsc-join-statements-online-ds-sp-000/master/images/venn.png">


## Primary vs Foreign Keys
- primary key:
- forgein key:


## The `USING` clause

- If the column name is identical,you can use  is the `USING` clause. 
- Rather then saying on `tableA.column = tableB.column` we can simply say `using(column)`. 
- Only works if the column is **identically named** for both tables.

### One-to-One, One-to-many, many-to-many Joins


- **Let's say we have databases A and B**


- **One-to-One joins:**
    - There is only 1 entry in database B that aligns with each individual entry in database A
    - e.g. A person and their social security number.
    
    
- **One-to-Many join:**
    - There are multiple entries in database B that match the entry in database A
    - e.g. Joining an order number from db A with the individual products in db B.
    
    
- **Many-to-many joins:**
    - There are multiple entries in database A that match multiple entries in database B.
    - e.g. A = classes at a college, B = students.

## SQL Subqueries

```python
cur.execute("""SELECT lastName, firstName, officeCode
               FROM employees
               WHERE officeCode IN (SELECT officeCode
                                    FROM offices 
                                    WHERE country = "USA");
                                    """)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df
```

# Pandas + SQL

## SQL QUERY IN PANDAS - `df.query()

- Pandas DataFrames have a method called `.query()`
- This allows us to use SQL-like commands to reference data.
```python
## Normal Pandas Syntax
foo_df = bar_df.loc[bar_df['Col_1']>bar_df['Col_2']]
```

```python
## Using .query()
foo_df = bar_df.query("Col_1 > Col_2")
```
- How to use:
    - Enter the querty as a single string, using just column names to reference data.
    - To use and/or statements, use `&` and `|`, respectively

```python
foo_df = bar_df.query("Col_1 > Col_2 & Col_2 <= Col_3")
```

## Using SQL syntax with `pandasql`


- There is a library is called [pandasql](https://pypi.org/project/pandasql/) that allows for sql queries with pandas

We can install `pandasql` using the bash command `pip install pandasql`.

### Importing pandasql

In order to use `pandasql`, we need to start by importing a `sqldf` object from `pandasql`

```python
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
```

### Writing Queries
```python
q = """SELECT
        m.date, m.beef, b.births
     FROM
        meats m
     INNER JOIN
        births b
           ON m.date = b.date;"""

results = pysqldf(q)

```

## SQL Data Types

- Data types in SQLite3:
    - https://www.sqlite.org/datatype3.html
    
- Data types:
    - TEXT
    - INTEGER
    - REAL
    - BLOB
    - NULL

## DATABASE ADMIN 101

- `CREATE TABLE table_name`
    -  Must include (col_name, datatype, and if its the key)
```PYTHON
  cur.execute("""CREATE TABLE cats (
    id INTEGER PRIMARY KEY,
    name TEXT,
    age INTEGER,
    breed TEXT)
    """)  
    ```
- `DROP TABLE table_name`
    - `DROP TABLE IF EXISTS table_name`
- `INSERT INTO table_name`
    - list the columns to fill in first and then the VALUES()   
```python
cur.execute('''INSERT INTO cats (name, age, breed) 
                  VALUES ('Maru', 3, 'Scottish Fold');
            ''')
```
- To add multiple:
```
cur.execute('''INSERT INTO cats (name, age, breed) 
            VALUES (?, ?, ?);
      ''',(dict_cats))
```
- `UPDATE`

```python
for dct in contacts:
    fname = dct['firstName']
    lname = dct['lastName']
    role = dct['role']
    phone = dct['telephone ']
    street = dct['street']
    city = dct['city']
    state = dct['state']
    z = dct['zipcode ']

    cur.execute('''INSERT INTO contactInfo (firstname, lastname, role, telephone, street, city, state, zipcode)
VALUES (?,?,?,?,?,?,?,?)''', (fname, lname, role, phone, street, city, state, z))```

In [None]:
# cur.execute("""CREATE TABLE cats (
#     id INTEGER PRIMARY KEY,
#     name TEXT,
#     age INTEGER,
#     breed TEXT)
#     """)

# Breakout Group Activity:  
- Survive on sql-island
    - https://sql-island.informatik.uni-kl.de/
- 3 min walk through together before breakout rooms

## .query Question

In [31]:
import fsds as fs

In [34]:
df =fs.datasets.load_titanic(read_csv_kwds={'index_col':0})

In [44]:
women_and_children_df = df.query('Sex == "female" | Age <=15')
women_and_children_df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
880,881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25.0,0,1,230433,26.0000,,S
882,883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22.0,0,0,7552,10.5167,,S
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29.1250,,Q
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S


In [35]:
adult_males_df = df.query('Sex == "Male" | Age > 15')

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,?,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C
