## DA 320 

| Key         | Value |
| ----------- | ----------- |
| Assignment  | Basics of Loading Data  |
| Author   | Ted Spence        |
| Date   | 2022-10-15        |

This example notebook contains tutorials on how to load data in Jupyter.

You can use the header markdown segment of this page as an example of markdown tables.

***
# Read data from a CSV file
***

In [17]:
import pandas

# Read this file into memory
metacritic = pandas.read_csv("f:\\git\\da320\\resources\\imdb.csv")

# Show a sample of a few columns of data
metacritic.head()

Unnamed: 0,id,title,runtime,user_rating,votes,mpaa_rating,release_date,budget,opening_weekend,gross_sales,genres,cast,director,producer,company
0,77631,Grease,110.0,7.2,265183,TV-14::(D),6/13/1978,6000000.0,60759.0,394955690.0,"Comedy, Musical, Romance","John Travolta, Olivia Newton-John, Stockard Ch...",Randal Kleiser,"Allan Carr, Neil A. Machlis, Robert Stigwood","Paramount Pictures, Robert Stigwood Organizati..."
1,78346,Superman,143.0,7.4,172769,TV-PG::(LV),12/10/1978,55000000.0,7465343.0,166200000.0,"Action, Adventure, Sci-Fi","Marlon Brando, Gene Hackman, Christopher Reeve...",Richard Donner,"Charles Greenlaw, Richard Lester, Alexander Sa...","Dovemead Films, Film Export A.G., Internationa..."
2,77416,The Deer Hunter,183.0,8.1,334827,R,12/8/1978,15000000.0,,,"Drama, War","Robert De Niro, John Cazale, John Savage, Chri...",Michael Cimino,"Joann Carelli, Michael Cimino, Michael Deeley,...","EMI Films, Universal Pictures"
3,77651,Halloween,91.0,7.7,267109,TV-14,10/25/1978,300000.0,,70000000.0,"Horror, Thriller","Donald Pleasence, Jamie Lee Curtis, Nancy Kyes...",John Carpenter,"Moustapha Akkad, John Carpenter, Debra Hill, K...","Compass International Pictures, Falcon Interna..."
4,77975,National Lampoon's Animal House,109.0,7.4,119917,"TV-14::(DLSV, TV Rating.)",7/27/1978,3000000.0,201747.0,3371006.0,Comedy,"Tom Hulce, Stephen Furst, Mark Metcalf, Mary L...",John Landis,"Ivan Reitman, Matty Simmons","Universal Pictures, Oregon Film Factory, Stage..."


***
# Retrieve connection strings, passwords, or secrets from a JSON file 
***

In [18]:
import json

# Demonstration of how to load a file that contains secrets without accidentally leaking those secrets
with open('f:\\git\\teds-secrets.json') as f:
    data = json.load(f)

    # If you want your data to be secure, don't print this variable out!
    # Jupyter will retain a cached version of any printed data and it can be
    # accidentally committed to version control.
    secret_key = data['my-secret-key']

# We can safely print the length of the secret key. That won't leak any sensitive information.
print(f"My secret key is {len(secret_key)} characters in length.")

My secret key is 56 characters in length.


***
# Connect to a MongoDB Server
***

In [19]:
import pymongo
import certifi

# Once you have retrieved your connection string from a secrets file, use it here
mongo_connection_string = data['mongo-connection-string']

# Connect to the database using known good certificates
client = pymongo.MongoClient(mongo_connection_string, tlsCAFile=certifi.where())

# Fetch the database named "DA320"
da320_database = client['DA320']

# Within the database we have "collections". Think of them as tables in SQL.
allCollections = da320_database.list_collection_names()

# Here is the list of collections within my database
print(f"Using MongoDB version {client.server_info()['version']}.")
print(f"This database has the collections {allCollections}")


Using MongoDB version 5.0.13.
This database has the collections ['Metacritic', 'IMDB']


***
# Fetch a collection from a MongoDB Server
***

In [20]:
import pandas as pd

# Retrieve all records from a collection - this can be a large amount of data!
cursor = da320_database["Metacritic"].find()

# Convert this information into a Pandas dataframe
metacritic = pd.DataFrame(cursor)

# Make sure we've read the information correctly
metacritic.head()

Unnamed: 0,_id,movie_id,title,release_date,description,score,thumbnail
0,6348c9f8c7a133e93876deeb,11234,"Crouching Tiger, Hidden Dragon","December 8, 2000","In 19th century China, a magical sword given b...",94,https://static.metacritic.com/images/products/...
1,6348c9f8c7a133e93876deec,11235,Yi Yi,"October 6, 2000",This film portrays life through portraits of t...,93,https://static.metacritic.com/images/products/...
2,6348c9f8c7a133e93876deed,11236,Beau Travail,"March 31, 2000",The soldiers of a small French Foreign Legion ...,91,https://static.metacritic.com/images/products/...
3,6348c9f8c7a133e93876deee,11237,Almost Famous,"September 13, 2000","In the 1970's, a high school boy (Fugit) is gi...",90,https://static.metacritic.com/images/products/...
4,6348c9f8c7a133e93876deef,11238,Chicken Run,"June 21, 2000",A claymation spoof of classic prison-camp flic...,88,https://static.metacritic.com/images/products/...


***
# Query a subset of data from a MongoDB Server
***

In [21]:
import pandas as pd

# Define your query in MongoDB Compass, on the website, or using any other available MongoDB program
query = { "description": { "$regex": "(China|Chinese)" } }

# Execute this query and produce a cursor
cursor = da320_database["Metacritic"].find(query)

# Convert this information into a Pandas dataframe
metacritic = pd.DataFrame(cursor)

# Make sure we've read the information correctly
metacritic.head()

Unnamed: 0,_id,movie_id,title,release_date,description,score,thumbnail
0,6348c9f8c7a133e93876deeb,11234,"Crouching Tiger, Hidden Dragon","December 8, 2000","In 19th century China, a magical sword given b...",94,https://static.metacritic.com/images/products/...
1,6348c9f8c7a133e93876df27,11294,Shower,"July 7, 2000",Set in and around a traditional Chinese commun...,74,https://static.metacritic.com/images/products/...
2,6348c9f8c7a133e93876dfb4,11435,Romeo Must Die,"March 22, 2000","Loosely based on Shakespeare's ""Romeo and Juli...",52,https://static.metacritic.com/images/products/...
3,6348c9f8c7a133e93876dff9,11504,Restless,"November 10, 2000",A young American woman (Kellner) and her Chine...,39,https://static.metacritic.com/images/products/...
4,6348c9f8c7a133e93876e023,11546,The Art of War,"August 25, 2000","Accused of killing the Chinese Ambassador, a s...",30,https://static.metacritic.com/images/products/...


***
# Display images in a Pandas dataframe grid
***