# Querying [tmdb](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata) movie information from aperturedb.

This notebook will work on an instance of ApertureDB, which can be on the [cloud](https://cloud.aperturedata.io), or running as a [local docker container(s)](https://docs.aperturedata.io/Setup/server/Local)

The dataset is hosted on kaggle, and available via a mlcroissant link.


In [1]:
%pip install --quiet mlcroissant pandas dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Import all the modules needed

In [2]:
import pandas as pd
from IPython.display import display


from aperturedb.CommonLibrary import (
    execute_query,
    create_connector
)
from aperturedb.Utils import Utils


In [3]:
client=create_connector()
utils = Utils(client)
utils.summary()


Database: garfield
Version: 0.18.1
Status:  0
Info:    OK
------------------ Entities -----------------
Total entities types:    2
Movie               
  Total elements: 4803
    Number   | budget      |      4803 (100%)
I   String   | id          |      4803 (100%)
  ! Number   | movie_id    |      4803 (100%)
    String   | overview    |      4803 (100%)
    Number   | popularity  |      4803 (100%)
    String   | title       |      4803 (100%)
Professional        
  Total elements: 104842
    Number   | gender  |    104842 (100%)
I   Number   | id      |    104842 (100%)
    String   | name    |    104842 (100%)
---------------- Connections ----------------
Total connections types: 2
CAST                
  Movie ====> Professional
  Total elements: 106257
I   Number   | cast_id    |    106257 (100%)
    String   | character  |    106257 (100%)
CREW                
  Movie ====> Professional
  Total elements: 129581
  ! String   | credit_id   |    129581 (100%)
    String   | departm

## Query time!
### Find all the movies where Tom Hanks as been a part of 

In [4]:
q = [
    {
        "FindEntity": {
            "_ref": 1,
            "with_class": "PROFESSIONAL",
            "constraints": {
                "name": ["==", "Tom Hanks"]
            },
            "results": {
                "all_properties": True
            }
        }
    },
    {
        "FindEntity": {
            "_ref": 2,
            "is_connected_to": {
                "ref": 1
            },
            "with_class": "MOVIE",
            "results": {
                # "list": ["id", "title"]
                "all_properties": True
            }
        }
    }
]

_, response, _ = execute_query(client, q)

display(pd.json_normalize(response[0]["FindEntity"]["entities"]))
display(pd.json_normalize(response[1]["FindEntity"]["entities"]))

movie_ids = [e["movie_id"] for e in response[1]["FindEntity"]["entities"]]
display(movie_ids)


Unnamed: 0,_uniqueid,gender,id,name
0,12.5826.53800,2,31,Tom Hanks


Unnamed: 0,_uniqueid,budget,id,movie_id,overview,popularity,title
0,11.179.53800,26000000,9800,9800,No one would take his case until one man was w...,44.301745,Philadelphia
1,11.529.53808,40000000,64685,64685,"A year after his father's death, Oskar, a trou...",31.066874,Extremely Loud & Incredibly Close
2,11.627.53810,70000000,857,857,"As U.S. troops storm the beaches of Normandy, ...",76.041867,Saving Private Ryan
3,11.634.53810,52000000,568,568,The true story of technical troubles that scut...,68.140214,Apollo 13
4,11.637.53810,65000000,9489,9489,"Book superstore magnate, Joe Fox and independe...",28.540267,You've Got Mail
5,11.689.53810,60000000,497,497,A supernatural tale set on death row in a Sout...,103.698022,The Green Mile
6,11.742.53812,200000000,10193,10193,"Woody, Buzz, and the rest of Andy's toys haven...",59.995418,Toy Story 3
7,11.773.53812,175000000,2698,2698,God contacts Congressman Evan Baxter and tells...,27.082182,Evan Almighty
8,11.790.53812,165000000,5255,5255,When a doubting young boy takes an extraordina...,47.323228,The Polar Express
9,11.1059.53818,1000000,13508,13508,"In 1996, electric cars began to appear on road...",5.323184,Who Killed the Electric Car?


[9800,
 64685,
 857,
 568,
 9489,
 497,
 10193,
 2698,
 5255,
 13508,
 13600,
 140823,
 5516,
 2619,
 20763,
 13448,
 13,
 109424,
 640,
 11631,
 862,
 9591,
 591,
 16523,
 59861,
 8346,
 83542,
 863,
 8358,
 2280,
 302688,
 9906,
 4147,
 11287,
 296098,
 594,
 6538,
 35,
 920]

### Get more info.

This response from cast and movies entities still misses the character information, because it's been encoded on the properties on connection between the 2. Let's merge that info in and get more richer details about the movies Tom Hanks has been a part of.

In [5]:
professional = pd.json_normalize(response[1]["FindEntity"]["entities"])

professional_details = []
for p in response[0]["FindEntity"]["entities"]:
    src = p["_uniqueid"]
    for m in response[1]["FindEntity"]["entities"]:
        dst = m["_uniqueid"]
        q = [{
            "FindEntity": {
                "_ref": 1,
                "with_class": "PROFESSIONAL",
                "constraints": {
                    "_uniqueid": ["==", src]
                },
                "results": {
                    "all_properties": True
                }
            }
        },
        {
            "FindEntity": {
                "_ref": 2,
                "is_connected_to": {
                    "ref": 1
                },
                "with_class": "MOVIE",
                "constraints": {
                    "_uniqueid": ["==", dst]
                },
                "results": {
                    "all_properties": True
                }
            }
        },{
            "FindConnection": {
                "src": 2,
                "dst": 1,
                "results": {
                    "all_properties": True
                }
            }
        }]
        _, responsec, _ = execute_query(client, q)
        # print(f"{response=}")

        if responsec[2]["FindConnection"]["returned"] > 0:
            c = responsec[2]["FindConnection"]["connections"][0]
            # print(f"{p['name']} has acted in {m['title']} as {c['character']}")
            # print(f"{p['name']} has contributed in {m['title']}")
            if "character" in c:
                professional_details.append(f"as character: {c['character']}")
            else:
                professional_details.append(f"as {c['job']} in {c['department']}")
            # display(pd.json_normalize(response[3]["FindConnection"]["connections"]))
display(len(professional_details))
professional['details'] = professional_details

display(professional)

39

Unnamed: 0,_uniqueid,budget,id,movie_id,overview,popularity,title,details
0,11.179.53800,26000000,9800,9800,No one would take his case until one man was w...,44.301745,Philadelphia,as character: Andrew Beckett
1,11.529.53808,40000000,64685,64685,"A year after his father's death, Oskar, a trou...",31.066874,Extremely Loud & Incredibly Close,as character: Thomas Schell
2,11.627.53810,70000000,857,857,"As U.S. troops storm the beaches of Normandy, ...",76.041867,Saving Private Ryan,as character: Captain John H. Miller
3,11.634.53810,52000000,568,568,The true story of technical troubles that scut...,68.140214,Apollo 13,as character: Jim Lovell
4,11.637.53810,65000000,9489,9489,"Book superstore magnate, Joe Fox and independe...",28.540267,You've Got Mail,as character: Joe Fox
5,11.689.53810,60000000,497,497,A supernatural tale set on death row in a Sout...,103.698022,The Green Mile,as character: Paul Edgecomb
6,11.742.53812,200000000,10193,10193,"Woody, Buzz, and the rest of Andy's toys haven...",59.995418,Toy Story 3,as character: Woody (voice)
7,11.773.53812,175000000,2698,2698,God contacts Congressman Evan Baxter and tells...,27.082182,Evan Almighty,as Executive Producer in Production
8,11.790.53812,165000000,5255,5255,When a doubting young boy takes an extraordina...,47.323228,The Polar Express,as character: Hero Boy / Father / Conductor / ...
9,11.1059.53818,1000000,13508,13508,"In 1996, electric cars began to appear on road...",5.323184,Who Killed the Electric Car?,as character: Himself


### Find 2 cast poeple. Find the movies in which they both appear (Logical AND)

Here we search for Tom Hanks and Meg Ryan. The All in the Find Connected entity means that find a set which is connected to both the cast people. 

In [6]:
from aperturedb.CommonLibrary import execute_query

q = [
    {
        "FindEntity": {
            "_ref": 1,
            "with_class": "PROFESSIONAL",
            "constraints":{
                "name": ["in", ["Tom Hanks"]]
            },
            "results": {
                "all_properties": True
                # "list": ["name", "_uniqueid"]
            }
        }
    },
    {
        "FindEntity": {
            "_ref": 2,
            "with_class": "PROFESSIONAL",
            "constraints":{
                "name": ["in", [ "Meg Ryan"]]
            },
            "results": {
                "all_properties": True
                # "list": ["name", "_uniqueid"]
            }
        }
    },
    {
        "FindEntity": {
            "is_connected_to": {
                "all": [
                {"ref": 1},
                {"ref": 2}
                ]
            },
            "with_class": "MOVIE",
            "results": {
                # "list": ["id", "title"],
                # "group_by_source": True
                "all_properties": True
            }
        }
    }
]

_, response, _ = execute_query(client, q)

pd.json_normalize(response[2]["FindEntity"]["entities"])


Unnamed: 0,_uniqueid,budget,id,movie_id,overview,popularity,title
0,11.637.53810,65000000,9489,9489,"Book superstore magnate, Joe Fox and independe...",28.540267,You've Got Mail
