# Querying [tmdb](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata) movie information from aperturedb.

This notebook will work on an instance of ApertureDB, which can be on the [cloud](https://cloud.aperturedata.io), or running as a [local docker container(s)](https://docs.aperturedata.io/Setup/server/Local)

The dataset is hosted on kaggle, and available via a mlcroissant link.


In [1]:
%pip install --quiet mlcroissant pandas dotenv

Note: you may need to restart the kernel to use updated packages.


## Import all the modules needed

In [2]:
import pandas as pd
from IPython.display import display


from aperturedb.CommonLibrary import (
    execute_query,
    create_connector
)
from aperturedb.Utils import Utils


In [3]:
client=create_connector()
utils = Utils(client)
utils.summary()


Database: graph-examples-ulrn7q84.farm0000.cloud.aperturedata.io
Version: 0.18.9
Status:  0
Info:    OK
------------------ Entities -----------------
Total entities types:    6
GENRE               
  Total elements: 20
I   Number   | id        |        20 (100%)
    String   | label     |        20 (100%)
    String   | name      |        20 (100%)
  ! String   | uniqueid  |        20 (100%)
KEYWORD             
  Total elements: 9813
I   Number   | id        |      9813 (100%)
    String   | label     |      9813 (100%)
    String   | name      |      9813 (100%)
  ! String   | uniqueid  |      9813 (100%)
MOVIE               
  Total elements: 4803
    Number   | budget        |      4803 (100%)
I   String   | id            |      4803 (100%)
    String   | label         |      4803 (100%)
  ! Number   | movie_id      |      4803 (100%)
    String   | overview      |      4803 (100%)
    Number   | popularity    |      4803 (100%)
    String   | tagline       |      4803 (100%)
    S

## Query time!
### Find all the movies where Tom Hanks as been a part of 

In [4]:
q = [
    {
        "FindEntity": {
            "_ref": 1,
            "with_class": "PROFESSIONAL",
            "constraints": {
                "name": ["==", "Tom Hanks"]
            },
            "results": {
                "all_properties": True
            }
        }
    },
    {
        "FindEntity": {
            "_ref": 2,
            "is_connected_to": {
                "ref": 1
            },
            "with_class": "MOVIE",
            "results": {
                # "list": ["id", "title"]
                "all_properties": True
            }
        }
    }
]

_, response, _ = execute_query(client, q)

display(pd.json_normalize(response[0]["FindEntity"]["entities"]))
display(pd.json_normalize(response[1]["FindEntity"]["entities"]))

movie_ids = [e["movie_id"] for e in response[1]["FindEntity"]["entities"]]
display(movie_ids)


Unnamed: 0,_uniqueid,gender,id,label,name,uniqueid
0,8.11428.742,2,31,PROFESSIONAL,Tom Hanks,Tom hanks


Unnamed: 0,_uniqueid,budget,id,label,movie_id,overview,popularity,tagline,title,uniqueid,vote_average,vote_count
0,7.329.742,40000000,64685,MOVIE,64685,"A year after his father's death, Oskar, a trou...",31.066874,"This is not a story about September 11th, it's...",Extremely Loud & Incredibly Close,Extremely loud & incredibly close,6.9,708
1,7.579.746,26000000,9800,MOVIE,9800,No one would take his case until one man was w...,44.301745,No one would take on his case... until one man...,Philadelphia,Philadelphia,7.6,988
2,7.759.750,1000000,13508,MOVIE,13508,"In 1996, electric cars began to appear on road...",5.323184,A lack of consumer confidence... or conspiracy?,Who Killed the Electric Car?,Who killed the electric car?,7.2,59
3,7.827.752,70000000,857,MOVIE,857,"As U.S. troops storm the beaches of Normandy, ...",76.041867,The mission is a man.,Saving Private Ryan,Saving private ryan,7.9,5048
4,7.834.752,52000000,568,MOVIE,568,The true story of technical troubles that scut...,68.140214,"Houston, we have a problem.",Apollo 13,Apollo 13,7.3,1599
5,7.837.752,65000000,9489,MOVIE,9489,"Book superstore magnate, Joe Fox and independe...",28.540267,Someone you pass on the street may already be ...,You've Got Mail,You've got mail,6.3,838
6,7.889.752,60000000,497,MOVIE,497,A supernatural tale set on death row in a Sout...,103.698022,Miracles do happen.,The Green Mile,The green mile,8.2,4048
7,7.942.754,200000000,10193,MOVIE,10193,"Woody, Buzz, and the rest of Andy's toys haven...",59.995418,No toy gets left behind.,Toy Story 3,Toy story 3,7.6,4597
8,7.973.754,175000000,2698,MOVIE,2698,God contacts Congressman Evan Baxter and tells...,27.082182,A comedy of biblical proportions,Evan Almighty,Evan almighty,5.3,1151
9,7.990.754,165000000,5255,MOVIE,5255,When a doubting young boy takes an extraordina...,47.323228,This holiday season... believe.,The Polar Express,The polar express,6.4,1474


[64685,
 9800,
 13508,
 857,
 568,
 9489,
 497,
 10193,
 2698,
 5255,
 13600,
 140823,
 5516,
 2619,
 20763,
 13448,
 13,
 109424,
 640,
 11631,
 9591,
 862,
 591,
 16523,
 59861,
 83542,
 863,
 8358,
 2280,
 302688,
 8346,
 9906,
 4147,
 11287,
 296098,
 594,
 6538,
 35,
 920]

### Get more info.

This response from cast and movies entities still misses the character information, because it's been encoded on the properties on connection between the 2. Let's merge that info in and get more richer details about the movies Tom Hanks has been a part of.

In [5]:
professional = pd.json_normalize(response[1]["FindEntity"]["entities"])

professional_details = []
for p in response[0]["FindEntity"]["entities"]:
    src = p["_uniqueid"]
    for m in response[1]["FindEntity"]["entities"]:
        dst = m["_uniqueid"]
        q = [{
            "FindEntity": {
                "_ref": 1,
                "with_class": "PROFESSIONAL",
                "constraints": {
                    "_uniqueid": ["==", src]
                },
                "results": {
                    "all_properties": True
                }
            }
        },
        {
            "FindEntity": {
                "_ref": 2,
                "is_connected_to": {
                    "ref": 1
                },
                "with_class": "MOVIE",
                "constraints": {
                    "_uniqueid": ["==", dst]
                },
                "results": {
                    "all_properties": True
                }
            }
        },{
            "FindConnection": {
                "src": 2,
                "dst": 1,
                "results": {
                    "all_properties": True
                }
            }
        }]
        _, responsec, _ = execute_query(client, q)
        # print(f"{response=}")

        if responsec[2]["FindConnection"]["returned"] > 0:
            c = responsec[2]["FindConnection"]["connections"][0]
            # print(f"{p['name']} has acted in {m['title']} as {c['character']}")
            # print(f"{p['name']} has contributed in {m['title']}")
            if "character" in c:
                professional_details.append(f"as character: {c['character']}")
            else:
                professional_details.append(f"as {c['job']} in {c['department']}")
            # display(pd.json_normalize(response[3]["FindConnection"]["connections"]))
display(len(professional_details))
professional['details'] = professional_details

display(professional)

39

Unnamed: 0,_uniqueid,budget,id,label,movie_id,overview,popularity,tagline,title,uniqueid,vote_average,vote_count,details
0,7.329.742,40000000,64685,MOVIE,64685,"A year after his father's death, Oskar, a trou...",31.066874,"This is not a story about September 11th, it's...",Extremely Loud & Incredibly Close,Extremely loud & incredibly close,6.9,708,as character: Thomas Schell
1,7.579.746,26000000,9800,MOVIE,9800,No one would take his case until one man was w...,44.301745,No one would take on his case... until one man...,Philadelphia,Philadelphia,7.6,988,as character: Andrew Beckett
2,7.759.750,1000000,13508,MOVIE,13508,"In 1996, electric cars began to appear on road...",5.323184,A lack of consumer confidence... or conspiracy?,Who Killed the Electric Car?,Who killed the electric car?,7.2,59,as character: Himself
3,7.827.752,70000000,857,MOVIE,857,"As U.S. troops storm the beaches of Normandy, ...",76.041867,The mission is a man.,Saving Private Ryan,Saving private ryan,7.9,5048,as character: Captain John H. Miller
4,7.834.752,52000000,568,MOVIE,568,The true story of technical troubles that scut...,68.140214,"Houston, we have a problem.",Apollo 13,Apollo 13,7.3,1599,as character: Jim Lovell
5,7.837.752,65000000,9489,MOVIE,9489,"Book superstore magnate, Joe Fox and independe...",28.540267,Someone you pass on the street may already be ...,You've Got Mail,You've got mail,6.3,838,as character: Joe Fox
6,7.889.752,60000000,497,MOVIE,497,A supernatural tale set on death row in a Sout...,103.698022,Miracles do happen.,The Green Mile,The green mile,8.2,4048,as character: Paul Edgecomb
7,7.942.754,200000000,10193,MOVIE,10193,"Woody, Buzz, and the rest of Andy's toys haven...",59.995418,No toy gets left behind.,Toy Story 3,Toy story 3,7.6,4597,as character: Woody (voice)
8,7.973.754,175000000,2698,MOVIE,2698,God contacts Congressman Evan Baxter and tells...,27.082182,A comedy of biblical proportions,Evan Almighty,Evan almighty,5.3,1151,as Executive Producer in Production
9,7.990.754,165000000,5255,MOVIE,5255,When a doubting young boy takes an extraordina...,47.323228,This holiday season... believe.,The Polar Express,The polar express,6.4,1474,as character: Hero Boy / Father / Conductor / ...


### Find 2 cast poeple. Find the movies in which they both appear (Logical AND)

Here we search for Tom Hanks and Meg Ryan. The All in the Find Connected entity means that find a set which is connected to both the cast people. 

In [6]:
from aperturedb.CommonLibrary import execute_query

q = [
    {
        "FindEntity": {
            "_ref": 1,
            "with_class": "PROFESSIONAL",
            "constraints":{
                "name": ["in", ["Tom Hanks"]]
            },
            "results": {
                "all_properties": True
                # "list": ["name", "_uniqueid"]
            }
        }
    },
    {
        "FindEntity": {
            "_ref": 2,
            "with_class": "PROFESSIONAL",
            "constraints":{
                "name": ["in", [ "Meg Ryan"]]
            },
            "results": {
                "all_properties": True
                # "list": ["name", "_uniqueid"]
            }
        }
    },
    {
        "FindEntity": {
            "is_connected_to": {
                "all": [
                {"ref": 1},
                {"ref": 2}
                ]
            },
            "with_class": "MOVIE",
            "results": {
                # "list": ["id", "title"],
                # "group_by_source": True
                "all_properties": True
            }
        }
    }
]

_, response, _ = execute_query(client, q)

pd.json_normalize(response[2]["FindEntity"]["entities"])


Unnamed: 0,_uniqueid,budget,id,label,movie_id,overview,popularity,tagline,title,uniqueid,vote_average,vote_count
0,7.837.752,65000000,9489,MOVIE,9489,"Book superstore magnate, Joe Fox and independe...",28.540267,Someone you pass on the street may already be ...,You've Got Mail,You've got mail,6.3,838


## We can write the same queries in SPARQL.

Trying the above examples (whatever is possible), as sparql does not deal with properties on relations.

In [7]:
from aperturedb.SPARQL import SPARQL
import json


sparql = SPARQL(client, debug=True)
print("namespaces:", json.dumps({k: str(v) for k, v in sparql.namespaces.items()}, indent=2))

print("properties:", json.dumps({sparql.graph.qname(k): str(v)
      for k, v in sparql.properties.items()}, indent=2))

print("connections:", json.dumps({sparql.graph.qname(k): str(v)
      for k, v in sparql.connections.items()}, indent=2))


namespaces: {
  "t": "http://aperturedb.io/type/",
  "c": "http://aperturedb.io/connection/",
  "p": "http://aperturedb.io/property/",
  "o": "http://aperturedb.io/object/",
  "knn": "http://aperturedb.io/knn/",
  "GENRE": "http://aperturedb.io/object/GENRE/",
  "KEYWORD": "http://aperturedb.io/object/KEYWORD/",
  "MOVIE": "http://aperturedb.io/object/MOVIE/",
  "PRODUCTION_COMPANY": "http://aperturedb.io/object/PRODUCTION_COMPANY/",
  "PROFESSIONAL": "http://aperturedb.io/object/PROFESSIONAL/",
  "SPOKEN_LANGUAGE": "http://aperturedb.io/object/SPOKEN_LANGUAGE/"
}
properties: {
  "p:id": "{'KEYWORD', 'MOVIE', 'GENRE', 'PRODUCTION_COMPANY', 'PROFESSIONAL'}",
  "p:label": "{'SPOKEN_LANGUAGE', 'KEYWORD', 'MOVIE', 'GENRE', 'PRODUCTION_COMPANY', 'PROFESSIONAL'}",
  "p:name": "{'SPOKEN_LANGUAGE', 'KEYWORD', 'GENRE', 'PRODUCTION_COMPANY', 'PROFESSIONAL'}",
  "p:uniqueid": "{'SPOKEN_LANGUAGE', 'KEYWORD', 'MOVIE', 'GENRE', 'PRODUCTION_COMPANY', 'PROFESSIONAL'}",
  "p:budget": "{'MOVIE'}",
  "p:

### Find all the movies where Tom Hanks as been a part of 

In [8]:

query = """
SELECT  ?title ?pop ?budget
WHERE {
  ?p p:name "Tom Hanks" .
  ?m c:CAST ?p .
  ?m p:title ?title ;
    p:popularity ?pop ;
    p:budget ?budget .
}
"""

results = sparql.query(query)
df = sparql.to_dataframe(results)
display(df)


Unnamed: 0,title,pop,budget
0,Extremely Loud & Incredibly Close,31.066874,40000000
1,Philadelphia,44.301745,26000000
2,Who Killed the Electric Car?,5.323184,1000000
3,Saving Private Ryan,76.041867,70000000
4,Apollo 13,68.140214,52000000
5,You've Got Mail,28.540267,65000000
6,The Green Mile,103.698022,60000000
7,Toy Story 3,59.995418,200000000
8,The Polar Express,47.323228,165000000
9,Saving Mr. Banks,31.957947,35000000


### Find 2 cast people. Find the movies they have been part of.

In [9]:
query = """
SELECT  ?title ?pop ?budget
WHERE {
  ?m c:CAST [p:name "Tom Hanks"] , [p:name "Meg Ryan"] ;
    p:title ?title ;
    p:popularity ?pop ;
    p:budget ?budget .
}
"""

results = sparql.query(query)
df = sparql.to_dataframe(results)
display(df)
# print(json.dumps(sparql.input_query, indent=2))

Unnamed: 0,title,pop,budget
0,You've Got Mail,28.540267,65000000
