# Internet Memes: Knowledge connects culture and creativity

This notebook shows how KGTK supports the enrichment and analytics of an Internet Meme Knowledge Graph (IMKG).

We highlight how KGTK facilitates:
1. Enrichment with a public Knowledge Graph (Wikidata)
2. Scalable analytics and visualization of the resulting IMKG

## Step 0: Setup


### Notebook data setup

In [1]:
import os
import os.path

from kgtk.configure_kgtk_notebooks import ConfigureKGTK
from kgtk.functions import kgtk

In [2]:
# Parameters

# Folders on local machine where to create the output and temporary files:
input_path = "datasets"
output_path = "projects"
project_name = "memes"

In [3]:
# These are all the KG files that we use in this notebook:
additional_files = {
    "kym": "kym.kgtk.gz",
    "wiki": "wd_mini.kgtk.gz",
    "mappings": "mappings.kgtk.gz"
    
}

big_files = [
    "label",
]

ck = ConfigureKGTK(big_files)
ck.configure_kgtk(input_graph_path=input_path, 
                  output_path=output_path, 
                  project_name=project_name,
                  additional_files=additional_files)

User home: /Users/filipilievski
Current dir: /Users/filipilievski/mcs/kgtk-aaai2023
KGTK dir: /Users/filipilievski/mcs
Use-cases dir: /Users/filipilievski/mcs/use-cases


In [4]:
ck.print_env_variables()

STORE: projects/memes/temp.memes/wikidata.sqlite3.db
TEMP: projects/memes/temp.memes
EXAMPLES_DIR: /Users/filipilievski/mcs/examples
KGTK_GRAPH_CACHE: projects/memes/temp.memes/wikidata.sqlite3.db
GRAPH: data
kgtk: kgtk
USE_CASES_DIR: /Users/filipilievski/mcs/use-cases
KGTK_LABEL_FILE: data/labels.en.tsv.gz
KGTK_OPTION_DEBUG: false
kypher: kgtk query --graph-cache projects/memes/temp.memes/wikidata.sqlite3.db
OUT: projects/memes
label: data/labels.en.tsv.gz
kym: data/kym.kgtk.gz
wiki: data/wd_mini.kgtk.gz
mappings: data/mappings.kgtk.gz


In [5]:
ck.load_files_into_cache()

kgtk query --graph-cache projects/memes/temp.memes/wikidata.sqlite3.db -i "data/labels.en.tsv.gz" --as label  -i "data/kym.kgtk.gz" --as kym  -i "data/wd_mini.kgtk.gz" --as wiki  -i "data/mappings.kgtk.gz" --as mappings  --limit 3
input alias 'label' already in use


## Part 1: Enrichment of knowledge with Wikidata

Let's see how much information Wikidata has about Internet Memes:

In [6]:
%%time
!kgtk query -i $wiki \
    --match '(im)-[:P31]->(:Q2927074)' \
    --return 'count (distinct im)'

count(DISTINCT graph_4_c1."node1")
277
CPU times: user 16.2 ms, sys: 11.6 ms, total: 27.8 ms
Wall time: 1.46 s


So our portion of Wikidata has 277 Internet Meme instances. Let's see how many of them have links to KnowYourMeme in Wikidata:

In [7]:
%%time
!kgtk query -i $wiki -i $mappings \
    --match 'wd: (im)-[:P31]->(:Q2927074), \
             mappings: (im)-[:P6760]->(imkym)' \
    --return 'count (distinct im)'

count(DISTINCT graph_4_c1."node1")
239
CPU times: user 18.2 ms, sys: 12.8 ms, total: 31.1 ms
Wall time: 1.48 s


Out of the 277 memes we have in Wikidata, 239 have a link to KnowYourMeme (KYM).

How many memes do we have in the KYM graph itself?

In [8]:
%%time
!kgtk query -i $kym \
    --match '(n1)-[r:`rdf:type`]->(:`kym:Meme`)' \
    --return 'count(distinct n1)'

count(DISTINCT graph_6_c1."node1")
12585
CPU times: user 16.8 ms, sys: 12.1 ms, total: 28.9 ms
Wall time: 1.49 s


Let's now connect KYM with Wikidata through their shared meme identifiers.

We first extract knowledge about memes that exist in both graphs. 

We store the information from KYM in a separate graph file called `kym_memes.kgtk.gz`:

In [9]:
%%time
!kgtk query -i $kym -i $mappings \
    --match 'mapping: (meme_qid)-[:P6760]->(kym_meme), \
            kym: (kym_meme)-[mrel]->(mval)' \
    --return 'kym_meme as node1, mrel.label as label, mval as node2' /\
    deduplicate -o $TEMP/kym_memes.kgtk.gz

CPU times: user 34.3 ms, sys: 17.3 ms, total: 51.6 ms
Wall time: 3.1 s


In [10]:
%%time
!kgtk query -i $TEMP/kym_memes.kgtk.gz \
    --match '(n1)-[r]->()' \
    --return 'count(n1)'

count(graph_23_c1."node1")
17073
CPU times: user 19.6 ms, sys: 13.9 ms, total: 33.5 ms
Wall time: 1.63 s


In [11]:
%%time
!kgtk query -i $TEMP/kym_memes.kgtk.gz \
    --match '(n1)-[r]->()' \
    --limit 10

node1	label	node2
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:about	( ͡° ͜ʖ ͡°) is an emoticon created with unicode character symbols. The face is often used to spam forums and image boards, similar to the Japanese word \desu\". On 4chan, it has also come to be known as \"Le Lenny Face\" or \"Le Face Face.\""
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:added	nodeDW9ARPJSRQYEUQn4RyWQMF-16625
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:from	Ylilauta
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromAbout	Q238330
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromAbout	Q28135014
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromAbout	Q31963
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromAbout	Q5287
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromAbout	Q8819
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromImage	Q1027879
kym:%CD%A1-%CD%9C%CA%96-%CD%A1-lenny-face	m4s:fromImage	Q10770146
CPU times: user 19.9 ms, sys: 12.6 ms, total: 32.5 ms
Wall time: 1.5 s


Alright, so we have 17K triples from KYM about these memes. 

We store the information from Wikidata in a separate graph file called `wd_memes.kgtk.gz`:

In [12]:
%%time
!kgtk query -i $wiki -i $mappings \
    --match 'mapping: (meme_qid)-[:P6760]->(), \
            wd: (meme_qid)-[mrel]->(mval)' \
    --return 'meme_qid as node1, mrel.label as label, mval as node2' /\
    deduplicate -o $TEMP/wd_memes.kgtk.gz

CPU times: user 36.1 ms, sys: 18.7 ms, total: 54.8 ms
Wall time: 2.96 s


In [13]:
%%time
!kgtk query -i $TEMP/wd_memes.kgtk.gz \
    --match '(n1)-[r]->()' \
    --return 'count(n1)'

count(graph_18_c1."node1")
1394
CPU times: user 19.4 ms, sys: 13.3 ms, total: 32.6 ms
Wall time: 1.52 s


In total we get 1,394 new triples about these memes. Let's look at some examples, but together with labels for human readability:

In [14]:
%%time
!kgtk query -i $TEMP/wd_memes.kgtk.gz \
    --match 'meme: (n1)-[r]->(n2)' \
    --limit 10 / add-labels

node1	label	node2	node1;label	node2;label
Q104005472	P1080	Q87609688	'Primitive Sponge'@en	'SpongeBob SquarePants universe'@en
Q104005472	P1340	Q17122834	'Primitive Sponge'@en	'blue'@en
Q104005472	P1441	Q83279	'Primitive Sponge'@en	'SpongeBob SquarePants'@en
Q104005472	P21	Q6581097	'Primitive Sponge'@en	'male'@en
Q104005472	P31	Q15711870	'Primitive Sponge'@en	'animated character'@en
Q104005472	P31	Q2927074	'Primitive Sponge'@en	'Internet meme'@en
Q104005472	P31	Q88560371	'Primitive Sponge'@en	'anthropomorphic sea sponge'@en
Q104005472	P3828	Q1130359	'Primitive Sponge'@en	'loincloth'@en
Q104005472	P4584	Q29566330	'Primitive Sponge'@en	'SB-129'@en
Q104005472	P462	Q68223248	'Primitive Sponge'@en	'light yellow'@en
CPU times: user 32.3 ms, sys: 16.1 ms, total: 48.4 ms
Wall time: 2.93 s


In [15]:
%%time
!kgtk cat -i $TEMP/kym_memes.kgtk.gz -i $TEMP/wd_memes.kgtk.gz / deduplicate -o $TEMP/combined.kgtk.gz

CPU times: user 37.8 ms, sys: 18.9 ms, total: 56.7 ms
Wall time: 3.2 s


Finally, we have extracted entities for the KYM memes already based on information extraction. Let's use that to get background information about these entities from Wikidata:

In [16]:
%%time
!kgtk query -i $TEMP/combined.kgtk.gz \
    --match '(x)-->(y)' \
     --return 'x as node1, "member" as label, "set1" as node2, y as node1, "member" as label, "set1" as node2' \
     --multi 2 \
     / deduplicate / add-id / \
     query -i - --as gnodes --idx mode:valuegraph -i $wiki --idx mode:graph \
     --match 'wd:  (x)-[r]->(y), \
              gnodes: (x)-->(), \
                      (y)-->()' \
    --return 'distinct x, r.label, y' \
    -o $TEMP/wikidata_ent.kgtk.gz

CPU times: user 50.5 ms, sys: 21.1 ms, total: 71.6 ms
Wall time: 4.04 s


To make the output more readable, we enhance it with entity labels from Wikidata:

In [17]:
%%time
!kgtk query -i $TEMP/wikidata_ent.kgtk.gz \
    --match '(n1)-[r]->()' \
    --limit 10 / add-labels

node1	label	node2	node1;label	node2;label
Q1	P2670	Q6999	'universe'@en	'astronomical object'@en
Q1	P3113	Q2051667	'universe'@en	'parallel universe'@en
Q100	P1376	Q771	'Boston'@en	'Massachusetts'@en
Q100	P17	Q30	'Boston'@en	'United States of America'@en
Q100	P30	Q49	'Boston'@en	'North America'@en
Q1001	P106	Q1930187	'Mohandas Karamchand Gandhi'@en	'journalist'@en
Q1001	P1412	Q1860	'Mohandas Karamchand Gandhi'@en	'English'@en
Q1001	P21	Q6581097	'Mohandas Karamchand Gandhi'@en	'male'@en
Q1001	P31	Q5	'Mohandas Karamchand Gandhi'@en	'human'@en
Q1001	P509	Q2140674	'Mohandas Karamchand Gandhi'@en	'ballistic trauma'@en
CPU times: user 34.4 ms, sys: 15.9 ms, total: 50.3 ms
Wall time: 3.01 s


Finally, we merge all the files and deduplicate information, leading to our combined Internet Meme KG.

In [18]:
%%time
!kgtk cat -i $TEMP/kym_memes.kgtk.gz -i $TEMP/wikidata_ent.kgtk.gz -i $TEMP/wd_memes.kgtk.gz / deduplicate -o $TEMP/imkg.kgtk.gz

CPU times: user 38.1 ms, sys: 18.7 ms, total: 56.8 ms
Wall time: 3.34 s


## 2. Scalable analytics and visualization

Let's first compute global statistics of our IMKG graph:

In [19]:
!kgtk graph-statistics \
     -i $TEMP/imkg.kgtk.gz \
     --log-file $TEMP/imkg_summary.txt \
     --output-statistics-only \
     -o $TEMP/imkg_stats.tsv

objc[3500]: Class GNotificationCenterDelegate is implemented in both /Users/filipilievski/opt/anaconda3/envs/kgtk23/lib/libgio-2.0.0.dylib (0x1a46b7c30) and /usr/local/Cellar/glib/2.72.2/lib/libgio-2.0.0.dylib (0x1abf5b6b0). One of the two will be used. Which one is undefined.


In [20]:
!cat $TEMP/imkg_summary.txt

graph loaded! It has 8307 nodes and 25002 edges

*** Top relations:
rdfs:seeAlso	5725
m4s:fromImage	5404
m4s:tag	1857
P31	1277
m4s:fromAbout	1231
P279	719
rdf:type	487
P530	369
skos:broader	324
skos:narrower	317

*** Degrees:
in degree stats: mean=3.009751, std=0.123733, max=1
out degree stats: mean=3.009751, std=0.181207, max=1
total degree stats: mean=6.019502, std=0.223082, max=1

*** PageRank
Max pageranks
5	Q30	0.019263
9	Q1860	0.009866
11	Q5	0.007839
629	Q180910	0.007520
146	Q145	0.007274

*** HITS
HITS hubs
37	Q2927074	0.292369
2478	m4s:MediaFrame	0.259005
2476	kym:Meme	0.259005
2457	confirmed	0.244055
98	Q478798	0.191864
HITS auth
3489	kym:lolcats	0.200570
3395	kym:caturday	0.191058
3498	kym:nyan-cat	0.190469
3490	kym:longcat	0.186396
3478	kym:kitler	0.182390


Show the most frequent 10 entities from metadata:

In [21]:
%%time
!kgtk query -i $TEMP/imkg.kgtk.gz  \
    --match 'imkg: ()-[:`m4s:fromAbout`]->(n)' \
    --return 'n, count(n) as c' \
    --order-by 'c desc' \
    --limit 10 / add-labels

node2	c	node2;label
Q6002242	29	'image macro'@en
Q238330	23	'4chan'@en
Q2708515	22	'catchphrase'@en
Q2927074	20	'Internet meme'@en
Q7889	13	'video game'@en
Q866	11	'YouTube'@en
Q75	11	'Internet'@en
Q8102	10	'slang'@en
Q5287	8	'Japanese'@en
Q1860	8	'English'@en
CPU times: user 36.2 ms, sys: 17.9 ms, total: 54.1 ms
Wall time: 3.09 s


Let's now run some queries for interesting use cases.

**Example: What are the most memable people in Wikidata?**

In [22]:
%%time
!kgtk query -i $TEMP/imkg.kgtk.gz -i $label \
    --match 'imkg: (h)-[]->(person),\
            (h)-[:`rdf:type`]->(:`kym:Meme`),\
            (person)-[:P31]->(:Q5), \
            label: (person)-->(pname)' \
    --return 'pname, count(h) as c' \
    --order-by 'c desc' \
    --limit 3 

node2	c
'Kyle Craven'@en	4
'Adolf Hitler'@en	4
'Stefán Karl Stefánsson'@en	3
CPU times: user 499 ms, sys: 135 ms, total: 634 ms
Wall time: 42.5 s


**Example: memes that are based on films?**

In [23]:
%%time
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromAbout`]->(t),\
             (t)-[:P31]->(:Q11424)' \
    --return 'count (distinct h)'

count(DISTINCT graph_20_c1."node1")
5
CPU times: user 19.5 ms, sys: 11.9 ms, total: 31.3 ms
Wall time: 1.53 s


Show me some instances of memes with their movies:

In [24]:
%%time
!kgtk query -i $TEMP/imkg.kgtk.gz -i $label \
    --match 'imkg: (h)-[:`m4s:fromAbout`]->(t),\
             (t)-[:P31]->(:Q11424), \
             labels: (t)-->(tname)' \
    --return 'h, tname' \
    --limit 10

node1	node2
kym:hitlers-downfall-parodies	'Downfall'@en
kym:dramatic-chipmunk	'Inception'@en
kym:karen	'Goodfellas'@en
kym:bush-did-911	'Inside Job'@en
kym:dramatic-chipmunk	'Young Frankenstein'@en
kym:pepe-the-frog	'Feels Good Man'@en
CPU times: user 22.1 ms, sys: 16.8 ms, total: 38.9 ms
Wall time: 1.65 s


**Example: how many memes depict the entity Q83279 ("SpongeBob SquarePants")**

In [25]:
%%time
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromImage`]->(:Q83279),\
            (h)-[:`rdf:type`]->(:`kym:Meme`)' \
    --return 'count(distinct h)'

count(DISTINCT graph_20_c1."node1")
2
CPU times: user 20.4 ms, sys: 12.9 ms, total: 33.3 ms
Wall time: 1.53 s


Show me some examples of Spongebob memes:

In [26]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromImage`]->(:Q83279),\
            (h)-[:`rdf:type`]->(:`kym:Meme`)' \
    --return 'distinct h' 

node1
kym:spongegar-primitive-sponge-caveman-spongebob
kym:the-ugly-barnacle


**Visualize memes about Spongebob**

Let's visualize the knowledge graph connections between memes that depict Spongebob

In [27]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromImage`]->(:Q83279),\
            (h)-[:`rdf:type`]->(:`kym:Meme`),\
            (h)-[r]->(t)' \
    --return 'distinct h,r.label,t'\
    / visualize-graph -o $TEMP/sponge.graph.html

In [28]:
from IPython.display import IFrame

IFrame(src="projects/memes/temp.memes/sponge.graph.html", width=500, height=250)