# Internet Memes: Knowledge connects culture and creativity

This notebook shows how KGTK supports the enrichment and analytics of an Internet Meme Knowledge Graph (IMKG).

We highlight how KGTK facilitates:

1. **Knowledge Engineers** to connect the dots between entities in in-house resources based on the large-scale KG, Wikidata
2. **Data Scientists** to perform scalable analytics and visualization of the resulting IMKG
3. **Knowledge Explorers** to use a customized browser to understand the contents of IMKG

## Step 0: Setup


### Notebook data setup

In [54]:
import os
import os.path

from kgtk.configure_kgtk_notebooks import ConfigureKGTK
from kgtk.functions import kgtk

In [55]:
# Parameters

# Folders on local machine where to create the output and temporary files:
input_path = "wikidata"
output_path = "projects"
project_name = "memes"

In [56]:
# These are all the KG files that we use in this notebook:
additional_files = {
    "items": "claims.wikibase-item.tsv.gz",
}

big_files = [
    "label",
]

ck = ConfigureKGTK(big_files)
ck.configure_kgtk(input_graph_path=input_path, 
                  output_path=output_path, 
                  project_name=project_name,
                  additional_files=additional_files)

User home: /Users/filipilievski
Current dir: /Users/filipilievski/mcs/kgtk-tutorial-aaai23
KGTK dir: /Users/filipilievski/mcs
Use-cases dir: /Users/filipilievski/mcs/use-cases


In [57]:
ck.print_env_variables()

STORE: projects/memes/temp.memes/wikidata.sqlite3.db
TEMP: projects/memes/temp.memes
KGTK_OPTION_DEBUG: false
OUT: projects/memes
USE_CASES_DIR: /Users/filipilievski/mcs/use-cases
KGTK_LABEL_FILE: wikidata/labels.en.tsv.gz
GRAPH: wikidata
kgtk: kgtk
kypher: kgtk query --graph-cache projects/memes/temp.memes/wikidata.sqlite3.db
EXAMPLES_DIR: /Users/filipilievski/mcs/examples
KGTK_GRAPH_CACHE: projects/memes/temp.memes/wikidata.sqlite3.db
label: wikidata/labels.en.tsv.gz
items: wikidata/claims.wikibase-item.tsv.gz


## Step 1: Enrichment of knowledge with Wikidata

Let's see how much information Wikidata has about Internet Memes:

In [13]:
!kgtk query -i $items \
    --match '()-[:P31]->(:Q2927074)' \
    --limit 5

id	node1	label	node2	lang	rank	node2;wikidatatype
Q100270830-P31-Q2927074-6ea94891-0	Q100270830	P31	Q2927074		normal	wikibase-item
Q100324361-P31-Q2927074-b0d999a2-0	Q100324361	P31	Q2927074		normal	wikibase-item
Q104005472-P31-Q2927074-89740183-0	Q104005472	P31	Q2927074		normal	wikibase-item
Q104631975-P31-Q2927074-18555c03-0	Q104631975	P31	Q2927074		normal	wikibase-item
Q104713251-P31-Q2927074-247b7c6d-0	Q104713251	P31	Q2927074		normal	wikibase-item


So Wikidata has ? Internet Meme instances. Let's see how many of them have links to KnowYourMeme in Wikidata:

In [None]:
!kgtk query -i $items \
    --match '(im)-[:P31]->(:Q2927074), \
             (im)-[:P6760]->(imkym)' \
    --limit 5

Let's now connect our in-house KG with Wikidata through these memes:

How many triples we have about these memes in either of the sources:

In [None]:
!kgtk query -i $items \
    --match '(im)-[:P31]->(:Q2927074), \
             (im)-[r]->()' \
    --limit 5

Finally, we have extracted entities in memes already based on information extraction. Let's use that to get background information about these entities from Wikidata:

In [None]:
!kgtk query -i $TEMP/combined_with_wd.kgtk.gz \
    --match '(x)-->(y)' \
     --return 'x as node1, "member" as label, "set1" as node2, y as node1, "member" as label, "set1" as node2' \
     --multi 2 \
     / deduplicate / add-id / \
     query -i - --as gnodes --idx mode:valuegraph -i $items --idx mode:graph \
     --match 'item:  (x)-[r]->(y), \
              gnodes: (x)-->(), \
                      (y)-->()' \
    --return 'distinct x, r.label, y' \
    -o $TEMP/wikidata_ent.kgtk.gz

To make the output more readable, we enhance it with entity labels from Wikidata:

## 2. Scalable analytics and visualization

### 2a. Compute global statistics

In [29]:
!kgtk graph-statistics \
     -i $TEMP/imkg.kgtk.gz \
     --log-file $TEMP/imkg_summary.txt \
     --output-statistics-only \
     -o $TEMP/imkg_stats.tsv

objc[28246]: Class GNotificationCenterDelegate is implemented in both /Users/filipilievski/opt/anaconda3/envs/kgtk23/lib/libgio-2.0.0.dylib (0x198317c30) and /usr/local/Cellar/glib/2.72.2/lib/libgio-2.0.0.dylib (0x1aecb86b0). One of the two will be used. Which one is undefined.


In [30]:
!cat $TEMP/imkg_summary.txt

graph loaded! It has 4850636 nodes and 16549810 edges

*** Top relations:
m4s:fromCaption	3344941
imgflipr:alt_text	1326032
imgflipr:image_url	1326032
imgflipr:template	1326032
imgflipr:templateId	1326032
imgflipr:template_title	1326032
imgflipr:upvote_count	1326032
imgflipr:view_count	1326032
imgflipr:title	1326021
imgflipr:author	1176414

*** Degrees:
in degree stats: mean=3.411885, std=0.202183, max=1
out degree stats: mean=3.411885, std=0.003003, max=1
total degree stats: mean=6.823769, std=0.202210, max=1

*** PageRank
Max pageranks
1895	Q2927074	0.006458
43756	Q978	0.006304
7339	Q336	0.003967
23	Q30	0.003257
678	Q11862829	0.002837

*** HITS
HITS hubs
1895	Q2927074	0.806012
43756	Q978	0.274611
86178	nan	0.259774
86169	1	0.245116
86229	2	0.155478
HITS auth
4716157	kym:memeing	0.002565
4752934	kym:expression-memes	0.002512
4724355	kym:nice-meme	0.002505
4715463	kym:1	0.002497
4827988	kym:x-shuts-up-the-queen-of-hearts	0.002493


Show the most common 20 relations:

In [31]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '()-[r]->()' \
    --return 'r.label, count(r.label) as c' \
    --order-by 'c desc' \
    --limit 20

label	c
m4s:fromCaption	3344941
imgflipr:view_count	1326032
imgflipr:upvote_count	1326032
imgflipr:template_title	1326032
imgflipr:templateId	1326032
imgflipr:template	1326032
imgflipr:image_url	1326032
imgflipr:alt_text	1326032
imgflipr:title	1326021
imgflipr:author	1176414
m4s:fromImage	388579
rdfs:seeAlso	196594
m4s:tag	73951
P31	58153
m4s:fromAbout	47455
P136	33657
m4s:structured_value	30440
m4s:structured_uri	30440
rdf:type	28939
P106	27605


Show the most frequent 10 entities from captions:

In [32]:
!kgtk query -i $TEMP/imkg.kgtk.gz -i $label \
    --match 'imkg: ()-[:`m4s:fromCaption`]->(n), \
            label: (n)-[]->(lbl)' \
    --return 'n, kgtk_lqstring_text_string(lbl), count(n) as c' \
    --order-by 'c desc' \
    --limit 10

node2	kgtk_lqstring_text_string(graph_2_c2."node2")	c
Q2927074	"Internet meme"	641428
Q978	"meme"	464579
Q11661	"information technology"	75104
Q44359	"bling-bling"	34331
Q2695156	"Batman"	30132
Q728183	"CAN bus"	27661
Q492038	"human brain"	26777
Q20992398	"Hotline Bling"	20873
Q1107971	"Kermit the Frog"	18217
Q1967556	"National Organization for Women"	17974


Show the most frequent 10 entities from About:

In [33]:
!kgtk query -i $TEMP/imkg.kgtk.gz -i $label \
    --match 'imkg: ()-[:`m4s:fromAbout`]->(n), \
            label: (n)-[]->(lbl)' \
    --return 'n, kgtk_lqstring_text_string(lbl), count(n) as c' \
    --order-by 'c desc' \
    --limit 10

node2	kgtk_lqstring_text_string(graph_2_c2."node2")	c
Q6002242	"image macro"	1122
Q238330	"4chan"	902
Q2927074	"Internet meme"	833
Q2708515	"catchphrase"	650
Q866	"YouTube"	573
Q170539	"parody"	501
Q5287	"Japanese"	455
Q384060	"Tumblr"	426
Q978	"meme"	403
Q30	"United States of America"	373


### 2b. Queries for interesting use cases

**Example: What are the most memable people in Wikidata?**

In [59]:
!kgtk query -i $TEMP/imkg.kgtk.gz -i "wikidata/labels.en.tsv.gz"\
    --match 'imkg: (h)-[]->(person),\
            (h)-[:`rdf:type`]->(:`kym:Meme`),\
            (person)-[:P31]->(:Q5), \
            labels: (person)-->(pname)' \
    --return 'pname, count(h) as c' \
    --order-by 'c desc' \
    --limit 3

node2	c
'Donald Trump'@en	145
'Kyle Craven'@en	72
'Kanye West'@en	56


**Example: memes that are based on films?**

In [43]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromAbout`]->(t),\
             (t)-[:P31]->(:Q11424)' \
    --return 'count (distinct h)' \
    --limit 10

count(DISTINCT graph_1_c1."node1")
413


Show me some instances of memes with their movies:

In [48]:
!kgtk query -i $TEMP/imkg.kgtk.gz -i "wikidata/labels.en.tsv.gz" \
    --match 'imkg: (h)-[:`m4s:fromAbout`]->(t),\
             (t)-[:P31]->(:Q11424), \
             labels: (t)-->(tname)' \
    --return 'h, tname' \
    --limit 10

node1	node2
kym:dark-knight-4-pane	'The Conversation'@en
kym:candy	'Candyman'@en
kym:confused-travolta	'Pulp Fiction'@en
kym:say-what-again	'Pulp Fiction'@en
kym:majiresu	'Unforgiven'@en
kym:i-believe-you-have-my-stapler	'Office Space'@en
kym:that-would-be-great	'Office Space'@en
kym:livetweeting	'Overheard'@en
kym:yngwie-malmsteen-unleashes-the-fucking-fury	'Overheard'@en
kym:alan-rickman-is-now-diamonds	'Die Hard'@en


**Example: how many memes depict the entity Q83279 ("SpongeBob SquarePants")**

In [44]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromImage`]->(:Q83279),\
            (h)-[:`rdf:type`]->(:`kym:Meme`)' \
    --return 'count(distinct h)'

count(DISTINCT graph_1_c1."node1")
130


Show me some examples of Spongebob memes:

In [52]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromImage`]->(:Q83279),\
            (h)-[:`rdf:type`]->(:`kym:Meme`)' \
    --return 'distinct h' \
    --limit 30

node1
kym:a-day-with-spongebob-squarepants-the-movie
kym:advanced-darkness
kym:allen
kym:are-there-any-other-squidwards-i-should-know-about
kym:are-you-feeling-it-now-mr-krabs
kym:at-night
kym:berto-bragaqvadra
kym:big-meaty-claws
kym:bitch-im-flawless
kym:bodyguard-legit
kym:bold-and-brash
kym:breath-in-boi
kym:campfire-song-song
kym:chinese-cartoons
kym:chocolate
kym:confused-mr-krabs
kym:could-you-play-that-song-again
kym:derp-sandy
kym:deuueaugh
kym:dont-say-youre-a-fan-if-you-dont-know-who-this-is
kym:doodlebob
kym:empty-page
kym:evil
kym:fake-history
kym:fooby-the-kamikaze-watermelon
kym:fuck-logic
kym:fun-song
kym:garys-song-gary-come-home
kym:get-a-job-soup
kym:giant-paper


### 2c. Visualize memes about Spongebob

Let's visualize the knowledge graph connections between memes that depict Spongebob

In [35]:
!kgtk query -i $TEMP/imkg.kgtk.gz \
    --match '(h)-[:`m4s:fromImage`]->(:Q83279),\
            (h)-[:`rdf:type`]->(:`kym:Meme`),\
            (h)-[r]->(t)' \
    --return 'distinct h,r.label,t' \
    -o $TEMP/sponge.kgtk.gz

In [41]:
kgtk("""visualize-graph 
        -i $TEMP/sponge.kgtk.gz
        -o viz/iflip_kym.graph.html""")

In [42]:
!open viz/iflip_kym.graph.html