In [1]:
!kaggle datasets download rounakbanik/ted-talks

ted-talks.zip: Skipping, found more recently modified local copy (use --force to force download)


In [2]:
!unzip -o ted-talks.zip

Archive:  ted-talks.zip
  inflating: ted_main.csv            
  inflating: transcripts.csv         


In [3]:
import pandas as pd

In [4]:
ted_main = pd.read_csv('ted_main.csv')
transcripts = pd.read_csv('transcripts.csv')
transcripts.head()

Unnamed: 0,transcript,url
0,Good morning. How are you?(Laughter)It's been ...,https://www.ted.com/talks/ken_robinson_says_sc...
1,"Thank you so much, Chris. And it's truly a gre...",https://www.ted.com/talks/al_gore_on_averting_...
2,"(Music: ""The Sound of Silence,"" Simon & Garfun...",https://www.ted.com/talks/david_pogue_says_sim...
3,If you're here today — and I'm very happy that...,https://www.ted.com/talks/majora_carter_s_tale...
4,"About 10 years ago, I took on the task to teac...",https://www.ted.com/talks/hans_rosling_shows_t...


In [5]:
ted = pd.merge(ted_main, transcripts, on='url')
ted['url'] = ted['url'].map(lambda url: url.strip())

In [6]:
ted.head()

Unnamed: 0,comments,description,duration,event,film_date,languages,main_speaker,name,num_speaker,published_date,ratings,related_talks,speaker_occupation,tags,title,url,views,transcript
0,4553,Sir Ken Robinson makes an entertaining and pro...,1164,TED2006,1140825600,60,Ken Robinson,Ken Robinson: Do schools kill creativity?,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 19645}, {...","[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...",Author/educator,"['children', 'creativity', 'culture', 'dance',...",Do schools kill creativity?,https://www.ted.com/talks/ken_robinson_says_sc...,47227110,Good morning. How are you?(Laughter)It's been ...
1,265,With the same humor and humanity he exuded in ...,977,TED2006,1140825600,43,Al Gore,Al Gore: Averting the climate crisis,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 544}, {'i...","[{'id': 243, 'hero': 'https://pe.tedcdn.com/im...",Climate advocate,"['alternative energy', 'cars', 'climate change...",Averting the climate crisis,https://www.ted.com/talks/al_gore_on_averting_...,3200520,"Thank you so much, Chris. And it's truly a gre..."
2,124,New York Times columnist David Pogue takes aim...,1286,TED2006,1140739200,26,David Pogue,David Pogue: Simplicity sells,1,1151367060,"[{'id': 7, 'name': 'Funny', 'count': 964}, {'i...","[{'id': 1725, 'hero': 'https://pe.tedcdn.com/i...",Technology columnist,"['computers', 'entertainment', 'interface desi...",Simplicity sells,https://www.ted.com/talks/david_pogue_says_sim...,1636292,"(Music: ""The Sound of Silence,"" Simon & Garfun..."
3,200,"In an emotionally charged talk, MacArthur-winn...",1116,TED2006,1140912000,35,Majora Carter,Majora Carter: Greening the ghetto,1,1151367060,"[{'id': 3, 'name': 'Courageous', 'count': 760}...","[{'id': 1041, 'hero': 'https://pe.tedcdn.com/i...",Activist for environmental justice,"['MacArthur grant', 'activism', 'business', 'c...",Greening the ghetto,https://www.ted.com/talks/majora_carter_s_tale...,1697550,If you're here today — and I'm very happy that...
4,593,You've never seen data presented like this. Wi...,1190,TED2006,1140566400,48,Hans Rosling,Hans Rosling: The best stats you've ever seen,1,1151440680,"[{'id': 9, 'name': 'Ingenious', 'count': 3202}...","[{'id': 2056, 'hero': 'https://pe.tedcdn.com/i...",Global health expert; data visionary,"['Africa', 'Asia', 'Google', 'demo', 'economic...",The best stats you've ever seen,https://www.ted.com/talks/hans_rosling_shows_t...,12005869,"About 10 years ago, I took on the task to teac..."


In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [8]:
vectorizer = TfidfVectorizer(stop_words='english')
vectors = vectorizer.fit_transform(ted['transcript'])
feature_names = vectorizer.get_feature_names()

In [9]:
dense = vectors.todense()
denselist = dense.tolist()
df = pd.DataFrame(denselist, columns=feature_names)

In [10]:
def get_top_k_keywords(talk_id, k):
    keywords = df.loc[talk_id].sort_values(ascending=False)[:k]
    keywords = keywords[~(keywords==0)]
    return keywords

In [11]:
def get_talks_with_keyword(keyword):
    if keyword not in df:
        return None
    scores = df[keyword].sort_values(ascending=False)
    scores = scores[~(scores==0)]
    return scores

In [12]:
pd.set_option('display.max_colwidth', None)

In [13]:
talks = get_talks_with_keyword("intelligence")
ted.loc[talks.index][["main_speaker", "title", "url"]][:5]

Unnamed: 0,main_speaker,title,url
1593,Alex Wissner-Gross,A new equation for intelligence,https://www.ted.com/talks/alex_wissner_gross_a_new_equation_for_intelligence
2210,Sam Harris,Can we build AI without losing control over it?,https://www.ted.com/talks/sam_harris_can_we_build_ai_without_losing_control_over_it
1884,Nick Bostrom,What happens when our computers get smarter than we are?,https://www.ted.com/talks/nick_bostrom_what_happens_when_our_computers_get_smarter_than_we_are
1538,Mikko Hypponen,How the NSA betrayed the world's trust -- time to act,https://www.ted.com/talks/mikko_hypponen_how_the_nsa_betrayed_the_world_s_trust_time_to_act
2305,Grady Booch,Don't fear superintelligent AI,https://www.ted.com/talks/grady_booch_don_t_fear_superintelligence


In [14]:
get_top_k_keywords(1593, 10)

entropica       0.485239
intelligence    0.411859
maximize        0.241058
variety         0.133458
action          0.120513
entropy         0.116891
freedom         0.115266
threads         0.112236
future          0.110370
earth           0.109982
Name: 1593, dtype: float64

In [15]:
def get_transcript(talk_id):
    return ted.loc[talk_id]["transcript"]

In [16]:
def get_link(talk_id):
    return ted.loc[talk_id]["url"].strip()

In [17]:
print(get_link(1593))

https://www.ted.com/talks/alex_wissner_gross_a_new_equation_for_intelligence


In [18]:
get_transcript(1593)

'Intelligence — what is it? If we take a look back at the history of how intelligence has been viewed, one seminal example has been Edsger Dijkstra\'s famous quote that "the question of whether a machine can think is about as interesting as the question of whether a submarine can swim." Now, Edsger Dijkstra, when he wrote this, intended it as a criticism of the early pioneers of computer science, like Alan Turing. However, if you take a look back and think about what have been the most empowering innovations that enabled us to build artificial machines that swim and artificial machines that [fly], you find that it was only through understanding the underlying physical mechanisms of swimming and flight that we were able to build these machines. And so, several years ago, I undertook a program to try to understand the fundamental physical mechanisms underlying intelligence.Let\'s take a step back. Let\'s first begin with a thought experiment. Pretend that you\'re an alien race that doesn