# This demo illustrates the functionality of Research API Python Wrapper

## Getting Started

Below we provide the examples of the queries using the Wrapper, illustrate its functionality, as well as provide some examples of the types of analysis possible with TikTok Research API

The information on TikTok Research API can be found here: https://developers.tiktok.com/doc/research-api-codebookhttps://developers.tiktok.com/doc/research-api-codebook

### Let's install the package

In [None]:
pip install TikTokResearchApi

### Loading the Wrapper and other packages you might potentially need

In [None]:
from tiktok_research_api import *

import pandas as pd 

import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
from nltk.corpus import stopwords

from datetime import datetime
from datetime import timedelta

from wordcloud import WordCloud
import matplotlib.pyplot as plt

 First thing first, you need to initialize the API client. For that you will need you client_key and client_secret that you will see in your TikTok for Developers portal  https://developers.tiktok.com/research after your application has been approved

In [None]:
# Initialize the API client
qps = 5 #rate limiter
client_key = 'your_client_key'
client_secret = 'your_client_secret'
research_api = TikTokResearchAPI(client_key, client_secret, qps)

## Querying 

Currently, the Research API Wrapper supports following endpoints:

- Videos endpoint

- Users endpoint

- Comments endpoint

- User Pinned Videos endpoint

- User Liked Videos endpoint

- User Reposted Videos endpoint

- User Following endpoint

- User Followers endpoint



### Pagination

If you want to retrieve more than 100 videos (max_count of videos per response), you need search_id and cursor. 

Search_id - unique identifier assigned to a cached search result. This identifier enables the resumption of a prior search and retrieval of additional results based on the same search criteria. The unique identifier assigned to a cached search result. 

Cursor - index assigned to a video, allowing you to continue your query where you left off.


The wrapper faciltates the pagination, through following arguments:

'fetch_all_pages" (default is False) argument. When set to True, handles pagination automatically. If False, it returns only the initial API response.

'max_total' (default is 100000) argument. If you want to return more than max_count, but limit for how much is returned.

For Video Endpoint, the Wrapper allows to query for a period for more than 30 days, but continuing the search would require start_date and end_date (returned in response)

For the endpoints listed below, enabling the "fetch_all_pages" argument allows automatic pagination. When enabled, the function handles pagination automatically. If disabled (set to false) or left unset, it returns only the initial API response.

| R Wrapper Function      | API Endpoint |
| ----------- | ----------- |
| query_videos      | /v2/research/video/query/|
| query_user_followers   | /v2/research/user/followers/|
| query_user_following   | /v2/research/user/following/|
| query_user_liked_videos   | /v2/research/user/liked_videos/|
| query_video_comments   | /v2/research/video/comment/list/|
| query_user_reposted_videos   | /v2/research/user/reposted_videos/|


### To query Videos endpoint

If you decide to query the video endpoint, the possible options are below.
Can search using one of:

"create_date": when the video was created

"username": the handle/username of the creator

"region_code": the region where the video was uploaded

"video_id": the unique ID of the video

"hashtag_name": indexed hashtag

"keyword": a string in the video description (can be a hashtag or something else)

"music_id": the unique ID of the audio

"effect_id": the unique ID of the effects used

"video_length": the length of the video in seconds

OPERATIONS:

"EQ" = equal to

"IN" = in a list

"GT" = greater than

"GTE" = greater than or equal to

"LT" = less than

"LTE" = less than or equal to

All search values should be input as a list of strings. E.g., region_code: ['US','CA']

All possible regions: 'FR', 'TH', 'MM', 'BD', 'IT', 'NP', 'IQ', 'BR', 'US', 'KW', 'VN', 'AR', 'KZ', 'GB', 'UA', 'TR', 'ID', 'PK', 'NG', 'KH', 'PH', 'EG', 'QA', 'MY', 'ES', 'JO', 'MA', 'SA', 'TW', 'AF', 'EC', 'MX', 'BW', 'JP', 'LT', 'TN', 'RO', 'LY', 'IL', 'DZ', 'CG', 'GH', 'DE', 'BJ', 'SN', 'SK', 'BY', 'NL', 'LA', 'BE', 'DO', 'TZ', 'LK', 'NI', 'LB', 'IE', 'RS', 'HU', 'PT', 'GP', 'CM', 'HN', 'FI', 'GA', 'BN', 'SG', 'BO', 'GM', 'BG', 'SD', 'TT', 'OM', 'FO', 'MZ', 'ML', 'UG', 'RE', 'PY', 'GT', 'CI', 'SR', 'AO', 'AZ', 'LR', 'CD', 'HR', 'SV', 'MV', 'GY', 'BH', 'TG', 'SL', 'MK', 'KE', 'MT', 'MG', 'MR', 'PA', 'IS', 'LU', 'HT', 'TM', 'ZM', 'CR', 'NO', 'AL', 'ET', 'GW', 'AU', 'KR', 'UY', 'JM', 'DK', 'AE', 'MD', 'SE', 'MU', 'SO', 'CO', 'AT', 'GR', 'UZ', 'CL', 'GE', 'PL', 'CA', 'CZ', 'ZA', 'AI', 'VE', 'KG', 'PE', 'CH', 'LV', 'PR', 'NZ', 'TL', 'BT', 'MN', 'FJ', 'SZ', 'VU', 'BF', 'TJ', 'BA', 'AM', 'TD', 'SI', 'CY', 'MW', 'EE', 'XK', 'ME', 'KY', 'YE', 'LS', 'ZW', 'MC', 'GN', 'BS', 'PF', 'NA', 'VI', 'BB', 'BZ', 'CW', 'PS', 'FM', 'PG', 'BI', 'AD', 'TV', 'GL', 'KM', 'AW', 'TC', 'CV', 'MO', 'VC', 'NE', 'WS', 'MP', 'DJ', 'RW', 'AG', 'GI', 'GQ', 'AS', 'AX', 'TO', 'KN', 'LC', 'NC', 'LI', 'SS', 'IR', 'SY', 'IM', 'SC', 'VG', 'SB', 'DM', 'KI', 'UM', 'SX', 'GD', 'MH', 'BQ', 'YT', 'ST', 'CF', 'BM', 'SM', 'PW', 'GU', 'HK', 'IN', 'CK', 'AQ', 'WF', 'JE', 'MQ', 'CN', 'GF', 'MS', 'GG', 'TK', 'FK', 'PM', 'NU', 'MF', 'ER', 'NF', 'VA', 'IO', 'SH', 'BL', 'CU', 'NR', 'TP', 'BV', 'EH', 'PN', 'TF', 'RU'

API Documention: https://developers.tiktok.com/doc/research-api-specs-query-videos/


In [None]:
# Define query object
query_criteria_1 = Criteria(
    operation="EQ", field_name="hashtag_name", field_values=["hashtag"]
)
query_criteria_2 = Criteria(
    operation="IN", field_name="region_code", field_values=["region"]
)

# You can define criteria using:
#'and_criteria' specify that all the conditions in the list must be met
#'or_criteria' specify that at least one of the conditions in the list must be met
#'not_criteria' specify that none of the conditions in the list must be met
query = Query(and_criteria=[query_criteria_1, query_criteria_2])

In [None]:
video_fields = "id,create_time,username,region_code,video_description,video_duration,hashtag_names,view_count,like_count,comment_count,share_count"
video_columns = video_fields.split(',')
video_request = QueryVideoRequest(
    fields=video_fields,    
    query = query,
    start_date="20240101",
    end_date="20240201",
    max_count=100,
    max_total = 1000
)

videos, search_id, cursor, has_more, start_date, end_date = research_api.query_videos(video_request, fetch_all_pages=True)


If you want to continue your search, you can specify your search_id, cursor, and start_date, and end_date in your next query. Please make sure to use search_id provided at the end of the query.

In [None]:
video_request = QueryVideoRequest(
    fields=video_fields,    
    query = query,
    start_date=start_date,
    end_date=end_date,
    max_count=100,
    max_total = 1000,
    search_id = search_id,
    cursor = cursor

)

videos, search_id, cursor, has_more, start_date, end_date = research_api.query_videos(video_request, fetch_all_pages=True)

In [None]:
video_df = pd.DataFrame(videos, columns = video_columns)


In [None]:
#you might want to format the variables
column_types = {'id': 'str', 'create_time': 'int', 'username': 'str', 'region_code': 'str', 'video_description': 'str', 'voice_to_text': 'str', 'hashtag_names': 'object', 'view_count': 'int', 'like_count': 'int', 'comment_count': 'int', 'share_count': 'int', 'music_id': 'str', 'effect_ids': 'str', 'playlist_id': 'str'}
int_columns = ['view_count','like_count','comment_count','share_count']

video_df['music_id'] = video_df['music_id'].apply(lambda x: '{:.0f}'.format(x))
video_df['playlist_id'] = video_df['playlist_id'].apply(lambda x: '{:.0f}'.format(x) if not pd.isna(x) else '')
video_df[int_columns] = video_df[int_columns].fillna(0)
video_df = video_df.fillna('')
video_df = video_df.astype(column_types)

# convert create_time to a human-readable datetime
video_df['datetime'] = pd.to_datetime(video_df['create_time'], unit='s').dt.strftime('%Y-%m-%d %H:%M:%S')

#### From here, for example, you can:

#### Select specific rows to investigate

In [None]:
pd.set_option('display.max_colwidth', 300) # use None for no boundaries

video_df[
    #(video_df['video_description'].str.contains('word', case=False, na=False)) #if you want to filter by a specific word
    (video_df['view_count'] > 1000) #if you want to filter by number of views
    #|(video_df['effect_ids'].apply(lambda x: '12345' in x)) #if you want to filter by effect_id, etc
    ]

#### Generate the TikTok link so you can view the video


In [None]:
#To get the video link for a specific video_id, you will need a username and the video_id:
def get_video_link(df, video_id):
    username_row = df[df['id'] == video_id]
    if not username_row.empty:
        username =username_row['username'].values[0]
        print(f"https://www.tiktok.com/@{username}/video/{video_id}")
    else:
        print(f"No user found with video {video_id}")

get_video_link(video_df, 'video_id')

#### Look at the descriptive statistics of the dataframe

In [None]:
def describe(df, stats):
    d = df.describe()
    return pd.concat([d, df.reindex(d.columns, axis = 1).agg(stats)])
pd.options.display.float_format = '{:.2f}'.format
describe(video_df[['view_count','like_count','comment_count','share_count']], ['sum', 'skew', 'kurt'])

#### Get the most frequent words in the description

In [None]:
def get_most_frequent_words(df, text_column_name, num_words=10):
    """
    Get the most frequent words from a text column in a DataFrame.
    
    Parameters:
    - df: DataFrame
        The DataFrame containing the text data.
    - text_column_name: str
        The name of the text column in the DataFrame.
    - num_words: int
        The number of top words to retrieve (default is 10).

    Returns:
    - list of tuples
        A list of (word, frequency) tuples for the most frequent words.
    """
    # Combine all the text in the specified column into a single string
    all_text = ' '.join(df[text_column_name])

    # Tokenize the text into words
    words = word_tokenize(all_text)

    # Remove stopwords (common words like 'the', 'and', 'in', etc.)
    stop_words = set(stopwords.words('english'))
    stop_words.update(['fyp','tiktok','foryou','foryoupage','viral','follow']) # add more stopwords if needed
    filtered_words = [word.lower() for word in words if word.lower() not in stop_words and word.isalpha()]

    # Calculate word frequencies
    fdist = FreqDist(filtered_words)

    # Get the most frequent words
    most_common_words = fdist.most_common(num_words)

    return most_common_words

top_words_description = get_most_frequent_words(video_df, 'video_description', num_words=20)
print(top_words_description)

### Query Users endpoint

In [None]:
username = "username"
user_info_request = QueryUserInfoRequest(
    username= username
)
userinfo = research_api.query_user_info(user_info_request)


In [None]:
user_df = pd.DataFrame(userinfo,index=[0])
user_df 

### Retrieving information for multiple accounts:

In [None]:
# For example, if you want to get query the users, who created videos with more than 100000 views, you can do the following:
selected_df = video_df[video_df['view_count'] > 100000] #filter the video
print(f"New video count: {len(selected_df)}")
usernames = selected_df['username'].unique() # get the list of users
print(f"Unique creator count: {len(usernames)}")

In [None]:
# Create an empty DataFrame to store the user information
accounts_df = pd.DataFrame()
# Loop through the usernames and query the user information
for username in usernames:
    user_info_request = QueryUserInfoRequest(
        username= username
    )
    userinfo = research_api.query_user_info(user_info_request)
    user_df = pd.DataFrame(userinfo,index=[0])
    accounts_df = pd.concat([accounts_df, user_df])

accounts_df

### Query Comments 

In [None]:
video_id="video_id"
video_comment_request = QueryVideoCommentsRequest(
    video_id=video_id,        
    max_count=100,
)
comments, cursor, has_more = research_api.query_video_comments(video_comment_request, fetch_all_pages=False)

In [None]:
video_comments = pd.DataFrame(comments)
video_comments

#### For instance, you might want to visualize the content of the comments through a wordcloud:

In [None]:
def create_wordcloud_from_column(df, column_name):
    """
    Create a word cloud from a column of text in a DataFrame.

    Parameters:
    - df: DataFrame
        The DataFrame containing the data.
    - column_name: str
        The name of the column with text data.

    Returns:
    - None
        Displays the word cloud plot.
    """
    # Combine all the text in the specified column into a single string
    all_text = ' '.join(df[column_name])

    # Create a WordCloud object
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(all_text)

    # Display the word cloud using matplotlib
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    #plt.title(f'Word Cloud from {column_name}')
    plt.show()

create_wordcloud_from_column(video_comments, 'text')

### Query User Pinned Videos

In [None]:
username = "username"
user_pinned_videos_request = QueryUserPinnedVideosRequest(
    username= username
)
pinned_videos = research_api.query_user_pinned_videos(user_pinned_videos_request)

In [None]:
pd.DataFrame.from_dict(pinned_videos)

### Query User Liked Videos

In [None]:
username = "username"
user_liked_videos = QueryUserLikedVideosRequest(
    username= username
)
liked_videos, cursor, has_more = research_api.query_user_liked_videos(user_liked_videos)

In [None]:
liked_videos_df = pd.DataFrame(liked_videos)
liked_videos_df

### Query Reposted Videos

In [None]:
username="username"
reposted_videos_request = QueryUserRepostedVideosRequest(
    username=username                                                
)
reposted_videos, cursor, has_more  = research_api.query_user_reposted_videos(reposted_videos_request)

In [None]:
reposted_videos = pd.DataFrame(reposted_videos)
reposted_videos

### Query User Following

In [None]:
username = "username"
user_following_request = QueryUserFollowingRequest(
    username= username
)
user_following, cursor, has_more = research_api.query_user_following(user_following_request, fetch_all_pages=False)

In [None]:
following = pd.DataFrame(user_following)
following

### Query User Followers

In [None]:
username = "username"
user_followers_request = QueryUserFollowersRequest(
    username= username
)
user_followers = research_api.query_user_followers(user_followers_request, fetch_all_pages=False)

In [None]:
followers = pd.DataFrame(user_followers)
followers