# Scraping Data From tiktok
* In this notebook there are two codes. we can use them separate. and then combine the two resutls if we like to build big datasets.
* First code: it will gather the data from the main page of the tiktok profile. It contains: the profile name, how many flowers, website of the person, 
the number of videos, links of each videos and the views of each video.   
* Second code: it will gather the metadata of each video in the profile such as video link, number of the share, number of the views, number of the likes, number of the commends, and duration and the released date of the video. 

**We will use selenium**    
One of the primary use cases for Selenium is automating interactions with web 
applications that utilize JavaScript. Selenium is able to execute JavaScript code just like a web browser. 
This makes it particularly useful for web scraping and data extraction. 

In [398]:
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd 
import re 
import warnings
warnings.filterwarnings("ignore")


**How selenium works?**     
it's necessary to install a web driver that corresponds to the browser you plan to automate. For instance, if you want to scrape a webpage with Google Chrome, you will need to install the Chrome driver. Once the driver is installed, Selenium can be utilized to open a browser window and navigate to the webpage of interest. By employing extracting element methods, you can access the specific data you want to scrape. Then we save the data in a desired format, such as a CSV or JSON file.    

The "options" parameter in Selenium is a powerful tool that allows you to customize the behavior of the web driver and perform tasks that would be difficult or impossible with a standard web browser.        
*.* Window size: You can use the options parameter to specify the size of the browser window when it opens. This is particularly useful if you are automating tests on different screen resolutions.     
*.* User-agent string: You can use the options parameter to change the user-agent string of the browser, which can be useful for scraping websites that block requests from bots.       
*.* Headless mode: You can use the options parameter to run the browser in headless mode, which means it runs without a GUI. This is useful for running automated tests on servers or for scraping websites without displaying the browser window.






In [None]:
#customize chrome display
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sanbox')
options.add_argument('disable-notificatioon')
options.add_argument('--disable-infobars')
options.add_argument("--start-maximized")

# initialize the driver  
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)

Names=[]
Followers=[]
Followings=[]
Likes=[]
Url=[]
Links=[]
video_num=[]
video_views=[]
video_href=[]
tiktoc='www.tiktok.com'

for id in ['taylorswift']:
    link = (f"https://{tiktoc}/@{id}")
    print(link)
    driver.get("" + link)
    time.sleep(2)

    Links.append(link)
    Names.append(driver.find_element(by=By.XPATH, value='//*[@id="main-content-others_homepage"]/div/div[1]/div[1]/div[2]/h1').text) 
    Followers.append(driver.find_element(by=By.XPATH, value='//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[2]/strong').text)
    Followings.append(driver.find_element(by=By.XPATH, value='//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[1]/strong').text)
    Likes.append(driver.find_element(by=By.XPATH, value='//*[@id="main-content-others_homepage"]/div/div[1]/h3/div[3]/strong').text)
    Url.append(driver.find_element(by=By.XPATH, value='//*[@id="main-content-others_homepage"]/div/div[1]/div[2]/a/span').text)
    
    all_posts=driver.find_elements(by=By.XPATH,value="//div[contains(@class,'tiktok-x6y88p-DivItemContainerV2 e19c29qe7')]")
    video_num.append(len(all_posts))
    for ln in all_posts: 
        video_views.append(ln.find_element(by=By.TAG_NAME, value='strong').text)
        video_href.append(ln.find_element(by=By.TAG_NAME, value='a').get_attribute('href'))    
            
       
profile={'Links':Links,
         'Names':Names,
         'Followers':Followers,
         'Followings':Followings,
         'Likes':Likes,
         'Url':Url,
         'video_num':video_num,
         'video_views':[video_views],
         'video_href':[video_href],
      
}
    
driver.quit() # don't forget to quit the driver 
#print(profile)

In [None]:
# cleaning and exporting the data to data frame. 
df2=pd.DataFrame.from_dict(profile)
df2.sample()

In [373]:

df1=df2.explode(column=['video_href','video_views'],ignore_index=True)
df1.head(2)

Unnamed: 0,Links,Names,Followers,Followings,Likes,Url,video_num,video_views,video_href
0,https://www.tiktok.com/@taylorswift,Taylor Swift,16.9M,0,158.2M,taylor.lnk.to/taylorswiftmidnights,30,4.8M,https://www.tiktok.com/@taylorswift/video/7192...
1,https://www.tiktok.com/@taylorswift,Taylor Swift,16.9M,0,158.2M,taylor.lnk.to/taylorswiftmidnights,30,125.3M,https://www.tiktok.com/@taylorswift/video/7164...


In [399]:
df=df1.select_dtypes('object').apply(lambda x: x.str.strip())
df.to_csv('tiktok_file1.csv')
df.sample(3)

Unnamed: 0,Links,Names,Followers,Followings,Likes,Url,video_views,video_href
1,https://www.tiktok.com/@taylorswift,Taylor Swift,16.9M,0,158.2M,taylor.lnk.to/taylorswiftmidnights,125.3M,https://www.tiktok.com/@taylorswift/video/7164...
16,https://www.tiktok.com/@taylorswift,Taylor Swift,16.9M,0,158.2M,taylor.lnk.to/taylorswiftmidnights,6.1M,https://www.tiktok.com/@taylorswift/video/7147...
11,https://www.tiktok.com/@taylorswift,Taylor Swift,16.9M,0,158.2M,taylor.lnk.to/taylorswiftmidnights,7M,https://www.tiktok.com/@taylorswift/video/7151...


### Metadata of the tiktok videos 
The metadata of a TikTok video includes various pieces of information about the video, such as:    
Video ID: A unique identifier for the video.   
Author username and ID: The username and unique identifier of the user who posted the video.   
Hashtags: Any hashtags used in the video's caption or comments.   
Music ID: The unique identifier of the music track used in the video.   
Video description: A description of the video, written by the author.   
Creation time: The date and time the video was created.   
Duration: The length of the video, measured in seconds.   
View count: The number of times the video has been viewed.    
Likes, comments, and shares: The number of times the video has been liked, commented on, and shared.    
Device information: Information about the device used to create the video, such as the type of device, operating system, and camera settings.   

* In the following code I extracted some of the feature of metadata. 

In [None]:
driver = webdriver.Chrome(ChromeDriverManager().install(), chrome_options=options)
all_link,name,date,video_duration,likes,comments,shares,=[],[],[],[],[],[],[]
for href in df['video_href'].values.tolist():
    
    link= href#'https://www.tiktok.com/@taylorswift/video/7033499714498661638/'
    print(link)
    #WebDriverWait(driver, TimeSpan.FromSeconds(30))
    driver.get(" "+link)
    #driver.implicitly_wait(30) # seconds
    WebDriverWait(driver, 30)
    numbers=[strong.get_attribute('innerHTML') for strong in driver.find_elements(by=By.TAG_NAME,value='strong')]
    all_link.append(link)
    likes.append(numbers[1])
    comments.append(numbers[2])
    shares.append(numbers[3])
    duration=driver.find_element(by=By.XPATH,value="//div[contains(@class,'tiktok-15xowx1-DivSeekBarTimeContainer e123m2eu1')]").get_attribute('innerHTML')
    video_duration.append(duration)
    all_prof=[prof.get_attribute('innerHTML') for  prof in driver.find_elements(by=By.XPATH,value="//span[contains(@class,'tiktok-lh6ok5-SpanOtherInfos e17fzhrb2')]//span")]
    date.append(all_prof[2])
    name.append(all_prof[0])
 
    


In [None]:
#driver.quit()    
video_data={ 'video_link':all_link,
                'name':name,
                'date':date,
                'video_duration':video_duration,
                'likes':likes,
                'comments':comments,
                'share':shares,
    }
            
    
    
    
    
  

video_data

In [400]:
df1=pd.DataFrame(video_data)
df=df1.select_dtypes('object').apply(lambda x: x.str.strip())
df.to_csv('tiktok_file2.csv')
df.sample(3)
        

Unnamed: 0,video_link,name,date,video_duration,likes,comments,share
6,https://www.tiktok.com/@taylorswift/video/7151...,Taylor Swift,2022-10-7,00:03/00:48,1.7M,89K,83K
7,https://www.tiktok.com/@taylorswift/video/7151...,Taylor Swift,2022-10-7,00:02/00:48,680.5K,17.6K,6171
13,https://www.tiktok.com/@taylorswift/video/7150...,Taylor Swift,2022-10-2,00:01/00:37,602.4K,16.2K,9273
