## Use Case : tweets NLP
This notebook addresses the use case of analyzing tweets related to **Inwi** and the **Moroccan telecommunications sector**.

This is the **first notebook** that extracts tweets from twitter(X) and then stores them into a SQLite Database. 

The **second notebook** of this use case (**tweets-NLP.ipynb**) applies NLP on the tweets for sentiment analysis.

### Objectives:
- Ingest fresh tweets their metadata using Selenium for web scraping.
- Extract 10 tweets discussing Inwi and 10 tweets discussing the Moroccan telco sector.
- Store the tweets and their metadata in a SQlite database.
- Analyze the sentiment of tweets and score them from 0 (negative) to 1 (positive).
- Investigate common topics in negative tweets.

## Installing Dependencies
Installing the necessary Python libraries to enable web scraping, database interaction and secrets management.

In [4]:
!pip install selenium webdriver-manager pandas python-dotenv




[notice] A new release of pip is available: 23.2.1 -> 24.3.1
[notice] To update, run: C:\Users\Zahra\Desktop\JupyterProjects\venv\Scripts\python.exe -m pip install --upgrade pip


## Authentication Setup
Loading login credentials to twitter(X) using environment variables from .env file.

In [7]:
from dotenv import load_dotenv
import os
# Load environment variables from the .env file
load_dotenv()

# Retrieve the value of MY_USER & MY_PASS from the environment
my_user = os.getenv("MY_USER")
my_pass = os.getenv("MY_PASS")

## Part 1: Selenium WebScraping
1. Configure the Selenium WebDriver to access twitter login page.
2. search for INWI.
3. define functions to extract nbr_retweets, nbr_likes, author.
4. extract tweets and metadata
5. repeat steps: 5 for search : Maroc telecom

#### Configure the Selenium WebDriver to access twitter login page.

In [20]:
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By   #locate elements on a webpage
from selenium.webdriver.common.keys import Keys   #simulate keyboard keys on web browser
from time import sleep

from selenium.webdriver.chrome.service import Service  #for starting, stopping, and managing the ChromeDriver process, which is a separate executable program that allows Selenium to control Chrome.
from selenium.webdriver.chrome.options import Options  
from webdriver_manager.chrome import ChromeDriverManager

In [45]:
#download the correct ChromeDriver version for the scraping
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=Options())
#open Twitter login in the browser
driver.get("https://twitter.com/login")

In [47]:
# automatically input the username and click enter
user_id = driver.find_element(By.XPATH,"//input[@type='text']")
user_id.send_keys(my_user)
user_id.send_keys(Keys.ENTER)

In [48]:
# automatically input the password and click login
password = driver.find_element(By.XPATH,"//input[@type='password']")
password.send_keys(my_pass)
password.send_keys(Keys.ENTER)

#### search for INWI.

##### function to search using a search_item

In [78]:
def perform_search(driver, search_term):
    """
    search operation using the search term.
    """
    search_box = driver.find_element(By.XPATH, "//input[@data-testid='SearchBox_Search_Input']")
    
    search_box.send_keys(Keys.CONTROL + "a")  # Select all text
    search_box.send_keys(Keys.DELETE)         # Delete selected text
    
    # Small delay to ensure clearing is complete
    sleep(0.5)
    
    # Enter new search term
    search_box.send_keys(search_term)
    search_box.send_keys(Keys.ENTER)
    
    # Small delay after search to ensure page responds
    sleep(1)

In [82]:
#searching for INWI
perform_search(driver, "INWI")

#### Define functions to extract metadata (nbr_likes, nbr_retweets, and author) from tweets.

In [83]:
def get_like_count(container):
    """
    Extract the number of likes of a tweet.
    """
    try:
        # using aria-label
        like_button = container.find_element(By.CSS_SELECTOR, "button[data-testid='like'][aria-label*='Likes']")
        likes_text = like_button.get_attribute('aria-label')
        # extract number from aria-label text: "6 Likes. Like"
        likes_number = likes_text.split()[0]
        return likes_number
    except Exception as e:
        print(f"Error getting like count: {str(e)}")
        return '0'

In [84]:
def get_retweet_count(driver):
    """
    Extract the number of retweets of a tweet.
    """
    # using aria-label
    retweet_button = driver.find_element(By.CSS_SELECTOR, "button[data-testid='retweet'][aria-label]")
    retweets_text = retweet_button.get_attribute('aria-label')
    # extract number from text like: "2 reposts. Repost"
    retweets_number = retweets_text.split()[0]
    return retweets_number

In [85]:
def get_author(container):
    """
    Extract the author of a tweet.
    """
    try:
        # Try to find the author (username) of the tweet
        author_element = container.find_element(By.CSS_SELECTOR, "div[data-testid='User-Name'] span")
        author_name = author_element.text
        return author_name
    except Exception as e:
        # Return None if there's an error (e.g., if the element is not found)
        print(f"Error extracting author: {str(e)}")
        return None

#### Extract tweets and metadata

In [90]:
all_tweets = []

while len(all_tweets) < 10: 
    # identify the containers that contain the tweets
    tweet_containers = driver.find_elements(By.CSS_SELECTOR, "article[data-testid='tweet']")
    
    for container in tweet_containers:
            tweet_data = {}
            
            # extract tweet text & date
            tweet_text = container.find_element(By.CSS_SELECTOR, '[data-testid="tweetText"]').text
            tweet_date = container.find_element(By.XPATH, ".//time").get_attribute('datetime')
            # extract likes, author, nbr_retweets using the functions
            nbr_likes = get_like_count(container)
            nbr_retweets = get_retweet_count(container)
            author = get_author(container)
            
            # skip if tweet exists
            if any(t['tweet_text'] == tweet_text for t in all_tweets):
                continue
                
                
            tweet_data['tweet_text'] = tweet_text
            tweet_data['tweet_date'] = tweet_date
            tweet_data['nbr_characters'] = len(tweet_text)
            tweet_data['nbr_likes'] = nbr_likes
            tweet_data['nbr_retweets'] = nbr_retweets
            tweet_data['author'] = author
            

            all_tweets.append(tweet_data)
            
            # if 10 tweets break the for loop
            if len(all_tweets) >= 10:
                break
            
    
    # Scroll to the bottom of the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(3)


In [91]:
# Print results
for tweet in all_tweets:
    print("\nTweet:")
    print(f"Text: {tweet['tweet_text']}")
    print(f"Date: {tweet['tweet_date']}")
    print(f"Characters: {tweet['nbr_characters']}")
    print(f"Likes: {tweet['nbr_likes']}")
    print(f"Retweets: {tweet['nbr_retweets']}")
    print(f"Author: {tweet['author']}")


Tweet:
Text:  Après la LNFP, c'est au la Botola Pro INWI qui aura très prochainement un compte officiel sur les réseaux sociaux.
Date: 2024-11-09T22:07:54.000Z
Characters: 115
Likes: 8
Retweets: 0
Author: BotolaNews

Tweet:
Text: Official Botola Pro Inwi page will follow soon
Date: 2024-11-09T21:38:11.000Z
Characters: 46
Likes: 30
Retweets: 3
Author: 𝗠𝗩𝗡_𝗘𝗡 | 𝗙𝗼𝗼𝘁𝗯𝗮𝗹𝗹 𝗡𝗲𝘄𝘀

Tweet:
Text: Je pense très sincèrement que l’absurdité est devenue organique chez 
@inwi
 , après des dizaines d’appels et de réclamations depuis 3 semaines, une responsable du CRC me dit à nouveau “on essaie de vous joindre sans succès”
Date: 2024-11-09T10:59:35.000Z
Characters: 224
Likes: 2
Retweets: 1
Author: from 04 with love

Tweet:
Text: Botola Inwi Pro is corrupted 
@fifacom_fr
Date: 2024-11-09T21:50:20.000Z
Characters: 41
Likes: 2
Retweets: 0
Author: 🅜🅞🅝🅒🅔🅕

Tweet:
Text: Programme de la 5eme journée du championnat du Maroc de Futsal !!!!!

Vu qu'il n'y a pas ce weekend de Botola Pro Inwi, Arryadia est dans 

#### Extract tweets about Maroc telecom

In [92]:
#searching for Maroc Telecom and clicking enter
perform_search(driver, "Maroc Telecom")

In [93]:
while len(all_tweets) < 20:  
    tweet_containers = driver.find_elements(By.CSS_SELECTOR, "article[data-testid='tweet']")
    
    for container in tweet_containers:
        try:
            tweet_data = {}
            
            # extract tweet text & date
            tweet_text = container.find_element(By.CSS_SELECTOR, '[data-testid="tweetText"]').text
            tweet_date = container.find_element(By.XPATH, ".//time").get_attribute('datetime')
            # extract likes, author, nbr_retweets using the functions
            nbr_likes = get_like_count(container)
            nbr_retweets = get_retweet_count(container)
            author = get_author(container)
            
            # skip if tweet exists
            if any(t['tweet_text'] == tweet_text for t in all_tweets):
                continue
                
                
            tweet_data['tweet_text'] = tweet_text
            tweet_data['tweet_date'] = tweet_date
            tweet_data['nbr_characters'] = len(tweet_text)
            tweet_data['nbr_likes'] = nbr_likes
            tweet_data['nbr_retweets'] = nbr_retweets
            tweet_data['author'] = author
            

            all_tweets.append(tweet_data)
            
            # if 10 tweets , break the for loop
            if len(all_tweets) >= 20:
                break
            
        except Exception as e:
            print(f"Error processing tweet: {str(e)}")
            continue
    
    # Scroll to the bottom of the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(3)


In [99]:
# Print results
for tweet in all_tweets:
    print("\nTweet:")
    print(f"Text: {tweet['tweet_text']}")
    print(f"Date: {tweet['tweet_date']}")
    print(f"Characters: {tweet['nbr_characters']}")
    print(f"Likes: {tweet['nbr_likes']}")
    print(f"Retweets: {tweet['nbr_retweets']}")
    print(f"Author: {tweet['author']}")


Tweet:
Text:  Après la LNFP, c'est au la Botola Pro INWI qui aura très prochainement un compte officiel sur les réseaux sociaux.
Date: 2024-11-09T22:07:54.000Z
Characters: 115
Likes: 8
Retweets: 0
Author: BotolaNews

Tweet:
Text: Official Botola Pro Inwi page will follow soon
Date: 2024-11-09T21:38:11.000Z
Characters: 46
Likes: 30
Retweets: 3
Author: 𝗠𝗩𝗡_𝗘𝗡 | 𝗙𝗼𝗼𝘁𝗯𝗮𝗹𝗹 𝗡𝗲𝘄𝘀

Tweet:
Text: Je pense très sincèrement que l’absurdité est devenue organique chez 
@inwi
 , après des dizaines d’appels et de réclamations depuis 3 semaines, une responsable du CRC me dit à nouveau “on essaie de vous joindre sans succès”
Date: 2024-11-09T10:59:35.000Z
Characters: 224
Likes: 2
Retweets: 1
Author: from 04 with love

Tweet:
Text: Botola Inwi Pro is corrupted 
@fifacom_fr
Date: 2024-11-09T21:50:20.000Z
Characters: 41
Likes: 2
Retweets: 0
Author: 🅜🅞🅝🅒🅔🅕

Tweet:
Text: Programme de la 5eme journée du championnat du Maroc de Futsal !!!!!

Vu qu'il n'y a pas ce weekend de Botola Pro Inwi, Arryadia est dans 

#### -------------------------------------------------------------------------------

## Part 2: Storing Tweets in SQLite

In [96]:
import sqlite3
from datetime import datetime

# SQLite database connection
conn = sqlite3.connect('tweetsDB.db')
cursor = conn.cursor()

# Create the table 
cursor.execute("""
CREATE TABLE IF NOT EXISTS tweets (
    id INTEGER PRIMARY KEY AUTOINCREMENT,  
    tweet_text TEXT NOT NULL,              
    tweet_date DATETIME,                   
    nbr_characters INTEGER,                
    nbr_retweets INTEGER,                  
    nbr_likes INTEGER,                     
    author TEXT
);
""")

# Prepare insert statement for tweets
insert_query = """
INSERT INTO tweets (tweet_text, tweet_date, nbr_characters, nbr_retweets, nbr_likes, author)
VALUES (?, ?, ?, ?, ?, ?);
"""

# Iterate over the tweets and insert them into the database
for tweet in all_tweets:
    tweet_text = tweet['tweet_text']
    tweet_date = datetime.strptime(tweet['tweet_date'], '%Y-%m-%dT%H:%M:%S.000Z')  # Convert string to datetime
    nbr_characters = int(tweet['nbr_characters'])
    nbr_retweets = int(tweet['nbr_retweets'])  # Convert to integer
    nbr_likes = int(tweet['nbr_likes'])  # Convert to integer
    author = tweet['author']

    cursor.execute(insert_query, (tweet_text, tweet_date, nbr_characters, nbr_retweets, nbr_likes, author))

# Commit and close the connection
conn.commit()
cursor.close()
conn.close()

print("Data inserted successfully!")

Data inserted successfully!


#### Verifying the tweets were ingested into the db

In [98]:
import sqlite3

# Connect to SQLite database
conn = sqlite3.connect('tweetsDB.db')
cursor = conn.cursor()

# Select all rows from the 'tweets' table
cursor.execute("SELECT * FROM tweets;")
rows = cursor.fetchall()

# Print each row
for row in rows:
    print(row)

# Close the cursor and connection
cursor.close()
conn.close()

(1, " Après la LNFP, c'est au la Botola Pro INWI qui aura très prochainement un compte officiel sur les réseaux sociaux.", '2024-11-09 22:07:54', 115, 0, 8, 'BotolaNews')
(2, 'Official Botola Pro Inwi page will follow soon', '2024-11-09 21:38:11', 46, 3, 30, '𝗠𝗩𝗡_𝗘𝗡 | 𝗙𝗼𝗼𝘁𝗯𝗮𝗹𝗹 𝗡𝗲𝘄𝘀')
(3, 'Je pense très sincèrement que l’absurdité est devenue organique chez \n@inwi\n , après des dizaines d’appels et de réclamations depuis 3 semaines, une responsable du CRC me dit à nouveau “on essaie de vous joindre sans succès”', '2024-11-09 10:59:35', 224, 1, 2, 'from 04 with love')
(4, 'Botola Inwi Pro is corrupted \n@fifacom_fr', '2024-11-09 21:50:20', 41, 0, 2, '🅜🅞🅝🅒🅔🅕')
(5, "Programme de la 5eme journée du championnat du Maroc de Futsal !!!!!\n\nVu qu'il n'y a pas ce weekend de Botola Pro Inwi, Arryadia est dans l'obligation de nous retransmettre quelques matchs.", '2024-11-16 08:35:29', 189, 0, 4, 'Saad M')
(6, 'Tous les buts de la 10ème journée de Botola Pro INWI :', '2024-11-10 19:25:57', 54, 0

## Conclusion
This notebook includes the first two parts of the use case. 

I extracted data (using webscraping) from twitter to a SQLite DB. 

now, for the NLP part of the use case please check the notebook: **tweets-NLP.ipynb**