This is a project to analyze the sentiment of tweets about Covid-19. The project is done in Python using gabrielpreda dataset. The dataset contains 14,000 tweets about Covid-19. The dataset is available on Kaggle here. The dataset is also available on Github. The dataset contains the following columns:
- user_name: The name of the user
- user_location: The location of the user
- user_description: The description of the user
- user_created: The date the user created their account
- user_followers: The number of followers the user has
- user_friends: The number of friends the user has
- user_favourites: The number of tweets the user has liked in the account
- user_verified: Whether the user is verified or not
- date: The date the tweet was created
- text: The text of the tweet
- hashtags: The hashtags in the tweet
- source: The app used to post the tweet
- is_retweet: Whether the tweet is a retweet or not
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Plotly
- String
- Collections
- NLTK
- Sklearn
- Re
- Removing the rows with missing values
- Removing the rows with duplicate values
- Removing the rows with retweets
- Removing the rows with tweets that are not in English