# Motivation

The emergence and ubiquitousness of the social media like Facebook, Twitter and Instagram have given each individual a platform to connect with people and also speak for himself/herself. People are becoming more and more used to express their thoughts though social media which have already become a "We media". People are also affected by other people's comments and opinions, which could be heard by more through social media. Interesingly also, one could get a sense of whether a certain product, organization or people is liked or not by monitoring social media/forums. In fact, Companies are already taking steps of managing branding or reputation over the internet.

In the mean time, the widely use of social medias have also provided with researchers a large amount of data and interesting topics to dive into, for example social network analysis. Twitter, in particular, being a platform where people make comments, produces tons of natural language data everyday. Utilizing NLP and machine learning algorithms, we could do many intersting things with the tweets people posted. In this project, we are interested in analyzing the reputation of certain entities by sentiment analysis on tweets.

# Approach

Sentiment analysis is essentially a sentence classification problem (positive, negative and/or neutral) where machine learning algorithms like neural networks could be applied. In this project, we use the Bidirectional Encoder Representations from Transformers(BERT)[1] as a sentence encoder and then add a classification layer to predict the final class label.

After training, the model can be apply on tweets crawled from twitter which @ a certain account to produce a "reputation score". we compute the scores by week and show the abosolute level as well as the relative variation of the score. 

## BERT with classifier

We use the base uncased version of BERT initialized with pretrained weight. The input(tweet) is preprocessed and tokenized(detail below). Tokens `[CLS]` and `[SEP]` are added to beginning and end of the input tokens respectively. The inputs are truncated or padded, depending on the length of the tokens, to a fixed length of 100, including the two added tokens. And a input mask is created with 1 indicating not padding and 0 indicating padding. Then the tokens are converted to ids using BERT vocabulary and feed to BERT with the mask. 

BERT produces a sentence encoding which is a 768 dimentional vector output coresponding to `[CLS]`. A dropout layer with probability of 0.1 is added to the output and then follows a fully connected layer, which outputs the logits of the labels.

## Compute "reputation score"

We crawl the tweets seperated by week that @ a certain account and preprocessed them in the same way and feed them to the model. The output of the model are the logits of the corresponding labels (positive, negative and neutral). Then we compute softmax and get probabilities. The final "reputation score" is the mean positive probability subtract the mean negative probability.

# Data
We train the model on the training data from SemEval 2017 task 4A and also evaluate on the test set. The SemEval SemEval 2017 task 4A data has 50k train data with 3 labels, i.e. negative, positive and neutral.

We also did experiments on sentiment140 dataset. The dataset has 1.6m data and we split it to 90% training and 10% testing. 

All the data are preprocessed before inputing to the network. The data are first clean by the following:

- All the @s are removed 
- Http addresses are also removed.
- Words contain invalid ascii symbols are removed
- All the characters that are not alphanumeric and not one of `'"?!` are converted to a space.
- After the above steps, tweets with less than 1 character are removed

Then, the sentences are tokenized with BertTokenizer. BertTokenizer consists of basic tokenizer, which does simple spliting and converting to lower case, and a word piece tokenizer[2].

# Code

# Experimental Setup

We use Adam optimizer with learning rate of 2e-5. Training was done on 4 NVIDIA GTX 1080 Ti. For the SemEval data, we train 5 epochs with batch size 128, which takes about 20 minutes. For the sentiment140 data, we train 5 epochs with batch size 128 and it takes about 10 hours.


# Results

## SemEval Result

We compare our result with other SemEval 2017 participants. As shown in the result below, our system accieved the best result among the participant of that shared task.

| #          | System        | AvgRec    | F1        | Accuracy  | Architecture                     |
| ---------- | ------------- | --------- | --------- | --------- | -------------------------------- |
| Our System | Windows Vista | **0.701** | **0.702** | **0.714** | Bert for Sequence Classification |
| 1          | DataStories[3]   | 0.681     | 0.677     | 0.651     | LSTM with attention              |
| 1          | BB twtr[4]       | 0.681     | 0.685     | 0.658     | LSTM and CNNs                    |
| 3          | LIA[5]           | 0.676     | 0.674     | 0.661     | LSTM and CNNs                    |
| Baseline   | All Positive  | 0.333     | 0.162     | 0.193     |                                  |

## Sentiment140 Result

This dataset does not come from a shared task so there is not any comparison that can be made. However, we here by report our test result on the test set(10%) of this dataset splited by ourselves.

`Accuracy: 86.48% F1 score: 0.8648 AvgRec: 0.8648`

# Analysis of the Results

The result on the SemEval task shows the power of BERT as a large scale pretrained language representation. Initialized with pretrained weights, tuning BERT on other tasks is simple and feasible yet produces very good results.


# Reputation Analysis


# Future Work

Conducting sentiment analysis on twitter is a bit difficult because the language used in tweeter are informal. There are slangs, mis-spellings, abbreviations, emojis, multi-media contents and so on. A challenging task is to recognize those informal language usage and utilize them for prediction. For example, emojis are often obvious indicators of sentiment and is also ubiquitous on social media. It would be very useful if those information could be captured and utilized.

Besides text contents, visual contents are also a important part of social media cotents and it also involves a lot of sentiment ques. In fact, there are already works on conbining text and visual contents for sentiment analysis on twitter.

# References

[1] Jacob   Devlin,   Ming-Wei   Chang,   Kenton   Lee,   and   KristinaToutanova. Bert: Pre-training of deep bidirectional transformers forlanguage understanding.arXiv preprint arXiv:1810.04805, 2018.

[2] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google’s neural ma- chine translation system: Bridging the gap between
human and machine translation. arXiv:1609.08144. arXiv preprint

[3] Baziotis C, Pelekis N, Doulkeridis C. Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis. InProceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017 (pp. 747-754).

[4] Mathieu Cliche. 2017. BB twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs. In Proceedings of the 11th International Workshop on Semantic Evaluation. Vancouver, Canada, SemEval ’17, pages 572–579.

[5] Rouvier M. LIA at SemEval-2017 Task 4: An Ensemble of Neural Networks for Sentiment Classification. InProceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) 2017 (pp. 760-765).
