<a href="https://colab.research.google.com/github/textspur/testland/blob/main/Twitter_Demographer_test01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Twitter-Demographer

This is going to take a couple of minutes. You probably want to use a GPU runtime if you need HF libraries.


In [None]:
%%capture
!pip install -U twitter-demographer
!pip install -U tweepy

### **NOTE**: Remember to **restart** the notebook

## Imports

In [None]:
from twitter_demographer.twitter_demographer import Demographer
from twitter_demographer.components import Rehydrate
from twitter_demographer.classification.transformers import HuggingFaceClassifier
from twitter_demographer.geolocation.nominatim import NominatimDecoder
import pandas as pd

## Tokens

You need to get tokens for both twitter and geonames to run this pipeline

In [None]:
BEARER_TOKEN = ""

In [None]:
data = pd.DataFrame({"tweet_ids": ["1545401081542443008", "253685732777009152", "1477976329710673921", "1467887350084689928", "1467887352647462912", "1290664307370360834", "1465284810696445952"]})

In [None]:
data

Unnamed: 0,tweet_ids
0,1477976329710673921
1,1467887350084689928
2,1467887352647462912
3,1290664307370360834
4,1465284810696445952


## Using the Demographer

In [None]:
demo = Demographer()

In [None]:
component_one = Rehydrate(BEARER_TOKEN)
component_two = NominatimDecoder()
component_three = HuggingFaceClassifier("cardiffnlp/twitter-roberta-base-sentiment")

In [None]:
demo.add_component(component_one)
demo.add_component(component_two)
demo.add_component(component_three)

In [None]:
new_data = demo.infer(data)

Running Demographer:   0%|          | 0/3 [00:00<?, ?it/s]
  0%|          | 0/5 [00:00<?, ?it/s][A
 20%|â–ˆâ–ˆ        | 1/5 [00:00<00:00,  7.47it/s][A
Running Hydrate:  20%|â–ˆâ–ˆ        | 1/5 [00:00<00:00,  7.47it/s][A
Running Hydrate:  40%|â–ˆâ–ˆâ–ˆâ–ˆ      | 2/5 [00:00<00:00,  7.47it/s][A
Running Hydrate:  60%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ    | 3/5 [00:00<00:00,  9.90it/s][A
Running Hydrate:  60%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ    | 3/5 [00:00<00:00,  9.90it/s][A
Running Hydrate:  80%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ  | 4/5 [00:00<00:00,  9.90it/s][A
Running Hydrate: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 5/5 [00:00<00:00, 11.15it/s][A
Running Hydrate: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 5/5 [00:00<00:00, 10.48it/s]
Running Demographer:  33%|â–ˆâ–ˆâ–ˆâ–Ž      | 1/3 [00:00<00:00,  2.02it/s]
  0%|          | 0/5 [00:00<?, ?it/s][A
Geocoder:   0%|          | 0/5 [00:00<?, ?it/s][A
Geocoder:  40%|â–ˆâ–ˆâ–ˆâ–ˆ      | 2/5 [00:00<00:00,  4.38it/s][A
Geocoder:  60%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ    | 3/5 [00:00<00:00,  3.72

In [None]:
new_data

Unnamed: 0,screen_name,location,created_at,text,geo_location_country,geo_location_address,cardiffnlp/twitter-roberta-base-sentiment
1,c3662a3720829f818ff78d5dd1d22d469488244efdbd21...,"Milan, Lombardy",2021-12-06 16:03:10+00:00,ðŸ“¢ ðŸ“¢ Evaluating #RecSys is difficult: accuracy ...,Italy,Milan,1
4,d0bf91186c88cb5dd886259ee503c309735cce25ac4b38...,"Milan, MI",2021-11-29 11:41:37+00:00,Excited to talk at the NLP4AI workshop today (...,Italy,Milan,2
2,c3662a3720829f818ff78d5dd1d22d469488244efdbd21...,"Milan, Lombardy",2021-12-06 16:03:11+00:00,"#RecList provides behavioral, ""black-box"" test...",Italy,Milan,1
0,9b61fff64038f969a863a61e28358a7ba13b4517dd7638...,"Zurich, Switzerland",2022-01-03 12:13:11+00:00,Just received this super cool swag kit! Many t...,Switzerland,Zurich,2
3,589b556f0c7c08b1623a5dabb9b5f16c4210019e07853e...,Lugano - Viganello,2020-08-04 15:02:04+00:00,"Thrilled to announce that our paper ""Tough Tab...",Switzerland,Viganello,2
