## Python - Fundamentos para a An√°lise de Dados

### Mini-projeto - Stream de Dados do Twitter

Neste projeto vamos criar um stream de dados do [twitter](https://twitter.com/explore) com o Banco de Dados MongoDB e as libs Pandas(https://pandas.pydata.org/) e Scikit-Learn (https://scikit-learn.org/stable/).

Este projeto consiste em aplicar t√©cnincas de processamento de linguagem natural e m√©todos anal√≠ticos para extrair informa√ß√µes relevantes de dados de texto, o [Text Mining](https://www.linguamatics.com/what-text-mining-text-analytics-and-natural-language-processing). 

**Twitter**

O Twitter √© uma fonte rica de informa√ß√µes sobre diversos assuntos. Podemos usar os dados para analisar tend√™ncias relacionados a uma palavra chave, analisar o sentimento relacionado a um determinado assunto e feedbacks de marcas.

**Mongo DB**

Banco de dados NoSQL que permite a integra√ß√£o entre certos tipos de aplica√ß√£o, de forma f√°cil e r√°pido.

Para obter esses dados ser√° necess√°rio a cria√ß√£o de uma API. Nesse caso vamos utilizar o [Twitter Stream API](https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters).

**Obtendo as API's Keys:**

* Crie uma conta no Twitter, caso ainda n√£o possua.
* Acesse as [apps](https://developer.twitter.com/en/apps) com seu usu√°rio do Twitter.
* Clique em 'Criar uma nova Aplica√ß√£o'.
* Preencha as informa√ß√µes e clique em "Criar T

**Instalando as libs necess√°rias**

Para fazer essa an√°lise vamos precisar utilizar o pacote Tweepy para fazer a comunica√ß√£o com o Twitter.

√â necess√°rio que o MongoDB esteja ativo, pois armazenaremos nossos dados nele.

In [1]:
# Instala o pacote tweepy
!pip install tweepy



In [2]:
# Importando os m√≥dulos Tweepy, Datetime e Json
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from datetime import datetime
import json

**Adicionando as chaves da API**

Quando criamos uma conta como desenvolvedor no twitter recebemos algumas chaves.

In [3]:
#API key
consumer_key = "#################"

In [4]:
#API secret key
consumer_secret ="###############################"

In [5]:
#Access Token
access_token ="###############################################"

In [6]:
#Access Token Secret
token_secret="##################################################"

**Criando os objetos de autentica√ß√£o**

In [7]:
#Criando as chaves de autentica√ß√£o
auth = OAuthHandler(consumer_key, consumer_secret)

In [8]:
auth.set_access_token(access_token, token_secret)

In [9]:
# Criando uma classe para capturar os stream de dados do Twitter e 
# armazenar no MongoDB
class MyListener(StreamListener):
    def on_data(self, dados):
        tweet = json.loads(dados)
        created_at = tweet["created_at"]
        id_str = tweet["id_str"]
        text = tweet["text"]
        obj = {"created_at":created_at,"id_str":id_str,"text":text,}
        tweetind = col.insert_one(obj).inserted_id
        print (obj)
        return True

In [10]:
# Criando o objeto mylistener
mylistener = MyListener()

In [11]:
# Criando o objeto mystream
mystream = Stream(auth, listener = mylistener)

### Preparando a conex√£o com o mongo

In [12]:
from pymongo import MongoClient

In [13]:
client = MongoClient()

In [14]:
# Criando o banco de dados
db = client.twitterdb

In [15]:
#Criando a collection 
col = db.tweets

In [16]:
# Criando uma lista de palavras chave para buscar nos Tweets
keywords = ['Corona Virus', 'Brazil', 'Quarentena', 'Fique em casa']

### Coletando os twitter

In [17]:
# Iniciando o filtro e gravando os tweets no MongoDB
mystream.filter(track=keywords)

{'created_at': 'Fri Jun 26 13:09:12 +0000 2020', 'id_str': '1276502774151176192', 'text': '@realDonaldTrump Really? Bullshit you LYING MORON!\n\nAmerica has 4% of the world‚Äôs population but 25% of the world‚Äôs‚Ä¶ https://t.co/kduWFUMhfl', '_id': ObjectId('5ef5f37ceaa1177f658d90dc')}
{'created_at': 'Fri Jun 26 13:09:12 +0000 2020', 'id_str': '1276502776923803649', 'text': 'RT @ateez_charts: #ATEEZ is trending  \n\n#11 Brazil\n#13 Mexico\n#15 France\n#22 Malaysia\n#32 United States\n#40 Singapore\n\n @ATEEZofficial http‚Ä¶', '_id': ObjectId('5ef5f37deaa1177f658d90dd')}
{'created_at': 'Fri Jun 26 13:09:13 +0000 2020', 'id_str': '1276502777368281088', 'text': 'RT @colelilivids: oxente o cole j√° foi em tantas festas durante essa quarentena, pq que c√™s t√£o brigando com ele dps de umas 6.000 festas q‚Ä¶', '_id': ObjectId('5ef5f37deaa1177f658d90de')}
{'created_at': 'Fri Jun 26 13:09:13 +0000 2020', 'id_str': '1276502777808785408', 'text': '@Lu_S_Maia A quarentena precisa acabar pelos bro

{'created_at': 'Fri Jun 26 13:09:22 +0000 2020', 'id_str': '1276502817599979521', 'text': 'RT @SandroGMoura: Amei a capa nova de que kaori.bruna fez para Luzes do Amanh√£ e outros contos. \n#cyberpunk #scifi #fic√ß√£ocient√≠fica #liter‚Ä¶', '_id': ObjectId('5ef5f386eaa1177f658d90fc')}
{'created_at': 'Fri Jun 26 13:09:22 +0000 2020', 'id_str': '1276502818640273409', 'text': 'RT @BeatricedeDaz1: Mi mam√° acaba de perder la lucha contra el corona virus üò≠ cuiden sus vidas nadie lo har√° por ustedes, se enferm√≥ sin ha‚Ä¶', '_id': ObjectId('5ef5f386eaa1177f658d90fd')}
{'created_at': 'Fri Jun 26 13:09:23 +0000 2020', 'id_str': '1276502822696058880', 'text': 'RT @mototanaka: @CaskDiogenes „Çπ„Éö„Ç§„É≥Ë™û„ÅÆË©±„Å´„Å™„Çä„Åæ„Åô„Ååcorona„ÅØÂ•≥ÊÄßÂêçË©û„ÄÅcoronavirus„Å†„Å®Áî∑ÊÄßÂêçË©û„Åß„Åô(virus„ÅåÁî∑ÊÄßÂêçË©û„Å™„ÅÆ„Åß„Åù„Çå„Å´Âºï„Å£Âºµ„Çâ„Çå„Çã)„ÄÇ„ÉÑ„É™„ÉºÂÖÉ„ÅÆCovid19„ÅåÂ•≥ÊÄßÂêçË©û„Å®„ÅÑ„ÅÜË©±„ÅØ‰ªèË™û„Åß„Äå„Ç≥„É≠„Éä„Ç¶„Ç§„É´„ÇπËµ∑Ê∫ê„ÅÆÁóÖÊ∞ó„Äç„ÅÆ‚Ä¶', '_id': ObjectId('5ef5f387eaa1177f65

{'created_at': 'Fri Jun 26 13:09:32 +0000 2020', 'id_str': '1276502857164840960', 'text': 'RT @Adityag19624568: day by day lots of people are contact in corona virus. Train has been cancelled. University please cancelled all exami‚Ä¶', '_id': ObjectId('5ef5f390eaa1177f658d911c')}
{'created_at': 'Fri Jun 26 13:09:32 +0000 2020', 'id_str': '1276502857718435848', 'text': 'RT @PCY61_sari: ÿß€åÿ±ÿßŸÜŸà ŸÖ€åÿ®€åŸÜ€åÿØüòéü§©\u2066üëêüèª\u2069\n\n#EXO \n@B_hundred_Hyun \n@weareoneEXO \n@layzhang', '_id': ObjectId('5ef5f390eaa1177f658d911d')}
{'created_at': 'Fri Jun 26 13:09:32 +0000 2020', 'id_str': '1276502857970323458', 'text': 'RT @JuanDGut: The paper authored by Anna Binotto and Gustavo  Kastrup is close to the one that @Manuabarca01 and I will present tomorrow. T‚Ä¶', '_id': ObjectId('5ef5f390eaa1177f658d911e')}
{'created_at': 'Fri Jun 26 13:09:32 +0000 2020', 'id_str': '1276502859358535682', 'text': '@DjNewAfrica IT HAS BEEN CURING FLUE EVER SINCE, BUT NOT DE WUHAN CORONA VIRUS DT CA

{'created_at': 'Fri Jun 26 13:09:40 +0000 2020', 'id_str': '1276502892560609280', 'text': 'RT @tom_trotts: The @washingtonpost will excitedly analyze the effect that Fox News coverage of corona virus had on the public (which is at‚Ä¶', '_id': ObjectId('5ef5f398eaa1177f658d913d')}
{'created_at': 'Fri Jun 26 13:09:40 +0000 2020', 'id_str': '1276502893034704896', 'text': 'RT @tressiemcphd: I cannot fathom how we won‚Äôt call this murder', '_id': ObjectId('5ef5f398eaa1177f658d913e')}
{'created_at': 'Fri Jun 26 13:09:40 +0000 2020', 'id_str': '1276502893517029378', 'text': 'RT @YoloAkili: Fam. HIV is a virus. like the Flu. like Corona. It conveys NOTHING about the worthiness, practices or ‚Äúmorals‚Äù of the person‚Ä¶', '_id': ObjectId('5ef5f398eaa1177f658d913f')}
{'created_at': 'Fri Jun 26 13:09:41 +0000 2020', 'id_str': '1276502894825467912', 'text': '@quest_rik üíπüí∞‚ö†Ô∏èAVISO DE QUARENTENA‚ö†Ô∏èüí∞üíπ\n\n. N√£o deixe de recarregar seu üì≤ \n\n.1Ô∏è‚É£ Baixe o app RecargaPay \n\n.

{'created_at': 'Fri Jun 26 13:09:49 +0000 2020', 'id_str': '1276502930946838528', 'text': '@Mr_LoLwa https://t.co/MPwGNxPMn7', '_id': ObjectId('5ef5f3a1eaa1177f658d915f')}
{'created_at': 'Fri Jun 26 13:09:49 +0000 2020', 'id_str': '1276502931337076736', 'text': 'RT @PhilipRucker: ‚ÄúArizona is facing more per capita cases than recorded by any country in Europe or even by hard-hit Brazil.‚Äù\n\nHow Arizona‚Ä¶', '_id': ObjectId('5ef5f3a1eaa1177f658d9160')}
{'created_at': 'Fri Jun 26 13:09:50 +0000 2020', 'id_str': '1276502933060771842', 'text': '@realDonaldTrump @POTUS \n\nI Think USA Govt should Keep Eye on All Chineez who Travel to USA in Jan, Feb,March 2020.‚Ä¶ https://t.co/K29ck40c7S', '_id': ObjectId('5ef5f3a2eaa1177f658d9161')}
{'created_at': 'Fri Jun 26 13:09:50 +0000 2020', 'id_str': '1276502933966905347', 'text': 'RT @DeepStateExpose: RETWEET! 5G is the real silent killer, not the "Corona Virus"!!!  https://t.co/ZRwKmA07zc', '_id': ObjectId('5ef5f3a2eaa1177f658d9162')}
{'created

{'created_at': 'Fri Jun 26 13:09:57 +0000 2020', 'id_str': '1276502961775091712', 'text': 'Acabou de publicar uma foto em Manaus, Brazil https://t.co/lznxQBDYFG', '_id': ObjectId('5ef5f3a9eaa1177f658d917f')}
{'created_at': 'Fri Jun 26 13:09:56 +0000 2020', 'id_str': '1276502959837208576', 'text': '@LuisaBathista Cara eu ODEIO. Eu vejo agora na quarentena o povo falando de ah, ficar em casa a mulher enchendo o s‚Ä¶ https://t.co/uHvkudjX5T', '_id': ObjectId('5ef5f3a9eaa1177f658d9180')}
{'created_at': 'Fri Jun 26 13:09:57 +0000 2020', 'id_str': '1276502964438515712', 'text': '@BelizeanBwoy2 I`m a super fan! whatsapp. 5521989721658. Rio de Janeiro. Brazil! I Iike you!', '_id': ObjectId('5ef5f3a9eaa1177f658d9181')}
{'created_at': 'Fri Jun 26 13:09:57 +0000 2020', 'id_str': '1276502965507899395', 'text': 'choradinha do dia', '_id': ObjectId('5ef5f3aaeaa1177f658d9182')}
{'created_at': 'Fri Jun 26 13:09:58 +0000 2020', 'id_str': '1276502966485127169', 'text': '@Anneelima_ cada um fala uma cois

{'created_at': 'Fri Jun 26 13:10:05 +0000 2020', 'id_str': '1276502997250572290', 'text': 'RT @PCY61_sari: ÿß€åÿ±ÿßŸÜŸà ŸÖ€åÿ®€åŸÜ€åÿØüòéü§©\u2066üëêüèª\u2069\n\n#EXO \n@B_hundred_Hyun \n@weareoneEXO \n@layzhang', '_id': ObjectId('5ef5f3b1eaa1177f658d91a0')}
{'created_at': 'Fri Jun 26 13:10:05 +0000 2020', 'id_str': '1276502997942456320', 'text': 'RT @marksonbabe: yugyeom disse que comp√¥s 30 m√∫sicas durante esse per√≠odo de quarentena o menino al√©m de uma m√°quina de dan√ßa eh uma m√°quin‚Ä¶', '_id': ObjectId('5ef5f3b1eaa1177f658d91a1')}
{'created_at': 'Fri Jun 26 13:10:05 +0000 2020', 'id_str': '1276502998219403266', 'text': 'RT @icretinoreal: Como vai a quarentena?\nEu: https://t.co/DHUeBi7AvE', '_id': ObjectId('5ef5f3b1eaa1177f658d91a2')}
{'created_at': 'Fri Jun 26 13:10:05 +0000 2020', 'id_str': '1276502998487896065', 'text': 'RT @MrDanZak: ‚ÄúMaricopa County, which includes Phoenix, is recording as many as 2,000 cases a day, eclipsing the New York City boroughs eve‚Ä¶', '_i

{'created_at': 'Fri Jun 26 13:10:13 +0000 2020', 'id_str': '1276503031907889152', 'text': 'RT @Iearnsomethlng: This is what blue butterflies in the Amazon rainforest in Brazil look like. https://t.co/NLCMGkshT5', '_id': ObjectId('5ef5f3b9eaa1177f658d91c1')}
{'created_at': 'Fri Jun 26 13:10:13 +0000 2020', 'id_str': '1276503032914739200', 'text': 'RT @jhowsiel: sextou amigo hj era dia de inventar desculpa p mia o role\n\nvarias oportunidades de ser feliz perdidas\n\ne quando acabar a quar‚Ä¶', '_id': ObjectId('5ef5f3baeaa1177f658d91c2')}
{'created_at': 'Fri Jun 26 13:10:14 +0000 2020', 'id_str': '1276503034835668993', 'text': 'RT @YoloAkili: Fam. HIV is a virus. like the Flu. like Corona. It conveys NOTHING about the worthiness, practices or ‚Äúmorals‚Äù of the person‚Ä¶', '_id': ObjectId('5ef5f3baeaa1177f658d91c3')}
{'created_at': 'Fri Jun 26 13:10:14 +0000 2020', 'id_str': '1276503035695390723', 'text': 'RT @RealDealzzz: eu e os rapazes na primeira aula de ingles depois da quarentena'

{'created_at': 'Fri Jun 26 13:10:22 +0000 2020', 'id_str': '1276503068948004865', 'text': 'RT @dijoni: Did you know after slavery was ended in 1865 in the US.Some 20,000 X confederate move to Brazil so they could reestablish the s‚Ä¶', '_id': ObjectId('5ef5f3c2eaa1177f658d91e7')}
{'created_at': 'Fri Jun 26 13:10:22 +0000 2020', 'id_str': '1276503070172684294', 'text': 'RT @QuickTake: South Africa started a #Covid19 vaccine trial on Wednesday, the first such study in Africa.\n\nOxford University developed the‚Ä¶', '_id': ObjectId('5ef5f3c2eaa1177f658d91e8')}
{'created_at': 'Fri Jun 26 13:10:23 +0000 2020', 'id_str': '1276503071028203520', 'text': '@jessgouv_ üíπüí∞‚ö†Ô∏èAVISO DE QUARENTENA‚ö†Ô∏èüí∞üíπ\n\n. N√£o deixe de recarregar seu üì≤ \n\n.1Ô∏è‚É£ Baixe o app RecargaPay \n\n.2Ô∏è‚É£ Fa√ßa o‚Ä¶ https://t.co/PrwtD5KMk4', '_id': ObjectId('5ef5f3c3eaa1177f658d91e9')}
{'created_at': 'Fri Jun 26 13:10:23 +0000 2020', 'id_str': '1276503071649140736', 'text': 'RT @AkimeSato: o pior des

{'created_at': 'Fri Jun 26 13:10:34 +0000 2020', 'id_str': '1276503117773692929', 'text': 'While Indian peoples are fighting against corona virus,our so called Central Modi Govt increased #Exciseduty on‚Ä¶ https://t.co/wx2mvpg2Lu', '_id': ObjectId('5ef5f3ceeaa1177f658d9209')}
{'created_at': 'Fri Jun 26 13:10:34 +0000 2020', 'id_str': '1276503119849914369', 'text': '‡∞°‡±ã‡∞Ç‡∞ü‡±ç ‡∞Æ‡∞ø‡∞∏‡±ç: ‡∞ï‡∞∞‡±ã‡∞®‡∞æ ‡∞§‡±ã ‡∞™‡±ç‡∞∞‡∞ß‡∞æ‡∞® ‡∞∏‡∞Æ‡∞∏‡±ç‡∞Ø ‡∞Ö‡∞¶‡±á!\n\nhttps://t.co/ZLv6vM7SZ7\n\n#OXYGEN #CoronavirusIndia #CoronaUpdatesInIndia #CoronaPandemic', '_id': ObjectId('5ef5f3ceeaa1177f658d920a')}
{'created_at': 'Fri Jun 26 13:10:35 +0000 2020', 'id_str': '1276503121557102592', 'text': 'RT @duartecarreira: 1/ A DGS e o Governo dizem que andam empenhados na luta anti-Covid mas ontem viajei de Inglaterra, pa√≠s mais afectado n‚Ä¶', '_id': ObjectId('5ef5f3cfeaa1177f658d920b')}
{'created_at': 'Fri Jun 26 13:10:35 +0000 2020', 'id_str': '1276503122190438400', 'text': 'RT @YoloAkili: Fam.

{'created_at': 'Fri Jun 26 13:10:41 +0000 2020', 'id_str': '1276503146391408642', 'text': 'RT @NikashOfficial: ‡§ï‡•ã‡§∞‡•ã‡§®‡§æ: ‡§¨‡§ø‡§π‡§æ‡§∞ ‡§ï‡•á ‡§á‡§® ‡§ú‡§ø‡§≤‡•ã‡§Ç ‡§ï‡§æ ‡§π‡§æ‡§≤ ‡§π‡•à ‡§ñ‡§∞‡§æ‡§¨, ‡§∏‡§Ç‡§ï‡•ç‡§∞‡§Æ‡§ø‡§§‡•ã‡§Ç ‡§ï‡•Ä ‡§∏‡§Ç‡§ñ‡•ç‡§Ø‡§æ 6736 ‡§π‡•Å‡§à, ‡§Ö‡§¨ ‡§§‡§ï 39 ‡§≤‡•ã‡§ó‡•ã‡§Ç ‡§ï‡•Ä ‡§Æ‡•å‡§§\n#‡§®‡•Ä‡§§‡•Ä‡§∂‡§ï‡•Å‡§Æ‡§æ‡§∞‡§ü‡•á‡§∏‡•ç‡§ü_‡§ï‡§∞‡§æ‡§ì\n#‡§™‚Ä¶', '_id': ObjectId('5ef5f3d5eaa1177f658d922a')}
{'created_at': 'Fri Jun 26 13:10:41 +0000 2020', 'id_str': '1276503146475294722', 'text': 'Essa quarentena ta durando mais que os todos relacionamentos q meu primo j√° teve na vida.... pqp', '_id': ObjectId('5ef5f3d5eaa1177f658d922b')}
{'created_at': 'Fri Jun 26 13:10:41 +0000 2020', 'id_str': '1276503147549126656', 'text': '.@SteveDaines you\'ve said your plan for corona virus is "made in America." Please explain to a voter how that solve‚Ä¶ https://t.co/iAn1YIJ3Zn', '_id': ObjectId('5ef5f3d5eaa1177f658d922c')}
{'created_at': 'Fri Jun 26 

{'created_at': 'Fri Jun 26 13:10:48 +0000 2020', 'id_str': '1276503176418660355', 'text': 'RT @YoloAkili: Fam. HIV is a virus. like the Flu. like Corona. It conveys NOTHING about the worthiness, practices or ‚Äúmorals‚Äù of the person‚Ä¶', '_id': ObjectId('5ef5f3dceaa1177f658d924b')}
{'created_at': 'Fri Jun 26 13:10:48 +0000 2020', 'id_str': '1276503178599596032', 'text': 'Nossa isso aqui parece muitoo bom', '_id': ObjectId('5ef5f3dceaa1177f658d924c')}
{'created_at': 'Fri Jun 26 13:10:49 +0000 2020', 'id_str': '1276503180386291713', 'text': '#BildungAberSicher (kfl)', '_id': ObjectId('5ef5f3ddeaa1177f658d924d')}
{'created_at': 'Fri Jun 26 13:10:49 +0000 2020', 'id_str': '1276503180487073793', 'text': 'RT @richardhine: "#Arizona is facing more per capita cases than recorded by any country in Europe or even by hard-hit Brazil... No state ha‚Ä¶', '_id': ObjectId('5ef5f3ddeaa1177f658d924e')}
{'created_at': 'Fri Jun 26 13:10:49 +0000 2020', 'id_str': '1276503183188283392', 'text': 'RT @than

KeyboardInterrupt: 

**Pressione STOP para finalizar a captura dos dados**

### Consultando os dados

Agora vamos consultar nossos dados. √â importante ressaltar que os dados s√£o coletados no momento da an√°lise.

mystream.disconnect()

In [18]:
# Verificando um documento no collection
col.find_one()

{'_id': ObjectId('5ef5f37ceaa1177f658d90dc'),
 'created_at': 'Fri Jun 26 13:09:12 +0000 2020',
 'id_str': '1276502774151176192',
 'text': '@realDonaldTrump Really? Bullshit you LYING MORON!\n\nAmerica has 4% of the world‚Äôs population but 25% of the world‚Äôs‚Ä¶ https://t.co/kduWFUMhfl'}

### An√°lise dos Dados com Pandas e Sckit-Learn

Vamos analisar os dados coletados

In [19]:
# criando um dataset com dados retornados do MongoDB
dataset = [{"created_at": item["created_at"], "text": item["text"],} for item in col.find()]

In [20]:
# Importando o m√≥dulo Pandas para trabalhar com datasets em Python
import pandas as pd

In [21]:
df = pd.DataFrame(dataset)

In [22]:
df

Unnamed: 0,created_at,text
0,Fri Jun 26 13:09:12 +0000 2020,@realDonaldTrump Really? Bullshit you LYING MO...
1,Fri Jun 26 13:09:12 +0000 2020,RT @ateez_charts: #ATEEZ is trending \n\n#11 ...
2,Fri Jun 26 13:09:13 +0000 2020,RT @colelilivids: oxente o cole j√° foi em tant...
3,Fri Jun 26 13:09:13 +0000 2020,@Lu_S_Maia A quarentena precisa acabar pelos b...
4,Fri Jun 26 13:09:13 +0000 2020,"100 dias presa dentro de casa, respeitando a q..."
...,...,...
388,Fri Jun 26 13:10:53 +0000 2020,RT @YoloAkili: Fam. HIV is a virus. like the F...
389,Fri Jun 26 13:10:53 +0000 2020,RT @skzwng: o minho √© meu mood nessa quarenten...
390,Fri Jun 26 13:10:54 +0000 2020,RT @Pedriell: D√° pra voc√™ entender a pandemia ...
391,Fri Jun 26 13:10:54 +0000 2020,RT @Y15299: #save_GTU_students\n@ugc please re...


In [23]:
from sklearn.feature_extraction.text import CountVectorizer
import sklearn

In [24]:
# Usando o m√©todo CountVectorizer para criar uma matriz de documentos
cv = CountVectorizer()
count_matrix = cv.fit_transform(df.text)

In [25]:
# Contando o n√∫mero de ocorr√™ncias das principais palavras em nosso dataset
word_count = pd.DataFrame(cv.get_feature_names(), columns=["word"])
word_count["count"] = count_matrix.sum(axis=0).tolist()[0]
word_count = word_count.sort_values("count", ascending=False).reset_index(drop=True)
word_count[:50]

Unnamed: 0,word,count
0,rt,260
1,the,190
2,quarentena,124
3,de,118
4,https,109
5,co,101
6,virus,85
7,corona,83
8,like,83
9,is,80


## Refer√™ncias##

Doc Tweepy: https://www.tweepy.org/