Skip to content

Collection of Python desktop (tkinter) apps that lets you: (1) Find (english) Wikipedia pages and look at page views over time (2) Look at the number of tweets containing substrings over time (3) Use the GPT-3 to search for keywords and Twitter to see if those keywords occur in the wild, and (4) Select terms to download tweets to a DB

License

Notifications You must be signed in to change notification settings

SpeedCoder5/KeywordExplorer

 
 

Repository files navigation

Explorer Apps

There are six(!) applications in this project, KeywordExplorer, TweetsCountExplorer, TweetDownloader, WikiPageviewExplorer, TweetEmbedExplorer, and ModelExplorer. The latest stable version can be installed with pip:

pip install keyword-explorer

A brief overview of each can be reached using the links below.

KeywordExplorer is a Python desktop app that lets you use the GPT-3 to search for keywords and Twitter to see if those keywords are any good.

TweetCountsExplorer is a Python desktop app that lets you explore the quantity of tweets containing keywords over days, weeks or months.

TweetDownloader is a Python desktop app that lets you select and download tweets containing keywords into a database. The number of Tweets can be adjusted so that they are the same for each day or proportional. Users can apply daily and overall limits for each keyword corpora.

WikiPageviewExplorer is a Python desktop app that lets you explore keywords that appear as articles in the Wikipedia, and chart their relative page views.

TweetEmbedExplorer is a Python desktop app for analyzing, filtering, and augmenting tweet information. Augmented information can them be used to create a train/test corpus for finetuning language models such as the GPT-2,

ModelExplorer is a Python desktop app that lets a user interact with a finetuned GPT-2 model trained using EmbeddingExplorer

Before Using!

Most of these apps require that you have an OpenAI account and/or a Twitter developer account:

  • KeywordExplorer requires a Twitter and OpenAI account
  • TweetCountExplorer requires a Twitter developer account
  • WikiPageviewExplorer uses the wikipedia API (pip install wikipedia), and requires a user agent
  • TweetDownloader requres additional elements such as a database, which will be descussed in its section but not here.
  • TweetEmbedExplorer requires a Twitter account, OpenAI account, and a MariaBD/MySQl database
  • ModelExplorer uses the HuggingFace transformers API (pip install transformers), and a MariaDB/Mysql database
  • ModelExplorer requires GPT-2 models trained on corpora generated by TweetEmbedExplorer. To train a model, follow these steps: How to train a model

The following links are very helpful:

In each case you'll have to get an ID and set it as an environment variable. The names must be OPENAI_KEY for your GPT-3 account and BEARER_TOKEN_2 for your Twitter account, as shown below for a Windows environment:

Environment variables

If you don't have permissions to set up environment variables or just don't want to, you can set up a json file and load that instead:

{
  "BEARER_TOKEN_2": "AAAAAAAAAAAAAAAAAAAAAC-----------------------",
  "OPENAI_KEY": "sk-s------------------------------------",
  "USER_AGENT": "xyz@xyz.com",
}

In this case, BEARER_TOKEN_2 id for the Twitter V2 account, OPENAI_KEY is for the GPT-3, and USER_AGENT is for accessing the Wikipedia.

To load the file click on the "File" menu and select "Load IDs". Then navigate to the json file and select it. After the ids are loaded, any application that depends on them will run. If you try using an app that doesn't have an active ID, it will complain.

LoadID

Alternately you can create a .env file in the folder from which you are running the apps or a parent folder thereof. An example of this file is provided as .env_example.

To use this method copy .env_example to .env, enter your keys and save the file.

This file uses dotenv to automatically search for and environment variables and load them. .env is ignored by git as to make sure it is not committed.

DATABASE_USER=root
DATABASE_PASSWORD=password
DATABASE_HOST=localhost
DATABASE_SSL_CA=/home/username/.ssl/DigiCertGlobalRootG2.crt.pem
OPENAI_KEY=AAAAAAAAAAAAAAAAAAAAAC-----------------------
BEARER_TOKEN_2=AAAAAAAAAAAAAAAAAAAAAC-----------------------
USER_AGENT=xyz@xyz.com

Default values are used if an environment variable is omitted.

DATABASE_SSL_CA is only required if you are connecting to a non-local database via ssl. It is the path to the file that contains a PEM-formatted CA certificate. By example, if you are using an Azure MySQL database the .pem file can be obtained here. Sometimes the Root CA changes, as happened recently with Azure MySQL. The new certificates can be found here.

MySQL Database Setup

  1. Create a .my.cnf file in your home directory:
touch ~/.my.cnf
  1. Open the .my.cnf file and paste the contents of .my.cnf_example into it.

  2. Modify the values in .my.cnf to match your MySQL configuration.

Set the correct file permissions:

chmod 600 ~/.my.cnf
  1. Verify the setup by running mysql without specifying user and password.
make show-databases
  1. Create the databases.
make create-databases

make commands explained

make description
help This help
clean Removes build artifacts
clean-all Remove the virtual environment and build artifacts
venv Create/update project's virtual environment. To activate, run: source activate
test Run unit tests
dist Create python package and run unit tests
publish Create/publish python package to test pypi repo
show-databases show existing databases (also tests database connection)
create-databases create databases
drop-databases drop databases
get-corpora download some books from gutenberg to corpora

You should be good to use the apps!

About

Collection of Python desktop (tkinter) apps that lets you: (1) Find (english) Wikipedia pages and look at page views over time (2) Look at the number of tweets containing substrings over time (3) Use the GPT-3 to search for keywords and Twitter to see if those keywords occur in the wild, and (4) Select terms to download tweets to a DB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Other 0.4%