ArxivBot is a Python script that automates the process of staying up-to-date with the latest research in your field. It fetches papers from arxiv.org, filters them based on your interests, summarizes them using a language model, and sends you a daily email with the results.
- Fetches the latest research papers from arxiv.org.
- Filters papers based on predefined research interests.
- Uses a language model to generate summaries of relevant papers.
- Extracts the teaser figure from relevant papers.
- Sends an email with the summaries, teaser figures and attached logs.
- Easy to configure and automate with cron.
- Conda (Miniconda or Anaconda)
-
Clone the repository:
git clone https://github.com/yourusername/arxivbot.git cd arxivbot
-
Create and activate the Conda environment:
conda env create -f environment.yml conda activate arxivbot
-
Create a Mailgun account or set up an email server:
Mailgun has a free plan for a small number of emails sent per month, but you can also use your own email server for this.
- Create a Mailgun Account:
- Go to the Mailgun website and sign up for an account.
- Follow the instructions to verify your email address and set up your domain.
- Get Your API Key:
- Once your account is set up, log in to the Mailgun dashboard.
- Navigate to the "API" section.
- Copy the "Private API Key" and use it as the EMAIL_PASSWORD in your .env file.
- Use your Mailgun domain and email as the EMAIL_FROM.
- Create a Mailgun Account:
-
Generate a token for Gemini Flash 1.5:
Gemini Flash is recommended for its large context window and free tier, but you can use other language models. You'll need to add them to LLMs directory with a proper interface. - Go to the Google AI Studio and sign up for an account if you don't already have one. - Press "Get API Key". - Create a new API key. - Copy the API key and use it as the GENAI_API_TOKEN in your .env file.
-
Set up environment variables: Add the token and email server info to .env_template file and rename it to .env.
EMAIL_FROM=<your Mailgun or email server email> EMAIL_TO=recipient@example.com EMAIL_SMTP_SERVER=<smtp.mailgun.com or email server smtp> EMAIL_SMTP_PORT=587 EMAIL_USERNAME=<your Mailgun or email server email> EMAIL_PASSWORD=<your Mailgun or email server password> GENAI_API_TOKEN=<your Gemini API token>
The configuration is managed in the config.py file. The default settings are for the Computer Vision section, but you can set your own section by updating the arxiv_section variable in config.py.
- arxiv_section: Section URL for fetching papers from arxiv.org (default is for Computer Vision and Pattern Recognition - cs.CV).
- interests: List of research interests to filter the papers.
To run the script, simply execute:
python main.py
The script will fetch the latest papers, filter them, generate summaries, and send an email with the results.
The script generates a log file in the logs directory. This log file is also attached to the email sent by the script.
You can automate the execution of the script using cron jobs.
-
Create a runme.sh script, add there the following content
#!/bin/bash source /path/to/your/anaconda/bin/activate arxivbot python /path/to/your/arxivbot/main.py
Make sure to replace /path/to/your/arxivbot and /path/to/your/anaconda with correct paths. Then run
chmod +x runme.sh
-
Open your crontab file:
crontab -e
-
Add a new cron job to run runme.sh at your desired frequency. Here’s how you can set up your cron job to run every day at 9 AM, Monday through Friday:
0 9 * * 1-5 /path/to/your/arxivbot/runme.sh
Make sure to replace /path/to/your/arxivbot/ with the actual path to your script.
This was not tested!
-
Open Task Scheduler and create a new basic task.
-
Follow the wizard to set the trigger (e.g., daily at a specific time). In the advanced settings, select "Repeat task every" and specify the desired interval (e.g., every day). Under "Days," check "On these days" and select "Weekdays."
-
For the action, choose "Start a Program" and browse to your runme.bat file (you'll need to create a batch file to activate the Conda environment and run the script). Example runme.bat file content:
@echo off cd C:\path\to\your\arxivbot call conda activate arxivbot python main.py
The dependencies are listed in the environment.yml file and include:
- python=3.9
- requests
- beautifulsoup4
- google-generativeai
- python-dotenv
- PyMuPDF
- numpy
- opencv
Feel free to open issues or submit pull requests for improvements or bug fixes.
This project is licensed under the MIT License.