Skip to content

Automatically fetch daily arxiv papers, filter with GPT, and send you an email.

Notifications You must be signed in to change notification settings

wzk1015/Arxiv-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arxiv Assistant

Automatically fetch daily arxiv papers, filter with GPT, and send you an email.

The program will check for new papers every 6 hours, use GPT to filter papers related to your input keywords, and send an email to your email address. It also saves jsons under papers/.

image-20231225205700261

Quick Start

  1. Ensure that your network can connect to arxiv API and ChatGPT API

  2. Install packages

    pip install openai arxiv markdown2
  3. Save your OpenAI API key in openai_key.txt. If you don't want to use GPT filter or don't have an OpenAI API key, set gpt_filter=False when initializing ArxivAssistant.

  4. Set up SMTP in your email (Instructions) and save the related information in mail_info.json. An example:

    {
        "mail_host": "smtp.qq.com", // SMTP host
        "mail_user": "xxx@qq.com", // your email address
        "mail_pass": "xxxxxxxx" // e.g. identification code of SMTP
    }
  5. Run the routine. Make sure the program runs constantly, e.g. run with tmux on a server

    import openai
    import json
    
    from assistant import ArxivAssistant
    
    with open("openai_key.txt") as f:
        openai.api_key = f.read()
        
    with open("mail_info.json") as f:
        mail_info = json.load(f)
    
    assistant = ArxivAssistant(
        mail_host=mail_info["mail_host"],
        mail_user=mail_info["mail_user"],
        mail_pass=mail_info["mail_pass"],
        
        categories=['cs.CV', 'cs.CL', 'cs.LG', 'cs.AI'], # your interested arxiv categories. See https://arxiv.org/category_taxonomy 
        keywords=['large language model', 'LLM'], # keywords describing your research interest
        negative_keywords=['medical'] # (Optional) keywords describing papers you don't want to read
    )
    
    assistant.run_routine()

Customize

  1. Configure the number of papers:

    1. max_results_per_category: If the number of papers in some category of one day exceeds this number, only the first max_results_per_category papers are kept. Defaults to 500.

    2. max_papers_per_query: The papers are divided into groups to avoid exceeding the context length of GPT, each containing this number of papers. Defaults to 50.

    3. num_filtered_papers: The maximum number of output papers for each group. Defaults to 10.

  2. Configure routine interval: Set routine_interval_hours. Defaults to 6.

Note: Arxiv publishes new papers at 20:00 EST every Sunday to Thursday. When the interval is less than 24, the routine only succeeds one time a day. When the interval is more than 24, only the last publish date (yesterday / last Thursday) is considered.

  1. Configure GPT:

    1. temperature: output temperature. Defaults to 0.7.
    2. gpt_model: gpt model to use. Please note the context length, and change max_papers_per_query accordingly. Defaults to gpt-3.5-turbo-16k.
  2. Change email receivers: mail_receivers is a list of receivers' email addresses. Defaults to the same as the mail sender.

  3. Customize prompt for GPT and email content: Update the strings in prompts.py and single_paper_info in assistant.py. The email content follows Markdown format.

Acknowledgment

This repository is partially built on wbs2788/Arxiv-Daily.

About

Automatically fetch daily arxiv papers, filter with GPT, and send you an email.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages