Skip to content

mach1el/openproject-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

openproject-crawler

OpenProject Go Python Selenium

This tool supports collecting data from OpenProject, forcing users to use the available API of OpenProject and additional Web Selenium for scraping more data, which the API doesn't support. Scraping processes are using asynchronous programming to make it faster and stable.

Installation

To install the required dependencies, use:

pip install -r requirements.txt

Important variables

  • username: This variable could be change when collect data from API or from web portal, for the API value should be apikey, check this for more information. For the portal value should be the username you use to access the web portal
  • password: Also like the username, for the API it must be access token, check this note.
  • api_url: The value should be https://myopenproject.example/api/v3 (endswith /api/v3)
  • portal_url: The value should be https://myopenproject.example (no need any uri path)

Example to use (Python)

For example, to use this module, I provide a script named utils.py to scrape data from a specific project. This will use the asynchronous method, execpt DataParser; it will use ThreadPool instead. Hence, you need to setup it in an asynchronous way with async/await syntax. Give some explanation.

  • Crawler class where to init crawler and get data such project's ID, project's tasks ID, tasks's activities
    • function get_projects_id -> Get all projects available and its ID
    • function get_tasks_id -> Get all tasks that belong to project "my_project" with filters parameters in HTTP request
    • function get_tasks_activities_data -> Scrape data from work_packages/{id}

Setup Python venv to use the tool

  • Navigate to the project source:
cd /path/to/openproject-crawler/src/python
  • Create a virutal environment:
python -m venv venv
  • Active environment

    • On Windows

      .\venv\Scripts\activate
      
    • Unix or MacOS

      source venv/bin/activate
  • Install the required dependencies:

pip install -e .

Example to use (GoLang)

Given detail usage on main.go as same as Python process, the flow is

Get projects ID -> Get tasks ID of specific project -> Get tasks activities of specific project

go run main.go

Data structure

  • Projects ID:
{
  "1" : "mainproject",
  "2" : "demoproject"
}
  • Tasks ID:

    • Golang data:

      [45 278 13 225]
    • Python data:

      [45, 278, 13, 225]
  • Tasks activities:

{
  "Task name": "Scraping data from openproject",
  "Task info": {
    "Project": "Data collection",
    "ID": "2",
    "Type": "Task",
    "Priority": "Normal",
    "Create date": "2024-06-09 15:12:26",
    "End Date": "2024-06-19 16:44:31",
    "Duration": "10 days"
  },
  "Task activities": [
    {
      "Datetime": "2024-06-19 16:44:31",
      "Action": [
        "Status changed from In progress to Closed"
      ]
    }
  ]
}

License

GitHub License

Releases

No releases published

Packages

No packages published