This project has the goal of automating the process of updating data charts for the Portugal's Covid-19 pandemic wikipedia page.
- Python 3
- pandas
- Beautiful Soup
- pdfminer
- requests
- pytest
- Clone this repository and
cd
into it. - Install the required packages with
pip3 install -r requirements.txt
. - Run
tests.py
to check whether the script works with the format from the latest DGS report. - If all tests pass, run the command
python3 get_data.py
. - After the script ends, go to the folder
output/
and:- copy the content of the file
PortugalCovid-19-Statistics.txt
into Portugal's Covid-19 pandemic english wikipedia page
- copy the content of the file
- To contribute to the portuguese page, go to the folder
output/portuguese
and copy the content of the filesGraphsCasesByAgeAndGender.txt
andTimelineGraphs.txt
into the sections Casos por idade e sexo and Gráfico da evolução dos casos, respectively.
Thanks to hagnat for the inspiration (he did something similar here but for the Brazilian wikipedia page).
Due to changes in the DGS report format (from August 16th onwards), a new method for parsing the PDF files and retrieving data had to be implemented. That was effectively done on August 22nd. To be able to gather data from reports before this date, a branch called old-format-16-08-20
was created.
Since the format was again changed, the script which dealt with that format was preserved on format-before-20-12-20
Blog post explaining the project a little bit further.
- Keep track of more information over time in the .csv file - things like the evolution of cases and deaths by gender and age and by location, just to name a few.