Colombia issue#109
Conversation
|
@eduardocorrearaujo The best solution is to use the The reason you are getting repetitions for To do that, you need to create the table in SQL, before starting to insert data: CREATE table colombia.positive_cases_covid_d (
id_ BIGSERIAL PRIMARY KEY,
fecha_inicio_sintomas TIMESTAMP WITHOUT TIME ZONE,
<outras colunas>
)But this can be a bit cumbersome to do for a table with many columns. ALTER TABLE colombia.positive_cases_covid_d ADD COLUMN id_ BIGSERIAL PRIMARY KEY;Which is much simpler, but you need to be sure that you don't have any duplicates in the database at this point. After you do this, you can continue to append rows to your table, and Postgresql will increment id_ automatically for you. |
ok, thank you for the great explanation. But I still have a doubt about it. Don't I need to define the |
|
@fccoelho, I made a commit adding the line: Because these typing errors were interfering with the Colombia dashboard, I would like to discuss if I should replace the values using pandas (as I did). Or, if it would be faster to create a SQL query to make the changes after we upload the new data to the database. |
No, after you create the |
I think you can do this "replace" on every chunk, it shouldn't slow things down too much. |
|
@eduardocorrearaujo I am not super familiar with the code here. but in order to test it, I think that it depends on #103 |
|
@eduardocorrearaujo let's try to close this PR |
|
@eduardocorrearaujo could you rebase your branch pls? the CI should work now 🤞 |
|
rebased 🤞 |
|
thanks @eduardocorrearaujo for working on that! |
This PR aims to solve the problem with the colombia scripts related in #103.
The problem was using the column
id_as unique constraint. I made this following thefoph.pyscript, bu since we are reading the colombia data in chunks, for each chunk theid_values repeat what was giving an error. So I changed it to the columnid_de_caso, which must be unique. To use theid_de_casoas unique constraint I also needed to type this code in the SQL editor:In the future when we migrate the scripts to Apache Airflow, I think that would be great for us to write a good tutorial about how to create the scripts to upload data and make it easier for other people to help us.