-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pg_dump does not work for single tables #112
Comments
I think you will find your answer in this comment of an old PR: #102 (comment) Let me know if that helps |
Yes I know that way to dump a table to CSV format. Unfotunately, I have a specific usecase where I need the same file as those generated with Of course I can generate the file |
I am guessing you are also familiar with our pg_dump procedure for backup/restore of the entire db (as opposed to one hypertable): http://docs.timescale.com/api#backup The reason that pg_dump doesn't just work for a single hypertable is that pg_dump does not dump the data from inherited tables when processing parent tables. So, if you just dump the hypertable (which is a parent table of a bunch of chunks) you won't get the data in the underlying chunks. We may create a wrapper utility for this in the future. In the meantime, maybe we can help you make this work with COPY instead of INSERT INTO? What issue are you having using COPY? |
Ok thank you. I will write a little tool to convert CSV to I will try to summarize our problem. In the backend, we use python and psycopg2, and the I hope I was clear :) |
Oh ok, that's more clear. May I suggest an approach like this instead: https://www.postgresql.org/message-id/3a0028490809301807j59498370m1442d8f5867e9668@mail.gmail.com That should become even easier once we get #100 resolved (working on that in current sprint). |
Just to be clear the approach I (and the message above) am suggesting is to use psycopg2 and the copy_from to copy data from CSV to a temporary table and then do the "conflict resolution" in sql while copying the data in the temporary table to the hypertable. In the message I linked to they did it through a series of sql commands, but you can also do this in a PLPGSQL function that copies data row-by-row and catches integrity errors inside the function. I think this approach using a temporary table will be much faster and easier than inserting row-by-row on the client side. Please let me know if any of this was unclear or you have any other questions. |
Also, this no longer seems to be the immediate issue, but the latest version of our docs (just published) provides more detail about dump/restore for single hypertables vs. the entire db: |
So, if let me summarize (maybe it will help other because this is a problem that will be often encoutered with IOT, I think). ProblemThere are data from multiples sites, each one with its own database. There is also a backend, where all data from all sites are stored. How can we synchronize data each X minutes from each site to the backend, knowing that some data may be updated ? First solutionDump data to sql (insert into) format, on each site, then send it to the backend. On the backend, execute each line one by one. If there is an Integrity error, it means we are trying to insert a key that is already present, so we do not need to insert the data but to update it. Indeed, the data may have been updated on site. In practise, we use python for our backend, with psycopg2 to communicate with our postgreSQL, which is boosted for timeseries data with timescaleDB. We have a function that transform an INSERT to an UPDATE (by @lucaslandry) def insert_to_update(test_string, id_name):
regex = "INSERT INTO (?P<table>[\w]+) \((?P<columns>.+)\) VALUES \((?P<values>.+)\)"
match = re.search(regex, test_string)
if match is not None:
columns = match.group('columns').replace(' ', '').split(',')
values = match.group('values').replace(' ', '').split(',')
update_command = "UPDATE {} SET ({}) = ({}) WHERE {} = {} ;".format(
match.group('table'),
match.group('columns'),
match.group('values'),
id_name,
values[columns.index(id_name)])
return update_command Another that use the psycopg2 error to get the value and name of the key with error: import re
def extract_id(error_string):
regex = ("DETAIL:\s{2}Key\s\((.+)\)=\((.+)\)\salready\sexists")
match = re.search(regex, error_string)
id_name, id_value = match.group(1), match.group(2)
return id_name, id_value And then we use them that way: # add your connection credentials
con = psycopg2.connect(...)
cur = con.cursor()
print("Connection to Postgres Database established.")
filename, extension = os.path.splitext(os.path.basename(body))
with open(body, "r") as my_file:
for line in my_file:
if 'INSERT INTO' in line[:11]:
try:
# we try to execute the insert
cur.execute(line)
con.commit()
except psycopg2.IntegrityError as e:
# if it does not work, we change that inser into to update.
# first we rollback the error transaction
con.rollback()
id_name, id_value = extract_id(e.pgerror)
update_command = insert_to_update(line, id_name)
cur.execute(update_command)
con.commit()
con.close() This approach works fine, however there are several drawback: Issue 1: Updating each row when there is a conflict can be quite inefficient when working on hypertables, because these data does not need to be updated (sensor data in our case). Issue 2: Performances could certainly be improved, using postgres internal mechanism. NB: I put this post on my website, if you want to see it's here |
We just merged a PR into master with support for UPSERTS, i.e., This should be part of the 0.3.0 release, which should go out early next week. |
UPSERTS can now be found in the 0.3.0 release: More information also in the docs: http://docs.timescale.com/api#upsert Please let us know if this allows you to simplify your import? |
@Romathonat Unless anything else, I'm going to close out this issue? |
Yes, I will work on this part of this project in several weeks, I guess the upsert will help us so you can close this ;) |
112: Call out Forge availability in our Readme r=JLockerman a=JLockerman Co-authored-by: Joshua Lockerman <josh@timescale.com>
I am facing an issue with the
pg_dump
command. I dump the data of my tables with the command:pg_dump -a -U user -d database-t table> /my/path/table.bak
It works perfectly fine with standard tables but not with hypertable.
Is it possible to dump hypertables that way or do we have to use the
COPY FROM
in the doc. We have a specific usecase where we need to useINSERT INTO
statements, too much complicated to explain in few lines.The text was updated successfully, but these errors were encountered: