Improve memory handling for remote COPY #2698

erimatnor · 2020-12-01T17:33:16Z

This change improves memory usage in the COPY code used for
distributed hypertables. The following issues have been addressed:

PGresult objects were not cleared, leading to memory leaks.
The caching of chunk connections didn't work since the lookup
compared ephemeral chunk pointers instead of chunk IDs. The effect
was that cached chunk connection state was reallocated every time
instead of being reused. This likely also caused worse performance.

To address these issues, the following changes are made:

All PGresult objects are now cleared with PQclear.
Lookup for chunk connections now compares chunk IDs instead of chunk
pointers.
The per-tuple memory context is moved the to the outer processing
loop to ensure that everything in the loop is allocated on the
per-tuple memory context, which is also reset at every iteration of
the loop.
The use of memory contexts is also simplified to have only one
memory context for state that should survive across resets of the
per-tuple memory context.

Fixes #2677

codecov · 2020-12-01T23:14:19Z

Codecov Report

Merging #2698 (b4cd8be) into master (47da879) will increase coverage by 0.18%.
The diff coverage is 90.88%.

@@            Coverage Diff             @@
##           master    #2698      +/-   ##
==========================================
+ Coverage   90.01%   90.19%   +0.18%     
==========================================
  Files         212      212              
  Lines       34431    34562     +131     
==========================================
+ Hits        30992    31174     +182     
+ Misses       3439     3388      -51

Impacted Files	Coverage Δ
src/bgw/job.c	`91.44% <ø> (-0.89%)`	⬇️
src/catalog.h	`100.00% <ø> (ø)`
src/cross_module_fn.c	`68.54% <0.00%> (-0.56%)`	⬇️
src/scanner.c	`96.26% <ø> (ø)`
tsl/test/src/remote/async.c	`100.00% <ø> (ø)`
tsl/src/remote/async.c	`83.14% <52.94%> (-3.30%)`	⬇️
tsl/src/remote/dist_copy.c	`89.92% <88.23%> (-1.45%)`	⬇️
src/version.c	`86.88% <90.00%> (-0.62%)`	⬇️
tsl/src/remote/connection.c	`92.12% <92.15%> (+0.06%)`	⬆️
src/hypertable.c	`87.61% <95.45%> (+0.15%)`	⬆️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c76fd4...b4cd8be. Read the comment docs.

erimatnor · 2020-12-02T10:25:01Z

For context, I tested this on my local machine with two data nodes using the NYC taxi data set. Without the fixes the memory consumption never stops increasing and inserts are slow. In fact, I had to prematurely stop the data load because it took such a long time (at 40+ mins) and memory usage was closing in on 1GB.

This is what memory consumption and load on the CPU cores looked like without the fix

You can see that the data load is bottle necked on the access node that is doing lots of unnecessary work and continuously allocating memory without releasing.

With the fix, the memory usage was stable throughout the test, and the CPU load was much more even across all nodes. The AN rarely reached 100%. The whole data load completed after 3 mins and 55 seconds. This is what memory consumption and CPU load looked like:

FWIW, on a regular (non-distributed) hypertable, it took just over 6 minutes to load the same data.

erimatnor · 2020-12-02T10:43:22Z

tsl/src/remote/dist_copy.c


-	return create_connection_list_for_chunk(state, chunk)->connections;
+		if (chunkconns->chunk_id == chunk->fd.id)
+			return chunkconns->connections;


This is definitely hit in my benchmark (I added print statement here). But I guess our regression tests never get to the point of actually switching between cached chunks. I'll try to see what it would take to make that happen.

mkindahl · 2020-12-02T11:34:08Z

src/cross_module_fn.c

+
+	return 0;


This just creates a false positive in codecov. If your compiler warns, it's better to mark error_no_default_fn_community as non-returning, e.g., for GCC:

void error_no_default_fn_community () __attribute__ ((noreturn));

Note that I didn't add this function, I only changed the signature. We currently don't use these attributes on any of our default functions and if we'd like to do that we can add for all functions in a separate PR.

I was referring to the added code, not the function. I don't think you need to add any code here at all.

mkindahl · 2020-12-02T11:49:44Z

tsl/src/remote/dist_copy.c

+		PG_TRY();
+		{


From what I understand, this is only for ensuring that the error message is not released prior to calling the error reporting function.

Using exception handling seems quite heavy-handed for ensuring that the error message can be passed up, suggest to just copy the error message and clearing the result before reporting an error. The code will be significantly easier to follow and codecov will not generate so many false positives.

No, this is to ensure that the PGresult is always released, even if we encounter an error or throw an error ourselves. The PGresult is malloced and will cause a permanent memory leak if we don't guarantee its release. And the only safe way to do that is a try-catch.

But if there is an error in PQexec nothing will be assigned to res (because the error function will do a longjmp to the exception handler directly) so then you do not need any cleanup code. The only place where res can have a value and an error can be thrown is from the ereport below, and the only reason you cannot call PQclear before calling ereport is because PQresultErrorMessage references the internals of res, so removing the entire try-catch and doing this instead is easier to understand:

if (PQresultStatus(res) != PGRES_COPY_IN) { char *msg = pchomp(PQresultErrorMessage(res)); PQclear(res); ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("unable to start remote COPY on data node"), errdetail("Remote command error: %s", msg))); }

Note that you only need to copy the error message, because that's all you use.

I am not sure I understand what the improvement is apart from "just different".

FWIW, this is the standard pattern for handling PGresult, e.g., in the postgres_fdw. And, it is safe to future changes, e.g., if someone adds a function call that can throw and error between the PQexec and the error checking.

Easier to follow the logic, less duplicated code, that's all. This is a matter of taste. I have approved the PR, feel free to push.

mkindahl · 2020-12-02T12:16:45Z

tsl/src/remote/dist_copy.c


-	foreach (lc, state->connections_in_use)
+	PG_TRY();


Unless I misunderstand something, it seems like the only reason to have the try-catch construction is to free the result if you generate an error on line 400. Wouldn't it be easier to just clear the result there instead and not have the exception handling wrapper?

No, this loop throws errors in multiple places, not just line 400. Then, in theory, all the other functions called could also throw errors. We need to ensure that all PGresults are released or we'd have a memory leak.

Note that we collect results in a list, so it is not only about one result.

pmwkaa

Looks good overall

pmwkaa · 2020-12-02T12:48:28Z

tsl/src/remote/dist_copy.c

 	if (PQisnonblocking(pg_conn))
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("distributed copy doesn't support non-blocking connections")));

 	if (!list_member_ptr(state->connections_in_use, connection))
 	{
-		PGresult *res = PQexec(pg_conn, state->outgoing_copy_cmd);
+		PGresult *volatile res = NULL;


Have you encountered any issues regarding it?

Not sure what the question is?

tsl/src/remote/dist_copy.c

mkindahl

Approving since the code is correct, but I think the use of the TRY-CATCH clauses is a quite heavy-handed solution to handling the deallocation of the result.

mkindahl · 2020-12-02T15:18:16Z

tsl/src/remote/dist_copy.c

+		PG_TRY();
+		{


But if there is an error in PQexec nothing will be assigned to res (because the error function will do a longjmp to the exception handler directly) so then you do not need any cleanup code. The only place where res can have a value and an error can be thrown is from the ereport below, and the only reason you cannot call PQclear before calling ereport is because PQresultErrorMessage references the internals of res, so removing the entire try-catch and doing this instead is easier to understand:

if (PQresultStatus(res) != PGRES_COPY_IN) { char *msg = pchomp(PQresultErrorMessage(res)); PQclear(res); ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("unable to start remote COPY on data node"), errdetail("Remote command error: %s", msg))); }

Note that you only need to copy the error message, because that's all you use.

This change improves memory usage in the `COPY` code used for distributed hypertables. The following issues have been addressed: * `PGresult` objects were not cleared, leading to memory leaks. * The caching of chunk connections didn't work since the lookup compared ephemeral chunk pointers instead of chunk IDs. The effect was that cached chunk connection state was reallocated every time instead of being reused. This likely also caused worse performance. To address these issues, the following changes are made: * All `PGresult` objects are now cleared with `PQclear`. * Lookup for chunk connections now compares chunk IDs instead of chunk pointers. * The per-tuple memory context is moved the to the outer processing loop to ensure that everything in the loop is allocated on the per-tuple memory context, which is also reset at every iteration of the loop. * The use of memory contexts is also simplified to have only one memory context for state that should survive across resets of the per-tuple memory context. Fixes timescale#2677

**Minor Features** * timescale#2662 Save compression metadata settings on access node * timescale#2707 Introduce additional db for data node bootstrapping **Bugfixes** * timescale#2698 Improve memory handling for remote COPY * timescale#2555 Set metadata for chunks compressed before 2.0

This release candidate contains bugfixes since the previous release candidate, as well as additional minor features. It improves validation of configuration changes for background jobs, adds support for gapfill on distributed tables, and contains improvements to compression for distributed hypertables. **Minor Features** * timescale#2689 Check configuration in alter_job and add_job * timescale#2696 Support gapfill on distributed hypertable * timescale#2468 Show more information in get_git_commit * timescale#2678 Include user actions into job stats view * timescale#2664 Fix support for complex aggregate expression * timescale#2672 Add hypertable to continuous aggregates view * timescale#2662 Save compression metadata settings on access node * timescale#2707 Introduce additional db for data node bootstrapping **Bugfixes** * timescale#2688 Fix crash for concurrent drop and compress chunk * timescale#2666 Fix timeout handling in async library * timescale#2683 Fix crash in add_job when given NULL interval * timescale#2698 Improve memory handling for remote COPY * timescale#2555 Set metadata for chunks compressed before 2.0

This release candidate contains bugfixes since the previous release candidate, as well as additional minor features. It improves validation of configuration changes for background jobs, adds support for gapfill on distributed tables, contains improvements to the memory handling for large COPY, and contains improvements to compression for distributed hypertables. **Minor Features** * timescale#2689 Check configuration in alter_job and add_job * timescale#2696 Support gapfill on distributed hypertable * timescale#2468 Show more information in get_git_commit * timescale#2678 Include user actions into job stats view * timescale#2664 Fix support for complex aggregate expression * timescale#2672 Add hypertable to continuous aggregates view * timescale#2662 Save compression metadata settings on access node * timescale#2707 Introduce additional db for data node bootstrapping **Bugfixes** * timescale#2688 Fix crash for concurrent drop and compress chunk * timescale#2666 Fix timeout handling in async library * timescale#2683 Fix crash in add_job when given NULL interval * timescale#2698 Improve memory handling for remote COPY * timescale#2555 Set metadata for chunks compressed before 2.0

This release candidate contains bugfixes since the previous release candidate, as well as additional minor features. It improves validation of configuration changes for background jobs, adds support for gapfill on distributed tables, contains improvements to the memory handling for large COPY, and contains improvements to compression for distributed hypertables. **Minor Features** * #2689 Check configuration in alter_job and add_job * #2696 Support gapfill on distributed hypertable * #2468 Show more information in get_git_commit * #2678 Include user actions into job stats view * #2664 Fix support for complex aggregate expression * #2672 Add hypertable to continuous aggregates view * #2662 Save compression metadata settings on access node * #2707 Introduce additional db for data node bootstrapping **Bugfixes** * #2688 Fix crash for concurrent drop and compress chunk * #2666 Fix timeout handling in async library * #2683 Fix crash in add_job when given NULL interval * #2698 Improve memory handling for remote COPY * #2555 Set metadata for chunks compressed before 2.0

This release candidate contains bugfixes since the previous release candidate, as well as additional minor features. It improves validation of configuration changes for background jobs, adds support for gapfill on distributed tables, contains improvements to the memory handling for large COPY, and contains improvements to compression for distributed hypertables. **Minor Features** * timescale#2689 Check configuration in alter_job and add_job * timescale#2696 Support gapfill on distributed hypertable * timescale#2468 Show more information in get_git_commit * timescale#2678 Include user actions into job stats view * timescale#2664 Fix support for complex aggregate expression * timescale#2672 Add hypertable to continuous aggregates view * timescale#2662 Save compression metadata settings on access node * timescale#2707 Introduce additional db for data node bootstrapping **Bugfixes** * timescale#2688 Fix crash for concurrent drop and compress chunk * timescale#2666 Fix timeout handling in async library * timescale#2683 Fix crash in add_job when given NULL interval * timescale#2698 Improve memory handling for remote COPY * timescale#2555 Set metadata for chunks compressed before 2.0

This release candidate contains bugfixes since the previous release candidate, as well as additional minor features. It improves validation of configuration changes for background jobs, adds support for gapfill on distributed tables, contains improvements to the memory handling for large COPY, and contains improvements to compression for distributed hypertables. **Minor Features** * #2689 Check configuration in alter_job and add_job * #2696 Support gapfill on distributed hypertable * #2468 Show more information in get_git_commit * #2678 Include user actions into job stats view * #2664 Fix support for complex aggregate expression * #2672 Add hypertable to continuous aggregates view * #2662 Save compression metadata settings on access node * #2707 Introduce additional db for data node bootstrapping **Bugfixes** * #2688 Fix crash for concurrent drop and compress chunk * #2666 Fix timeout handling in async library * #2683 Fix crash in add_job when given NULL interval * #2698 Improve memory handling for remote COPY * #2555 Set metadata for chunks compressed before 2.0

erimatnor added the multinode label Dec 1, 2020

erimatnor force-pushed the fix-remote-copy-memory-usage branch 2 times, most recently from 61aef7b to 37612a5 Compare December 1, 2020 17:35

erimatnor force-pushed the fix-remote-copy-memory-usage branch from 37612a5 to 7fe56ef Compare December 2, 2020 08:37

erimatnor mentioned this pull request Dec 2, 2020

COPY Command not working for large CSV/txt files of 1000000+ rows in Timescaledb rc1 #2677

Closed

erimatnor force-pushed the fix-remote-copy-memory-usage branch from 7fe56ef to 76c9bee Compare December 2, 2020 10:17

erimatnor marked this pull request as ready for review December 2, 2020 10:26

erimatnor requested a review from a team as a code owner December 2, 2020 10:26

erimatnor requested review from pmwkaa, k-rus, gayyappan, mkindahl, svenklemm and a team and removed request for a team and gayyappan December 2, 2020 10:26

erimatnor commented Dec 2, 2020

View reviewed changes

svenklemm approved these changes Dec 2, 2020

View reviewed changes

mkindahl reviewed Dec 2, 2020

View reviewed changes

pmwkaa approved these changes Dec 2, 2020

View reviewed changes

mkindahl approved these changes Dec 2, 2020

View reviewed changes

erimatnor force-pushed the fix-remote-copy-memory-usage branch from 76c9bee to b4cd8be Compare December 2, 2020 15:51

erimatnor merged commit 2ecb53e into timescale:master Dec 2, 2020

erimatnor deleted the fix-remote-copy-memory-usage branch December 2, 2020 16:40

mkindahl mentioned this pull request Dec 2, 2020

Release 2.0.0-rc4 #2702

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory handling for remote COPY #2698

Improve memory handling for remote COPY #2698

erimatnor commented Dec 1, 2020 •

edited

Loading

codecov bot commented Dec 1, 2020 •

edited

Loading

erimatnor commented Dec 2, 2020 •

edited

Loading

erimatnor Dec 2, 2020

mkindahl Dec 2, 2020

erimatnor Dec 2, 2020

mkindahl Dec 2, 2020

mkindahl Dec 2, 2020

erimatnor Dec 2, 2020

mkindahl Dec 2, 2020

erimatnor Dec 2, 2020

mkindahl Dec 2, 2020

mkindahl Dec 2, 2020

erimatnor Dec 2, 2020

pmwkaa left a comment

pmwkaa Dec 2, 2020

erimatnor Dec 2, 2020

mkindahl left a comment

mkindahl Dec 2, 2020

Improve memory handling for remote COPY #2698

Improve memory handling for remote COPY #2698

Conversation

erimatnor commented Dec 1, 2020 • edited Loading

codecov bot commented Dec 1, 2020 • edited Loading

Codecov Report

erimatnor commented Dec 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmwkaa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkindahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erimatnor commented Dec 1, 2020 •

edited

Loading

codecov bot commented Dec 1, 2020 •

edited

Loading

erimatnor commented Dec 2, 2020 •

edited

Loading