PostgreSQL #1888

fryorcraken · 2023-08-07T07:39:46Z

Planned start date:
Due date:

Summary

Implementation of PosgreSQL database engine for Waku Store queries.

See waku-org/pm#4 for details.

Acceptance Criteria

PostgreSQL is available in nwaku
Options to easily deploy PosgresSQL with nwaku available (e.g. docker compose)

Tasks

chore(archive): allow request concurrency towards PostgreSQL #1604
chore (postgres): Establish a testing environment and tooling to measure Store performance #1894
chore(postgres): Create a docker file to bring nwaku + PostgreSQL simultaneously. #1840
chore (postgres): Make nwaku to reconnect to the Postgres database if it restarts #1893
~~- [ ] Add retention policy with GB or MB limitation #1885~~
chore(postgres): Optimize the database. #1842
chore(postgres): keep the Postgres queries in separate raw files. #1841

RAID (Risks, Assumptions, Issues and Dependencies)

Ivansete-status · 2023-08-07T11:00:30Z

Weekly Update

achieved: Docker compose with nwaku + postgres + prometheus + grafana + postgres_exporter https://github.com/alrevuelta/nwaku-compose/pull/3
next: Carry on with stress testing

fryorcraken · 2023-08-10T04:30:44Z

I suggest to confirm PostgreSQL integration done, as it is actually done, especially that concurrent requests are already implemented. This should be good enough for 10k users.

We can then move this issue to scope PostgreSQL optimizations for 1mil users target. Do you agree @Ivansete-status ?

Ivansete-status · 2023-08-10T06:23:29Z

I suggest to confirm PostgreSQL integration done, as it is actually done, especially that concurrent requests are already implemented. This should be good enough for 10k users.

We can then move this issue to scope PostgreSQL optimizations for 1mil users target. Do you agree @Ivansete-status ?

Morning @fryorcraken ! The PostgreSQL integration is completed but I have a doubt: for 10k users, how many requests per second should we support ? And how many for 1mil? I think we need first to measure the current performance and this task is in progress, implementing a basic stress tests :)

fryorcraken · 2023-08-10T06:38:35Z

@kaiserd did a hackmd that did the estimation for relay scaling which gave us a theoretical green light for 10k users per shard. Maybe we can use the same figures for Store. However, it is challenging to estimate the needs in any case.

An alternative would be to look at the usage of store node in Status prod fleet. It's for ~100 contributors.
We can multiply this by 10x (1000 users) this would give us an idea of needs for 10k users (10 shards, 1000 users per shards). Not the 1mil users is 100 shards, 10k users per shards.

Another alternative would be to stress test sqlite and postgresql and if we demonstrated a 10x improvement then we can reach the same conclusion: we can support 100 users with sqlite, postgresql is 10x more performance, hence in theory we can support 1000 users with PostgreSQL.

SQLite and PostgreSQL are industry standards, surely there are available benchmarks that can tell us this performance difference?

Then, the next step is the needs of 10k users. I believe this is where optimization and DST simulation would be necessary.

I think we need first to measure the current performance

Indeed, it may make sense to measure performance and confirm there are no issues on nwaku code that bottlenecks the usage of PostgreSQL.
Great to hear it's in progress. where is it trcked?

implementing a basic stress tests

Is that something to be done with Kurtosis or more simple one? Where is it tracked? is the intent to just run it once or have it part of regular processes (e.g. release candidate).

Cc @jm-clius

Ivansete-status · 2023-08-10T06:48:08Z

SQLite and PostgreSQL are industry standards, surely there are available benchmarks that can tell us this performance difference?
Sure, we can check that. And also compare our current implementations.

The stress test creation is tracked in the next issue. The idea, for now, is to make manual runs and measurements.
#1894

jm-clius · 2023-08-10T14:28:16Z

For now a manual measurement would be a good start IMO. In fact, we can look at current query rate for Status Community and assume a linear increase with increase in numbers and simply script that many queries to a postgresql instance. Very crude estimate with approximate results, but will give us a good initial benchmark.

Ivansete-status · 2023-08-14T13:30:41Z

Weekly Update

achieved: Learned that the insertion rate is constrained by the relay protocol. i.e. the maximum insert rate is limited by relay so I couldn't push the "insert" operation to a limit from a Postgres point of view. For example, if 25 clients publish messages concurrently, and each client publishes 300 msgs, all the messages are correctly stored. If repeating the same operation but with 50 clients, then many messages are lost because the relay protocol doesn't process all of them.
next: Carry on with stress testing. Analyze the performance differences between Postgres and SQLite regarding the read operations.

Ivansete-status · 2023-08-28T06:37:49Z

Weekly Update

achieved: new docker compose in test-waku-query that allows to quickly compare insert and query performance between SQLite and Postgres.
next: Carry on with stress testing & follow-up of the Postgres addition to wakuv2.shards by the infra team.

Ivansete-status · 2023-09-04T06:35:58Z

Weekly Update

achieved: Download and start configuring jmeter to have a variable number of clients sending concurrent Store requests.
next: Carry on with stress testing & follow-up of the Postgres addition to wakuv2.shards by the infra team.

Ivansete-status · 2023-09-08T09:02:40Z

Weekly Update

achieved:
- Created a jmeter test plan to stress Store queries through REST Store. As a conclusion, the node with Store Postgres showed worse performance than the one with SQLite.
  Including jmeter to allow concurrent Store requests easily test-waku-query#5
- Added reconnection feature. If the connection with Postgres is lost, the nwaku node tries to reconnect again. chore(postgres): Adding healtcheck and reconnection mechanism to the postgres archive driver #1997
- The wakuv2.shards fleet had been de-prioritized in favor of the status.shards one.
  Static sharding & Postgres status-im/infra-nim-waku#74 (comment)
next: Optimize database so that the Store requests behave better with Postgres.

Ivansete-status · 2023-09-29T21:14:28Z

Weekly Update

achieved:
- Better dburl parse that accepts host names with dashes and dots.
- Properly set the compilation flag -d:postgres so Docker images are compiled with support to Postgres (with libpq5 dependency.)
- During the stress testing, I discovered that the max throughput seems not to be directly related to Postgres. If I make the code to ignore Postgres and return immediately a mocked response, then the throughput is even lower.
next: Carry on with "select" performance analysis and analyze it directly from a Store client, rather than having REST <-> Store_Client <-> Store_Server. By ignoring the REST layer we will have a better insight into the actual Store protocol, as @jm-clius recommended to me some time ago.

Ivansete-status · 2023-10-06T09:52:20Z

Weekly Update

achieved: Applied performance comparison between SQLite and Postgres but in this case, making direct requests from a go-waku unittest that @richard-ramos had prepared.
After directly comparing the Store protocol, noticed that the bottle neck is within the database itself. i.e. the SQLite database performs better than Postgres, given that we have a very simple schema and simple queries, without joins. Adding indexes to the Postgres database didn't help very much. For example, given the same query, SQLite takes 1ms whereas Postgres takes 6ms.
next:
- Wrap up the Store testing environment and install it into our sandbox machine, metal-01.he-eu-hel1.wakudev.misc.statusim.net, so that anyone can proceed from this point (two databases with the same dataset of ~2 million rows .) in case someone is keen on analyzing performance or debug in a more realistic testing scenery. This will include concurrent queries from multiple nodes, where PostgreSQL is expected to perform better.
- Start extracting the database creation and indexes creation to outside the code base.

Ivansete-status · 2023-10-16T09:41:31Z

Weekly Update

achieved:
- Testing environment prepared in metal-01.he-eu-hel1.wakudev.misc.statusim.net. There are two databases (Postgres and SQLite) with 5 million of random messages.
- Enhanced Grafana dashboard so that we can compare timings performance throughout an histogram.
next: Carry on with the investigation to enhance the Postgres performance.

Ivansete-status · 2023-10-27T18:56:50Z

Weekly Update

achieved:
- Time processing enhancement when performing SELECT operations. There was an overhead caused by looping too many times over the returned rows, in order to convert the row types. By applying a "rowCallback" approach we can reduce by 30ms the time spent on the query under analysis.
next:
- The queries used in the comparison analysis still perform much better in SQLite (< ~5ms) than in Postgres (< ~15ms.) Therefore we need to push the investigation further to enhance that.
  
  ( Edited: notice that the timings indicated above are for tests using consecutive queries. If the queries are performed concurrently, then the timings are worse. I will elaborate more in a report shortly .)

Ivansete-status · 2023-11-03T22:27:06Z

Weekly Update

achieved: Optimize select/Store queries by adding prepared statements. PR
next: Wrap up the Postgres optimizations. Summarize the performance comparison in a report.

Ivansete-status · 2023-11-07T14:41:44Z

I'll disregard for now the point of "having the queries in a separate file". The main reason is that we are using prepared statements intensively, in order to enhance query performance, and that induced a more complex query set. Therefore, I don't see a benefit of doing that in the short-term.

Ivansete-status · 2023-11-14T15:08:02Z

The report was created in https://github.com/waku-org/nwaku/wiki/Postgres-adoption

Ivansete-status · 2023-11-14T15:11:45Z

I crossed out "#1885" from the description list because the "size" retention policy isn't strictly related to Postgres only.

Therefore, this task can be considered complete.

fryorcraken added milestone Tracks a subteam milestone E:2023-10k-users labels Aug 7, 2023

fryorcraken assigned Ivansete-status Aug 7, 2023

fryorcraken mentioned this issue Aug 7, 2023

[Milestone] Waku Network Can Support 10K Users waku-org/pm#12

Closed

22 tasks

Ivansete-status mentioned this issue Aug 7, 2023

Waku Store Cache: PostgreSQL implementation waku-org/pm#4

Closed

fryorcraken changed the title ~~[Milestone] PostgreSQL~~ [Epic] PostgreSQL Aug 24, 2023

fryorcraken added Epic and removed milestone Tracks a subteam milestone labels Aug 24, 2023

fryorcraken added the E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details label Aug 31, 2023

fryorcraken mentioned this issue Aug 31, 2023

Update issue templates epic<>milestone rename waku-org/pm#48

Merged

fryorcraken mentioned this issue Sep 7, 2023

[Epic] PostgreSQL in service node: Further optimisations waku-org/pm#84

Closed

8 tasks

fryorcraken added E:PostgreSQL See https://github.com/waku-org/pm/issues/84 for details and removed E:2023-10k-users Epic labels Sep 8, 2023

fryorcraken changed the title ~~[Epic] PostgreSQL~~ PostgreSQL Sep 8, 2023

fryorcraken mentioned this issue Sep 13, 2023

[Epic] 2.1: Production testing of existing protocols waku-org/pm#49

Closed

5 tasks

Ivansete-status closed this as completed Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL #1888

PostgreSQL #1888

fryorcraken commented Aug 7, 2023 •

edited by Ivansete-status

Ivansete-status commented Aug 7, 2023

fryorcraken commented Aug 10, 2023

Ivansete-status commented Aug 10, 2023

fryorcraken commented Aug 10, 2023 •

edited

Ivansete-status commented Aug 10, 2023

jm-clius commented Aug 10, 2023

Ivansete-status commented Aug 14, 2023

Ivansete-status commented Aug 28, 2023 •

edited by fryorcraken

Ivansete-status commented Sep 4, 2023

Ivansete-status commented Sep 8, 2023 •

edited by fryorcraken

Ivansete-status commented Sep 29, 2023 •

edited

Ivansete-status commented Oct 6, 2023 •

edited by fryorcraken

Ivansete-status commented Oct 16, 2023 •

edited by fryorcraken

Ivansete-status commented Oct 27, 2023 •

edited

Ivansete-status commented Nov 3, 2023

Ivansete-status commented Nov 7, 2023

Ivansete-status commented Nov 14, 2023

Ivansete-status commented Nov 14, 2023

PostgreSQL #1888

PostgreSQL #1888

Comments

fryorcraken commented Aug 7, 2023 • edited by Ivansete-status

Summary

Acceptance Criteria

Tasks

RAID (Risks, Assumptions, Issues and Dependencies)

Ivansete-status commented Aug 7, 2023

fryorcraken commented Aug 10, 2023

Ivansete-status commented Aug 10, 2023

fryorcraken commented Aug 10, 2023 • edited

Ivansete-status commented Aug 10, 2023

jm-clius commented Aug 10, 2023

Ivansete-status commented Aug 14, 2023

Ivansete-status commented Aug 28, 2023 • edited by fryorcraken

Ivansete-status commented Sep 4, 2023

Ivansete-status commented Sep 8, 2023 • edited by fryorcraken

Ivansete-status commented Sep 29, 2023 • edited

Ivansete-status commented Oct 6, 2023 • edited by fryorcraken

Ivansete-status commented Oct 16, 2023 • edited by fryorcraken

Ivansete-status commented Oct 27, 2023 • edited

Ivansete-status commented Nov 3, 2023

Ivansete-status commented Nov 7, 2023

Ivansete-status commented Nov 14, 2023

Ivansete-status commented Nov 14, 2023

fryorcraken commented Aug 7, 2023 •

edited by Ivansete-status

fryorcraken commented Aug 10, 2023 •

edited

Ivansete-status commented Aug 28, 2023 •

edited by fryorcraken

Ivansete-status commented Sep 8, 2023 •

edited by fryorcraken

Ivansete-status commented Sep 29, 2023 •

edited

Ivansete-status commented Oct 6, 2023 •

edited by fryorcraken

Ivansete-status commented Oct 16, 2023 •

edited by fryorcraken

Ivansete-status commented Oct 27, 2023 •

edited