Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL #1888

Closed
8 tasks done
fryorcraken opened this issue Aug 7, 2023 · 18 comments
Closed
8 tasks done

PostgreSQL #1888

fryorcraken opened this issue Aug 7, 2023 · 18 comments
Assignees
Labels
E:PostgreSQL See https://github.com/waku-org/pm/issues/84 for details E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details

Comments

@fryorcraken
Copy link
Collaborator

fryorcraken commented Aug 7, 2023

Planned start date:
Due date:

Summary

Implementation of PosgreSQL database engine for Waku Store queries.

See waku-org/pm#4 for details.

Acceptance Criteria

  • PostgreSQL is available in nwaku
  • Options to easily deploy PosgresSQL with nwaku available (e.g. docker compose)

Tasks

RAID (Risks, Assumptions, Issues and Dependencies)

@Ivansete-status
Copy link
Collaborator

Weekly Update

@fryorcraken
Copy link
Collaborator Author

I suggest to confirm PostgreSQL integration done, as it is actually done, especially that concurrent requests are already implemented. This should be good enough for 10k users.

We can then move this issue to scope PostgreSQL optimizations for 1mil users target. Do you agree @Ivansete-status ?

@Ivansete-status
Copy link
Collaborator

I suggest to confirm PostgreSQL integration done, as it is actually done, especially that concurrent requests are already implemented. This should be good enough for 10k users.

We can then move this issue to scope PostgreSQL optimizations for 1mil users target. Do you agree @Ivansete-status ?

Morning @fryorcraken ! The PostgreSQL integration is completed but I have a doubt: for 10k users, how many requests per second should we support ? And how many for 1mil? I think we need first to measure the current performance and this task is in progress, implementing a basic stress tests :)

@fryorcraken
Copy link
Collaborator Author

fryorcraken commented Aug 10, 2023

@kaiserd did a hackmd that did the estimation for relay scaling which gave us a theoretical green light for 10k users per shard. Maybe we can use the same figures for Store. However, it is challenging to estimate the needs in any case.

An alternative would be to look at the usage of store node in Status prod fleet. It's for ~100 contributors.
We can multiply this by 10x (1000 users) this would give us an idea of needs for 10k users (10 shards, 1000 users per shards). Not the 1mil users is 100 shards, 10k users per shards.

Another alternative would be to stress test sqlite and postgresql and if we demonstrated a 10x improvement then we can reach the same conclusion: we can support 100 users with sqlite, postgresql is 10x more performance, hence in theory we can support 1000 users with PostgreSQL.

SQLite and PostgreSQL are industry standards, surely there are available benchmarks that can tell us this performance difference?

Then, the next step is the needs of 10k users. I believe this is where optimization and DST simulation would be necessary.

I think we need first to measure the current performance

Indeed, it may make sense to measure performance and confirm there are no issues on nwaku code that bottlenecks the usage of PostgreSQL.
Great to hear it's in progress. where is it trcked?

implementing a basic stress tests

Is that something to be done with Kurtosis or more simple one? Where is it tracked? is the intent to just run it once or have it part of regular processes (e.g. release candidate).

Cc @jm-clius

@Ivansete-status
Copy link
Collaborator

SQLite and PostgreSQL are industry standards, surely there are available benchmarks that can tell us this performance difference?
Sure, we can check that. And also compare our current implementations.

The stress test creation is tracked in the next issue. The idea, for now, is to make manual runs and measurements.
#1894

@jm-clius
Copy link
Contributor

For now a manual measurement would be a good start IMO. In fact, we can look at current query rate for Status Community and assume a linear increase with increase in numbers and simply script that many queries to a postgresql instance. Very crude estimate with approximate results, but will give us a good initial benchmark.

@Ivansete-status
Copy link
Collaborator

Weekly Update

  • achieved: Learned that the insertion rate is constrained by the relay protocol. i.e. the maximum insert rate is limited by relay so I couldn't push the "insert" operation to a limit from a Postgres point of view. For example, if 25 clients publish messages concurrently, and each client publishes 300 msgs, all the messages are correctly stored. If repeating the same operation but with 50 clients, then many messages are lost because the relay protocol doesn't process all of them.
  • next: Carry on with stress testing. Analyze the performance differences between Postgres and SQLite regarding the read operations.

@fryorcraken fryorcraken changed the title [Milestone] PostgreSQL [Epic] PostgreSQL Aug 24, 2023
@fryorcraken fryorcraken added Epic and removed milestone Tracks a subteam milestone labels Aug 24, 2023
@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Aug 28, 2023

Weekly Update

  • achieved: new docker compose in test-waku-query that allows to quickly compare insert and query performance between SQLite and Postgres.
  • next: Carry on with stress testing & follow-up of the Postgres addition to wakuv2.shards by the infra team.

@Ivansete-status
Copy link
Collaborator

Weekly Update

  • achieved: Download and start configuring jmeter to have a variable number of clients sending concurrent Store requests.
  • next: Carry on with stress testing & follow-up of the Postgres addition to wakuv2.shards by the infra team.

@fryorcraken fryorcraken added E:PostgreSQL See https://github.com/waku-org/pm/issues/84 for details and removed E:2023-10k-users Epic labels Sep 8, 2023
@fryorcraken fryorcraken changed the title [Epic] PostgreSQL PostgreSQL Sep 8, 2023
@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Sep 8, 2023

Weekly Update

@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Sep 29, 2023

Weekly Update

  • achieved:

    • Better dburl parse that accepts host names with dashes and dots.
    • Properly set the compilation flag -d:postgres so Docker images are compiled with support to Postgres (with libpq5 dependency.)
    • During the stress testing, I discovered that the max throughput seems not to be directly related to Postgres. If I make the code to ignore Postgres and return immediately a mocked response, then the throughput is even lower.
  • next: Carry on with "select" performance analysis and analyze it directly from a Store client, rather than having REST <-> Store_Client <-> Store_Server. By ignoring the REST layer we will have a better insight into the actual Store protocol, as @jm-clius recommended to me some time ago.

@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Oct 6, 2023

Weekly Update

  • achieved: Applied performance comparison between SQLite and Postgres but in this case, making direct requests from a go-waku unittest that @richard-ramos had prepared.
    After directly comparing the Store protocol, noticed that the bottle neck is within the database itself. i.e. the SQLite database performs better than Postgres, given that we have a very simple schema and simple queries, without joins. Adding indexes to the Postgres database didn't help very much. For example, given the same query, SQLite takes 1ms whereas Postgres takes 6ms.

  • next:

    • Wrap up the Store testing environment and install it into our sandbox machine, metal-01.he-eu-hel1.wakudev.misc.statusim.net, so that anyone can proceed from this point (two databases with the same dataset of ~2 million rows .) in case someone is keen on analyzing performance or debug in a more realistic testing scenery. This will include concurrent queries from multiple nodes, where PostgreSQL is expected to perform better.
    • Start extracting the database creation and indexes creation to outside the code base.

@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Oct 16, 2023

Weekly Update

  • achieved:

    • Testing environment prepared in metal-01.he-eu-hel1.wakudev.misc.statusim.net. There are two databases (Postgres and SQLite) with 5 million of random messages.
    • Enhanced Grafana dashboard so that we can compare timings performance throughout an histogram.
  • next: Carry on with the investigation to enhance the Postgres performance.

@Ivansete-status
Copy link
Collaborator

Ivansete-status commented Oct 27, 2023

Weekly Update

  • achieved:
    • Time processing enhancement when performing SELECT operations. There was an overhead caused by looping too many times over the returned rows, in order to convert the row types. By applying a "rowCallback" approach we can reduce by 30ms the time spent on the query under analysis.
  • next:
    • The queries used in the comparison analysis still perform much better in SQLite (< ~5ms) than in Postgres (< ~15ms.) Therefore we need to push the investigation further to enhance that.

      ( Edited: notice that the timings indicated above are for tests using consecutive queries. If the queries are performed concurrently, then the timings are worse. I will elaborate more in a report shortly .)

@Ivansete-status
Copy link
Collaborator

Weekly Update

  • achieved: Optimize select/Store queries by adding prepared statements. PR
  • next: Wrap up the Postgres optimizations. Summarize the performance comparison in a report.

@Ivansete-status
Copy link
Collaborator

I'll disregard for now the point of "having the queries in a separate file". The main reason is that we are using prepared statements intensively, in order to enhance query performance, and that induced a more complex query set. Therefore, I don't see a benefit of doing that in the short-term.

@Ivansete-status
Copy link
Collaborator

The report was created in https://github.com/waku-org/nwaku/wiki/Postgres-adoption

@Ivansete-status
Copy link
Collaborator

I crossed out "#1885" from the description list because the "size" retention policy isn't strictly related to Postgres only.

Therefore, this task can be considered complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:PostgreSQL See https://github.com/waku-org/pm/issues/84 for details E:2.1: Production testing of existing protocols See https://github.com/waku-org/pm/issues/49 for details
Projects
Archived in project
Development

No branches or pull requests

3 participants