Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running simultaneous queries with relations stalls typeorm #4738

Closed
oskarious opened this issue Sep 12, 2019 · 21 comments · Fixed by #10630
Closed

Running simultaneous queries with relations stalls typeorm #4738

oskarious opened this issue Sep 12, 2019 · 21 comments · Fixed by #10630

Comments

@oskarious
Copy link

Issue type:

[ ] question
[x] bug report
[ ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[ ] mssql
[ ] mysql / mariadb
[ ] oracle
[x] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

[ ] latest
[x] @next
[ ] 0.x.x (or put your version here)

I ran into an issue where typeorm would just stop responding occasionally, and when looking at the postgres stats I noticed that typeorm kept 10 (the size of the pool) connections open and in idle and didn't seem to ever close them afterwards.

The connections showed up as Idle | WaitEvent: Client: ClientRead.

Some more investigations seem to indicate that it the issue presents itself when trying to find entities with relations (with the getManager().find(Entity, options) syntax).

Increasing the pool size can help mitigate the problem, and using the query builder works as a workaround, but is obviously something we'd want to avoid as much as possible in favour of shorter, and clearer syntax.

I don't believe it's a config issue either

let entities = ['./dist/api/src/entities/**/*.js'];

module.exports = {
  type: 'postgres',
  synchronize, // True locally
  ssl,
  host,
  database,
  username,
  password,
  entities,
  timezone: 'Z',
  logging,
  logger,
  maxQueryExecutionTime: 1000,
  extra: {
    poolSize, // 10
    idleTimeoutMillis: 5000, // Drop connections that are stalled
    connectionTimeoutMillis: 10000, // Drop connections that are stalled
  },

  cli: {
    entitiesDir: 'api/src/entities/**',
    migrationsDir: 'api/src/migration',
    subscribersDir: 'api/src/subscriber',
  },
};

Steps to reproduce or a small repository showing the problem:

/* eslint-disable @typescript-eslint/no-use-before-define */
import { createConnection, Entity, getManager, ManyToOne, OneToMany, PrimaryGeneratedColumn } from 'typeorm';

@Entity()
export class Entity2 {
  @PrimaryGeneratedColumn()
  id!: number;

  @ManyToOne(() => Entity1, ent1 => ent1.ent2)
  ent1!: Entity1;
}

@Entity()
export class Entity1 {
  @PrimaryGeneratedColumn()
  id!: number;

  @OneToMany(() => Entity2, ent2 => ent2.ent1)
  ent2!: Entity2[];
}

const getWithRelations = () => {
  getManager()
    .find(Entity1, { relations: { ent2: true } })
    .then(e => {
      console.log(e);
    });
};

const getSimple = () => {
  getManager()
    .find(Entity1)
    .then(console.log);
};

const getWithQb = () => {
  getManager()
    .getRepository(Entity1)
    .createQueryBuilder('ent')
    .leftJoinAndSelect('ent.ent2', 'ent2')
    .getMany()
    .then(console.log);
};

(async () => {
  await createConnection();

  const parent = await getManager().save(new Entity1());

  const child1 = new Entity2();
  child1.ent1 = parent;
  await getManager().save(child1);

  const child2 = new Entity2();
  child2.ent1 = parent;
  await getManager().save(child2);

  for (let i = 0; i < 35; i++) {
    getWithRelations(); // Will stall
  }

  for (let i = 0; i < 35; i++) {
    getSimple(); // Runs as expected
  }

  for (let i = 0; i < 35; i++) {
    getWithQb(); // Runs as expected
  }
})();
@johnbjurulf
Copy link

Got the same issue :-(

@jwhitmarsh
Copy link

Is there any alternative solution to this problem, other than those suggested by OP?

@oskarious
Copy link
Author

@jwhitmarsh
For reference, I have started using the query builder instead. Haven't checked out the latest versions of typeorm, but last time I checked (a couple of months ago) it was still happening. Interested in hearing if anyone else has any alternatives as well!

@falahati
Copy link

falahati commented Jan 25, 2021

Same thing here; especially if you have a queue for operations; in our case, it is the RabbitMQ and requests from other microservices that can overwhelm the service when it is down for a while; like when stuck due to this issue; therefore pushing it to stay in this state forever.
Decreasing the database timeout can mitigate the problem, but you better have your operations in a transaction and being able to gracefully reject or redone it, otherwise, it's a risk to the integrity of the data stored.

Please consider taking a look into this issue as it almost makes it impossible to write a real world application with the Postgres driver.

@falahati
Copy link

falahati commented Jan 30, 2021

My problem was that I tried to use .find() from outside of a transaction using Repository while being in a transaction; since the transaction takes a connection, and the second nested call takes another connection; it would create a deadlock as a result of a race condition if 10 transactions ware started together and then each tried to use .find() and therefore opening another connection before any connection in the pool is free.
Solved simply by doing everything database related using the instance of entity manager provided by the transaction.

I also tried to reproduce the provided sample code here and it works as expected. So I don't know if this specific issue still exists.

@Stene3
Copy link

Stene3 commented Jul 1, 2022

Same happenned to me. Seems like issue with relations loading when relationLoadStrategy of type query is used. Single queries works, but more concurrently connections cause typeorm, and most probably all connection in pool, to be stalled. Any idea what could be the reason? I was trying to debug but no luck yet.

@mkeemon
Copy link

mkeemon commented Aug 11, 2022

@Stene3 I have been running into the same issue using relationLoadStrategy: 'query'. The connection pool seems to be at the heart of the issue. I increased the max pool size to 50 against my local Postgres instance, and no longer run into the app locking up. Not a solution by any means, but helpful in debugging.

@edeesis
Copy link
Contributor

edeesis commented Jan 30, 2023

We just ran into this issue as well, thought it might've run out of memory, but the only thing that makes sense is the connection pool issue. App just stopped responding to requests.

@adrien2p
Copy link

UP!

@scr4bble
Copy link
Contributor

scr4bble commented Aug 4, 2023

We just ran into this issue as well.
We are using MySQL though, not PostgeSQL.
TypeORM version "0.3.10"
Using parallel tasks from (nestjs/)bull. Each of the parallel tasks executes find() method to query the database while also fetching some of the entity relations.
It started happening when we switched relationLoadStrategy to 'query' with intention to improve the performance.

@joelybahh
Copy link

joelybahh commented Sep 21, 2023

Same issue with us, we rolled out an API performance update via this at the time "magical" setting, however, we get the same thing, basically is like our API is on a timer before it falls over. Its a function app on Azure, so once this issue arises, the ENTIRE function needs to be restarted to create a new instance of typeORM.

Our workaround that doesn't crash the API is a poolSize of 100, this only works because our DB connection errors out first, which thankfully, isn't completely breaking the typeORM manager, and throws the appropriate errors so it will start working again as they clear out.

@darius-00
Copy link

darius-00 commented Sep 28, 2023

Same issue here. Using NestJS with TypeORM, Postgresql. Running more concurrent queries than a number in poolSize specified causes full stall from any further database queries.
Query:
this.usersRepository.findOne({ where: { 'id': userId } relations: ['posts'] })

SELECT * FROM pg_stat_activity;
While typeorm works as expected connections in pg_stat_activity appears and disappears. But as soon as you do 100 concurrent queries all those connections in pg_stat_activity stays there in idle state and server becomes unresponsive.

@joelybahh
Copy link

joelybahh commented Sep 29, 2023

@darius-00 I've experienced exactly the same issue, then an entire server restart is required. I've found some "hidden" config options that are helping, but still unusual as no errors are thrown when the limit is reached, its like typeORM just silently dies with no way to reboot without entire restart.

My hunch is that as we are in a "serverless" api environment, maybe its initialising the datasource more than once so even with max set, its only max for that data source instance, but am unsure.

Here's some extra config that prevented everything falling over at the very least for us (So far):

const extra = {
    ssl: DB_DEV ? false : true,
    max: DB_MAX ? Number(DB_MAX) : DEFAULT_MAX,
    poolSize: DB_POOL_SIZE ? Number(DB_POOL_SIZE) : DEFAULT_POOL_SIZE,
    connectionTimeoutMillis: DB_CONNECTION_TIMEOUT_MILLIS
        ? Number(DB_CONNECTION_TIMEOUT_MILLIS)
        : DEFAULT_CONNECTION_TIMEOUT_MILLIS,
    query_timeout: DB_QUERY_TIMEOUT ? Number(DB_QUERY_TIMEOUT) : DEFAULT_QUERY_TIMEOUT,
    statement_timeout: DB_STATEMENT_TIMEOUT
        ? Number(DB_STATEMENT_TIMEOUT)
        : DEFAULT_STATEMENT_TIMEOUT,
};

export const AppDataSource = new DataSource({
    type: "postgres",
    host: DB_HOST,
    port: Number(DB_PORT),
    username: DB_USERNAME,
    password: DB_PASSWORD,
    database: DB_DATABASE,
    extra,
    logging: DB_LOG ? JSON.parse(DB_LOG) : true,
    synchronize: false,
    entities: entities,
    subscribers: [],
    migrations: migrations,
    ...rest,
});

@joelybahh
Copy link

When I set max to 5, for example, I observe the pg_stat_activity increments connections by 5 for EVERY request to the API, making me unsure if max means max connections per query, or max connections in general. Documentation around configuration options for postgres specifically seem light/non-existent. The datasource is only being initialised once from what we can tell so unless we're missing some other nuance of serverless, unsure best starting point to debug this.

@clintonb
Copy link
Contributor

max comes from node-postgres. You're creating a pool with the specified number of connections. The connections are opened when the DataSource initializes the connection. You should expect to see those connections idle, unless you are making simultaneous connections through that one DataSource.

If you have n instances of your service running, you will end up with n * max connections.

@darius-00
Copy link

poolSize and max config for me did the same thing. Just to make little better I increased poolSize to 100. It's much better than default 10, but still any user with js script
for(let i = 0; i < 200; i++){ fetch('/api/test') }
inside browsers dev console can make server inaccessible until it is restarted manually.
Strange thing is that sometimes I cannot recreate problem even with 1k http requests. After server is restarted, problem can occur with just 200 requests. After next restart it can be good again. After another restart it can happen again.

@KamalAman
Copy link

My problem was that I tried to use .find() from outside of a transaction using Repository while being in a transaction; since the transaction takes a connection, and the second nested call takes another connection; it would create a deadlock as a result of a race condition if 10 transactions ware started together and then each tried to use .find() and therefore opening another connection before any connection in the pool is free. Solved simply by doing everything database related using the instance of entity manager provided by the transaction.

I also tried to reproduce the provided sample code here and it works as expected. So I don't know if this specific issue still exists.

The root cause for our deadlock was caused by caused by the N+1 Problem in GraphQL creating dead-lock with query managers + repository queries e.g. 10 query managers were being created, but then we needed some additional data from the database and used a repository query, causing the dead lock with the maxPool size of 10. Query number 11 may not pass go and it creates a deadlock.

The solution was to ensure each top level service function only interacts with the database using the same query manager.

@dgonzalezcuyna
Copy link

+1

@ajubin
Copy link

ajubin commented Nov 28, 2023

it seems to be related to #10481

@scr4bble
Copy link
Contributor

it seems to be related to #10481

Yes, the new issue is well described. Hopefully it gets more attention. It's quite important feature.

@gauravl-tevaeralabs
Copy link

UP!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.