Proper reconnect to database after unexpected loss of connection #6350

Mondei1 · 2020-07-04T10:57:59Z

Issue type:

[x] question
[ ] bug report
[ ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[ ] mssql
[x] mysql / mariadb
[ ] oracle
[ ] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

[ ] latest
[ ] @next
[x] 0.2.22

Steps to reproduce or a small repository showing the problem:

Establish a connection to your database. Then do some inserts and selects, whatever. And while TypeORM does all that, cut the connection by stopping the MariaDB server. That should throw an error (when logging is turned on) that says either PROTOCOL_CONNECTION_LOST and/or ER_SERVER_SHUTDOWN in the code property of the error.

Actual question

How can I catch that? How can I catch these errors to perform reconnect tries. I want to have access to the error object that looks like that in the console:

error: Error: ER_SERVER_SHUTDOWN: Server shutdown in progress
    [TRACE]
    --------------------
    [TRACE]
  code: 'ER_SERVER_SHUTDOWN',
  errno: 1053,
  sqlMessage: 'Server shutdown in progress',
  sqlState: '08S01',
  index: 0,
  sql: 'ALTER TABLE `TABLE_NAME` ADD CONSTRAINT `FK_62c53a5b672adbef1ccb69f5a4b` FOREIGN KEY (`TABLE_NAME`) REFERENCES `TABLE_NAME`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION'

error: Error: Connection lost: The server closed the connection.
    [STACK]
  fatal: true,
  code: 'PROTOCOL_CONNECTION_LOST'

My goal is to catch that event and the SQL query, so that when a new connection is established, I can execute these cached SQL queries, so no data is lost. Maybe TypeORM as an built-in way or event to detect such things.

Solving attempts

Through unhandled rejection handler

process.on('unhandledRejection', listener => {
    if (listener!.toString().match(/ECONNREFUSED/) ||
        listener!.toString().match(/Connection is not established/)) {
        logger(`Connection to ${bold(target_database.database)} has been lost. Retry to connect ...`, _PRIORITY.CRITICAL, null, _MODULE.MAIN);
        conn?.close()
        conn?.connect()
    }

})

The problem is that I don't get enough data. I can manually close the connection (since TypeORM still thinks that the connection is there) and then I can try a reconnect. Problem is just that I'm not able to cache queries that were made while the database wasn't available. And another problem is that this might trigger on an other event which isn't related to the database at all.

Directly using the underlying MySQL driver

// I also tried to use the 'enqueue' event, which doesn't trigger either.
// @ts-ignore: Since `pool` doesn't exist in the type definition
conn.driver.pool.on('error', (error) => {
    logger(`Connection to ${bold(target_database.database)} has been lost. Retry to connect ...`, _PRIORITY.CRITICAL, null, _MODULE.MAIN);
})

Here I get the driver property of my TypeORM connection and of that the pool object. I don't get an runtime error, so I assume that this object exists. The problem is just that this never triggers.

My config

{
    name: "development",
    type: "mysql",
    host: "localhost",
    port: 3306,
    username: "****-backend-dev",
    password: "*****",
    database: "****-backend-dev",
    synchronize: true,
    logging: true,
    entities: [
        "src/models/**/*.ts"
    ],
    migrations: [
        "src/migration/**/*.ts"
    ],
    subscribers: [
        "src/subscriber/**/*.ts"
    ],
    extra: {
        connectionTimeoutMillis: 1000,
        idleTimeoutMillis: 1500
    }
}

The text was updated successfully, but these errors were encountered:

Mondei1 · 2020-07-05T13:14:28Z

Update: I'm actually able to get queries after they got executed with a custom logger:

export class CustomTypeORMLogger implements Logger {

    logQueryError(error: string, query: string, parameters?: any[] | undefined, queryRunner?: QueryRunner | undefined) {
        if (this.isDown) {
            logger(`Query "${query}" ${chalk.grey(`-- [${parameters?.join(", ")}`)}] will be cached!`, _PRIORITY.NORMAL, null, _MODULE.DB)
        } else {
            this.isDown = true
            logger(`Query "${query}" ${chalk.grey(`-- [${parameters?.join(", ")}]`)} 
            failed because ${chalk.bold(error)}"
            
            === [DATABASE IS OFFLINE] ===
            
            `, _PRIORITY.ERROR, null, _MODULE.DB)
        }
    }
    logQuery(query: string, parameters?: any[] | undefined, queryRunner?: QueryRunner | undefined) {
        logger(`Query "${query}" ${chalk.grey(`-- [${parameters?.join(", ")}]`)} was ${chalk.bold("successful")}`, _PRIORITY.DEBUG, null, _MODULE.DB)
    }
    [...]

Which leads to that output:

[14:47:32,859] DEBG [DB   ]:    Query "SET FOREIGN_KEY_CHECKS = 1;" -- [undefined] was successful

So far, so good. But it's still weird. Every time TypeORM syncs the database (creating tables, sets up foreign keys) and it loses connection in that period, the logger receives a message that get's logged:

[14:53:39,319] # ERROR [DB   ]: Query "ALTER TABLE `password_reset` ADD CONSTRAINT `FK_e279a7b047bf32bc5dd17b43fdc` FOREIGN KEY (`user`) REFERENCES `users`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION" -- [undefined] 
            failed because Error: ER_SERVER_SHUTDOWN: Server shutdown in progress"
            
            === [DATABASE IS OFFLINE] ===
            
            
[14:53:39,320] DEBG [DB   ]:    Query "ROLLBACK" -- [undefined] was successful
[14:53:39,321] INFO [DB   ]:    Query "ROLLBACK" -- [undefined] will be cached!

Now I start the MariaDB server again and the backend and stop the database again after the backend finished syncing and proceeds with seeding the DB with dummy data (after the connection was successful) I get ECONNREFUSED errors:

Error: connect ECONNREFUSED 127.0.0.1:3306
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
    --------------------
    at Protocol._enqueue (.../node_modules/mysql/lib/protocol/Protocol.js:144:48)
    at Protocol.handshake (.../node_modules/mysql/lib/protocol/Protocol.js:51:23)
    at PoolConnection.connect (.../node_modules/mysql/lib/Connection.js:119:18)
    [...]
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3306,
  fatal: true
}

But logQueryError(...) doesn't get triggered. I assume it's a driver issue and not TypeORM (I'm not sure). Maybe TypeORM doesn't even notice that, since the driver throws that error. The thing is, I don't want that to be triggered. I want it to continue normally and to cache the finish constructed SQL-queries, so I can execute them later, when the database is back online.

Another approach

I also tried using subscribers with the beforeInsert(...) event which works like expected but I haven't seen a way to get a SQL query out of that. I also tried to override the .query(...) function but that didn't helped really. I'm lost, really 😭

Maybe this isn't really possible and I have to use Redis which takes care of that? I really don't know.

dani3l · 2021-11-14T09:59:38Z

hi guys
any update on this one?

segovia-no · 2021-11-29T20:31:21Z

I also have the same problem, i'm trying to repurpose the default connection to connect to another server (after closing it).

Mondei1 · 2021-12-01T14:12:00Z

This seems to be a common problem if already 8 people showed interest.

As for myself: This issue is now more then a year old and I don't work on the project anymore where I required a solution to this problem. It would be great if someone could experiment on it's own to solve this issue but it would be much easier if the dev team could implement a exposed handler or internal mechanic to solve this. It's very important to not lose queries if the connection drops for e. g. 30secs.

VarunBatraIT · 2022-05-03T08:40:21Z

I am also facing the same problem:
So far I am thinking:

Should there be a ping functionality and auto-reconnect?
What about an event on the loss of connection as well with ping to move to a different server as mentioned by @segovia-no

alexey2baranov · 2022-06-09T11:14:02Z

in my current project we use TypeORM to interact with Aurora AWS. Sometimes Aurora shutdowns server and instantiates another one. Previous team used Sequelize internal retry feature to handle this situation. It performed autoreconnect during specified period and everything went well. So I believe it is possible to do in TypoORM. I have a few ideas which I will describe in next post.

alexey2baranov · 2022-06-09T11:25:56Z

TypeORM has internal custom logger feature which I guess may be utilized for this purpose. Currently we utilize it to collect SQL, start time, end time and parameters of each query to send to AWS XRay. It works perfect. And I believe we can utilize this logger for retrying as well. Perhaps all we need is to introduce a cache to store store all this data and perform broken query later after reconnect.

alexey2baranov · 2022-06-10T11:34:27Z

BTW: doesn't this canRetry do the job? https://typeorm.io/multiple-data-sources#replication

   /**
    * If true, PoolCluster will attempt to reconnect when connection fails. (Default: true)
    */
    canRetry: true,

krisanalfa · 2023-02-16T04:50:28Z

@alexey2baranov I've tried canRetry, it only retry the initial connection. Once your connection Pool established, then it's disconnected for any reason, TypeORM still won't be able to reconnect your broken connection - leaving all queries failed.

mustard-mh · 2023-03-14T07:34:20Z

Faced kind of the same problem with replication configured

Stop mysqld after ts server started
Restart mysqld again
It'll never reconnect. With error Pool does Not exists always 😨

MedFBoudis · 2023-06-08T13:06:31Z

Any updates on this ? kind of a deal breaker...

yatishbalaji · 2023-06-30T12:13:08Z

Any update on this. eagerly waiting since we get this issue in production

gabrielcipriano · 2023-08-10T14:42:59Z

Faced kind of the same problem with replication configured

Stop mysqld after ts server started

Restart mysqld again

It'll never reconnect. With error Pool does Not exists always 😨

I'm facing exactly the same problem

LuckyArdhika · 2023-08-17T00:04:01Z

i will migrate to other orm if nothing resolve here

theocerutti · 2023-09-05T15:14:07Z

Any solution ? I mean it's mandatory for production so...
Is typeorm only used for dev?

EDIT SOLUTION:
I am working with nestjs/typeorm:

// seems to be mandatory otherwise poolErrorHandler doesn't get called if database dies
process.on('unhandledRejection', () => {});
const databaseConfigModule = TypeOrmModule.forRootAsync({
  imports: [ConfigModule],
  inject: [ConfigService],
  useFactory: (config: ConfigService<Environment>) => ({
    type: 'postgres',
    host: config.get<DatabaseConfig>('database').host,
    port: config.get<DatabaseConfig>('database').port,
    username: config.get<DatabaseConfig>('database').user,
    password: config.get<DatabaseConfig>('database').password,
    database: config.get<DatabaseConfig>('database').database,
    entities: [],
    synchronize: true,
    autoLoadEntities: true,
    poolErrorHandler: async (err) => {
      Logger.error(err);
      const retryDelayMs = 1000;
      const reconnection = setInterval(async () => {
        Logger.log(
          `DB Connection lost. Retry every ${retryDelayMs}ms...`,
        );
        const datasource = new DataSource(databaseOptions);
        const db = await datasource.initialize();
        if (db && db.isInitialized) {
          Logger.log('Connection restored.');
          clearInterval(reconnection);
        }
      }, retryDelayMs);
    },
    },
  }),
});

Lacrimosica · 2023-09-13T16:44:26Z

@theocerutti thanks for your solution, Although I tried implementing it but couldn't really do it, Maybe you could clear things up for me, What troubles me is that the databaseOptions that you are going to pass to the new instance, firstly Correct me if I'm wrong is DataSource a method that is used is a factory method and the databaseOptions is a const that you pass to it?

since I'm not sure that there is a native class that would do it, moreover; the biggest problem I'm facing is the "recursive" manner in which you are doing this, since if the new instance of the DataSource you are creating is going to have the poolErrorHandler you would have to pass a databaseOptions containing that handler into it again and then in that, again and again, and again. this is my interpretation of course, If you could provide more details regarding my doubts that would be great.

Lacrimosica · 2023-09-13T16:44:41Z

On the other hand, what I managed to implement yesterday was to handle this error.
of course, this is limited to my use case.
what I did was to implement a GlobalExceptionHandler which actually lets you catch the ECONNREFUSED exception, whereas in its absence Nest would not let you catch it.

DISCLAIMER:
this is not related to typeorm so and is rather just a workaround so excuse me if I'm posting it here, I just supposed it might help someone since at first i was perplexed by the fact that i couldn't catch this error and handle it, but then i discovered that implementing a gloablExceptionHandler gives you control over it (which seems obvious i know but whatever)

@Catch()
export class GlobalExceptionFilter implements ExceptionFilter {

  MAX_RETRIES = 10;
  RETRY_DELAY = 10000;
  FAILED_REQUEST_DELAY = 5000;
  private failedRequests: FailedRequest[] = [];
  
  async catch(exception: any, host: ArgumentsHost) {
      const ctx = host.switchToHttp();
      let response = ctx.getResponse<Response>();
      const request = ctx.getRequest<Request>();
  
      /*
        When the database is down and the connection is lost, the exception is not an instance of HttpException
        rather it has the exception code ECONNREFUSED
        We identify it and implement a strategy to restore the connection
        and retry the failed requests
       */
       
      if (exception && exception.code && exception.code.includes('ECONNREFUSED')) {
        console.error('Detected database connection issue! Trying to reconnect...');
        const connection: Connection = getConnection();
        return this.handleDatabaseConnectionIssue(connection, ctx, request, response);
      }
  
      /* rest of the default/generic exception handling code */
  
      ...
  
      response.status(status).json(GlobalResponseError(status, message, code, request));
    },

inside the handleDatabaseConnectionIssue:

   async handleDatabaseConnectionIssue(connection: Connection, ctx: HttpArgumentsHost, request: any, response: any) {
    let retries = 0;
    let retrySuccessful = false;

    const failedRequest: FailedRequest = {
      method: request.method,
      endpoint: request.url,
      body: request.body,
      headers: request.headers,
    };

    this.failedRequests.push(failedRequest);

    /*
      I closed the connection because although the connection has been lost,
      the connection.isConnected() method returns true, and the connection.connect() method does not work.
    */

    await connection.close();
    while (retries < this.MAX_RETRIES && !retrySuccessful) {
      /* delay the execution of the loop */
      await new Promise(res => setTimeout(res, this.RETRY_DELAY));

      try {
        await connection.connect();
        console.log('Reconnected to the database successfully.');
        /* when the connection is restored, retry the failed requests */
        const result = await this.retryFailedRequests();
        /* if the request is retried successfully, send the response to the client */
        if (result) {
          response.status(result.status).send(result.data);
          response.headers = result.headers;
          retrySuccessful = true;
        }
        retries = 0;
      } catch (err) {
        console.log(`Failed to reconnect to the database on attempt ${retries + 1}.`, err.stack);
        retries++;
      }

      if (retries >= this.MAX_RETRIES) {
        console.log('Exceeded max retries. Giving up on database reconnection.');
      }
    }
  }

my use case was to redo the call to the database therefore i retried doing the http call that i had put inside the failedRequests array and then retry them using axios:

private async retryFailedRequests() {
  while (this.failedRequests.length) {
    const request = this.failedRequests.shift();
    await new Promise(res => setTimeout(res, this.FAILED_REQUEST_DELAY));
    const axiosRequest ={
      method: request.method,
      url: process.env.BASE_URL + request.endpoint,
      headers: request.headers,
      data: request.body
    }

    try {
      const response = await axios(axiosRequest);
      console.log('Request retried successfully:', response.data);
      return response;
    } catch (error) {
      console.error('Failed to retry the request:', error.message);
      this.failedRequests.push(request);
    }
  }
}

it goes without saying that there are other workarounds such as using a queuing framework that could redo the http calls instead of doing them internally in your application. for example, configuring it to redo the http calls to your application in case of a certain error code, using master/slave replication options which I tried but really didn't get into since it was a pain and wasn't really my use case nor had I the possibility.
passing retryAttempts and retryDelay only serves when the application is restarted, instead, this use case is more directed towards databases that might break under certain conditions and there is the need to just wait it out for a minute or so and you want the user to not loose connection, of course, client-side timeouts should be considered.

MickaelDjak · 2023-10-16T13:57:11Z

Have the same problem
I expect to see 2 options:
1 set up reconnection logic true/false
2 specifing errorHandler or something like that

but see nothing from that

There are any plan to add any solution to hanlde next errors?

Connection terminated unexpectedly
Connection terminated

americano212 · 2024-01-15T15:15:48Z

I have the same problem.

I used docker for running mysql database.
When I tried to connect immediately docker-up, it throw 'Connection lost' error.

imnotjames added driver: mariadb driver: mysql question labels Oct 1, 2020

imnotjames mentioned this issue Oct 9, 2020

how to reconnect after disconnect #6116

Closed

imnotjames added the comp: error handling label Jul 9, 2021

Mondei1 changed the title ~~Proper reconnect to database after unexpected lost of connection~~ Proper reconnect to database after unexpected loss of connection Mar 16, 2023

maksimkurb mentioned this issue Mar 21, 2024

[BUG] ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-jpeg-thumbnail) and (library/library-refresh-asset) immich-app/immich#4827

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper reconnect to database after unexpected loss of connection #6350

Proper reconnect to database after unexpected loss of connection #6350

Mondei1 commented Jul 4, 2020 •

edited

Mondei1 commented Jul 5, 2020

dani3l commented Nov 14, 2021

segovia-no commented Nov 29, 2021

Mondei1 commented Dec 1, 2021

VarunBatraIT commented May 3, 2022

alexey2baranov commented Jun 9, 2022 •

edited

alexey2baranov commented Jun 9, 2022

alexey2baranov commented Jun 10, 2022

krisanalfa commented Feb 16, 2023

mustard-mh commented Mar 14, 2023 •

edited

MedFBoudis commented Jun 8, 2023

yatishbalaji commented Jun 30, 2023

gabrielcipriano commented Aug 10, 2023

LuckyArdhika commented Aug 17, 2023

theocerutti commented Sep 5, 2023 •

edited

Lacrimosica commented Sep 13, 2023

Lacrimosica commented Sep 13, 2023

MickaelDjak commented Oct 16, 2023 •

edited

americano212 commented Jan 15, 2024

Proper reconnect to database after unexpected loss of connection #6350

Proper reconnect to database after unexpected loss of connection #6350

Comments

Mondei1 commented Jul 4, 2020 • edited

Steps to reproduce or a small repository showing the problem:

Actual question

Solving attempts

Through unhandled rejection handler

Directly using the underlying MySQL driver

My config

Mondei1 commented Jul 5, 2020

Another approach

dani3l commented Nov 14, 2021

segovia-no commented Nov 29, 2021

Mondei1 commented Dec 1, 2021

VarunBatraIT commented May 3, 2022

alexey2baranov commented Jun 9, 2022 • edited

alexey2baranov commented Jun 9, 2022

alexey2baranov commented Jun 10, 2022

krisanalfa commented Feb 16, 2023

mustard-mh commented Mar 14, 2023 • edited

MedFBoudis commented Jun 8, 2023

yatishbalaji commented Jun 30, 2023

gabrielcipriano commented Aug 10, 2023

LuckyArdhika commented Aug 17, 2023

theocerutti commented Sep 5, 2023 • edited

Lacrimosica commented Sep 13, 2023

Lacrimosica commented Sep 13, 2023

MickaelDjak commented Oct 16, 2023 • edited

americano212 commented Jan 15, 2024

Mondei1 commented Jul 4, 2020 •

edited

alexey2baranov commented Jun 9, 2022 •

edited

mustard-mh commented Mar 14, 2023 •

edited

theocerutti commented Sep 5, 2023 •

edited

MickaelDjak commented Oct 16, 2023 •

edited