Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper reconnect to database after unexpected loss of connection #6350

Open
Mondei1 opened this issue Jul 4, 2020 · 19 comments
Open

Proper reconnect to database after unexpected loss of connection #6350

Mondei1 opened this issue Jul 4, 2020 · 19 comments

Comments

@Mondei1
Copy link

Mondei1 commented Jul 4, 2020

Issue type:

[x] question
[ ] bug report
[ ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[ ] mssql
[x] mysql / mariadb
[ ] oracle
[ ] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

[ ] latest
[ ] @next
[x] 0.2.22

Steps to reproduce or a small repository showing the problem:

Establish a connection to your database. Then do some inserts and selects, whatever. And while TypeORM does all that, cut the connection by stopping the MariaDB server. That should throw an error (when logging is turned on) that says either PROTOCOL_CONNECTION_LOST and/or ER_SERVER_SHUTDOWN in the code property of the error.

Actual question

How can I catch that? How can I catch these errors to perform reconnect tries. I want to have access to the error object that looks like that in the console:

error: Error: ER_SERVER_SHUTDOWN: Server shutdown in progress
    [TRACE]
    --------------------
    [TRACE]
  code: 'ER_SERVER_SHUTDOWN',
  errno: 1053,
  sqlMessage: 'Server shutdown in progress',
  sqlState: '08S01',
  index: 0,
  sql: 'ALTER TABLE `TABLE_NAME` ADD CONSTRAINT `FK_62c53a5b672adbef1ccb69f5a4b` FOREIGN KEY (`TABLE_NAME`) REFERENCES `TABLE_NAME`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION'

error: Error: Connection lost: The server closed the connection.
    [STACK]
  fatal: true,
  code: 'PROTOCOL_CONNECTION_LOST'

My goal is to catch that event and the SQL query, so that when a new connection is established, I can execute these cached SQL queries, so no data is lost. Maybe TypeORM as an built-in way or event to detect such things.

Solving attempts

Through unhandled rejection handler

process.on('unhandledRejection', listener => {
    if (listener!.toString().match(/ECONNREFUSED/) ||
        listener!.toString().match(/Connection is not established/)) {
        logger(`Connection to ${bold(target_database.database)} has been lost. Retry to connect ...`, _PRIORITY.CRITICAL, null, _MODULE.MAIN);
        conn?.close()
        conn?.connect()
    }

})

The problem is that I don't get enough data. I can manually close the connection (since TypeORM still thinks that the connection is there) and then I can try a reconnect. Problem is just that I'm not able to cache queries that were made while the database wasn't available. And another problem is that this might trigger on an other event which isn't related to the database at all.

Directly using the underlying MySQL driver

// I also tried to use the 'enqueue' event, which doesn't trigger either.
// @ts-ignore: Since `pool` doesn't exist in the type definition
conn.driver.pool.on('error', (error) => {
    logger(`Connection to ${bold(target_database.database)} has been lost. Retry to connect ...`, _PRIORITY.CRITICAL, null, _MODULE.MAIN);
})

Here I get the driver property of my TypeORM connection and of that the pool object. I don't get an runtime error, so I assume that this object exists. The problem is just that this never triggers.

My config

{
    name: "development",
    type: "mysql",
    host: "localhost",
    port: 3306,
    username: "****-backend-dev",
    password: "*****",
    database: "****-backend-dev",
    synchronize: true,
    logging: true,
    entities: [
        "src/models/**/*.ts"
    ],
    migrations: [
        "src/migration/**/*.ts"
    ],
    subscribers: [
        "src/subscriber/**/*.ts"
    ],
    extra: {
        connectionTimeoutMillis: 1000,
        idleTimeoutMillis: 1500
    }
}
@Mondei1
Copy link
Author

Mondei1 commented Jul 5, 2020

Update: I'm actually able to get queries after they got executed with a custom logger:

export class CustomTypeORMLogger implements Logger {

    logQueryError(error: string, query: string, parameters?: any[] | undefined, queryRunner?: QueryRunner | undefined) {
        if (this.isDown) {
            logger(`Query "${query}" ${chalk.grey(`-- [${parameters?.join(", ")}`)}] will be cached!`, _PRIORITY.NORMAL, null, _MODULE.DB)
        } else {
            this.isDown = true
            logger(`Query "${query}" ${chalk.grey(`-- [${parameters?.join(", ")}]`)} 
            failed because ${chalk.bold(error)}"
            
            === [DATABASE IS OFFLINE] ===
            
            `, _PRIORITY.ERROR, null, _MODULE.DB)
        }
    }
    logQuery(query: string, parameters?: any[] | undefined, queryRunner?: QueryRunner | undefined) {
        logger(`Query "${query}" ${chalk.grey(`-- [${parameters?.join(", ")}]`)} was ${chalk.bold("successful")}`, _PRIORITY.DEBUG, null, _MODULE.DB)
    }
    [...]

Which leads to that output:

[14:47:32,859] DEBG [DB   ]:    Query "SET FOREIGN_KEY_CHECKS = 1;" -- [undefined] was successful

So far, so good. But it's still weird. Every time TypeORM syncs the database (creating tables, sets up foreign keys) and it loses connection in that period, the logger receives a message that get's logged:

[14:53:39,319] # ERROR [DB   ]: Query "ALTER TABLE `password_reset` ADD CONSTRAINT `FK_e279a7b047bf32bc5dd17b43fdc` FOREIGN KEY (`user`) REFERENCES `users`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION" -- [undefined] 
            failed because Error: ER_SERVER_SHUTDOWN: Server shutdown in progress"
            
            === [DATABASE IS OFFLINE] ===
            
            
[14:53:39,320] DEBG [DB   ]:    Query "ROLLBACK" -- [undefined] was successful
[14:53:39,321] INFO [DB   ]:    Query "ROLLBACK" -- [undefined] will be cached!

Now I start the MariaDB server again and the backend and stop the database again after the backend finished syncing and proceeds with seeding the DB with dummy data (after the connection was successful) I get ECONNREFUSED errors:

Error: connect ECONNREFUSED 127.0.0.1:3306
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
    --------------------
    at Protocol._enqueue (.../node_modules/mysql/lib/protocol/Protocol.js:144:48)
    at Protocol.handshake (.../node_modules/mysql/lib/protocol/Protocol.js:51:23)
    at PoolConnection.connect (.../node_modules/mysql/lib/Connection.js:119:18)
    [...]
  errno: -111,
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3306,
  fatal: true
}

But logQueryError(...) doesn't get triggered. I assume it's a driver issue and not TypeORM (I'm not sure). Maybe TypeORM doesn't even notice that, since the driver throws that error. The thing is, I don't want that to be triggered. I want it to continue normally and to cache the finish constructed SQL-queries, so I can execute them later, when the database is back online.

Another approach

I also tried using subscribers with the beforeInsert(...) event which works like expected but I haven't seen a way to get a SQL query out of that. I also tried to override the .query(...) function but that didn't helped really. I'm lost, really 😭

Maybe this isn't really possible and I have to use Redis which takes care of that? I really don't know.

@dani3l
Copy link

dani3l commented Nov 14, 2021

hi guys
any update on this one?

@segovia-no
Copy link

I also have the same problem, i'm trying to repurpose the default connection to connect to another server (after closing it).

@Mondei1
Copy link
Author

Mondei1 commented Dec 1, 2021

This seems to be a common problem if already 8 people showed interest.

As for myself: This issue is now more then a year old and I don't work on the project anymore where I required a solution to this problem. It would be great if someone could experiment on it's own to solve this issue but it would be much easier if the dev team could implement a exposed handler or internal mechanic to solve this. It's very important to not lose queries if the connection drops for e. g. 30secs.

@VarunBatraIT
Copy link

I am also facing the same problem:
So far I am thinking:

  • Should there be a ping functionality and auto-reconnect?
  • What about an event on the loss of connection as well with ping to move to a different server as mentioned by @segovia-no

@alexey2baranov
Copy link
Contributor

alexey2baranov commented Jun 9, 2022

in my current project we use TypeORM to interact with Aurora AWS. Sometimes Aurora shutdowns server and instantiates another one. Previous team used Sequelize internal retry feature to handle this situation. It performed autoreconnect during specified period and everything went well. So I believe it is possible to do in TypoORM. I have a few ideas which I will describe in next post.

@alexey2baranov
Copy link
Contributor

TypeORM has internal custom logger feature which I guess may be utilized for this purpose. Currently we utilize it to collect SQL, start time, end time and parameters of each query to send to AWS XRay. It works perfect. And I believe we can utilize this logger for retrying as well. Perhaps all we need is to introduce a cache to store store all this data and perform broken query later after reconnect.

@alexey2baranov
Copy link
Contributor

BTW: doesn't this canRetry do the job? https://typeorm.io/multiple-data-sources#replication

   /**
    * If true, PoolCluster will attempt to reconnect when connection fails. (Default: true)
    */
    canRetry: true,

@krisanalfa
Copy link

@alexey2baranov I've tried canRetry, it only retry the initial connection. Once your connection Pool established, then it's disconnected for any reason, TypeORM still won't be able to reconnect your broken connection - leaving all queries failed.

@mustard-mh
Copy link

mustard-mh commented Mar 14, 2023

Faced kind of the same problem with replication configured

  • Stop mysqld after ts server started
  • Restart mysqld again
  • It'll never reconnect. With error Pool does Not exists always 😨

@Mondei1 Mondei1 changed the title Proper reconnect to database after unexpected lost of connection Proper reconnect to database after unexpected loss of connection Mar 16, 2023
@MedFBoudis
Copy link

Any updates on this ? kind of a deal breaker...

@yatishbalaji
Copy link

Any update on this. eagerly waiting since we get this issue in production

@gabrielcipriano
Copy link

Faced kind of the same problem with replication configured

  • Stop mysqld after ts server started
  • Restart mysqld again
  • It'll never reconnect. With error Pool does Not exists always 😨

I'm facing exactly the same problem

@LuckyArdhika
Copy link

i will migrate to other orm if nothing resolve here

@theocerutti
Copy link

theocerutti commented Sep 5, 2023

Any solution ? I mean it's mandatory for production so...
Is typeorm only used for dev?

EDIT SOLUTION:
I am working with nestjs/typeorm:

// seems to be mandatory otherwise poolErrorHandler doesn't get called if database dies
process.on('unhandledRejection', () => {});
const databaseConfigModule = TypeOrmModule.forRootAsync({
  imports: [ConfigModule],
  inject: [ConfigService],
  useFactory: (config: ConfigService<Environment>) => ({
    type: 'postgres',
    host: config.get<DatabaseConfig>('database').host,
    port: config.get<DatabaseConfig>('database').port,
    username: config.get<DatabaseConfig>('database').user,
    password: config.get<DatabaseConfig>('database').password,
    database: config.get<DatabaseConfig>('database').database,
    entities: [],
    synchronize: true,
    autoLoadEntities: true,
    poolErrorHandler: async (err) => {
      Logger.error(err);
      const retryDelayMs = 1000;
      const reconnection = setInterval(async () => {
        Logger.log(
          `DB Connection lost. Retry every ${retryDelayMs}ms...`,
        );
        const datasource = new DataSource(databaseOptions);
        const db = await datasource.initialize();
        if (db && db.isInitialized) {
          Logger.log('Connection restored.');
          clearInterval(reconnection);
        }
      }, retryDelayMs);
    },
    },
  }),
});

@Lacrimosica
Copy link

@theocerutti thanks for your solution, Although I tried implementing it but couldn't really do it, Maybe you could clear things up for me, What troubles me is that the databaseOptions that you are going to pass to the new instance, firstly Correct me if I'm wrong is DataSource a method that is used is a factory method and the databaseOptions is a const that you pass to it?

since I'm not sure that there is a native class that would do it, moreover; the biggest problem I'm facing is the "recursive" manner in which you are doing this, since if the new instance of the DataSource you are creating is going to have the poolErrorHandler you would have to pass a databaseOptions containing that handler into it again and then in that, again and again, and again. this is my interpretation of course, If you could provide more details regarding my doubts that would be great.

@Lacrimosica
Copy link

On the other hand, what I managed to implement yesterday was to handle this error.
of course, this is limited to my use case.
what I did was to implement a GlobalExceptionHandler which actually lets you catch the ECONNREFUSED exception, whereas in its absence Nest would not let you catch it.

DISCLAIMER:
this is not related to typeorm so and is rather just a workaround so excuse me if I'm posting it here, I just supposed it might help someone since at first i was perplexed by the fact that i couldn't catch this error and handle it, but then i discovered that implementing a gloablExceptionHandler gives you control over it (which seems obvious i know but whatever)

@Catch()
export class GlobalExceptionFilter implements ExceptionFilter {

  MAX_RETRIES = 10;
  RETRY_DELAY = 10000;
  FAILED_REQUEST_DELAY = 5000;
  private failedRequests: FailedRequest[] = [];
  
  async catch(exception: any, host: ArgumentsHost) {
      const ctx = host.switchToHttp();
      let response = ctx.getResponse<Response>();
      const request = ctx.getRequest<Request>();
  
      /*
        When the database is down and the connection is lost, the exception is not an instance of HttpException
        rather it has the exception code ECONNREFUSED
        We identify it and implement a strategy to restore the connection
        and retry the failed requests
       */
       
      if (exception && exception.code && exception.code.includes('ECONNREFUSED')) {
        console.error('Detected database connection issue! Trying to reconnect...');
        const connection: Connection = getConnection();
        return this.handleDatabaseConnectionIssue(connection, ctx, request, response);
      }
  
      /* rest of the default/generic exception handling code */
  
      ...
  
      response.status(status).json(GlobalResponseError(status, message, code, request));
    },

inside the handleDatabaseConnectionIssue:

   async handleDatabaseConnectionIssue(connection: Connection, ctx: HttpArgumentsHost, request: any, response: any) {
    let retries = 0;
    let retrySuccessful = false;

    const failedRequest: FailedRequest = {
      method: request.method,
      endpoint: request.url,
      body: request.body,
      headers: request.headers,
    };

    this.failedRequests.push(failedRequest);

    /*
      I closed the connection because although the connection has been lost,
      the connection.isConnected() method returns true, and the connection.connect() method does not work.
    */

    await connection.close();
    while (retries < this.MAX_RETRIES && !retrySuccessful) {
      /* delay the execution of the loop */
      await new Promise(res => setTimeout(res, this.RETRY_DELAY));

      try {
        await connection.connect();
        console.log('Reconnected to the database successfully.');
        /* when the connection is restored, retry the failed requests */
        const result = await this.retryFailedRequests();
        /* if the request is retried successfully, send the response to the client */
        if (result) {
          response.status(result.status).send(result.data);
          response.headers = result.headers;
          retrySuccessful = true;
        }
        retries = 0;
      } catch (err) {
        console.log(`Failed to reconnect to the database on attempt ${retries + 1}.`, err.stack);
        retries++;
      }

      if (retries >= this.MAX_RETRIES) {
        console.log('Exceeded max retries. Giving up on database reconnection.');
      }
    }
  }

my use case was to redo the call to the database therefore i retried doing the http call that i had put inside the failedRequests array and then retry them using axios:

private async retryFailedRequests() {
  while (this.failedRequests.length) {
    const request = this.failedRequests.shift();
    await new Promise(res => setTimeout(res, this.FAILED_REQUEST_DELAY));
    const axiosRequest ={
      method: request.method,
      url: process.env.BASE_URL + request.endpoint,
      headers: request.headers,
      data: request.body
    }

    try {
      const response = await axios(axiosRequest);
      console.log('Request retried successfully:', response.data);
      return response;
    } catch (error) {
      console.error('Failed to retry the request:', error.message);
      this.failedRequests.push(request);
    }
  }
}

it goes without saying that there are other workarounds such as using a queuing framework that could redo the http calls instead of doing them internally in your application. for example, configuring it to redo the http calls to your application in case of a certain error code, using master/slave replication options which I tried but really didn't get into since it was a pain and wasn't really my use case nor had I the possibility.
passing retryAttempts and retryDelay only serves when the application is restarted, instead, this use case is more directed towards databases that might break under certain conditions and there is the need to just wait it out for a minute or so and you want the user to not loose connection, of course, client-side timeouts should be considered.

@MickaelDjak
Copy link

MickaelDjak commented Oct 16, 2023

Have the same problem
I expect to see 2 options:
1 set up reconnection logic true/false
2 specifing errorHandler or something like that

but see nothing from that

There are any plan to add any solution to hanlde next errors?

  • Connection terminated unexpectedly
  • Connection terminated

@americano212
Copy link

I have the same problem.

I used docker for running mysql database.
When I tried to connect immediately docker-up, it throw 'Connection lost' error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests