Description
Is your feature request related to a problem? Please describe.
In ElizaOS, core system migrations (e.g., creation of tables like message_servers, central_messages, etc.) are triggered automatically during elizaos start, before any agent code runs. This design works well in single-instance deployments but introduces challenges when ElizaOS is deployed in horizontally scaled environments, such as Kubernetes or server clusters.
If multiple instances start simultaneously, they may attempt to run migrations at the same time. Although Drizzle ORM provides a migration tracking system, there is currently no coordination or locking, which means:
- Simultaneous attempts to apply the same schema migrations
- Race conditions around table/constraint creation
- PostgreSQL errors such as relation already exists
- Partial or inconsistent schema state if one migration fails mid-way
Additionally, there is no built-in mechanism to delay readiness — the server will begin accepting requests even if migrations are still in progress.
This behavior also applies to plugin-specific migrations triggered during agent startup. This may not seem as a serious problem at first glance and it's not something people are likely to encounter when they first deploy eliza (migrations are quick when creating new tables in an empty database) but when the application is deployed at scale and has a lot of data, this could become a real headache.
Describe the solution you'd like
To improve reliability and readiness in distributed or orchestrated deployments, I’d propose:
-
Introduce advisory locking (e.g., PostgreSQL pg_advisory_lock) to ensure that only one instance runs migrations at a time. Other instances should wait for the lock to be released.
-
Defer readiness until migrations are complete. The /health endpoint should reflect this, returning a non-200 response or ready: false until the instance has finished applying all pending migrations.
-
Continue using the existing Drizzle migration tracking mechanism to prevent reapplying completed migrations — no changes needed here.
This would allow ElizaOS to safely support:
- Multi-instance startup
- Auto-scaling
- Rolling updates
- Zero-downtime deployments
Describe alternatives you've considered
-
Running migrations in a separate job before deployment, for example as part of CI/CD — workable, but difficult to coordinate with plugin-level migrations triggered at runtime. In addition, this does not solve the problem of ElizaOS itself running server-level database migration as part of startup; it only works for agent migrations.
-
Controlling migration timing via project code — not currently feasible, as migrations run before the project’s logic is executed. and short of not using the CLI at all and manually constructing an AgentRuntime; it's not possible to get Runtime control over the server startup process.
Additional context
The relevant core migration logic runs here:
// packages/cli/src/server/index.ts
await migrationService.runAllPluginMigrations(); // Happens before agent startup
Agent plugin migrations run here:
// packages/cli/src/commands/start/actions/agent-start.ts
await migrationService.runAllPluginMigrations();
Example: Simple Advisory Lock to Serialize Migrations (PoC)
const MIGRATION_LOCK_ID = 123456;
const client = await db.getClient();
try {
const result = await client.query('SELECT pg_try_advisory_lock($1) AS acquired', [MIGRATION_LOCK_ID]);
if (!result.rows[0].acquired) {
console.log('Waiting for migration lock...');
await client.query('SELECT pg_advisory_lock($1)', [MIGRATION_LOCK_ID]); // Blocks until lock is acquired
}
// Safe to run migrations here
} finally {
await client.query('SELECT pg_advisory_unlock($1)', [MIGRATION_LOCK_ID]);
client.release();
}
Thanks again for the work on ElizaOS — this would be a valuable enhancement to ensure safe deployment in production environments.