Blog
Database Migration Strategies for Banks: Zero-Downtime Schema Changes at Scale
Database migration strategies for banking systems. Zero-downtime schema changes, backfill patterns, and rollback strategies for large financial databases.
Migrating a banking database is one of the highest-risk operations in financial technology. A failed migration can lock customers out of their accounts, corrupt transaction data, or trigger regulatory violations. The database is the system of record — everything else is derived from it.
We have migrated banking databases with over 100 million rows and 99.99% uptime requirements. The key is not the migration tool — it is the migration strategy. Every schema change must be backward-compatible, reversible, and testable in production without impacting customers.
Who Is This Guide For?
This guide is for database architects, platform engineers, and CTOs at banks and fintechs planning or executing database migrations. If you need to change schema on a production banking database without downtime, this is for you.
By the End of This, You’ll Know…
- Why zero-downtime migration is a regulatory requirement, not a luxury
- The expand-and-contract pattern that makes schema changes backward-compatible
- How to backfill data without locking tables or impacting production traffic
- The rollback strategies that prevent migration failures from becoming outages
Why Zero-Downtime Migration Matters
Banking databases cannot go down. Period. The reasons are both business and regulatory:
- Business: Customers expect 24/7 access to their accounts. Downtime during business hours directly impacts revenue and customer trust.
- Regulatory: PCI DSS and SOX require continuous availability of financial systems. Planned downtime must be documented and approved.
- Operational: Banking databases are the source of truth. Other systems (payment processing, risk calculation, reporting) depend on the database. Downtime cascades.
A schema change that requires downtime is a schema change that cannot be deployed. Design for zero downtime from the start.
The Expand-and-Contract Pattern
The expand-and-contract pattern makes schema changes backward-compatible by breaking them into three phases:
Phase 1: Expand
Add the new schema elements without removing the old ones. Both old and new schema coexist:
- Add new columns with default values
- Add new indexes
- Add new tables
- Do not remove old columns or indexes
Example: Adding a new column to the transactions table
| |
Phase 2: Migrate
Write data to both old and new schema elements. Both are kept in sync:
- Update application code to write to both old and new columns
- Backfill historical data in the new column
- Validate that both columns contain consistent data
Example: Backfilling the new column
| |
Phase 3: Contract
Remove the old schema elements after all consumers have migrated:
- Update application code to read only from the new column
- Remove the old column
- Drop the old index
Example: Removing the old column (contract)
| |
Backfill Patterns
Backfilling historical data is the most dangerous part of a migration. A naive backfill locks the table, blocks all production traffic, and can cause timeouts.
Online Backfill
Backfill data in small batches without locking the table:
| |
CDC-Based Backfill
Use Change Data Capture to backfill without impacting production:
- Enable CDC on the source table
- Process changes in near-real-time
- Update the target column as changes arrive
- Backfill historical data during low-traffic periods
Background Job Backfill
Use a background job queue (Redis, SQS) to process backfill work:
- Enqueue all rows needing backfill
- Process queue entries in parallel workers
- Each worker updates a small batch of rows
- Monitor progress and handle failures
Rollback Strategies
Every migration must be reversible. If something goes wrong, you must be able to roll back without data loss.
Forward Rollback
Roll back by applying a reverse migration:
- Expand phase: Roll back by removing the new column (safe if no data was written)
- Migrate phase: Roll back by stopping the dual-write and reverting to the old column
- Contract phase: Roll back by re-adding the old column (requires data restoration from backup)
Snapshot-Based Rollback
Take a database snapshot before the migration. If the migration fails, restore from the snapshot:
- Pros: Simple, guaranteed restore point
- Cons: May lose recent transactions, restore takes time
Point-in-Time Recovery
Use the database’s point-in-time recovery capability:
- PostgreSQL: WAL archiving enables recovery to any point in time
- MySQL: Binary log archiving enables similar recovery
- AWS RDS: Automated backups with point-in-time restore
Testing Migrations in Production
Shadow Traffic
Route a copy of production traffic to the new schema and compare results:
- Use a traffic mirror or shadow to send reads to both old and new schema
- Compare results to ensure consistency
- Do not route writes to the new schema until validated
Feature Flags
Use feature flags to control which users or traffic segments use the new schema:
- Gradually increase the percentage of traffic using the new schema
- Monitor for errors and performance degradation
- Roll back by disabling the feature flag
Canary Deployments
Deploy the migration to a small percentage of production traffic first:
- Start with 1% of traffic
- Monitor for errors and performance degradation
- Gradually increase to 100%
What You Can Actually Use Today
- gh-ost: GitHub’s online schema change tool for MySQL. Zero-downtime schema changes with built-in rollback.
- pt-online-schema-change: Percona’s tool for online schema changes on MySQL.
- Flyway: Database migration tool with version control and rollback support.
- Liquibase: Database migration tool with changelog-based migrations.
FAQ
How long does a typical database migration take?
A simple schema change (add column, add index) takes hours. A complex migration (add table, restructure data) takes days to weeks. The expand-and-contract pattern extends the timeline but ensures zero downtime.
What if the migration fails mid-way?
Every migration must have a rollback plan. If the migration fails during the expand phase, remove the new column. If it fails during the migrate phase, stop the dual-write and revert to the old column. If it fails during the contract phase, restore from backup.
Can we test migrations in staging?
Staging tests are necessary but insufficient. Staging databases do not have production data volumes or production traffic patterns. Always test migrations in production using feature flags or canary deployments.
We help banks and fintechs execute database migrations that satisfy zero-downtime requirements and regulatory constraints. If you are planning a database migration, get in touch.