Blog

6 min read

Database Migration Strategies for Banks: Zero-Downtime Schema Changes at Scale

Database migration strategies for banking systems. Zero-downtime schema changes, backfill patterns, and rollback strategies for large financial databases.

Migrating a banking database is one of the highest-risk operations in financial technology. A failed migration can lock customers out of their accounts, corrupt transaction data, or trigger regulatory violations. The database is the system of record — everything else is derived from it.

We have migrated banking databases with over 100 million rows and 99.99% uptime requirements. The key is not the migration tool — it is the migration strategy. Every schema change must be backward-compatible, reversible, and testable in production without impacting customers.

Who Is This Guide For?

This guide is for database architects, platform engineers, and CTOs at banks and fintechs planning or executing database migrations. If you need to change schema on a production banking database without downtime, this is for you.

By the End of This, You’ll Know…

  • Why zero-downtime migration is a regulatory requirement, not a luxury
  • The expand-and-contract pattern that makes schema changes backward-compatible
  • How to backfill data without locking tables or impacting production traffic
  • The rollback strategies that prevent migration failures from becoming outages

Why Zero-Downtime Migration Matters

Banking databases cannot go down. Period. The reasons are both business and regulatory:

  • Business: Customers expect 24/7 access to their accounts. Downtime during business hours directly impacts revenue and customer trust.
  • Regulatory: PCI DSS and SOX require continuous availability of financial systems. Planned downtime must be documented and approved.
  • Operational: Banking databases are the source of truth. Other systems (payment processing, risk calculation, reporting) depend on the database. Downtime cascades.

A schema change that requires downtime is a schema change that cannot be deployed. Design for zero downtime from the start.


The Expand-and-Contract Pattern

The expand-and-contract pattern makes schema changes backward-compatible by breaking them into three phases:

Phase 1: Expand

Add the new schema elements without removing the old ones. Both old and new schema coexist:

  • Add new columns with default values
  • Add new indexes
  • Add new tables
  • Do not remove old columns or indexes

Example: Adding a new column to the transactions table

1
2
-- Phase 1: Add new column (expand)
ALTER TABLE transactions ADD COLUMN merchant_category VARCHAR(50) DEFAULT 'unknown';

Phase 2: Migrate

Write data to both old and new schema elements. Both are kept in sync:

  • Update application code to write to both old and new columns
  • Backfill historical data in the new column
  • Validate that both columns contain consistent data

Example: Backfilling the new column

1
2
3
4
5
-- Phase 2: Backfill historical data (migrate)
UPDATE transactions 
SET merchant_category = categorize(description) 
WHERE merchant_category = 'unknown' 
AND created_at > '2020-01-01';

Phase 3: Contract

Remove the old schema elements after all consumers have migrated:

  • Update application code to read only from the new column
  • Remove the old column
  • Drop the old index

Example: Removing the old column (contract)

1
2
-- Phase 3: Remove old column (contract)
ALTER TABLE transactions DROP COLUMN old_description_field;

Backfill Patterns

Backfilling historical data is the most dangerous part of a migration. A naive backfill locks the table, blocks all production traffic, and can cause timeouts.

Online Backfill

Backfill data in small batches without locking the table:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Backfill in batches of 1000
WHILE EXISTS (SELECT 1 FROM transactions WHERE merchant_category = 'unknown' LIMIT 1)
BEGIN
    UPDATE TOP (1000) transactions
    SET merchant_category = categorize(description)
    WHERE merchant_category = 'unknown';
    
    -- Wait between batches to avoid overwhelming the database
    WAITFOR DELAY '00:00:01';
END

CDC-Based Backfill

Use Change Data Capture to backfill without impacting production:

  1. Enable CDC on the source table
  2. Process changes in near-real-time
  3. Update the target column as changes arrive
  4. Backfill historical data during low-traffic periods

Background Job Backfill

Use a background job queue (Redis, SQS) to process backfill work:

  1. Enqueue all rows needing backfill
  2. Process queue entries in parallel workers
  3. Each worker updates a small batch of rows
  4. Monitor progress and handle failures

Rollback Strategies

Every migration must be reversible. If something goes wrong, you must be able to roll back without data loss.

Forward Rollback

Roll back by applying a reverse migration:

  • Expand phase: Roll back by removing the new column (safe if no data was written)
  • Migrate phase: Roll back by stopping the dual-write and reverting to the old column
  • Contract phase: Roll back by re-adding the old column (requires data restoration from backup)

Snapshot-Based Rollback

Take a database snapshot before the migration. If the migration fails, restore from the snapshot:

  • Pros: Simple, guaranteed restore point
  • Cons: May lose recent transactions, restore takes time

Point-in-Time Recovery

Use the database’s point-in-time recovery capability:

  • PostgreSQL: WAL archiving enables recovery to any point in time
  • MySQL: Binary log archiving enables similar recovery
  • AWS RDS: Automated backups with point-in-time restore

Testing Migrations in Production

Shadow Traffic

Route a copy of production traffic to the new schema and compare results:

  • Use a traffic mirror or shadow to send reads to both old and new schema
  • Compare results to ensure consistency
  • Do not route writes to the new schema until validated

Feature Flags

Use feature flags to control which users or traffic segments use the new schema:

  • Gradually increase the percentage of traffic using the new schema
  • Monitor for errors and performance degradation
  • Roll back by disabling the feature flag

Canary Deployments

Deploy the migration to a small percentage of production traffic first:

  • Start with 1% of traffic
  • Monitor for errors and performance degradation
  • Gradually increase to 100%

What You Can Actually Use Today

  • gh-ost: GitHub’s online schema change tool for MySQL. Zero-downtime schema changes with built-in rollback.
  • pt-online-schema-change: Percona’s tool for online schema changes on MySQL.
  • Flyway: Database migration tool with version control and rollback support.
  • Liquibase: Database migration tool with changelog-based migrations.

FAQ

How long does a typical database migration take?

A simple schema change (add column, add index) takes hours. A complex migration (add table, restructure data) takes days to weeks. The expand-and-contract pattern extends the timeline but ensures zero downtime.

What if the migration fails mid-way?

Every migration must have a rollback plan. If the migration fails during the expand phase, remove the new column. If it fails during the migrate phase, stop the dual-write and revert to the old column. If it fails during the contract phase, restore from backup.

Can we test migrations in staging?

Staging tests are necessary but insufficient. Staging databases do not have production data volumes or production traffic patterns. Always test migrations in production using feature flags or canary deployments.


We help banks and fintechs execute database migrations that satisfy zero-downtime requirements and regulatory constraints. If you are planning a database migration, get in touch.