Blog

4 min read

Multi-Region Kafka for Global Financial Services

Architecting Apache Kafka across financial data centres. Geo-replication, compliance boundaries, and disaster recovery for global trading and risk systems.

A global investment bank running trading operations across London, New York, Singapore, and Tokyo needs a messaging infrastructure that treats each region as both an independent operational domain and a participant in a global data mesh. Kafka geo-replication across financial data centres requires solving challenges that most Kafka documentation does not address.

We have deployed Kafka across multi-region architectures for tier-one banks. Here is what we learned about keeping trades flowing between London and Singapore while satisfying data residency requirements in each jurisdiction.

Who Is This Guide For?

This guide is for data platform engineers and architects at global financial institutions designing Kafka infrastructure across multiple regions. If you need streaming data to cross regulatory boundaries while maintaining operational reliability, this is for you.

By the End of This, You’ll Know…

  • The difference between active-active, active-passive, and active-standing Kafka architectures
  • How to implement geo-replication that satisfies data residency requirements
  • How to handle disaster recovery across regions without message loss

Kafka Architecture Patterns

Active-Passive (Most Common in Regulated Environments)

One region (e.g., London) is the primary producer and consumer of critical trading topics. Other regions consume replicated copies for risk aggregation, reporting, and disaster recovery.

1
2
3
[London] → Primary Kafka → MirrorMaker 2
[Singapore] → Replicated Kafka → Risk aggregation, DR

The trade-off: London is the single point of failure for trade submission, but compliance with data residency is straightforward because trading data never leaves the region.

Active-Active (Higher Complexity, Lower Latency)

Each region independently produces and consumes local trades. Replicated topics carry a global view for risk aggregation and compliance reporting.

1
2
[London] → Local trades → MM2 → Global risk topics ← MM2 ← [Singapore]
[New York] → Local trades → MM2 → Global risk topics ← MM2 ← [Singapore]

The challenge: conflict resolution for trades that span regions. We use globally unique IDs (UUIDs based on exchange, timestamp, and venue) to ensure no two regions produce the same trade ID.


MirrorMaker 2 Configuration

MirrorMaker 2 is the standard tool for Kafka geo-replication. Key configuration parameters for financial workloads:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Replication policy to preserve topic names and partition counts
replication.policy.separator=-
replication.policy.class=org.apache.kafka.connect.mirror.DefaultReplicationPolicy

# Sync consumer offsets for exact state replication
sync.topic.configs.enabled=true
sync.topic.acls.enabled=true

# Heartbeat monitoring between clusters
heartbeats.topic.interval.ms=5000

# Checkpoint replication for consumer group offset sync
checkpoints.topic.replication.factor=1
emit.checkpoints.interval.seconds=60

Data Residency Controls

Regulatory compliance requires that certain data never leaves its jurisdiction. Implement a rules engine at the MirrorMaker level:

  • Permit list: Topics that can be replicated (risk, research, backtesting)
  • Deny list: Topics that cannot be replicated (client trades, settlement data, PII)
  • Transform: Topics where PII must be anonymised before replication

Disaster Recovery Patterns

Regional Cluster Failure

When the primary region’s Kafka cluster fails:

  1. Promote the secondary region’s cluster to primary by updating consumer offset tracking
  2. Risk consumers fail over to the secondary cluster via DNS-based routing
  3. On recovery, the primary cluster replays from the last checkpoint and catches up via MM2

Multi-Region Data Loss

In the worst case (region data loss), recovery relies on:

  • Topic-level replication: MM2 maintains copies in the secondary cluster
  • S3/object store: Cold storage backup with configurable retention (typically 90 days)
  • Replay: Restore from object store using Kafka Connect S3 Source Connector

What You Can Actually Use Today

ToolPurposeSource
MirrorMaker 2Kafka geo-replicationApache 2.0
Confluent Cluster LinkingManaged geo-replicationConfluent
Kafka ConnectData integrationApache 2.0

FAQ

How much latency does geo-replication add? MirrorMaker 2 adds 100-500ms of end-to-end latency between regions, depending on physical distance and bandwidth. For risk aggregation at minute-level granularity, this is acceptable. For real-time order routing across regions, a direct Kafka connection without MM2 is preferred.

Can I use a single Kafka cluster across multiple regions? You can, but you should not. A single cluster across an ocean introduces latency for every produce and consume operation. The cluster’s controller election becomes unreliable across geographic distances. Multi-cluster with MM2 is the standard pattern for financial services.

How do I handle schema evolution across regions? Use a central Schema Registry in a primary region. Secondary regions access the registry via read-only replicas. Schema changes are approved through a governance process and deployed first to the primary region, then propagated to downstream consumers.


Further Reading

For a deeper discussion of data platform architecture, see our Financial Data Platforms service.