Green background with text and map.

How does Geo-Replication Work in Apache Pulsar?

Published May 2023

Apache Pulsar offers geo-replication as out-of-the-box functionality.  This sets Pulsar apart from some other message queues that require external tools for geo-replication. 

Pulsar Geo-Replication

Geo-replication is the replication of messages across multiple clusters of a Pulsar instance. 

For clarification we are referring to a Pulsar instance as multiple processes of Pulsar brokers, BookKeeper bookies, and ZooKeeper.

Other distributed messaging systems, such as Apache Kafka, support geo-replication but only with the assistance of an external tool such as MirrorMaker.  

Geo-replication is a built-in feature with Pulsar that is made possible in part because BookKeeper is used as the storage layer.  

Both synchronous and asynchronous geo-replication is available.  Synchronous replication occurs at the BookKeeper level, and asynchronous geo-replication is configured at the Pulsar broker level.

For asynchronous geo-replication, two options must be enabled.  Firstly geo-replication needs to be enabled on both namespaces.  

If you aren’t familiar with namespaces, let’s briefly review. Namespaces are used as a grouping mechanism for related topics.  There is no limit to the number of topics included in a namespace. 

Secondly, the namespace must be configured to replicate across two (or more) clusters.  With these two configurations, all messages that are published to any of the topics in the namespace are automatically replicated across the clusters in the provisioned set of clusters.

For additional information on Pulsar geo-replication, visit the Pulsar documentation.

Apache Pulsar Support Services

If you are interested in 24/7 support, consulting, and/or fully managed Pulsar services, you can find more information on our Apache Pulsar services page

Schedule a call with a Pulsar solution architect.

Published by

Dattell - Kafka & Elasticsearch Support

Benefit from the experience of our Kafka, Pulsar, Elasticsearch, and OpenSearch expert services to help your team deploy and maintain high-performance platforms that scale. We support Kafka, Elasticsearch, and OpenSearch both on-prem and in the cloud, whether on stand alone clusters or running within Kubernetes. We’ve saved our clients $100M+ over the past six years. Without our guidance companies tend to overspend on hardware or purchase unnecessary licenses. We typically save clients multiples more money than our fees cost in addition to building, optimizing, and supporting fault-tolerant, highly available architectures.