Published November 2022
Apache Pulsar has two layers, a serving layer and a storage layer. The Pulsar brokers carry out the data serving. The BookKeeper bookies provide the storage.
In this article we will describe how Pulsar’s two-layer system works and review its benefits and drawbacks.
Serving Layer: Role of Pulsar Brokers
Data is sent to Pulsar topics by producers, and read from Pulsar topics by consumers.
Multiple consumers can read from the same topic. For instance, a retail company might have a machine learning model and an alerting system both consuming messages from the same inventory database.
Each single partition topic has a single Pulsar broker that owns it. A topic with multiple partitions will be spread across many Pulsar brokers.
This single Pulsar broker serves all of the reads and writes to the partition topic.
In other words, all writes to the topic from the one or more producers and all reads from the one or more consumers go through that specific Pulsar broker.
Storage Layer: Role of BookKeeper Bookies
A bookie is the name for a single BookKeeper node. Typically each topic is replicated across multiple bookies.
While the Pulsar broker is the entrypoint for reads and writes to a single partition topic, only one of the three or more bookies used to replicate storage for that partition topic are needed for a read.
However, for tail reads, those don’t require a bookie and are simply read from the Pulsar broker’s queue. This cache memory is also referred to as Topic Compaction. Topic Compaction keeps the most recent value for each key-value pair in memory.
Benefits of Pulsar’s Two-Layer System
This structure where a single broker owns all messages from a topic has two primary benefits.
#1 The Pulsar broker can keep the most recent message or messages in memory. The most recent message or messages is referred to as the log tail. This cache can then serve the tailing messages to the consumers more quickly than if consumers needed to retrieve the data from storage on a bookie.
If a consumer tries to read a message that isn’t in the Pulsar broker cache, then the consumer will need to request the data from a bookie. This process takes longer than reading from cache. That means that catch-up readers have a poorer serving performance than tail readers.
#2 The second benefit to having a single Pulsar broker owning a partition topic is that it is aware of the ID of the last confirmed entry. This is beneficial if there is a failure of either one of the producers or consumers.
Drawbacks of Pulsar’s Two-Layer System
The drawback to Pulsar’s two-layer system (brokers and bookies) is that more configuration and maintenance is needed than with a one-layer system, like the one used with Apache Kafka.
Apache Pulsar Support Services
If you are interested in 24/7 support, consulting, and/or fully managed Pulsar services, you can find more information on our Apache Pulsar services page.
Schedule a call with a Pulsar solution architect.