Published August 2022
Pulsar and Kafka achieve the same result. They both guarantee messages reach their intended destination(s). Yet, there are important differences between the two message queues. These differences can make one of the technologies a better fit, depending on your use case.
In this post we cover 8 ways in which Apache Kafka and Apache Pulsar compare, some similar, some divergent. There is no clear “better” technology. Rather, these are two strong technologies that each excel in their respective ways.
8 Ways to Compare Kafka & Pulsar
1- Philosophical Direction.
The biggest difference between Pulsar and Kafka is their underlying philosophies.
Kafka strives for simplicity. Kafka is actively making installation and management simpler by working to remove ZooKeeper.
Pulsar strives for modularity. Its modular architecture allows for independent scaling of the serving (Pulsar) and storage (BookKeeper) layers. The drawback to modular architecture is the increased complexity. When running Pulsar, we must also install Pulsar Broker, BookKeeper, and ZooKeeper.
One area where Kafka adds complexity is with geo-replication. Pulsar has built-in data center replication functionality, whereas Kafka requires MirrorMaker. Both approaches to replication achieve the same result.
Replicating data across data centers is important for several reasons. Firstly, it improves performance for your users across geo-locations. And secondly, it is important protection in the event of a cloud failure. See our post on preparing for a cloud outage for more information.
Both Kafka and Pulsar advertise extensive functionality. Kafka can perform data processing using Kafka Streams, and Pulsar uses Pulsar Functions.
The issue here is that message queues should be reliable. It’s their simplicity of purpose that drives their reliability. Your message queue should be the most dependable component of your data infrastructure. With each new function you ask it to perform, you are adding complexity. And that complexity can contribute to failure.
For instance, if we’re running functions on the Pulsar brokers, then CPU usage becomes a greater liability.
So when it comes to extra functionality, we say it doesn’t matter how Pulsar and Kafka compare. We recommend processing messages before or after your message queue.
4- Kafka has better throughput.
Kafka tends to perform better with large throughput. For instance, Confluent found that Kafka has a peak throughput of 605 MB/s and Pulsar was 305 MB/s.
Be careful with benchmarking tests though. This article in DZone discusses how test results can be misleading.
Both Kafka and Pulsar will likely give you the same ballpark results. We recommend running your own proof-of-concept in your environment, on your data. That will be the most reliable comparison.
5- Pulsar has lower latency with lower throughput.
6- Batching data is important.
7- Exactly once semantics.
Kafka ensures exactly once message processing. This is a crucial guarantee for many use cases.
Pulsar ensures that there are no duplicate messages stored in Pulsar. However, Pulsar does not ensure that messages aren’t duplicately read by consumers.
8- Breadth of support and ease of hiring.
Final thoughts on Pulsar vs Kafka
Looking for support?
Dattell provides 24×7 support and managed services for Kafka and Pulsar on our clients’ environments.