There are six key components to securing Kafka. These best practices will help you optimize Kafka and protect your data from avoidable exposure.
By default, data is plaintext in Kafka, which leaves it vulnerable to a man-in-the-middle attack as data is routed over your network. Transport layer security (TLS) and/or a secure sockets layer (SSL) encrypts data between clients and/or brokers, from clients → brokers, and tools → brokers.
Using a communications security layer, like TLS or SSL, will chip away at throughput and performance because encrypting and decrypting data packets requires processing power. However, the performance cost is typically negligible for an optimized Kafka implementation, especially with the more recent versions of Kafka. See our article on Kafka optimization for general use cases for more details on optimizing Kafka.
It is important to remember that encryption only protects the data as it moves to and from Kafka. Other measures outlined below must be taken to secure data that is sitting un-encrypted in Kafka.
Brokers should be located in a private network. Port-based and web-access firewalls are important for isolating both Kafka and ZooKeeper. Port-based firewalls limit access to a specific port number. Web-access firewalls limit access to a specific, limited group of possible requests. For more information on firewalls, see our posts about preventing data breaches.
Use SSL/SASL (simple authentication and security layer) for authentication of clients → brokers, between brokers, and brokers → tools. SSL authentication uses two ways authentication and is the most common approach for out-of-the-box managed services.
SASL authentication is more involved and tends to be the better approach for big data implementations. SASL comes in different forms such as SASL Plaintext, SASL SCRAM, Kerberos, and a few others. Of these, Kerberos is our go-to for security reasons.
#4 Access Control
When granting access, it is important to set parameters for what information or sets of information a client has access to within Kafka. Employ access control lists (ACL) to limit which clients can read and/or write to a particular topic. This approach limits access and also sets a baseline for alerting off of abnormal behavior.
#5 Monitoring and alerting
Monitoring Kafka is an important component of securing Kafka. Our post on Kafka monitoring using Elasticsearch and Kibana outlines the key performance indicators for Kafka and how to observe them in real-time. For instance, in addition to monitoring access, an optimized Kafka monitoring setup will detect if unknown entities are accessing Kafka topics or known entities are acting irregularly.
With either machine learning based alerting or threshold based alerting, you can have the system notify you or your team in real-time if abnormal behavior is detected.
#6 Isolation for ZooKeeper
Isolating ZooKeeper is another crucial component to keeping the implementation secure. Zookeeper should not connect to the public internet, aside from rare use cases. ACLs can also be employed for ZooKeeper.
Optimized Kafka Security Configuration
An example configuration for security setup with SASL_SSL:
#Client Configuration (jaas file)
Kafka Optimization Resources
We have several other articles on similar Kafka optimization topics to help you with your Kafka implementation.
- Creating a Kafka Topic — Kafka is structured by its four primary components: topics, producers, consumers, and brokers. In this post, we discuss topics.
- Calculating Number of Kafka Partitions — A critical component of Kafka optimization is optimizing the number of partitions in the implementation. Use our calculation to determine the number of partitions needed.
- Kafka Optimization for General Use Cases — Issues with Apache Kafka performance are directly tied to system optimization and utilization. Here, we compiled the best practices for a high volume, clustered, general use case.
- Kafka Monitoring With Elasticsearch and Kibana — Monitoring Kafka cluster performance is crucial for diagnosing system issues and preventing future problems.
- Kafka vs. RabbitMQ — If you’re looking for a message broker to handle high throughput and provide access to stream history, Kafka is likely the better choice. If you have complex routing needs and want a built-in GUI to monitor the broker, then RabbitMQ might be best for your application.
Data consulting and implementation services from Dattell provide STRATEGY, ENGINEERING, and PERSPECTIVE to support your organization’s data projects. Our services include custom Data Architecture, Business Analytics, Operational Intelligence, Centralized Reporting, Automation, and Machine Learning. Dattell specializes in Apache Kafka and the Elastic Stack for reliable data collection, storage, and real-time display.