Updated February 2023
Kafka Connect is a free tool for efficiently moving data into and out of Apache Kafka. Kafka Connect simplifies streaming data while also improving scalability and reliability.
Features of Kafka Connect
Standardizes integrations with Kafka.
Kafka Connect provides a shared framework for all Kafka connectors, which improves efficiency for connector development and management.
Scale up or down.
Kafka Connect offers two different modes: distributed or standalone. Distributed mode is used for scaled deployments, for example enterprise deployments. Distributed mode includes a number of important features such as automation of work balancing, dynamic scaling, and fault tolerance. Fault tolerance is available for both active tasks as well as offset commits and configuration.
Standalone mode is meant for scaled down applications, such as testing environments or smaller production deployments. It is simpler to set up than distributed mode, but it doesn’t include all of the features of Kafka Connect. For instance, fault tolerance isn’t included in standalone mode.
Kafka Connect can also incorporate any preexisting group management protocols. This adds to its scalability.
REST API.
Kafka Connect has a REST API for simple management of the connectors.
Automation.
Kafka Connect can automatically manage the offset commit process. Committing an offset, or offset commit, is the process of confirming that an offset has been processed. (Read more about Kafka offsets here.)
Integration.
Kafka Connect can also be used to bridge batch data systems and streaming systems.
Connector Types
When setting up a connector using Kafka connect you must choose from two different connector types: source and sink.
Source connector.
Source connectors are used for ingesting data into Kafka topics. For example, a source connector might stream database updates to Kafka.
Sink connector.
Sink connectors are used for moving data from Kafka to secondary indexes or batch systems. Examples of secondary indexes and batch systems are Elasticsearch and Hadoop, respectively.
Kafka Connect Quick Start Guide
Standalone mode.
Standalone mode can be initiated with the following:
> bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties ...]
Distributed mode.
Distributed mode can be initiated with the following:
> bin/connect-distributed.sh config/connect-distributed.properties
For full quick start instructors and configurations visit the Kafka documentation page.
Have Kafka Questions?
Managed Kafka on your environment with 24/ 7 support.
Consulting support to implement, troubleshoot,
and optimize Kafka.