What is Kafka Connect?

What is Kafka Connect?

Updated February 2023

Kafka Connect is a free tool for efficiently moving data into and out of Apache Kafka.  Kafka Connect simplifies streaming data while also improving scalability and reliability.

Features of Kafka Connect

Standardizes integrations with Kafka. 
Kafka Connect provides a shared framework for all Kafka connectors, which improves efficiency for connector development and management.

Scale up or down.
Kafka Connect offers two different modes:  distributed or standalone.  Distributed mode is used for scaled deployments, for example enterprise deployments.  Distributed mode includes a number of important features such as automation of work balancing, dynamic scaling, and fault tolerance.  Fault tolerance is available for both active tasks as well as offset commits and configuration.

Standalone mode is meant for scaled down applications, such as testing environments or smaller production deployments.  It is simpler to set up than distributed mode, but it doesn’t include all of the features of Kafka Connect. For instance, fault tolerance isn’t included in standalone mode.

Kafka Connect can also incorporate any preexisting group management protocols. This adds to its scalability.

REST API.
Kafka Connect has a REST API for simple management of the connectors.

Automation.
Kafka Connect can automatically manage the offset commit process.  Committing an offset, or offset commit, is the process of confirming that an offset has been processed. (Read more about Kafka offsets here.)

Integration.
Kafka Connect can also be used to bridge batch data systems and streaming systems.

Connector Types

When setting up a connector using Kafka connect you must choose from two different connector types:  source and sink.

Source connector.
Source connectors are used for ingesting data into Kafka topics.  For example, a source connector might stream database updates to Kafka.

Sink connector. 
Sink connectors are used for moving data from Kafka to secondary indexes or batch systems.  Examples of secondary indexes and batch systems are Elasticsearch and Hadoop, respectively.

Kafka Connect Quick Start Guide

Standalone mode.
Standalone mode can be initiated with the following:

				
					> bin/connect-standalone.sh config/connect-standalone.properties connector1.properties [connector2.properties ...]
				
			

Distributed mode.
Distributed mode can be initiated with the following:

				
					> bin/connect-distributed.sh config/connect-distributed.properties
				
			

For full quick start instructors and configurations visit the Kafka documentation page.

Have Kafka Questions?

Managed Kafka on your environment with 24/ 7 support.

Consulting support to implement, troubleshoot,
and optimize Kafka.

Schedule a call with a Kafka solution architect.

Published by

Dattell - Kafka & Elasticsearch Support

Benefit from the experience of our Kafka, Pulsar, Elasticsearch, and OpenSearch expert services to help your team deploy and maintain high-performance platforms that scale. We support Kafka, Elasticsearch, and OpenSearch both on-prem and in the cloud, whether on stand alone clusters or running within Kubernetes. We’ve saved our clients $100M+ over the past six years. Without our guidance companies tend to overspend on hardware or purchase unnecessary licenses. We typically save clients multiples more money than our fees cost in addition to building, optimizing, and supporting fault-tolerant, highly available architectures.

Leave a Reply