Apache Kafka is a distributed streaming platform. Let’s explain it in more details. Apache Kafka is three key capabilities where publish and subscribe to streams of records, similar to a message queue or enterprise messaging system. Apache Kafka provides a distributed publish-subscribe messaging system and robust queue that can handle a high volume of data and enables us to pass message consumption.
Apache Kafka Advantages
Apache Kafka provides a lot of benefits to the owners but some of them are very important where they are listed below.
- `Reliability` is provided by Kafka because it is distributed, partitioned, replicated and fault tolerance.
- `Scalability` is provided by Kafka where it provides no downtime.
- `Durability` is provided by Kafka where messages persist on disks as fast as possible.
- `Performance` is provided by Kafka where a high volume of messages for publishing and subscribing. It can provide stable performance with event TB’s of messages stored.
Download and Install Kafka For Linux and Windows
We can install Apache Kafka into the Linux, Ubuntu, Mint, Debian, Fedora, CentOS and Windows operating systems where Kafka is Java-based software. If we can install Java into these operating system we can run Kafka easily.
Download Apache Kafka
We will download Apache Kafka from the following link. This link provides us nearest mirror to download.
In this case, we will download on Ubuntu with the
wget command .
$ wget http://kozyatagi.mirror.guzel.net.tr/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz
Extract Downloaded File
We will extract the downloaded file with the
tar command like below.
$ tar xvf kafka_2.12-2.2.0.tgz
And we will enter to the extracted directory
$ cd kafka_2.12-2.2.0/
Start ZooKeeper Server
Apache Kafka is managed with the ZooKeeper. So in order to start Kafka, we will start the ZooKeeper Server with the provided configuration. We will use
zookeeper-server-start.sh bash script by providing provided default configuration
$ ./bin/zookeeper-server-start.sh config/zookeeper.properties
Start Kafka Server
Now we can start the Kafka Server by using
kafka-server-start.sh with the configuration file named
$ ./bin/kafka-server-start.sh ./config/server.properties
We will see the configuration like below in the console messages.
Create A Topic
We can create a topic with the
kafka-topics.sh to create a topic named
$ bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic poftut
Then we can list existing topics with the
--list parameter like below.
$ bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Send Some Message
We can send some message to the created topic with the
kafka-console-consumer.sh like below.
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic poftut
Start A Consumer
We can consume the messages in the provided topic with the
kafka-console-consumer.sh like below.
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic poftut --from-beginning
Apache Kafka Use Cases
Apache Kafka can be used in different cases. In this part, we will list the most convenient and popular of them.
Kafka can be used for message broker. Kafka is designed as robust, stable, high-performance message delivery. In comparison to most messaging systems, Kafka has a better throughput, built-in partitioning, replication, and fault tolerance which makes is a good solution from small scale to large scale message processing applications.
Website Activity Tracking
The original use case for Kafka was able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity tracking is often high volume as many activity messages are generated for each user page view.
Kafka can be used as operational data monitoring. This involves tracking and distributing metric to the different producers and consumers like web, mobile, desktop.
Log aggregation is a hard job to accomplish. Kafka can be used for log aggregation from different producers and senders to centrally collect them and provide to other log components like SIEM, Log Archiver etc.
Apache Kafka can process stream data easily. Stream processing can be done in multiple stages where input raw data can be aggregated, enriched, or transformed into new topics for further consumption.
Kafka can be used as external commit-log for the distributed system. The log can be used to replicate data between nodes and act as a re-sync mechanism.
Apache Kafka Architecture
Apache Kafka has a very simple architecture from the user point of view. There are different actors which are used to create, connect, process and consume data with the Kafka system.
Producers will create data and provide this data into Kafka system via different ways like APIs. Producers also use different programming language SDK and library to push data created by them.
Connectors are used to create scalable and reliable data streaming between Apache Kafka and other systems. Thes systems can be databases, file systems etc.
In some cases, input data should be processed or transformed. Stream Processors are used to processing and transform different topics and input data and output to different topic or consumers.
Consumers are the entities whose are mainly getting, using, consuming Kafka provided data. Consumers can use data from different topics which can be processed or not.