Kafka and Kafka as a Service
Apache Kafka is a fast and scalable Publish/Subscribe messaging platform. It enables the communication between producers and consumers using messaging-based topics. It allows producers to write records into Kafka that can be read by one or more consumers per consumer group. It's becoming a solution for big data and microservices applications. It is being used by several companies to solve the problem of real-time processing. AWS development services also render support for Apache Kafka via its fully managed Amazon MSK (Amazon Managed Streaming for Kafka) platform.
A Broker is like a Kafka server that runs in a Kafka Cluster. Kafka Brokers form a cluster. The Kafka Cluster consists of many Kafka Brokers on several servers. Brokers often refer to more of a logical system or as Kafka as a whole.
It uses ZooKeeper to manage the cluster. ZooKeeper is used to coordinate the brokers/cluster topology. ZooKeeper gets used for leadership elections for Broker Topic Partition Leaders.
The Kafka architecture consists of four main APIs on which Kafka runs.
- Producer API:
This API allows an application to publish a stream of records to one or more Kafka topics.
Consumer API
It allows an application to subscribe to one or more topics. It also allows the application to process the stream of records that are published to the topic(s).
Streams API
This streams API allows an application to act as a stream processor. The application consumes an input stream from one or more topics and produces an output stream to one or more output topics thereby transforming input streams to output streams.
Connector API
This connector API builds reusable producers and consumers that connect Kafka topics to applications and data systems.
Kafka Cluster Architecture
Kafka architecture can also be described as a cluster with different components.
Kafka Broker
A Kafka cluster often consists of many brokers. One Kafka broker can be used to handle thousands of reads and writes per second. However, since brokers are stateless they use Zookeeper to maintain the cluster state.
Kafka ZooKeeper
This uses ZooKeeper to manage and coordinate Kafka brokers in the cluster. The ZooKeeper notifies the producers and consumers when a new broker enters the Kafka cluster or if a broker fails in the cluster. On being informed about the failure of a broker, the producer and consumer decide how to act and start coordinating with other active brokers.
Kafka Producers
This component in the Kafka cluster architecture pushes the data to brokers. It sends messages to the broker at a speed that the broker can handle. Therefore, it doesn’t wait for acknowledgments from the broker. It can also search for and send messages to new brokers exactly when they start.
Kafka Consumers
Since brokers are stateless, Kafka consumers maintain the number of messages that have been consumed already and this can be achieved using the partition offset. The consumer remembers each message offset which is an assurance that it has consumed all the messages before it.
Kafka cluster setup via Docker
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka-1:
image: wurstmeister/kafka
ports:
- "9095:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka1
KAFKA_ADVERTISED_PORT: 9095
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LOG_DIRS: /kafka/logs
KAFKA_BROKER_ID: 500
KAFKA_offsets_topic_replication_factor: 3
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- kafka_data/500:/kafka
kafka-2:
image: wurstmeister/kafka
ports:
- "9096:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka2
KAFKA_ADVERTISED_PORT: 9096
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LOG_DIRS: /kafka/logs
KAFKA_BROKER_ID: 501
KAFKA_offsets_topic_replication_factor: 3
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- kafka_data/501:/kafka
kafka-3:
image: wurstmeister/kafka
ports:
- "9097:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: kafka3
KAFKA_ADVERTISED_PORT: 9097
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LOG_DIRS: /kafka/logs
KAFKA_BROKER_ID: 502
KAFKA_offsets_topic_replication_factor: 3
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- kafka_data/502:/kafka
Start The Cluster
Simply start the cluster using the docker-compose command from the current directory:
$ docker-compose up -d
We can quickly check which nodes are part of the cluster by running a command against zookeeper:
$ docker-compose exec zookeeper ./bin/zkCli.sh ls /brokers/ids
And that’s it. We’ve now configured a kafka cluster up and running. We can also test failover cases or other settings by simply bringing one kafka node down and seeing how the clients react.
Self-managed Kafka Services
We can also use Cloud-based self-managed kafka service on different cloud providers. AWS cloud services provide fully managed and secure Apache Kafka service like Amazon MSK (Amazon Managed Streaming for Apache Kafka).
No comments:
Post a Comment