Sunday, June 21, 2020

An Introduction To Kafka Architecture and Kafka as a Service

Kafka and Kafka as a Service

Apache Kafka is a fast and scalable Publish/Subscribe messaging platform. It enables the communication between producers and consumers using messaging-based topics. It allows producers to write records into Kafka that can be read by one or more consumers per consumer group. It's becoming a solution for big data and microservices applications. It is being used by several companies to solve the problem of real-time processing. AWS development services also render support for Apache Kafka via its fully managed Amazon MSK (Amazon Managed Streaming for Kafka) platform.

A Broker is like a Kafka server that runs in a Kafka Cluster. Kafka Brokers form a cluster. The Kafka Cluster consists of many Kafka Brokers on several servers. Brokers often refer to more of a logical system or as Kafka as a whole.

It uses ZooKeeper to manage the cluster. ZooKeeper is used to coordinate the brokers/cluster topology. ZooKeeper gets used for leadership elections for Broker Topic Partition Leaders.

The Kafka architecture consists of four main APIs on which Kafka runs.
  1. Producer API:
This API allows an application to publish a stream of records to one or more Kafka topics.

Consumer API

It allows an application to subscribe to one or more topics. It also allows the application to process the stream of records that are published to the topic(s).

Streams API

This streams API allows an application to act as a stream processor. The application consumes an input stream from one or more topics and produces an output stream to one or more output topics thereby transforming input streams to output streams.

Connector API

This connector API builds reusable producers and consumers that connect Kafka topics to applications and data systems.

Kafka Cluster Architecture


Kafka architecture can also be described as a cluster with different components. 

Kafka Broker

A Kafka cluster often consists of many brokers. One Kafka broker can be used to handle thousands of reads and writes per second. However, since brokers are stateless they use Zookeeper to maintain the cluster state.

Kafka ZooKeeper

This uses ZooKeeper to manage and coordinate Kafka brokers in the cluster. The ZooKeeper notifies the producers and consumers when a new broker enters the Kafka cluster or if a broker fails in the cluster. On being informed about the failure of a broker, the producer and consumer decide how to act and start coordinating with other active brokers. 

Kafka Producers

This component in the Kafka cluster architecture pushes the data to brokers. It sends messages to the broker at a speed that the broker can handle. Therefore, it doesn’t wait for acknowledgments from the broker. It can also search for and send messages to new brokers exactly when they start.

Kafka Consumers

Since brokers are stateless, Kafka consumers maintain the number of messages that have been consumed already and this can be achieved using the partition offset. The consumer remembers each message offset which is an assurance that it has consumed all the messages before it. 



Kafka cluster setup via Docker

version: '2'

services:

  zookeeper:

    image: wurstmeister/zookeeper

    ports:

      - "2181:2181"

  kafka-1:

    image: wurstmeister/kafka

    ports:

      - "9095:9092"

    environment:

      KAFKA_ADVERTISED_HOST_NAME: kafka1

      KAFKA_ADVERTISED_PORT: 9095

      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181

      KAFKA_LOG_DIRS: /kafka/logs

      KAFKA_BROKER_ID: 500

      KAFKA_offsets_topic_replication_factor: 3

    volumes:

      - /var/run/docker.sock:/var/run/docker.sock

      - kafka_data/500:/kafka


  kafka-2:

    image: wurstmeister/kafka

    ports:

      - "9096:9092"

    environment:

      KAFKA_ADVERTISED_HOST_NAME: kafka2

      KAFKA_ADVERTISED_PORT: 9096

      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181

      KAFKA_LOG_DIRS: /kafka/logs

      KAFKA_BROKER_ID: 501

      KAFKA_offsets_topic_replication_factor: 3

    volumes:

      - /var/run/docker.sock:/var/run/docker.sock

      - kafka_data/501:/kafka


  kafka-3:

    image: wurstmeister/kafka

    ports:

      - "9097:9092"

    environment:

      KAFKA_ADVERTISED_HOST_NAME: kafka3

      KAFKA_ADVERTISED_PORT: 9097

      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181

      KAFKA_LOG_DIRS: /kafka/logs

      KAFKA_BROKER_ID: 502

      KAFKA_offsets_topic_replication_factor: 3

    volumes:

      - /var/run/docker.sock:/var/run/docker.sock

      - kafka_data/502:/kafka

Start The Cluster

Simply start the cluster using the docker-compose command from the current directory:
$ docker-compose up -d

We can quickly check which nodes are part of the cluster by running a command against zookeeper:
$ docker-compose exec zookeeper ./bin/zkCli.sh ls /brokers/ids

And that’s it. We’ve now configured a kafka cluster up and running. We can also test failover cases or other settings by simply bringing one kafka node down and seeing how the clients react.

Self-managed Kafka Services

We can also use Cloud-based self-managed kafka service on different cloud providers. AWS cloud services provide fully managed and secure Apache Kafka service like Amazon MSK (Amazon Managed Streaming for Apache Kafka).

No comments:

Post a Comment