Message Queues And Kafka Explained in Plain English

Message Queues And Kafka Explained in Plain English

Are you trying to hire Kafka experts? Want to understand why kafka is gaining so much traction and hiring managers are looking for people with this skill?

Well, I wrote this article to help illuminate kafka a little bit, but I also explain message queues in general first to build up the foundation to explain Kafka.

What is a message queue

The purpose of a message queue is to help reliably deliver communications (or messages). If you google "what is a message queue" you will get an answer like:

Message queues provide an asynchronous communications protocol, meaning that the sender and receiver of the message do not need to interact with the message queue at the same time.

But I think you can explain this even more simply. You can think of a message queue like a PO box. The mailman might deliver a message there any time, and can keep putting them there until you go collect them. They can build up or be retrieved immediately.

In this example the mailman is the "Publisher" or "Producer" and when you go get mail you are the "Subscriber" or "Consumer".

Here is a diagram of this system:

simple message queue diagram

Why use a message queue?

Well, there are a lot of reasons but basically the answer is that real world systems are a lot larger and more complicated than a simple single consumer, single producer postal system.

You might have many consumers of the same message, you might have to deal with fault tolerance (what if something breaks while processing a message?) or you might want to be able to track when messages are being delivered.

Here is a diagram of what things might actually look like inside of a social media application that would allow you to upload photos. A user uploads a photo and it is saved to a database, in addition to a message queue. Multiple consumers use this notification so that they know a user has uploaded a new photo, but ultimately it will show up in the "notifications" of your friends.

example photo message queue diagram

What is Kafka

So with the basics of the "what" and "why" of message queues out of the way, now what is Kafka?

The chief difference with kafka is storage, it saves data using a commit log. Kafka stores the messages that you send to it in Topics. Consumers can "replay" these messages if they wish. Normally in message queues, the messages are removed after subscribers have confirmed their receipt.

Another thing different about kafka is that the topics are ordered (by date they were added). Not all message queues guarantee this.

Individual Kafka servers that store messages are called "Brokers". Brokers are typically used in a cluster, which means many servers are linked together to handle lots of data and traffic. Topics may be further broken down in "Partitions" which are divided across brokers.

Kafka easily lets you divide up the work of publishing and consuming messages across a cluster of brokers. This is what it looks like:

simple kafka cluster

Why is Kafka so big now?

Kafka was originally developed at LinkedIn to handle large quantities of traffic and provide a platform for handling real-time data feeds.

Kafka is designed to store data in what could be thought of as a transactional nature. Groups of consumers keep track of where they are while reading a topic so multiple consumers can read lots of the data from the same topic while breaking up the work between them. If you wish, can read any existing topic starting from the beginning to get all of the messages that were sent.

Things to talk about when it comes to Kafka

In terms of evaluation of somebody's experience with Kafka, there are a couple of things you could ask about.

  • How many transactions per second did your system handle?
  • How did you decide how to size your cluster?
  • Why did you decide to go with Kafka over something simpler?
  • You can ask things about the number of topics, partitions, consumer groups, etc.
  • What challenges did you encounter implementing kafka in your system?

Generally asking "why" questions is a great way really understand if a person had decision making power, or if they really understood why they were doing something. But asking more specific questions about scale and challenges faced in implementation can be useful as well.

Summary

Message queues are a common architecture that might be encountered on the backend of many types of applications. Almost every significant application or company building software will have a message queue somewhere in their infrastructure once they get to a certain size.

Kafka is being adopted in many large organizations because of the ability to store data messages indefinitely and deal with high amounts of traffic. There are a lot of other features of Kafka that I didn't touch on (such as the stream processing system) but these are the basics and should help you understand a little bit about why people are using it and what it is for.


I try to write articles like this every week to help recruiters understand understand domain knowledge better when it comes to software engineering positions.

I am building a course that will cover information like this for tech recruiters, if you are interested in keeping updated with the things I am doing, please add your email to the mailing list below!