Message Queues And Kafka Explained in Plain English

Post Author: Aaron Decker
Date published: November 02, 2019

Are you trying to hire Kafka experts? Want to understand why kafka is gaining so much traction and hiring managers are looking for people with this skill?

Well, I wrote this article to help illuminate kafka a little bit, but I also explain message queues in general first to build up the foundation to explain Kafka.

What is a message queue

The purpose of a message queue is to help reliably deliver communications (or messages). If you google "what is a message queue" you will get an answer like:

Message queues provide an asynchronous communications protocol, meaning that the sender and receiver of the message do not need to interact with the message queue at the same time.

But I think you can explain this even more simply. You can think of a message queue like a PO box. The mailman might deliver a message there any time, and can keep putting them there until you go collect them. They can build up or be retrieved immediately.

In this example the mailman is the "Publisher" or "Producer" and when you go get mail you are the "Subscriber" or "Consumer".

Here is a diagram of this system:

Why use a message queue?

Well, there are a lot of reasons but basically the answer is that real world systems are a lot larger and more complicated than a simple single consumer, single producer postal system.

You might have many consumers of the same message, you might have to deal with fault tolerance (what if something breaks while processing a message?) or you might want to be able to track when messages are being delivered.

Here is a diagram of what things might actually look like inside of a social media application that would allow you to upload photos. A user uploads a photo and it is saved to a database, in addition to a message queue. Multiple consumers use this notification so that they know a user has uploaded a new photo, but ultimately it will show up in the "notifications" of your friends.

What is Kafka

So with the basics of the "what" and "why" of message queues out of the way, now what is Kafka?

The chief difference with kafka is storage, it saves data using a commit log. Kafka stores the messages that you send to it in Topics. Consumers can "replay" these messages if they wish. Normally in message queues, the messages are removed after subscribers have confirmed their receipt.

Another thing different about kafka is that the topics are ordered (by date they were added). Not all message queues guarantee this.

Individual Kafka servers that store messages are called "Brokers". Brokers are typically used in a cluster, which means many servers are linked together to handle lots of data and traffic. Topics may be further broken down in "Partitions" which are divided across brokers.

Kafka easily lets you divide up the work of publishing and consuming messages across a cluster of brokers. This is what it looks like:

Why is Kafka so big now?

Kafka was originally developed at LinkedIn to handle large quantities of traffic and provide a platform for handling real-time data feeds.

Kafka is designed to store data in what could be thought of as a transactional nature. Groups of consumers keep track of where they are while reading a topic so multiple consumers can read lots of the data from the same topic while breaking up the work between them. If you wish, can read any existing topic starting from the beginning to get all of the messages that were sent.

Things to talk about when it comes to Kafka

In terms of evaluation of somebody's experience with Kafka, there are a couple of things you could ask about.

How many transactions per second did your system handle?
How did you decide how to size your cluster?
Why did you decide to go with Kafka over something simpler?
You can ask things about the number of topics, partitions, consumer groups, etc.
What challenges did you encounter implementing kafka in your system?

Generally asking "why" questions is a great way really understand if a person had decision making power, or if they really understood why they were doing something. But asking more specific questions about scale and challenges faced in implementation can be useful as well.

Summary

Message queues are a common architecture that might be encountered on the backend of many types of applications. Almost every significant application or company building software will have a message queue somewhere in their infrastructure once they get to a certain size.

Kafka is being adopted in many large organizations because of the ability to store data messages indefinitely and deal with high amounts of traffic. There are a lot of other features of Kafka that I didn't touch on (such as the stream processing system) but these are the basics and should help you understand a little bit about why people are using it and what it is for.

Want updates?

Want new posts about tech topics emailed to you? Sign up to the list below 👇

Also, if you are interested in learning technical topics through a video course specifically created for recruiters, don't forget to check out the courses I offer.

The main course "How to Speak Software Engineering Jargon for Recruiters" is specifically designed to help tech recruiters get up to speed fast on technical topics.

Written By Aaron Decker

I'm currently a co-founder and head of engineering at a venture backed startup called Bounty. I tend to think of myself as a backend engineer that can work up and down the stack in Typescript. Previously, I have worked as a Tech Lead and hired teams, and as a Senior Software Engineer at multiple fortune 500 companies building large products. I also did a brief stint teaching programming courses as an Adjunct Instructor at a local community college, which taught me a lot about breaking down complex things into understandable chunks.