This is part one of a series which will cover our discovery, research, prototype and official on-boarding to the Apache Kafka.

Introduction

At Justworks, our main application driving core business is built on Ruby on Rails. And we are leveraging Sidekiq framework to run huge amount of computations as background processes. Sidekiq is an amazing framework and has served us well. But with our business growth our integrations with internal as well as external systems is growing exponentially. We realized that beyond a point managing such integrations via Rails or Sidekiq framework is not feasible. And we also identified that majority of such integrations could be decoupled (asynchronous) and need not be over HTTP (e.g. REST API). This inspired some of us to take the journey to discover a versatile Pub/Sub platform.

Discovery Phase

Back in Dec 2020, when everyone was enjoying Christmas and New Year break, one of our fellow engineering manager was busy working on a prototype using Kafka cloud offering from confluent. In Jan 2021, he started sharing his findings with fellow engineers and managers. Everyone felt that on-boarding to such platform will be amazing but were not sure how to get started. After few more rounds of brain storming, leadership agreed to have a focused group of engineers to do the discovery, research and prototype work.

This focused group of engineers started reading about various pub/sub and messaging platforms. Key criteria for this evaluation was:

  • Language agnostic
  • Durability
  • Support for retries/redeliveries
  • Fanout ease
  • Instrumentation, Alerting
  • Ordering guarantees
  • Scalability
  • Traceability
  • Latency and throughput
  • Retention - time and volume
  • T&M cost to On-boarding
  • Infrastructure needs
  • Maintenance and Support cost
  • Spin up - Tear down effort

After lot of due diligence, we arrived to the consensus that Apache Kafka ticks majority of the features we were looking for.

Research and deep learning Phase

After discovery phase, the next goal was to do deep research and learning about Apache Kafka. In this phase engineers read official Apache Kafka documentation, guides from confluent and various tech blog. Engineers then did 3 different Kafka courses on Udemy to practice the concepts. During this phase we also did hands-on with open source Apache kafka distribution, Confluent’s platform distribution, Confluent Cloud and AWS MSK. After lot of discussions with our DevOps, Security and leadership with agreed that at this point on-boarding to AWS MSK will be best feasible solution as we run all our applications on AWS infrastructure. Hence, running AWS MSK cluster within our VPC will alleviate our concerns around data security (in-transit as well as at rest), and operational costs.

In this research phase we also identified ruby kafka libraries and framework which we would use to build the solutions. Well we all love Ruby here ;)

Spinning the first official cluster and prototype Phase

In April 2021, we selected the first problem which we will solve using Kafka and worked on the High Level Design for it. After that we bootstrapped our first official AWS MSK cluster and development work started…

To be continued…

In the subsequent posts of this series we will share details of the problem we are solving and its architecture. And we will also share information about kafka libraries and framework we are working with. Stay tuned!!!