What is an event driven architecture and how does kafka fit into that paradigm. Lets understand the basics:
Event
An event in an EDA is an update or a state change that occurred. E.g. A shipping ticket order created, a truck became available
Message
Data that gets published due to an event is called a message. Data usually conforms to some data schema and has all the required fields for all actors that care about the event to act on it
Three components of an EDA
Producers
Producers act on events that happen and create messages with all the data that would be needed for any actor down the line in the processing chain and publish that message to a channel or a router. Kafka is an event channel. Producers can produce messages to kafka or any other event router such as MSMQ or RabbitMQ. Producers do not directly interact with the consumer and they don’t know if a consumer exists for the message being produced.
Consumers (Sinks)
Consumers read messages as provided by the event channel such as Kafka and perform actions that they need to. A single message can be consumed by many different consumers to do different actions on the same message
Channels (Routers)
Kafka is an event channel. Event channels store the message for a certain period of time and make it available for any consumer to consume and process the data. Kafka can be configured to store messages for 24 hours or for a long period of time. A message that has already been processed is occupying wasteful space and hence it’s better to have short retention for the messages. For details on kafka see the other blog https://aqibtech.net/apache-kafka-vs-confluent-kafka/
Loose coupling
Components of a system are considered loosely coupled if each component can independently act without the knowledge of the other parts of the system. Each component can be serviced, brought down, replaced or modified without the need or necessity to modify or service other parts of the system. In this scenario the kafka consumers, producers and kafka storage itself act independently of each other.
Sync vs Async
Synchronous communication is when the communication happens in real time and asynchronous communication is when the components in a system interact without the requirement or expectation for any component to respond immediately. A web service making a db call to store data and waiting for a response is sync communication. A kafka producer sending a message to kafka and waiting for kafka to confirm that the message was successfully written is sync communication. However in the above example kafka producer, New Dock Processor, acts on CDC (Change Data Capture) from New DB and produces the message to the kafka topic ME2Dispatch_New. The Legacy processor consumes this message at its own pace and writes to the Legacy DB and then writes a resultant message to the other kafka topic called ME2Dispatch_Legacy. New Dock processor then consumes the message from the legacy topic which is actually a response to the message that was produced earlier in the “New” topic. The entire cycle can be completed in a few milliseconds based on how efficient the processing business logic is however due to the async approach overall latency decreases and many more messages can be processed in unit time
What does this setup lead to?
Loose Coupling
As described above each of the components of the kafka ecosystem or any event driven architecture should not and cannot assume about the other counterpart and hence is loosely coupled with other components. A procedure can keep producing data to kafka even if there are no consumers ready to consume the data. The data will stay in Kafka (based on retention window) and whenever consumer becomes available, every event that was created will eventually get processed
Atomicity of events
Each event generated should have no dependency on any other event generated before or after and hence each event should be self contained in itself.
Business Process Unit Abstraction
Any event should not include internal business logic details or should not be tied to a specific implementation. More leaky the abstraction more the messages and the events will be resistant to changes and updates
Specific events (non generic)
If the events and messages are very generic then all events that happen can be represented by the same event and then each of the events get separated out by some textual data leading to textual parsing and bad performance because each consumer has to deserialize the data in order to know whether its interested in the event or not.
Asynchronous
Events can be and will be generated out of order some times. If a sequence is desired then the partitioning key should be the unique identifier of an event.
Self Contained
Each event should have all data that any downstream processor (or consumer) can ever need. An event that depends on events before or after is flawed and will lead to issues
Hope this was helpful to you. For questions / concerns / comments / help feel free to reach out to the Data Experts at Aqib Technologies!
Leave A Comment