A Kafka audit system
Chaperone is a Kafka audit system that monitors the completeness and latency of data stream. Audit metrics are persisted in database for Kafka users to quantify the loss of their topics (if any). Basically, Chaperone cuts timeline into 10min buckets and assigns message to the corresponding bucket according to its event time. Bucket stats, like the total message count, are updated accordingly. Periodically, stats are sent out to a dedicated Kafka topic, say
ChaperoneCollector consumes those stats from this topic, and persists them into database.
Chaperone is made of several components:
ChaperoneClientis a library that can be put in like Kafka Producer or Consumer to audit messages as they flow through. The audit stats are sent to a dedicated Kafka topic, i.e.
ChaperoneCollectorconsumes audit stats from 'chaperone-audit' and persists them into database.
ChaperoneServiceaudits messages kept in Kafka. Since it is built upon uReplicator, it consists of two subsystems:
ChaperoneServiceControllerto auto-detect topics in Kafka and assign the topic-partitions to workers to audit;
ChaperoneServiceWorkerto audit messages from assigned topic-partitions. In particular,
ChaperoneCollectortogether ensure each message is audited exactly once.