Thursday, June 13, 2013

The Erlang client for Apache Kafka 0.8

Apache Kafka 0.8 (http://kafka.apache.org/) represents a significant evolution in distributed messaging system that is of specific interest to me (more about that later).

The Erlang client for Apache Kafka 0.8 has been written to interface erlang to kafka. This client has the verbs and nouns to talk directly to kafka at the wire protocol  level. Since kafka 0.8 introduced a couple neat features, I though that I would talk about that here.


Since kafka has the ability of multiplexing different requests through the co-relation-id, we thought that it would be useful to bake this into the implementation. This also means that the responsibility of managing the relationship is pushed up the chain.

All of the formation of the request and the parsing of the request occurs in kafka_protocol. The protocol needs a little help from the kafka_client to managing the tcp connection. Essentially kafka_client sends the request formatted through kafka_protocol  to kafka; but does NOT wait for the response at the time of the request.

At the backend, as the response for the request becomes available (through the corelation id), it then is sent to kafka_client through parsing by kafka_protocol. The range of co-relation-id decides for the kafka protocol what the actual response was for out of the six requests. In that way, the co-relation id is opaque to an end client of kafka_client.


A typical invocation looks like:
       
        Pid = kafka_client:locate_kafka_client().

%% produce for Topic1, Partition 0 with two messages

        Cid = kafka_client:request_produce(Pid, [{<<"Topic1">>, [ {0, [{<<"Key1">>, <<"Msg1">>}, {<<"Key1">>, <<"Msg1">>}]}]]).

         Rid = kafka_client:response_produce (Pid, Cid)

The syntax of the produce (as with other requests) is ensure compatibility with the philosophy of kafka to be able to generate produce (and other requests) for multiple topics and multiple partitions within a topic through the same request.


erlkafka.hrl has all of the defined constants.


The whole thing is available on github @ https://github.com/milindparikh/mps. The kafka client is embedded into this project as a part of the larger goal of providing an eventing framework on top of kafka. (more about that soon).

UPDATE 06/14/2013: WE ARE AWARE ABOUT A BUG IN THE kafka_protocol.erl THAT DOES NOT HANDLE BROKEN TCP DATA CORRECTLY. THIS BUG WILL MANIFEST ITSELF INTERMINENTLY UNTIL IT IS FIXED BY THROWING AN EXCEPTION TO WHICH THE ONLY REMEDY IS TO CURRENTLY RESTART THE ERLANG NODE.

UPDATE 06/16/2013 This bug has been fixed in 0.4.6 and also master

No comments:

Post a Comment