Zookeeper - Season 2
- 7 minsWelcome Back!
This is a follow up blog post to Zookeeper - All you need to know. If you have not read that blog post, I would highly recommend to go check that out first.
Modes in Zookeeper
Zookeeper can run in two modes:
- Standalone: Only one server (can be used in development environment)
- Quorum Mode/Ensemble Mode:
2F + 1servers, whereFis the number of server failures one can tolerate
How to configure a Zookeeper Ensemble
- Each Server has a configuration file associated with itself
- Each configuration file lists all the servers that make up the ensemble
- Each server is identified by an id called
sidor Server Identifier myidfile: tells the server what its id is supposed to bemyidfile is kept inside thedataDirfor each serverclientPortis where clients connect. Mention this in theconnectionString- Information related to ensemble is provided in
server.nentries - Each
server.nentry specifies the address and port numbers used by the servers- Example -
server.5=127.0.0.1:2222:2223 - This example indicates that server with
myid=5 (S5) will use 2222 for communication and 2223 for leader election. (Shown in the logs below)
- Example -
What happens when a Zookeeper server is started?
Standalone Mode
Keep calm and let clients connect
Ensemble Mode
-
When a server is started, it looks frantically for other servers mentioned in its config file
-
As long as its the only server up and running it will keep throwing
java.net.ConnectException -
As soon as another server joins, it proposes an election to take place
-
The following logs will be printed to the console:
My election bind port: /127.0.0.1:2223 LOOKING New election. My id = 5, proposed zxid=0xf00000002 -
Each server notifies other servers of its proposed leader using a ‘Notification Message’
Notification Message
Notifications are messages that let other peers know that a given peer has changed its vote, either because it has joined leader election or because it learned of another peer with higher zxid or same zxid and higher server id
Examples -
Notification: 1 (message format version), 1 (n.leader), 0x900000003 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x9 (n.peerEpoch) LEADING (my state)
Notification: 1 (message format version), 3 (n.leader), 0xa00000001 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0xa (n.peerEpoch) LEADING (my state)
Notification: 1 (message format version), 5 (n.leader), 0xf00000002 (n.zxid), 0x1 (n.round), LOOKING (n.state), 5 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
Notification: 1 (message format version), 2 (n.leader), 0xd00000000 (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 1 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
Reading the Notification Message / Notification Log
n.sidis the server which sent the notificationn.leaderis the proposed leader; this can be the actual leader in case one existsn.stateis the state of the sender (sid)n.zxidis the timestamp of the last change of the proposed leader
So, this is how you should read a notification message, taking 3rd and 4th example from above:
- Server 5 (S5) is currently in
LOOKINGstate and proposes itself as the leader - Server 1 (S1) is currently in
FOLLOWINGstate and proposes Server 2 (S2) as the leader (maybe S2 is already theLEADERsince S1 is inFOLLOWINGstate)
Types of Servers in Zookeeper
- Leader
- Follower
- Observer: Observers do not participate in an election. They are used for High Availability
Leader, Follower are also called Participants since they participate in an election
Follower, Observer are also called Learners
More about messages
LEADER sends following types of messages:
SNAPentire snapshot transferDIFFcontains most recent missing transactionsINFORMis information for Observer serverPROPOSALproposes state changes to FollowersCOMMITasks the Followers to commit the changes conveyed inPROPOSALHEARTBEAT
Let us understand these messages



Zookeeper in 14 steps
Since you have been so patient in understanding and reading about all these concepts, this is where you are going to reward yourself by making sense of it all.
- A server connects in
LOOKINGstate - Election takes place
LEADERis electedFOLLOWERsyncs withLEADER- Based on how much a
FOLLOWERlags behind, theLEADERsendsDIFF(most recent missing transactions) or aSNAP(entire snapshot transfer) - Send
INFORMmessage toOBSERVER
- Based on how much a
- Client connects to
FOLLOWER/OBSERVER(also calledLEARNER) LERNERforwards session information of the client to theLEADERLERNERresponds to read requests locally- Client sends write request to the
LERNER LERNERforwards write requests to theLEADERLEADERtransforms the request into a transactionLEADERsends aPROPOSEs theFOLLOWERabout the transactionFOLLOWERresponds with ACK- If majority of ensemble replies with ACK,
LEADERsends aCOMMITto all theFOLLOWER LEADERsendsINFORMmessage toOBSERVER
Caveat to point 12
Before acknowledging the PROPOSAL, the FOLLOWER needs to perform these checks:
PROPOSALis from theLEADERit is currently following- ACK and
COMMITtransactions in the same order as broadcasted by theLEADER
Miscellaneous points
tickis the basic unit of measurement for time used by ZookeeperLEADERPINGs every half-tick- Availability depends on Quorum
- Observers do not take part in Quorum
epochnumber increases after each election- Client can not connect to a server that has not seen an update that the client might have seen. i.e.
zxidof client should be less than or equal tozxidof server.
In a single machine, if a process fails, other processes can detect the failure from OS. However, in a distributed system, the processes which are still running are responsible to detect failure of other processes
Directories to watch out for
A typical Zookeeper installation on Ubuntu has the following components.
/usr/share/zookeeper//etc/zookeeper/var/lib/zookeeper/etc/alternatives/zookeeper-conf
logs
tail -f /var/log/zookeeper/zookeeper.log
start zookeeper service
sudo /usr/share/zookeeper/bin/zkServer.sh
What next?
I have done several experiments with Zookeeper. Experiments with different configurations, added weights to each vote and more scenarios. Sharing all experiments is not feasible here, but depending on the response to this Zookeeper series of two posts, I may write another follow-up blog in the future regarding one or two of my experiments. Till then, adios amigos!