Zookeeper - Season 2
- 7 minsWelcome Back!
This is a follow up blog post to Zookeeper - All you need to know. If you have not read that blog post, I would highly recommend to go check that out first.
Modes in Zookeeper
Zookeeper can run in two modes:
- Standalone: Only one server (can be used in development environment)
- Quorum Mode/Ensemble Mode:
2F + 1
servers, whereF
is the number of server failures one can tolerate
How to configure a Zookeeper Ensemble
- Each Server has a configuration file associated with itself
- Each configuration file lists all the servers that make up the ensemble
- Each server is identified by an id called
sid
or Server Identifier myid
file: tells the server what its id is supposed to bemyid
file is kept inside thedataDir
for each serverclientPort
is where clients connect. Mention this in theconnectionString
- Information related to ensemble is provided in
server.n
entries - Each
server.n
entry specifies the address and port numbers used by the servers- Example -
server.5=127.0.0.1:2222:2223
- This example indicates that server with
myid
=5 (S5) will use 2222 for communication and 2223 for leader election. (Shown in the logs below)
- Example -
What happens when a Zookeeper server is started?
Standalone Mode
Keep calm and let clients connect
Ensemble Mode
-
When a server is started, it looks frantically for other servers mentioned in its config file
-
As long as its the only server up and running it will keep throwing
java.net.ConnectException
-
As soon as another server joins, it proposes an election to take place
-
The following logs will be printed to the console:
My election bind port: /127.0.0.1:2223 LOOKING New election. My id = 5, proposed zxid=0xf00000002
-
Each server notifies other servers of its proposed leader using a ‘Notification Message’
Notification Message
Notifications are messages that let other peers know that a given peer has changed its vote, either because it has joined leader election or because it learned of another peer with higher zxid
or same zxid
and higher server id
Examples -
Notification: 1 (message format version), 1 (n.leader), 0x900000003 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x9 (n.peerEpoch) LEADING (my state)
Notification: 1 (message format version), 3 (n.leader), 0xa00000001 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0xa (n.peerEpoch) LEADING (my state)
Notification: 1 (message format version), 5 (n.leader), 0xf00000002 (n.zxid), 0x1 (n.round), LOOKING (n.state), 5 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
Notification: 1 (message format version), 2 (n.leader), 0xd00000000 (n.zxid), 0x1 (n.round), FOLLOWING (n.state), 1 (n.sid), 0xf (n.peerEpoch) LOOKING (my state)
Reading the Notification Message / Notification Log
n.sid
is the server which sent the notificationn.leader
is the proposed leader; this can be the actual leader in case one existsn.state
is the state of the sender (sid
)n.zxid
is the timestamp of the last change of the proposed leader
So, this is how you should read a notification message, taking 3rd and 4th example from above:
- Server 5 (S5) is currently in
LOOKING
state and proposes itself as the leader - Server 1 (S1) is currently in
FOLLOWING
state and proposes Server 2 (S2) as the leader (maybe S2 is already theLEADER
since S1 is inFOLLOWING
state)
Types of Servers in Zookeeper
- Leader
- Follower
- Observer: Observers do not participate in an election. They are used for High Availability
Leader, Follower are also called Participants since they participate in an election
Follower, Observer are also called Learners
More about messages
LEADER
sends following types of messages:
SNAP
entire snapshot transferDIFF
contains most recent missing transactionsINFORM
is information for Observer serverPROPOSAL
proposes state changes to FollowersCOMMIT
asks the Followers to commit the changes conveyed inPROPOSAL
HEARTBEAT
Let us understand these messages
Zookeeper in 14 steps
Since you have been so patient in understanding and reading about all these concepts, this is where you are going to reward yourself by making sense of it all.
- A server connects in
LOOKING
state - Election takes place
LEADER
is electedFOLLOWER
syncs withLEADER
- Based on how much a
FOLLOWER
lags behind, theLEADER
sendsDIFF
(most recent missing transactions) or aSNAP
(entire snapshot transfer) - Send
INFORM
message toOBSERVER
- Based on how much a
- Client connects to
FOLLOWER
/OBSERVER
(also calledLEARNER
) LERNER
forwards session information of the client to theLEADER
LERNER
responds to read requests locally- Client sends write request to the
LERNER
LERNER
forwards write requests to theLEADER
LEADER
transforms the request into a transactionLEADER
sends aPROPOSE
s theFOLLOWER
about the transactionFOLLOWER
responds with ACK- If majority of ensemble replies with ACK,
LEADER
sends aCOMMIT
to all theFOLLOWER
LEADER
sendsINFORM
message toOBSERVER
Caveat to point 12
Before acknowledging the PROPOSAL
, the FOLLOWER
needs to perform these checks:
PROPOSAL
is from theLEADER
it is currently following- ACK and
COMMIT
transactions in the same order as broadcasted by theLEADER
Miscellaneous points
tick
is the basic unit of measurement for time used by ZookeeperLEADER
PINGs every half-tick- Availability depends on Quorum
- Observers do not take part in Quorum
epoch
number increases after each election- Client can not connect to a server that has not seen an update that the client might have seen. i.e.
zxid
of client should be less than or equal tozxid
of server.
In a single machine, if a process fails, other processes can detect the failure from OS. However, in a distributed system, the processes which are still running are responsible to detect failure of other processes
Directories to watch out for
A typical Zookeeper installation on Ubuntu has the following components.
/usr/share/zookeeper/
/etc/zookeeper
/var/lib/zookeeper
/etc/alternatives/zookeeper-conf
logs
tail -f /var/log/zookeeper/zookeeper.log
start zookeeper service
sudo /usr/share/zookeeper/bin/zkServer.sh
What next?
I have done several experiments with Zookeeper. Experiments with different configurations, added weights to each vote and more scenarios. Sharing all experiments is not feasible here, but depending on the response to this Zookeeper series of two posts, I may write another follow-up blog in the future regarding one or two of my experiments. Till then, adios amigos!