org.apache.catalina.tribes.group.interceptors
public class NonBlockingCoordinator extends ChannelInterceptorBase
Title: Auto merging leader election algorithm
Description: Implementation of a simple coordinator algorithm that not only selects a coordinator, it also merges groups automatically when members are discovered that werent part of the
This algorithm is non blocking meaning it allows for transactions while the coordination phase is going on
This implementation is based on a home brewed algorithm that uses the AbsoluteOrder of a membership
to pass a token ring of the current membership.
This is not the same as just using AbsoluteOrder! Consider the following scenario:
Nodes, A,B,C,D,E on a network, in that priority. AbsoluteOrder will only work if all
nodes are receiving pings from all the other nodes.
meaning, that node{i} receives pings from node{all}-node{i}
but the following could happen if a multicast problem occurs.
A has members {B,C,D}
B has members {A,C}
C has members {D,E}
D has members {A,B,C,E}
E has members {A,C,D}
Because the default Tribes membership implementation, relies on the multicast packets to
arrive at all nodes correctly, there is nothing guaranteeing that it will.
To best explain how this algorithm works, lets take the above example:
For simplicity we assume that a send operation is O(1) for all nodes, although this algorithm will work
where messages overlap, as they all depend on absolute order
Scenario 1: A,B,C,D,E all come online at the same time
Eval phase, A thinks of itself as leader, B thinks of A as leader,
C thinks of itself as leader, D,E think of A as leader
Token phase:
(1) A sends out a message X{A-ldr, A-src, mbrs-A,B,C,D} to B where X is the id for the message(and the view)
(1) C sends out a message Y{C-ldr, C-src, mbrs-C,D,E} to D where Y is the id for the message(and the view)
(2) B receives X{A-ldr, A-src, mbrs-A,B,C,D}, sends X{A-ldr, A-src, mbrs-A,B,C,D} to C
(2) D receives Y{C-ldr, C-src, mbrs-C,D,E} D is aware of A,B, sends Y{A-ldr, C-src, mbrs-A,B,C,D,E} to E
(3) C receives X{A-ldr, A-src, mbrs-A,B,C,D}, sends X{A-ldr, A-src, mbrs-A,B,C,D,E} to D
(3) E receives Y{A-ldr, C-src, mbrs-A,B,C,D,E} sends Y{A-ldr, C-src, mbrs-A,B,C,D,E} to A
(4) D receives X{A-ldr, A-src, mbrs-A,B,C,D,E} sends sends X{A-ldr, A-src, mbrs-A,B,C,D,E} to A
(4) A receives Y{A-ldr, C-src, mbrs-A,B,C,D,E}, holds the message, add E to its list of members
(5) A receives X{A-ldr, A-src, mbrs-A,B,C,D,E}
At this point, the state looks like
A - {A-ldr, mbrs-A,B,C,D,E, id=X}
B - {A-ldr, mbrs-A,B,C,D, id=X}
C - {A-ldr, mbrs-A,B,C,D,E, id=X}
D - {A-ldr, mbrs-A,B,C,D,E, id=X}
E - {A-ldr, mbrs-A,B,C,D,E, id=Y}
A message doesn't stop until it reaches its original sender, unless its dropped by a higher leader.
As you can see, E still thinks the viewId=Y, which is not correct. But at this point we have
arrived at the same membership and all nodes are informed of each other.
To synchronize the rest we simply perform the following check at A when A receives X:
Original X{A-ldr, A-src, mbrs-A,B,C,D} == Arrived X{A-ldr, A-src, mbrs-A,B,C,D,E}
Since the condition is false, A, will resend the token, and A sends X{A-ldr, A-src, mbrs-A,B,C,D,E} to B
When A receives X again, the token is complete.
Optionally, A can send a message X{A-ldr, A-src, mbrs-A,B,C,D,E confirmed} to A,B,C,D,E who then
install and accept the view.
Lets assume that C1 arrives, C1 has lower priority than C, but higher priority than D.
Lets also assume that C1 sees the following view {B,D,E}
C1 waits for a token to arrive. When the token arrives, the same scenario as above will happen.
In the scenario where C1 sees {D,E} and A,B,C can not see C1, no token will ever arrive.
In this case, C1 sends a Z{C1-ldr, C1-src, mbrs-C1,D,E} to D
D receives Z{C1-ldr, C1-src, mbrs-C1,D,E} and sends Z{A-ldr, C1-src, mbrs-A,B,C,C1,D,E} to E
E receives Z{A-ldr, C1-src, mbrs-A,B,C,C1,D,E} and sends it to A
A sends Z{A-ldr, A-src, mbrs-A,B,C,C1,D,E} to B and the chain continues until A receives the token again.
At that time A optionally sends out Z{A-ldr, A-src, mbrs-A,B,C,C1,D,E, confirmed} to A,B,C,C1,D,E
To ensure that the view gets implemented at all nodes at the same time, A will send out a VIEW_CONF message, this is the 'confirmed' message that is optional above.
Ideally, the interceptor below this one would be the TcpFailureDetector to ensure correct memberships
The example above, of course can be simplified with a finite statemachine:
But I suck at writing state machines, my head gets all confused. One day I will document this algorithm though.
Maybe I'll do a state diagram :)
Version: 1.0
Nested Class Summary | |
---|---|
static class | NonBlockingCoordinator.CoordinationEvent |
static class | NonBlockingCoordinator.CoordinationMessage |
Field Summary | |
---|---|
protected AtomicBoolean | coordMsgReceived |
protected static byte[] | COORD_ALIVE
Alive message |
protected static byte[] | COORD_CONF
Coordination confirmation, for blocking installations |
protected static byte[] | COORD_HEADER
header for a coordination message |
protected static byte[] | COORD_REQUEST
Coordination request |
protected Object | electionMutex |
protected Membership | membership
Our nonblocking membership |
protected boolean | started |
protected int | startsvc |
protected UniqueId | suggestedviewId
indicates that we are running an election
and this is the one we are running |
protected Membership | suggestedView |
protected Membership | view
Our current view |
protected UniqueId | viewId
Out current viewId |
protected long | waitForCoordMsgTimeout
Time to wait for coordination timeout |
Constructor Summary | |
---|---|
NonBlockingCoordinator() |
Returns: Member
Returns: Member
Parameters: mbr Member
Returns: Member
Returns: all members or empty array