Rejoin Mechanism
The sections below explain the joining and rejoining process for the different types of nodes.
Some terms used are:
- Launch script: This refers to either of the scripts supplied in the Mainnet joining page (i.e.,
launch_docker.sh
andlaunch.sh
) orstart.sh
for guard nodes - Upper seed: A seed node that the node can query for blockchain data. One or more upper seeds are normally listed in
constants.xml
The joining or rejoining process relies on a m_syncType
setting, which can be any of these values:
SyncType | Purpose |
---|---|
(0) NO_SYNC |
Indicates that a node is fully synced |
(1) NEW_SYNC |
New node (possibly sharded) joining or rejoining |
(2) NORMAL_SYNC |
New node (unsharded) joining |
(3) DS_SYNC |
DS node rejoining |
(4) LOOKUP_SYNC |
Lookup node rejoining |
(5) RECOVERY_ALL_SYNC |
Launching entire network from existing blockchain |
(6) NEW_LOOKUP_SYNC |
New lookup node joining |
(7) GUARD_DS_SYNC |
DS guard node rejoining |
Note
Guard-specific sequences have been omitted to simplify the sections below.
New Node Joining
Note
This also applies to existing shard nodes who attempt to rejoin using the launch script.
- Launch script downloads the latest persistence from AWS S3 incremental DB using
download_incr_db.py
- Launch script starts the node (i.e., the
zilliqa
process) withm_syncType = NEW_SYNC
- The node reads out the local
persistence
which was updated by the launch script - The node recreates the current state using the base state and state deltas fetched from incremental DB
- Since
m_syncType
is notNO_SYNC
, the node blocks some messages that will normally be processed by a synced node - The node starts synchronization using
Node::StartSynchronization()
Node::StartSynchronization()
- Send request to all upper seeds to remove the node's IP address from their relaxed blacklist
- Fetch the more recent DS blocks and Tx blocks from a random upper seed
- Separate threads process Tx blocks upon receipt, fetching the corresponding state deltas and calculating the current state for each one
- If the latest Tx block is for a non-vacuous epoch:
- Fetch the latest sharding structure from a random upper seed and check if this node is already part of a shard
- If it's not part of any shard, then it's considered a new miner, in which case continue to fetch recent blocks according to the earlier step
- If it's already part of a shard
- Set the shard parameters (members and ID)
- Change
m_syncType
toNO_SYNC
and stop blocking messages - Send request to shard peers for removal from relaxed blacklist
- Start next Tx epoch by initializing node variables (e.g.,
m_consensusID
,m_consensusLeaderID
, etc.), checking current role (i.e., shard leader or backup), initializing Rumor Manager, and proceeding with microblock consensus - At this point the node has successfully rejoined the network as an existing shard node
- If the latest Tx block is for a vacuous epoch:
- Move state updates to disk after calculating the state
- Fetch latest DS committee information, and send request to a random upper seed to let this node know when to start PoW mining
- Start mining upon receiving the notification from the upper seed
- If the next DS block includes this node in the sharding information:
- Change
m_syncType
toNO_SYNC
and stop blocking messages - At this point the node has successfully joined the network as a new shard node
- Change
- If the node fails to receive the next DS block in time:
- Fetch the latest DS block from a random upper seed
- If a new DS block was in fact created, it means this node lost PoW. Continue syncing until next vacuous epoch as done above
- If the node fails to get a new DS block from the upper seed, set
syncType = NORMAL_SYNC
and triggerNode::RejoinAsNormal
The node maintains a while loop within Node::StartSynchronization()
while all the steps above are performed (except the relaxed blacklist removal request). It exits the while loop when m_syncType
becomes NO_SYNC
.
Node::RejoinAsNormal()
- Set
SyncType = NORMAL_SYNC
- Download latest persistence from AWS S3 incremental DB
- Retrieve the downloaded persistent storage
- Recreate the current state using the base state and state deltas fetched from incremental DB
- Since
m_syncType
is notNO_SYNC
, block some messages that will normally be processed by a synced node - Start synchronization using
Node::StartSynchronization()
DS Node Joining
Note
This also applies to existing DS nodes who attempt to rejoin using the launch script.
This procedure mirrors that of new node joining, with some differences:
- After recreating the current state, check if the node is part of the current DS committee. If yes:
- Recreate the coinbase for all Tx blocks and microblocks from the start of the latest DS epoch
- Fetch missing cosignatures (needed for coinbase recreation) from a random upper seed
- Send request to all upper seeds for removal from relaxed blacklist
- Trigger
DirectoryService::StartSynchronization()
- If the node is not part of the current DS committee, trigger
Node::RejoinAsNormal()
DirectoryService::RejoinAsDS()
This procedure mirrors Node::RejoinAsNormal()
, with some differences:
- Set
SyncType = DS_SYNC
- Start synchronization using
DirectoryService::StartSynchronization()
DirectoryService::StartSynchronization()
This procedure mirrors Node::StartSynchronization()
, with some differences:
- The node doesn't need to check for shard membership. However, after recreating current state, if the node is no longer part of DS committee, then trigger
Node::RejoinAsNormal()
- After recreating current state, if a new DS epoch has started, then fetch the updated sharding structure again
- Start next Tx epoch by initializing node variables (e.g.,
m_consensusID
,m_consensusLeaderID
, etc.), checking current role (i.e., DS leader or backup), initializing Rumor Manager, and proceeding with microblock consensus - If the latest Tx block is for a non-vacuous epoch, set state to
MICROBLOCK_SUBMISSION
- If the latest Tx block is for a vacuous epoch, set state to
POW_SUBMISSION
- At this point the node has successfully rejoined the network as an existing DS node
Other Conditions That Trigger DS Node Rejoining
- When a view change occurs, DS nodes initially perform a pre-check. One of the reasons pre-check can fail is if a new DS block or Tx block was mined during the pre-check and this particular node failed to participate in the consensus for that block. This will cause the node to invoke
DirectoryService::RejoinAsDS()
- If
Node::Install()
fails for whatever reason, the DS node checks if it is still part of the DS committee. If it is, it triggersRejoinAsDS()
. If not, it triggersRejoinAsNormal()
- If the node is started with
SyncType
ofDS_GUARD_SYNC
, it triggersRejoinAsDS()
Seed Node Joining
This procedure mirrors that of new node joining, with some differences:
- Launch script starts the node with
m_syncType = NEW_LOOKUP_SYNC
- The node starts synchronization using
Lookup::InitSync()
Lookup::InitSync()
- Fetch the more recent DS blocks and Tx blocks from a random upper seed
- Separate threads process Tx blocks upon receipt, fetching the corresponding state deltas and calculating the current state for each one
- If the latest Tx block is for a vacuous epoch:
- Move state updates to disk after calculating the state
- Fetch any microblocks from a random upper seed for newly received Tx blocks as well as for the last
N
Tx blocks read out from persistence - Fetch the latest sharding structure from a random upper seed
- Set
syncType = NO_SYNC
- At this point the node has successfully rejoined the network as a seed node
The node maintains a while loop within Lookup::InitSync()
while all the steps above are performed. It exits the while loop when m_syncType
becomes NO_SYNC
.
Lookup::RejoinAsNewlookup()
A seed node can potentially miss receiving a Tx block or DS block, in which case it goes out of sync and triggers RejoinAsNewlookup
to rejoin.
- Set
syncType = NEW_LOOKUP_SYNC
- If the number of missing Tx blocks exceeds
NUM_FINAL_BLOCK_PER_POW
: - Download latest persistence from AWS S3 incremental DB
- Retrieve the downloaded persistent storage
- Start synchronization using
Lookup::InitSync()
- If the number of missing Tx blocks does not exceed
NUM_FINAL_BLOCK_PER_POW
: - Start synchronization using
Lookup::StartSynchronization()
Lookup::StartSynchronization()
- Fetch the more recent DS blocks and Tx blocks from a random upper seed
- Separate threads process Tx blocks upon receipt, fetching the corresponding state deltas and calculating the current state for each one
- If the latest Tx block is for a vacuous epoch:
- Move state updates to disk after calculating the state
- Fetch any microblocks from a random upper seed for newly received Tx blocks as well as for the last
N
Tx blocks read out from persistence - Fetch the latest DS committee information from a random upper seed
- Set
syncType = NO_SYNC
- At this point the node has successfully rejoined the network as a seed node
Lookup Node Rejoining
A lookup node can potentially miss receiving a Tx block or DS block, in which case it goes out of sync and triggers RejoinAsLookup
to rejoin.
Lookup::RejoinAsLookup()
- Set
syncType = LOOKUP_SYNC
- Start synchronization using
Lookup::StartSynchronization()