Docs / Clustering Guide

Clustering Guide

Maree-DB clustering is an Enterprise tier feature. Clusters provide automatic failover, Byzantine fault tolerance, and horizontal scalability.

Requirement: Enterprise tier licence. Minimum 3 nodes recommended (allows 1 node failure tolerance). 5 nodes recommended for Byzantine fault tolerance (f = 1, n = 5, f < n/3).

Step 1: Prepare Each Node

Install Maree-DB on each node. Ensure all nodes can communicate on the cluster port (default 7001).

# On each node: add cluster section to maree-db.toml
[cluster]
node_id    = "node-1"        # Unique ID for this node
cluster_id = "prod-cluster-1" # Same on all nodes
port       = 7001
seeds      = [
  "node-1.internal:7001",
  "node-2.internal:7001",
  "node-3.internal:7001"
]
byzantine_fault_tolerant = true  # Requires 5+ nodes for f=1

Step 2: Start the Bootstrap Node

# On node-1 (bootstrap node)
maree-db-server start --bootstrap-cluster

Step 3: Join Remaining Nodes

# On node-2 and node-3
maree-db-server start --join-cluster node-1.internal:7001

Step 4: Verify Cluster

# Check cluster status
maree-db-cli cluster status

# Output example:
# Cluster ID:    prod-cluster-1
# Status:        HEALTHY
# Leader:        node-1
# Nodes:         3/3 healthy
# BFT Mode:      OFF (need 5+ nodes)

# Via SQL:
SELECT * FROM _system.cluster_nodes;

Adding a Node

# 1. Install Maree-DB on the new node with cluster config
# 2. Add node-4.internal to seeds in all configs
# 3. Start the new node
maree-db-server start --join-cluster node-1.internal:7001

# 4. Confirm it joined
maree-db-cli cluster status

Removing a Node

# Graceful removal (node transfers data first)
maree-db-cli cluster remove-node node-3

# Force removal (for unresponsive nodes)
maree-db-cli cluster remove-node node-3 --force

Rolling Upgrades (Zero Downtime)

# Upgrade nodes one at a time
# Step 1: Drain node-1 (redirects traffic to other nodes)
maree-db-cli cluster drain node-1

# Step 2: Upgrade node-1 binary
curl -sSL https://dist.mareedb.com/install.sh | bash
maree-db-server restart

# Step 3: Confirm node-1 is healthy, then undrain
maree-db-cli cluster status
maree-db-cli cluster undrain node-1

# Repeat for node-2, node-3, etc.

Byzantine Fault Tolerance

In Byzantine fault tolerant mode (5+ nodes), Maree-DB can maintain correctness even if up to f nodes are compromised or behaving maliciously, where f < n/3 (n = number of nodes).

5 nodes: tolerates 1 compromised node
7 nodes: tolerates 2 compromised nodes
10 nodes: tolerates 3 compromised nodes

# Enable BFT mode (requires 5+ nodes)
[cluster]
byzantine_fault_tolerant = true

# Monitor consensus health
SELECT * FROM _system.consensus_status;

Cluster Monitoring

# Real-time cluster dashboard
maree-db-cli cluster monitor

# Query replication lag
SELECT node_id, lag_ms, status
FROM _system.replication_status;

# Check split-brain protection status
SELECT * FROM _system.quorum_status;