How to Setup a High Availability Pacemaker Cluster on CentOS

A High Availability (HA) cluster ensures that your services remain online even if a server hardware failure occurs. This guide walks you through a production-ready setup using Pacemaker, Corosync, and Shared Storage.

1. Prerequisites & Environment

You will need two nodes (servers) running CentOS Stream 9.

  • node1: 192.168.1.10
  • node2: 192.168.1.11
  • Virtual IP (VIP): 192.168.1.50 (This is the IP clients will connect to)

On both nodes, add the following to /etc/hosts:

Bash

192.168.1.10 node1
192.168.1.11 node2

2. Installation and Initial Setup

Run these commands on both nodes to prepare the environment.

Enable HA Repositories and Install Software

Bash

# Enable the HA repository
sudo dnf config-manager --set-enabled highavailability

# Install the cluster stack and web server
sudo dnf install -y pacemaker pcs corosync resource-agents fence-agents-all nfs-utils nginx

Firewall Configuration

Allow cluster communication through the firewall:

Bash

sudo firewall-cmd --permanent --add-service=high-availability
sudo firewall-cmd --reload

Set Cluster Password

The hacluster user is created automatically. Set a password for it:

Bash

sudo passwd hacluster
sudo systemctl enable --now pcsd

3. Creating the Cluster

Run these commands on Node 1 only to link the servers together.

Bash

# Authenticate the nodes
sudo pcs host auth node1 node2 -u hacluster

# Create and start the cluster
sudo pcs cluster setup my_cluster node1 node2
sudo pcs cluster start --all
sudo pcs cluster enable --all

4. Configuring Fencing (STONITH)

Fencing is mandatory for data integrity when using shared storage. It ensures a “rogue” node is powered off before the healthy node takes over.

Setup a Quorum Device (Optional Tie-breaker)

On a third machine (not node1 or node2):

Bash

sudo dnf install -y corosync-qnetd
sudo systemctl enable --now corosync-qnetd

Then, on Node 1, add it:

Bash

sudo pcs quorum device add model net host=<QNETD_IP>

Enable STONITH

If you have IPMI or a Cloud Fencing agent, configure it now. For testing purposes only, you can disable it (not recommended for production):

Bash

sudo pcs property set stonith-enabled=false

5. Shared Storage & Application Resources

We will group the Virtual IP, NFS Mount, and Nginx into a single “stack” so they always move together.

Prepare the Web Server (Both Nodes)

Bash

sudo systemctl disable --now nginx

Create the Stack (Node 1 only)

Bash

# 1. Create the Virtual IP
sudo pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.1.50 cidr_netmask=24 op monitor interval=30s

# 2. Create the NFS Mount (Ensure NFS Server is ready)
sudo pcs resource create web_fs ocf:heartbeat:Filesystem device="<NFS_IP>:/var/www/shared" directory="/usr/share/nginx/html" fstype="nfs" op monitor interval=20s

# 3. Create the Nginx Resource
sudo pcs resource create web_server ocf:heartbeat:nginx configfile="/etc/nginx/nginx.conf" op monitor interval=30s

# 4. Group them for automatic failover order
sudo pcs resource group add web-stack virtual_ip web_fs web_server

6. Troubleshooting & Management

Use these commands to maintain your cluster:

CommandAction
pcs statusCheck the health of all nodes and resources.
pcs resource cleanupClear error messages after fixing a problem.
pcs node standby node1Manually move all resources to the other node.
pcs node unstandby node1Bring a node back into the cluster rotation.

Common Error: “Failed Actions”

If you see red text in pcs status, it means a resource failed to start. Fix the configuration (check /var/log/messages) and then run:

Bash

sudo pcs resource cleanup

Summary

Building a cluster on CentOS Stream 9 provides a robust foundation for critical applications. By grouping your IP, Storage, and Services, you ensure that even in the event of a total node crash, your users remain connected to your data.

Leave a Reply

Your email address will not be published. Required fields are marked *