diff --git a/README.md b/README.md index b06fc11..d8778b0 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,7 @@ There is a [dedicated chatroom](https://matrix.to/#/#forgejo-ci:matrix.org). A m ## Table of content - Setting up a new [K8S/DRBD/NFS k8s node](k8s.md) +- Maintenance and disaster recovery of a [K8S/DRBD/NFS k8s node](k8s-maintenance.md) - Setting up a new [LXC/DRBD Host](lxc.md) - Managing services with a [LXC/DRBD/nginx stack](drbd-nginx-lxc.md) - Installing a [Forgejo instance in the K8S cluster](k8s-forgejo.md) diff --git a/k8s-maintenance.md b/k8s-maintenance.md new file mode 100644 index 0000000..e902bb6 --- /dev/null +++ b/k8s-maintenance.md @@ -0,0 +1,23 @@ +# Disaster recovery and maintenance + +## When a machine or disk is scheduled for replacement. + +* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown +* `kubectl taint nodes hetzner05 key1=value1:NoSchedule` # prevent any pod from being created there (metallb speaker won't be drained, for instance) +* `kubectl delete node hetzner05` # let the cluster know it no longer exists so a new one by the same name can replace it + +## Routing the failover IP + +When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel](https://robot.hetzner.com/server), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do. + +## Manual boot operations + +### On the machine that runs the NFS server + +* `sudo drbdadm primary r1` # Switch the DRBD to primary +* `sudo mount /precious` # DRBD volume shared via NFS +* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP + +### On the other machines + +* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP diff --git a/k8s.md b/k8s.md index bb4933e..888017a 100644 --- a/k8s.md +++ b/k8s.md @@ -105,27 +105,3 @@ Define the 20GB `forgejo-data` pvc owned by user id 1000. ```sh ./setup.sh setup_k8s_pvc forgejo-data 20Gi 1000 ``` - -# Disaster recovery and maintenance - -## When a machine or disk is scheduled for replacement. - -* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown -* `kubectl taint nodes hetzner05 key1=value1:NoSchedule` # prevent any pod from being created there (metallb speaker won't be drained, for instance) -* `kubectl delete node hetzner05` # let the cluster know it no longer exists so a new one by the same name can replace it - -## Routing the failover IP - -When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel](https://robot.hetzner.com/server), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do. - -## Manual boot operations - -### On the machine that runs the NFS server - -* `sudo drbdadm primary r1` # Switch the DRBD to primary -* `sudo mount /precious` # DRBD volume shared via NFS -* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP - -### On the other machines - -* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP