1
0
Fork 0
mirror of https://code.forgejo.org/infrastructure/documentation synced 2024-12-18 11:23:53 +00:00

Merge pull request 'add disaster recovery instructions' (#31) from earl-warren/documentation:wip-disaster into main

Reviewed-on: https://code.forgejo.org/infrastructure/documentation/pulls/31
This commit is contained in:
earl-warren 2024-10-19 10:31:27 +00:00
commit c1bef01310

View file

@ -847,7 +847,7 @@ iface enp5s0.4002 inet static
```sh ```sh
sudo apt-get install curl sudo apt-get install curl
master_node_ip=10.88.1.5,fd10::5 master_node_ip=10.88.1.5,fd10::5
curl -fL https://get.k3s.io | sh -s - server --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$master_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 curl -fL https://get.k3s.io | sh -s - server --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$master_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash - curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
``` ```
@ -860,8 +860,9 @@ sudo apt-get install curl
token=??? token=???
master_ip=10.88.1.5 master_ip=10.88.1.5
second_node_ip=10.88.1.6,fd10::6 second_node_ip=10.88.1.6,fd10::6
curl -fL https://get.k3s.io | sh -s - server --token $token --server https://$master_ip:6443 --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$second_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 curl -fL https://get.k3s.io | sh -s - server --token $token --server https://$master_ip:6443 --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$second_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash - curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
kubectl taint nodes $(hostname) key1=value1:NoSchedule
``` ```
### k8s dedicated etcd node ### k8s dedicated etcd node
@ -873,13 +874,8 @@ The token is found on one of the master nodes in the `/var/lib/rancher/k3s/serve
```sh ```sh
master_ip=10.88.1.5 master_ip=10.88.1.5
etcd_node_ip=10.88.1.3,fd10::3 etcd_node_ip=10.88.1.3,fd10::3
curl -sfL https://get.k3s.io | sh -s - server --token "$token" --server https://$master_ip:6443 --cluster-init --disable-apiserver --disable-controller-manager --disable-scheduler --write-kubeconfig-mode=644 --node-ip=$etcd_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 curl -sfL https://get.k3s.io | sh -s - server --token "$token" --server https://$master_ip:6443 --cluster-init --disable=servicelb --disable-apiserver --disable-controller-manager --disable-scheduler --write-kubeconfig-mode=644 --node-ip=$etcd_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
``` kubectl taint nodes $(hostname) key1=value1:NoSchedule
It should not be allowed to schedule pods but for some reason it is. Working around this with:
```sh
kubectl taint nodes hetzner03 key1=value1:NoSchedule
``` ```
### k8s networking ### k8s networking
@ -913,8 +909,7 @@ kubectl apply --server-side=true -f clusterissuer.yml
[metallb](https://metallb.universe.tf). [metallb](https://metallb.universe.tf).
``` ```
helm install metallb metallb/metallb helm install metallb --set installCRDs=true metallb/metallb
# wait a few seconds
cat > metallb.yaml <<EOF cat > metallb.yaml <<EOF
apiVersion: metallb.io/v1beta1 apiVersion: metallb.io/v1beta1
kind: IPAddressPool kind: IPAddressPool
@ -925,7 +920,7 @@ spec:
- 188.40.16.47/32 - 188.40.16.47/32
- 2a01:4f8:fff2:48::0/64 - 2a01:4f8:fff2:48::0/64
EOF EOF
kubectl apply --server-side=true -f metallb.yml sleep 120 ; kubectl apply --server-side=true -f metallb.yml
``` ```
[traefik](https://traefik.io/) requests with [annotations](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736) specific IPs from `metalldb`. [traefik](https://traefik.io/) requests with [annotations](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736) specific IPs from `metalldb`.
@ -945,6 +940,8 @@ spec:
redirectTo: redirectTo:
port: websecure port: websecure
priority: 1 priority: 1
deployment:
replicas: 2
service: service:
annotations: annotations:
metallb.universe.tf/allow-shared-ip: "key-to-share-188-40-16-47" metallb.universe.tf/allow-shared-ip: "key-to-share-188-40-16-47"
@ -1069,6 +1066,28 @@ persistence:
claimName: forgejo-data claimName: forgejo-data
``` ```
## Disaster recovery and maintenance
### When a machine or disk is scheduled for replacement.
* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown
### Routing the failover IP
When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel](https://robot.hetzner.com/server), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.
### Manual boot operations
#### On the machine that runs the NFS server
* `sudo drbdadm primary r1` # Switch the DRBD to primary
* `sudo mount /precious` # DRBD volume shared via NFS
* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP
#### On the other machines
* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP
## Uberspace ## Uberspace
The website https://forgejo.org is hosted at The website https://forgejo.org is hosted at