mirror of
https://code.forgejo.org/infrastructure/documentation
synced 2024-12-18 11:23:53 +00:00
Merge pull request 'add disaster recovery instructions' (#31) from earl-warren/documentation:wip-disaster into main
Reviewed-on: https://code.forgejo.org/infrastructure/documentation/pulls/31
This commit is contained in:
commit
c1bef01310
43
README.md
43
README.md
|
@ -847,7 +847,7 @@ iface enp5s0.4002 inet static
|
||||||
```sh
|
```sh
|
||||||
sudo apt-get install curl
|
sudo apt-get install curl
|
||||||
master_node_ip=10.88.1.5,fd10::5
|
master_node_ip=10.88.1.5,fd10::5
|
||||||
curl -fL https://get.k3s.io | sh -s - server --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$master_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112
|
curl -fL https://get.k3s.io | sh -s - server --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$master_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
|
||||||
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
|
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -860,8 +860,9 @@ sudo apt-get install curl
|
||||||
token=???
|
token=???
|
||||||
master_ip=10.88.1.5
|
master_ip=10.88.1.5
|
||||||
second_node_ip=10.88.1.6,fd10::6
|
second_node_ip=10.88.1.6,fd10::6
|
||||||
curl -fL https://get.k3s.io | sh -s - server --token $token --server https://$master_ip:6443 --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$second_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112
|
curl -fL https://get.k3s.io | sh -s - server --token $token --server https://$master_ip:6443 --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$second_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
|
||||||
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
|
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
|
||||||
|
kubectl taint nodes $(hostname) key1=value1:NoSchedule
|
||||||
```
|
```
|
||||||
|
|
||||||
### k8s dedicated etcd node
|
### k8s dedicated etcd node
|
||||||
|
@ -873,13 +874,8 @@ The token is found on one of the master nodes in the `/var/lib/rancher/k3s/serve
|
||||||
```sh
|
```sh
|
||||||
master_ip=10.88.1.5
|
master_ip=10.88.1.5
|
||||||
etcd_node_ip=10.88.1.3,fd10::3
|
etcd_node_ip=10.88.1.3,fd10::3
|
||||||
curl -sfL https://get.k3s.io | sh -s - server --token "$token" --server https://$master_ip:6443 --cluster-init --disable-apiserver --disable-controller-manager --disable-scheduler --write-kubeconfig-mode=644 --node-ip=$etcd_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112
|
curl -sfL https://get.k3s.io | sh -s - server --token "$token" --server https://$master_ip:6443 --cluster-init --disable=servicelb --disable-apiserver --disable-controller-manager --disable-scheduler --write-kubeconfig-mode=644 --node-ip=$etcd_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
|
||||||
```
|
kubectl taint nodes $(hostname) key1=value1:NoSchedule
|
||||||
|
|
||||||
It should not be allowed to schedule pods but for some reason it is. Working around this with:
|
|
||||||
|
|
||||||
```sh
|
|
||||||
kubectl taint nodes hetzner03 key1=value1:NoSchedule
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### k8s networking
|
### k8s networking
|
||||||
|
@ -913,8 +909,7 @@ kubectl apply --server-side=true -f clusterissuer.yml
|
||||||
[metallb](https://metallb.universe.tf).
|
[metallb](https://metallb.universe.tf).
|
||||||
|
|
||||||
```
|
```
|
||||||
helm install metallb metallb/metallb
|
helm install metallb --set installCRDs=true metallb/metallb
|
||||||
# wait a few seconds
|
|
||||||
cat > metallb.yaml <<EOF
|
cat > metallb.yaml <<EOF
|
||||||
apiVersion: metallb.io/v1beta1
|
apiVersion: metallb.io/v1beta1
|
||||||
kind: IPAddressPool
|
kind: IPAddressPool
|
||||||
|
@ -925,7 +920,7 @@ spec:
|
||||||
- 188.40.16.47/32
|
- 188.40.16.47/32
|
||||||
- 2a01:4f8:fff2:48::0/64
|
- 2a01:4f8:fff2:48::0/64
|
||||||
EOF
|
EOF
|
||||||
kubectl apply --server-side=true -f metallb.yml
|
sleep 120 ; kubectl apply --server-side=true -f metallb.yml
|
||||||
```
|
```
|
||||||
|
|
||||||
[traefik](https://traefik.io/) requests with [annotations](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736) specific IPs from `metalldb`.
|
[traefik](https://traefik.io/) requests with [annotations](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736) specific IPs from `metalldb`.
|
||||||
|
@ -945,6 +940,8 @@ spec:
|
||||||
redirectTo:
|
redirectTo:
|
||||||
port: websecure
|
port: websecure
|
||||||
priority: 1
|
priority: 1
|
||||||
|
deployment:
|
||||||
|
replicas: 2
|
||||||
service:
|
service:
|
||||||
annotations:
|
annotations:
|
||||||
metallb.universe.tf/allow-shared-ip: "key-to-share-188-40-16-47"
|
metallb.universe.tf/allow-shared-ip: "key-to-share-188-40-16-47"
|
||||||
|
@ -1069,6 +1066,28 @@ persistence:
|
||||||
claimName: forgejo-data
|
claimName: forgejo-data
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Disaster recovery and maintenance
|
||||||
|
|
||||||
|
### When a machine or disk is scheduled for replacement.
|
||||||
|
|
||||||
|
* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown
|
||||||
|
|
||||||
|
### Routing the failover IP
|
||||||
|
|
||||||
|
When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel](https://robot.hetzner.com/server), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.
|
||||||
|
|
||||||
|
### Manual boot operations
|
||||||
|
|
||||||
|
#### On the machine that runs the NFS server
|
||||||
|
|
||||||
|
* `sudo drbdadm primary r1` # Switch the DRBD to primary
|
||||||
|
* `sudo mount /precious` # DRBD volume shared via NFS
|
||||||
|
* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP
|
||||||
|
|
||||||
|
#### On the other machines
|
||||||
|
|
||||||
|
* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP
|
||||||
|
|
||||||
## Uberspace
|
## Uberspace
|
||||||
|
|
||||||
The website https://forgejo.org is hosted at
|
The website https://forgejo.org is hosted at
|
||||||
|
|
Loading…
Reference in a new issue