mirror of
https://code.forgejo.org/infrastructure/documentation
synced 2024-12-18 11:23:53 +00:00
Merge pull request 'add disaster recovery instructions' (#31) from earl-warren/documentation:wip-disaster into main
Reviewed-on: https://code.forgejo.org/infrastructure/documentation/pulls/31
This commit is contained in:
commit
c1bef01310
43
README.md
43
README.md
|
@ -847,7 +847,7 @@ iface enp5s0.4002 inet static
|
|||
```sh
|
||||
sudo apt-get install curl
|
||||
master_node_ip=10.88.1.5,fd10::5
|
||||
curl -fL https://get.k3s.io | sh -s - server --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$master_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112
|
||||
curl -fL https://get.k3s.io | sh -s - server --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$master_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
|
||||
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
|
||||
```
|
||||
|
||||
|
@ -860,8 +860,9 @@ sudo apt-get install curl
|
|||
token=???
|
||||
master_ip=10.88.1.5
|
||||
second_node_ip=10.88.1.6,fd10::6
|
||||
curl -fL https://get.k3s.io | sh -s - server --token $token --server https://$master_ip:6443 --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$second_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112
|
||||
curl -fL https://get.k3s.io | sh -s - server --token $token --server https://$master_ip:6443 --cluster-init --disable=servicelb --write-kubeconfig-mode=644 --node-ip=$second_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
|
||||
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -
|
||||
kubectl taint nodes $(hostname) key1=value1:NoSchedule
|
||||
```
|
||||
|
||||
### k8s dedicated etcd node
|
||||
|
@ -873,13 +874,8 @@ The token is found on one of the master nodes in the `/var/lib/rancher/k3s/serve
|
|||
```sh
|
||||
master_ip=10.88.1.5
|
||||
etcd_node_ip=10.88.1.3,fd10::3
|
||||
curl -sfL https://get.k3s.io | sh -s - server --token "$token" --server https://$master_ip:6443 --cluster-init --disable-apiserver --disable-controller-manager --disable-scheduler --write-kubeconfig-mode=644 --node-ip=$etcd_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112
|
||||
```
|
||||
|
||||
It should not be allowed to schedule pods but for some reason it is. Working around this with:
|
||||
|
||||
```sh
|
||||
kubectl taint nodes hetzner03 key1=value1:NoSchedule
|
||||
curl -sfL https://get.k3s.io | sh -s - server --token "$token" --server https://$master_ip:6443 --cluster-init --disable=servicelb --disable-apiserver --disable-controller-manager --disable-scheduler --write-kubeconfig-mode=644 --node-ip=$etcd_node_ip --cluster-cidr=10.42.0.0/16,fd01::/48 --service-cidr=10.43.0.0/16,fd02::/112 --flannel-ipv6-masq
|
||||
kubectl taint nodes $(hostname) key1=value1:NoSchedule
|
||||
```
|
||||
|
||||
### k8s networking
|
||||
|
@ -913,8 +909,7 @@ kubectl apply --server-side=true -f clusterissuer.yml
|
|||
[metallb](https://metallb.universe.tf).
|
||||
|
||||
```
|
||||
helm install metallb metallb/metallb
|
||||
# wait a few seconds
|
||||
helm install metallb --set installCRDs=true metallb/metallb
|
||||
cat > metallb.yaml <<EOF
|
||||
apiVersion: metallb.io/v1beta1
|
||||
kind: IPAddressPool
|
||||
|
@ -925,7 +920,7 @@ spec:
|
|||
- 188.40.16.47/32
|
||||
- 2a01:4f8:fff2:48::0/64
|
||||
EOF
|
||||
kubectl apply --server-side=true -f metallb.yml
|
||||
sleep 120 ; kubectl apply --server-side=true -f metallb.yml
|
||||
```
|
||||
|
||||
[traefik](https://traefik.io/) requests with [annotations](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736) specific IPs from `metalldb`.
|
||||
|
@ -945,6 +940,8 @@ spec:
|
|||
redirectTo:
|
||||
port: websecure
|
||||
priority: 1
|
||||
deployment:
|
||||
replicas: 2
|
||||
service:
|
||||
annotations:
|
||||
metallb.universe.tf/allow-shared-ip: "key-to-share-188-40-16-47"
|
||||
|
@ -1069,6 +1066,28 @@ persistence:
|
|||
claimName: forgejo-data
|
||||
```
|
||||
|
||||
## Disaster recovery and maintenance
|
||||
|
||||
### When a machine or disk is scheduled for replacement.
|
||||
|
||||
* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown
|
||||
|
||||
### Routing the failover IP
|
||||
|
||||
When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel](https://robot.hetzner.com/server), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.
|
||||
|
||||
### Manual boot operations
|
||||
|
||||
#### On the machine that runs the NFS server
|
||||
|
||||
* `sudo drbdadm primary r1` # Switch the DRBD to primary
|
||||
* `sudo mount /precious` # DRBD volume shared via NFS
|
||||
* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP
|
||||
|
||||
#### On the other machines
|
||||
|
||||
* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP
|
||||
|
||||
## Uberspace
|
||||
|
||||
The website https://forgejo.org is hosted at
|
||||
|
|
Loading…
Reference in a new issue