mirror of
https://code.forgejo.org/infrastructure/documentation
synced 2024-11-21 19:11:11 +00:00
177 lines
5.8 KiB
Markdown
177 lines
5.8 KiB
Markdown
|
#### Imaging
|
||
|
|
||
|
Using installimage from the rescue instance.
|
||
|
|
||
|
- `wipefs -fa /dev/nvme*n1`
|
||
|
- `installimage -r no -n hetzner0?`
|
||
|
- Debian bookworm
|
||
|
- `PART / ext4 100G`
|
||
|
- `PART /srv ext4 all`
|
||
|
- ESC 0 + yes
|
||
|
- reboot
|
||
|
|
||
|
Partitioning.
|
||
|
|
||
|
- First disk
|
||
|
- OS
|
||
|
- non precious data such as the LXC containers with runners.
|
||
|
- Second disk
|
||
|
- a partition configured with DRBD
|
||
|
|
||
|
Debian user.
|
||
|
|
||
|
- `ssh root@hetzner0?.forgejo.org`
|
||
|
- `useradd --shell /bin/bash --create-home --groups sudo debian`
|
||
|
- `mkdir -p /home/debian/.ssh ; cp -a .ssh/authorized_keys /home/debian/.ssh ; chown -R debian /home/debian/.ssh`
|
||
|
- in `/etc/sudoers` edit `%sudo ALL=(ALL:ALL) NOPASSWD:ALL`
|
||
|
|
||
|
#### Install helpers
|
||
|
|
||
|
Each node is identifed by the last digit of the hostname.
|
||
|
|
||
|
```sh
|
||
|
sudo apt-get install git etckeeper
|
||
|
git clone https://code.forgejo.org/infrastructure/documentation
|
||
|
cd documentation/k3s-host
|
||
|
cp variables.sh.example variables.sh
|
||
|
cp secrets.sh.example secrets.sh
|
||
|
```
|
||
|
|
||
|
Variables that must be set depending on the role of the node.
|
||
|
|
||
|
- first server node
|
||
|
- secrets.sh: node_drbd_shared_secret
|
||
|
- other server node
|
||
|
- secrets.sh: node_drbd_shared_secret
|
||
|
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
|
||
|
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
|
||
|
- etcd node
|
||
|
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
|
||
|
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
|
||
|
- variables.sh: node_k8s_etcd: identifier of the node whose role is just etcd (e.g. 3)
|
||
|
|
||
|
The other variables depend on the setup.
|
||
|
|
||
|
#### Firewall
|
||
|
|
||
|
`./setup.sh setup_ufw`
|
||
|
|
||
|
#### DRBD
|
||
|
|
||
|
DRBD is [configured](https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#p-work) with:
|
||
|
|
||
|
`./setup.sh setup_drbd`
|
||
|
|
||
|
Once two nodes have DRBD setup for the first time, it can be initialized by [pretending all is in sync](https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-skip-initial-resync) to save the initial bitmap sync since there is actually no data at all.
|
||
|
|
||
|
|
||
|
```sh
|
||
|
sudo drbdadm primary r1
|
||
|
sudo drbdadm new-current-uuid --clear-bitmap r1/0
|
||
|
sudo mount /precious
|
||
|
```
|
||
|
|
||
|
#### NFS
|
||
|
|
||
|
`./setup.sh setup_nfs`
|
||
|
|
||
|
On the node that has the DRBD volume `/precious` mounted, set the IP of the NFS server to be used by k8s:
|
||
|
|
||
|
```sh
|
||
|
sudo ip addr add 10.53.101.100/24 dev enp5s0.4001
|
||
|
```
|
||
|
|
||
|
#### K8S
|
||
|
|
||
|
For the first node `./setup.sh setup_k8s`. For nodes joining the cluster `./setup.sh setup_k8s 6` where `hetzner06` is an existing node.
|
||
|
|
||
|
- [metallb](https://metallb.universe.tf) instead of the default load balancer because it does not allow for a public IP different from the `k8s` node IP.
|
||
|
`./setup.sh setup_k8s_metallb`
|
||
|
- [traefik](https://traefik.io/) requests with [annotations](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736) specific IPs from `metalldb`.
|
||
|
`./setup.sh setup_k8s_traefik`
|
||
|
- [cert-manager](https://cert-manager.io/).
|
||
|
`./setup.sh setup_k8s_certmanager`
|
||
|
- NFS storage class
|
||
|
`./setup.sh setup_k8s_nfs`
|
||
|
|
||
|
#### Forgejo
|
||
|
|
||
|
[forgejo](https://code.forgejo.org/forgejo-helm/forgejo-helm) configuration in [ingress](https://code.forgejo.org/forgejo-helm/forgejo-helm#ingress) for the reverse proxy (`traefik`) to route the domain and for the ACME issuer (`cert-manager`) to obtain a certificate. And in [service](https://code.forgejo.org/forgejo-helm/forgejo-helm#service) for the `ssh` port to be bound to the desired IPs of the load balancer (`metallb`).
|
||
|
|
||
|
```
|
||
|
ingress:
|
||
|
enabled: true
|
||
|
annotations:
|
||
|
# https://cert-manager.io/docs/usage/ingress/#supported-annotations
|
||
|
# https://github.com/cert-manager/cert-manager/issues/2239
|
||
|
cert-manager.io/cluster-issuer: letsencrypt-http
|
||
|
cert-manager.io/private-key-algorithm: ECDSA
|
||
|
cert-manager.io/private-key-size: 384
|
||
|
kubernetes.io/ingress.class: traefik
|
||
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||
|
tls:
|
||
|
- hosts:
|
||
|
- t1.forgejo.org
|
||
|
secretName: tls-forgejo-t1-ingress-http
|
||
|
hosts:
|
||
|
- host: t1.forgejo.org
|
||
|
paths:
|
||
|
- path: /
|
||
|
pathType: Prefix
|
||
|
|
||
|
service:
|
||
|
http:
|
||
|
type: ClusterIP
|
||
|
ipFamilyPolicy: PreferDualStack
|
||
|
port: 3000
|
||
|
ssh:
|
||
|
type: LoadBalancer
|
||
|
annotations:
|
||
|
metallb.universe.tf/loadBalancerIPs: 188.40.16.47,2a01:4f8:fff2:48::2
|
||
|
metallb.universe.tf/allow-shared-ip: "key-to-share-failover"
|
||
|
ipFamilyPolicy: PreferDualStack
|
||
|
port: 2222
|
||
|
```
|
||
|
|
||
|
### K8S NFS storage creation
|
||
|
|
||
|
Define the 20GB `forgejo-data` pvc owned by user id 1000.
|
||
|
|
||
|
```sh
|
||
|
./setup.sh setup_k8s_pvc forgejo-data 20Gi 1000
|
||
|
```
|
||
|
|
||
|
[Instruct the forgejo pod](https://code.forgejo.org/forgejo-helm/forgejo-helm#persistence) to use the `forgejo-data` pvc.
|
||
|
|
||
|
```yaml
|
||
|
persistence:
|
||
|
enabled: true
|
||
|
create: false
|
||
|
claimName: forgejo-data
|
||
|
```
|
||
|
|
||
|
## Disaster recovery and maintenance
|
||
|
|
||
|
### When a machine or disk is scheduled for replacement.
|
||
|
|
||
|
* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown
|
||
|
* `kubectl taint nodes hetzner05 key1=value1:NoSchedule` # prevent any pod from being created there (metallb speaker won't be drained, for instance)
|
||
|
* `kubectl delete node hetzner05` # let the cluster know it no longer exists so a new one by the same name can replace it
|
||
|
|
||
|
### Routing the failover IP
|
||
|
|
||
|
When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel](https://robot.hetzner.com/server), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.
|
||
|
|
||
|
### Manual boot operations
|
||
|
|
||
|
#### On the machine that runs the NFS server
|
||
|
|
||
|
* `sudo drbdadm primary r1` # Switch the DRBD to primary
|
||
|
* `sudo mount /precious` # DRBD volume shared via NFS
|
||
|
* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP
|
||
|
|
||
|
#### On the other machines
|
||
|
|
||
|
* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP
|
||
|
|