2024-10-20 09:31:03 +00:00
# K8S node
Installing a K8S node using [scripts from the k3s-host ](k3s-host ) directory.
## Imaging
2024-10-20 09:24:52 +00:00
Using installimage from the rescue instance.
- `wipefs -fa /dev/nvme*n1`
- `installimage -r no -n hetzner0?`
- Debian bookworm
- `PART / ext4 100G`
- `PART /srv ext4 all`
- ESC 0 + yes
- reboot
Partitioning.
- First disk
- OS
- non precious data such as the LXC containers with runners.
- Second disk
- a partition configured with DRBD
Debian user.
- `ssh root@hetzner0?.forgejo.org`
- `useradd --shell /bin/bash --create-home --groups sudo debian`
- `mkdir -p /home/debian/.ssh ; cp -a .ssh/authorized_keys /home/debian/.ssh ; chown -R debian /home/debian/.ssh`
- in `/etc/sudoers` edit `%sudo ALL=(ALL:ALL) NOPASSWD:ALL`
2024-10-20 09:31:03 +00:00
## Install helpers
2024-10-20 09:24:52 +00:00
Each node is identifed by the last digit of the hostname.
```sh
sudo apt-get install git etckeeper
git clone https://code.forgejo.org/infrastructure/documentation
cd documentation/k3s-host
cp variables.sh.example variables.sh
cp secrets.sh.example secrets.sh
```
Variables that must be set depending on the role of the node.
- first server node
- secrets.sh: node_drbd_shared_secret
- other server node
- secrets.sh: node_drbd_shared_secret
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
- etcd node
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
- variables.sh: node_k8s_etcd: identifier of the node whose role is just etcd (e.g. 3)
The other variables depend on the setup.
2024-10-20 09:31:03 +00:00
## Firewall
2024-10-20 09:24:52 +00:00
`./setup.sh setup_ufw`
2024-10-20 09:31:03 +00:00
## DRBD
2024-10-20 09:24:52 +00:00
DRBD is [configured ](https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#p-work ) with:
`./setup.sh setup_drbd`
Once two nodes have DRBD setup for the first time, it can be initialized by [pretending all is in sync ](https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-skip-initial-resync ) to save the initial bitmap sync since there is actually no data at all.
```sh
sudo drbdadm primary r1
sudo drbdadm new-current-uuid --clear-bitmap r1/0
sudo mount /precious
```
2024-10-20 09:31:03 +00:00
## NFS
2024-10-20 09:24:52 +00:00
`./setup.sh setup_nfs`
On the node that has the DRBD volume `/precious` mounted, set the IP of the NFS server to be used by k8s:
```sh
sudo ip addr add 10.53.101.100/24 dev enp5s0.4001
```
2024-10-20 09:31:03 +00:00
## K8S
2024-10-20 09:24:52 +00:00
For the first node `./setup.sh setup_k8s` . For nodes joining the cluster `./setup.sh setup_k8s 6` where `hetzner06` is an existing node.
- [metallb ](https://metallb.universe.tf ) instead of the default load balancer because it does not allow for a public IP different from the `k8s` node IP.
`./setup.sh setup_k8s_metallb`
- [traefik ](https://traefik.io/ ) requests with [annotations ](https://github.com/traefik/traefik-helm-chart/blob/7a13fc8a61a6ad30fcec32eec497dab9d8aea686/traefik/values.yaml#L736 ) specific IPs from `metalldb` .
`./setup.sh setup_k8s_traefik`
- [cert-manager ](https://cert-manager.io/ ).
`./setup.sh setup_k8s_certmanager`
- NFS storage class
`./setup.sh setup_k8s_nfs`
2024-10-20 09:31:03 +00:00
## Forgejo
2024-10-20 09:24:52 +00:00
[forgejo ](https://code.forgejo.org/forgejo-helm/forgejo-helm ) configuration in [ingress ](https://code.forgejo.org/forgejo-helm/forgejo-helm#ingress ) for the reverse proxy (`traefik`) to route the domain and for the ACME issuer (`cert-manager`) to obtain a certificate. And in [service ](https://code.forgejo.org/forgejo-helm/forgejo-helm#service ) for the `ssh` port to be bound to the desired IPs of the load balancer (`metallb`).
```
ingress:
enabled: true
annotations:
2024-10-20 09:31:03 +00:00
# https://cert-manager.io/docs/usage/ingress/#supported-annotations
# https://github.com/cert-manager/cert-manager/issues/2239
cert-manager.io/cluster-issuer: letsencrypt-http
cert-manager.io/private-key-algorithm: ECDSA
cert-manager.io/private-key-size: 384
kubernetes.io/ingress.class: traefik
traefik.ingress.kubernetes.io/router.entrypoints: websecure
2024-10-20 09:24:52 +00:00
tls:
2024-10-20 09:31:03 +00:00
- hosts:
- t1.forgejo.org
secretName: tls-forgejo-t1-ingress-http
2024-10-20 09:24:52 +00:00
hosts:
2024-10-20 09:31:03 +00:00
- host: t1.forgejo.org
paths:
- path: /
pathType: Prefix
2024-10-20 09:24:52 +00:00
service:
http:
2024-10-20 09:31:03 +00:00
type: ClusterIP
ipFamilyPolicy: PreferDualStack
port: 3000
2024-10-20 09:24:52 +00:00
ssh:
2024-10-20 09:31:03 +00:00
type: LoadBalancer
annotations:
metallb.universe.tf/loadBalancerIPs: 188.40.16.47,2a01:4f8:fff2:48::2
metallb.universe.tf/allow-shared-ip: "key-to-share-failover"
ipFamilyPolicy: PreferDualStack
port: 2222
2024-10-20 09:24:52 +00:00
```
2024-10-20 09:31:03 +00:00
# K8S NFS storage creation
2024-10-20 09:24:52 +00:00
Define the 20GB `forgejo-data` pvc owned by user id 1000.
```sh
./setup.sh setup_k8s_pvc forgejo-data 20Gi 1000
```
[Instruct the forgejo pod ](https://code.forgejo.org/forgejo-helm/forgejo-helm#persistence ) to use the `forgejo-data` pvc.
```yaml
persistence:
enabled: true
create: false
claimName: forgejo-data
```
2024-10-20 09:31:03 +00:00
Disaster recovery and maintenance
2024-10-20 09:24:52 +00:00
2024-10-20 09:31:03 +00:00
# When a machine or disk is scheduled for replacement.
2024-10-20 09:24:52 +00:00
* `kubectl drain hetzner05` # evacuate all the pods out of the node to be shutdown
* `kubectl taint nodes hetzner05 key1=value1:NoSchedule` # prevent any pod from being created there (metallb speaker won't be drained, for instance)
* `kubectl delete node hetzner05` # let the cluster know it no longer exists so a new one by the same name can replace it
2024-10-20 09:31:03 +00:00
# Routing the failover IP
2024-10-20 09:24:52 +00:00
When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the [Hetzner server panel ](https://robot.hetzner.com/server ), to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.
2024-10-20 09:31:03 +00:00
# Manual boot operations
2024-10-20 09:24:52 +00:00
2024-10-20 09:31:03 +00:00
## On the machine that runs the NFS server
2024-10-20 09:24:52 +00:00
* `sudo drbdadm primary r1` # Switch the DRBD to primary
* `sudo mount /precious` # DRBD volume shared via NFS
* `sudo ip addr add 10.53.101.100/24 dev enp5s0.4001` # add NFS server IP
2024-10-20 09:31:03 +00:00
## On the other machines
2024-10-20 09:24:52 +00:00
* `sudo ip addr del 10.53.101.100/24 dev enp5s0.4001` # remove NFS server IP