5.8 KiB
Imaging
Using installimage from the rescue instance.
wipefs -fa /dev/nvme*n1
installimage -r no -n hetzner0?
- Debian bookworm
PART / ext4 100G
PART /srv ext4 all
- ESC 0 + yes
- reboot
Partitioning.
- First disk
- OS
- non precious data such as the LXC containers with runners.
- Second disk
- a partition configured with DRBD
Debian user.
ssh root@hetzner0?.forgejo.org
useradd --shell /bin/bash --create-home --groups sudo debian
mkdir -p /home/debian/.ssh ; cp -a .ssh/authorized_keys /home/debian/.ssh ; chown -R debian /home/debian/.ssh
- in
/etc/sudoers
edit%sudo ALL=(ALL:ALL) NOPASSWD:ALL
Install helpers
Each node is identifed by the last digit of the hostname.
sudo apt-get install git etckeeper
git clone https://code.forgejo.org/infrastructure/documentation
cd documentation/k3s-host
cp variables.sh.example variables.sh
cp secrets.sh.example secrets.sh
Variables that must be set depending on the role of the node.
- first server node
- secrets.sh: node_drbd_shared_secret
- other server node
- secrets.sh: node_drbd_shared_secret
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
- etcd node
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
- variables.sh: node_k8s_etcd: identifier of the node whose role is just etcd (e.g. 3)
The other variables depend on the setup.
Firewall
./setup.sh setup_ufw
DRBD
DRBD is configured with:
./setup.sh setup_drbd
Once two nodes have DRBD setup for the first time, it can be initialized by pretending all is in sync to save the initial bitmap sync since there is actually no data at all.
sudo drbdadm primary r1
sudo drbdadm new-current-uuid --clear-bitmap r1/0
sudo mount /precious
NFS
./setup.sh setup_nfs
On the node that has the DRBD volume /precious
mounted, set the IP of the NFS server to be used by k8s:
sudo ip addr add 10.53.101.100/24 dev enp5s0.4001
K8S
For the first node ./setup.sh setup_k8s
. For nodes joining the cluster ./setup.sh setup_k8s 6
where hetzner06
is an existing node.
- metallb instead of the default load balancer because it does not allow for a public IP different from the
k8s
node IP../setup.sh setup_k8s_metallb
- traefik requests with annotations specific IPs from
metalldb
../setup.sh setup_k8s_traefik
- cert-manager.
./setup.sh setup_k8s_certmanager
- NFS storage class
./setup.sh setup_k8s_nfs
Forgejo
forgejo configuration in ingress for the reverse proxy (traefik
) to route the domain and for the ACME issuer (cert-manager
) to obtain a certificate. And in service for the ssh
port to be bound to the desired IPs of the load balancer (metallb
).
ingress:
enabled: true
annotations:
# https://cert-manager.io/docs/usage/ingress/#supported-annotations
# https://github.com/cert-manager/cert-manager/issues/2239
cert-manager.io/cluster-issuer: letsencrypt-http
cert-manager.io/private-key-algorithm: ECDSA
cert-manager.io/private-key-size: 384
kubernetes.io/ingress.class: traefik
traefik.ingress.kubernetes.io/router.entrypoints: websecure
tls:
- hosts:
- t1.forgejo.org
secretName: tls-forgejo-t1-ingress-http
hosts:
- host: t1.forgejo.org
paths:
- path: /
pathType: Prefix
service:
http:
type: ClusterIP
ipFamilyPolicy: PreferDualStack
port: 3000
ssh:
type: LoadBalancer
annotations:
metallb.universe.tf/loadBalancerIPs: 188.40.16.47,2a01:4f8:fff2:48::2
metallb.universe.tf/allow-shared-ip: "key-to-share-failover"
ipFamilyPolicy: PreferDualStack
port: 2222
K8S NFS storage creation
Define the 20GB forgejo-data
pvc owned by user id 1000.
./setup.sh setup_k8s_pvc forgejo-data 20Gi 1000
Instruct the forgejo pod to use the forgejo-data
pvc.
persistence:
enabled: true
create: false
claimName: forgejo-data
Disaster recovery and maintenance
When a machine or disk is scheduled for replacement.
kubectl drain hetzner05
# evacuate all the pods out of the node to be shutdownkubectl taint nodes hetzner05 key1=value1:NoSchedule
# prevent any pod from being created there (metallb speaker won't be drained, for instance)kubectl delete node hetzner05
# let the cluster know it no longer exists so a new one by the same name can replace it
Routing the failover IP
When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the Hetzner server panel, to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.
Manual boot operations
On the machine that runs the NFS server
sudo drbdadm primary r1
# Switch the DRBD to primarysudo mount /precious
# DRBD volume shared via NFSsudo ip addr add 10.53.101.100/24 dev enp5s0.4001
# add NFS server IP
On the other machines
sudo ip addr del 10.53.101.100/24 dev enp5s0.4001
# remove NFS server IP