infrastructure-documentation/k8s.md at 731d2931bec86bdeede2d293e144bc2dbf6d2f46

forgejo/infrastructure-documentation

Fork 0

mirror of https://code.forgejo.org/infrastructure/documentation synced 2024-11-21 19:11:11 +00:00

Earl Warren 731d2931be

split the README into separate files for clarity

2024-10-20 11:26:15 +02:00

5.8 KiB

Raw Blame History

Imaging

Using installimage from the rescue instance.

wipefs -fa /dev/nvme*n1
installimage -r no -n hetzner0?
Debian bookworm
PART / ext4 100G
PART /srv ext4 all
ESC 0 + yes
reboot

Partitioning.

First disk
- OS
- non precious data such as the LXC containers with runners.
Second disk
- a partition configured with DRBD

Debian user.

ssh root@hetzner0?.forgejo.org
useradd --shell /bin/bash --create-home --groups sudo debian
mkdir -p /home/debian/.ssh ; cp -a .ssh/authorized_keys /home/debian/.ssh ; chown -R debian /home/debian/.ssh
in /etc/sudoers edit %sudo ALL=(ALL:ALL) NOPASSWD:ALL

Install helpers

Each node is identifed by the last digit of the hostname.

sudo apt-get install git etckeeper
git clone https://code.forgejo.org/infrastructure/documentation
cd documentation/k3s-host
cp variables.sh.example variables.sh
cp secrets.sh.example secrets.sh

Variables that must be set depending on the role of the node.

first server node
- secrets.sh: node_drbd_shared_secret
other server node
- secrets.sh: node_drbd_shared_secret
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
etcd node
- secrets.sh: node_k8s_token: content of /var/lib/rancher/k3s/server/token on the first node
- variables.sh: node_k8s_existing: identifier of the first node (e.g. 5)
- variables.sh: node_k8s_etcd: identifier of the node whose role is just etcd (e.g. 3)

The other variables depend on the setup.

Firewall

./setup.sh setup_ufw

DRBD

DRBD is configured with:

./setup.sh setup_drbd

Once two nodes have DRBD setup for the first time, it can be initialized by pretending all is in sync to save the initial bitmap sync since there is actually no data at all.

sudo drbdadm primary r1
sudo drbdadm new-current-uuid --clear-bitmap r1/0
sudo mount /precious

NFS

./setup.sh setup_nfs

On the node that has the DRBD volume /precious mounted, set the IP of the NFS server to be used by k8s:

sudo ip addr add 10.53.101.100/24 dev enp5s0.4001

K8S

For the first node ./setup.sh setup_k8s. For nodes joining the cluster ./setup.sh setup_k8s 6 where hetzner06 is an existing node.

metallb instead of the default load balancer because it does not allow for a public IP different from the k8s node IP. ./setup.sh setup_k8s_metallb
traefik requests with annotations specific IPs from metalldb. ./setup.sh setup_k8s_traefik
cert-manager. ./setup.sh setup_k8s_certmanager
NFS storage class ./setup.sh setup_k8s_nfs

Forgejo

forgejo configuration in ingress for the reverse proxy (traefik) to route the domain and for the ACME issuer (cert-manager) to obtain a certificate. And in service for the ssh port to be bound to the desired IPs of the load balancer (metallb).

ingress:
  enabled: true
  annotations:
    # https://cert-manager.io/docs/usage/ingress/#supported-annotations
    # https://github.com/cert-manager/cert-manager/issues/2239
    cert-manager.io/cluster-issuer: letsencrypt-http
    cert-manager.io/private-key-algorithm: ECDSA
    cert-manager.io/private-key-size: 384
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
  tls:
    - hosts:
        - t1.forgejo.org
      secretName: tls-forgejo-t1-ingress-http
  hosts:
    - host: t1.forgejo.org
      paths:
        - path: /
          pathType: Prefix

service:
  http:
    type: ClusterIP
    ipFamilyPolicy: PreferDualStack
    port: 3000
  ssh:
    type: LoadBalancer
    annotations:
      metallb.universe.tf/loadBalancerIPs: 188.40.16.47,2a01:4f8:fff2:48::2
      metallb.universe.tf/allow-shared-ip: "key-to-share-failover"
    ipFamilyPolicy: PreferDualStack
    port: 2222

K8S NFS storage creation

Define the 20GB forgejo-data pvc owned by user id 1000.

./setup.sh setup_k8s_pvc forgejo-data 20Gi 1000

Instruct the forgejo pod to use the forgejo-data pvc.

persistence:
  enabled: true
  create: false
  claimName: forgejo-data

Disaster recovery and maintenance

When a machine or disk is scheduled for replacement.

kubectl drain hetzner05 # evacuate all the pods out of the node to be shutdown
kubectl taint nodes hetzner05 key1=value1:NoSchedule # prevent any pod from being created there (metallb speaker won't be drained, for instance)
kubectl delete node hetzner05 # let the cluster know it no longer exists so a new one by the same name can replace it

Routing the failover IP

When the machine to which the failover IP (failover.forgejo.org) is routed is unavailable or to be shutdown, to the Hetzner server panel, to the IPs tab and change the route of the failover IP to another node. All nodes are configured with the failover IP, there is nothing else to do.

5.8 KiB

Raw Blame History

Imaging

Install helpers

Firewall

DRBD

NFS

K8S

Forgejo

K8S NFS storage creation

Disaster recovery and maintenance

When a machine or disk is scheduled for replacement.

Routing the failover IP

Manual boot operations

On the machine that runs the NFS server

On the other machines

5.8 KiB Raw Blame History

Imaging

Install helpers

Firewall

DRBD

NFS

K8S

Forgejo

K8S NFS storage creation

Disaster recovery and maintenance

When a machine or disk is scheduled for replacement.

Routing the failover IP

Manual boot operations

On the machine that runs the NFS server

On the other machines

5.8 KiB

Raw Blame History