The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/716
This patch makes us use more fully-qualified container image names
(either prefixed with docker.io/ or with localhost/).
The latter happens when self-building is enabled.
We've recently had issues where if an image was removed manually
and the service was restarted (making `docker run` fetch it from Docker Hub, etc.),
we'd end up with a pulled image, even though we're aiming for a self-built one.
Re-running the playbook would then not do a rebuild, because:
- the image with that name already exists (even though it's something
else)
- we sometimes had conditional logic where we'd build only if the git
repo changed
By explicitly changing the name of the images (prefixing with localhost/),
we avoid such confusion and the possibility that we'd automatically pul something
which is not what we expect.
Also, I've removed that condition where building would happen on git
changes only. We now always build (unless an image with that name
already exists). We just force-build when the git repo changes.
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
This keeps the roles cleaner and more independent of matrix-base,
which may be important for people building their own playbook
out of the individual roles and not using the matrix-base role.
We do use some `:latest` images by default for the following services:
- matrix-dimension
- Goofys (in the matrix-synapse role)
- matrix-bridge-appservice-irc
- matrix-bridge-appservice-discord
- matrix-bridge-mautrix-facebook
- matrix-bridge-mautrix-whatsapp
It's terribly unfortunate that those software projects don't release
anything other than `:latest`, but that's how it is for now.
Updating that software requires that users manually do `docker pull`
on the server. The playbook didn't force-repull images that it already
had.
With this patch, it starts doing so. Any image tagged `:latest` will be
force re-pulled by the playbook every time it's executed.
It should be noted that even though we ask the `docker_image` module to
force-pull, it only reports "changed" when it actually pulls something
new. This is nice, because it lets people know exactly when something
gets updated, as opposed to giving the indication that it's always
updating the images (even though it isn't).
Reasoning is the same as for matrix-org/synapse#5023.
For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
In most cases, there's not really a need to touch the system
firewall, as Docker manages iptables by itself
(see https://docs.docker.com/network/iptables/).
All ports exposed by Docker containers are automatically whitelisted
in iptables and wired to the correct container.
This made installing firewalld and whitelisting ports pointless,
as far as this playbook's services are concerned.
People that wish to install firewalld (for other reasons), can do so
manually from now on.
This is inspired by and fixes#97 (Github Issue).
Most (all?) of our Matrix services are running in the `matrix` network,
so they were safe -- not accessible from Coturn to begin with.
Isolating Coturn into its own network is a security improvement
for people who were starting other services in the default
Docker network. Those services were potentially reachable over the
private Docker network from Coturn.
Discussed in #120 (Github Pull Request)
This is more explicit than hiding it in the role defaults.
People who reuse the roles in their own playbook (and not only) may
incorrectly define `ansible_host` to be a hostname or some local address.
Making it more explicit is more likely to prevent such mistakes.
Seems like we unintentionally removed the mounting of certificates
(the `/matrix-config` mount) as part of splitting the playbook into
roles in 51312b8250.
It appears that those certificates weren't necessary for coturn to
funciton though, so we might just get rid of the configuration as well.
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`
The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.
Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.
According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)
All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.
Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.
This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.
- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse
We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.
As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.