Fixes a regression caused by a5ee39266c.
If the user id and group id were different than 991:991
(which used to be a hardcoded default for us long ago),
there was a mismatch between what Synapse was trying to use (991:991)
and what it was actually started with (in `--user=..`). It was then
trying to change ownership, which was failing.
This was mostly affecting newer installations which were not using the
991:991 defaults we had long ago (since a1c5a197a9).
This give us the possibility to run multiple instances of
workers that that don't expose a port.
Right now, we don't support that, but in the future we could
run multiple `federation_sender` or `pusher` workers, without
them fighting over naming (previously, they'd all be named
something like `matrix-synapse-worker-pusher-0`, because
they'd all define `port` as `0`).
This leads to much easier management and potential safety
features (validation). In the future, we could try to avoid port
conflicts as well, but it didn't seem worth the effort to do it now.
Our port ranges seem large enough.
This can also pave the way for a "presets" feature
(similar to `matrix_nginx_proxy_ssl_presets`) which makes it even easier
for people to configure worker counts.
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:
> Error response from daemon: Cannot kill container: something: No such container: something
and:
> Error: No such container: something
People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.
In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)
.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
Not specifying bind addresses for the worker resulted in this warning:
> synapse.app - 47 - WARNING - None - Failed to listen on 0.0.0.0, continuing because listening on [::]
Additionally, metrics listening only on 127.0.0.1 seems like a no-op.
Only having it accessible from within the container is likely not what
we intend. Changed that to all interfaces as well.
Whether it actually gets exposed or not depends on the systemd service
and `matrix_synapse_workers_container_host_bind_address`.
This switches the `docker exec` method of spawning
Synapse workers inside the `matrix-synapse` container with
dedicated containers for each worker.
We also have dedicated systemd services for each worker,
so this are now:
- more consistent with everything else (we don't use systemd
instantiated services anywhere)
- we don't need the "parse systemd instance name into worker name +
port" part
- we don't need to keep track of PIDs manually
- we don't need jq (less depenendencies)
- workers dying would be restarted by systemd correctly, like any other
service
- `docker ps` shows each worker separately and we can observe resource
usage
There was a `matrix_nginx_proxy_enabled|default(False)` check, but:
- it didn't seem to work reliably for some reason (hmm)
- referring to a `matrix_nginx_proxy_*` variable from within the
`matrix-synapse` role is not ideal
- exposing always happened on `127.0.0.1`, which may not be good enough
for some rarer setups (where the own webserver is external to the host)
I guess it didn't hurt to do it until now, but it's not great serving
federation APIs on the client-server API port, etc.
matrix-corporal doesn't work yet (still something to be solved in the
future), but its firewalling operations will also be sabotaged
by Client-Server APIs being served on the federation port (it's a way to get around its firewalling).
If we load it at runtime, during matrix-synapse role execution,
it's good enough for matrix-synapse and all roles after that,
but.. it breaks when someone uses `--tags=setup-nginx-proxy` alone.
The downside of including this vars file like this in `setup.yml`
is that the variables contained in it cannot be overriden by the user
(in their inventory's `vars.yml`).
... but it's not like overriding these variables was possible anyway
when including them at runtime.
In short, this makes Synapse a 2nd class citizen,
preparing for a future where it's just one-of-many homeserver software
options.
We also no longer have a default Postgres superuser password,
which improves security.
The changelog explains more as to why this was done
and how to proceed from here.
Since we'll likely have generic SQLite database importing
via [pgloader](https://pgloader.io/) for migrating bridge
databases from SQLite to Postgres, we'd rather avoid
calling the "import Synapse SQLite database" command
as just `--tags=import-sqlite-db`.
Similarly, for the media store, we'd like to mention that it's
related to Synapse as well.
We'd like to be more explicit, so as to be less confusing,
especially in light of other homeserver implementations
coming in the future.
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
ma1sd requires the openid endpoints for certain functionality.
Example: 90b2b5301c/src/main/java/io/kamax/mxisd/auth/AccountManager.java (L67-L99)
If federation is disabled, we still need to expose these openid APIs on the
federation port.
Previously, we were doing similar magic for Dimension.
As per its documentation, when running unfederated, one is to enable
the openid listener as well. As per their recommendation, people
are advised to do enable it on the Client-Server API port
and use the `federationUrl` variable to override where the federation
port is (making federation requests go to the Client-Server API).
Because ma1sd always uses the federation port (unless you do some
DNS overwriting magic using its configuration -- which we'd rather not
do), it's better if we just default to putting the `openid` listener
where it belongs - on the federation port.
With this commit, we retain the "automatically enable openid APIs" thing
we've been doing for Dimension, but move it to the federation port instead.
We also now do the same thing when ma1sd is enabled.
Many people probably didn't even know this - that ansible can be
quite a bit picky about what it will be willing to work with remotely.
Thanks @maxklenk !
`-v` magically creates the source destination as a directory,
if it doesn't exist already. We'd like to avoid this magic
and the potential breakage that it might cause.
We'd rather fail while Docker tries to find things to `--mount`
than have it automatically create directories and fail anyway,
while having contaminated the filesystem.
There's a lot more `-v` instances remaining to be fixed later on.
This is just some start.
Things like `matrix_synapse_container_additional_volumes` and
`matrix_nginx_proxy_container_additional_volumes` were not changed to
use `--mount`, as options for each one are passed differently
(`ro` is `ro`, but `rw` doesn't exist and `slave` is `bind-propagation=slave`).
To avoid breaking people's custom volume mounts, we keep it as it is for now.
A deficiency with `--mount` is that it lacks the `z` option (SELinux
ownership changes), and some of our `-v` instances use that. I'm not
sure how supported SELinux is for us right now, but it might be,
and breaking that would not be a good idea.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/716
This patch makes us use more fully-qualified container image names
(either prefixed with docker.io/ or with localhost/).
The latter happens when self-building is enabled.
We've recently had issues where if an image was removed manually
and the service was restarted (making `docker run` fetch it from Docker Hub, etc.),
we'd end up with a pulled image, even though we're aiming for a self-built one.
Re-running the playbook would then not do a rebuild, because:
- the image with that name already exists (even though it's something
else)
- we sometimes had conditional logic where we'd build only if the git
repo changed
By explicitly changing the name of the images (prefixing with localhost/),
we avoid such confusion and the possibility that we'd automatically pul something
which is not what we expect.
Also, I've removed that condition where building would happen on git
changes only. We now always build (unless an image with that name
already exists). We just force-build when the git repo changes.
also, worker.yaml.j2:
- hone worker_name
- remove worker_pid_file entry (would only be used if worker_daemonize
set to true; also, synapse only knows about the container namespace
and thus can not provide the required host-view PID)