Commit graph

52 commits

Author SHA1 Message Date
Slavi Pantaleev 226c550ffa Add support for stream writer Synapse workers
As stream writer workers are also powered by the `generic_worker`
Synapse app, this necessitated that we provide means for distinguishing
between them and regular `generic_workers`.

I've also taken the time to optimize nginx configuration generation
(more Jinja2 macro usage, less duplication).

Worker names have also changed.
Workers are now named sequentially like this:
- `matrix-synapse-worker-0-generic`
- `matrix-synapse-worker-1-stream-writer-typing`
- `matrix-synapse-worker-2-pusher`

instead of `matrix-synapse-worker_generic_worker-18111` (indexed with a
port number).

People who modify `matrix_synapse_workers_enabled_list` directly will
need to adjust their configuration.
2022-09-15 08:10:04 +03:00
Shaleen Jain f674afe5e8
appservice: add and use homeserver_container_* vars (#2045)
* appservice: add and use matrix_homeserver_* vars

* appservice: use the new vars

* Apply suggestions from code review

Co-authored-by: Slavi Pantaleev <slavi@devture.com>

Co-authored-by: Slavi Pantaleev <slavi@devture.com>
2022-08-24 08:38:12 +03:00
Slavi Pantaleev 0364c6c634 Suppress old container cleanup (kill/rm) failures
People often report and ask about these "failures".
More-so previously, when the `docker kill/rm` output was collected,
but it still happens now when people do `systemctl status
matrix-something` and notice that it says "FAILURE".

Suppressing to avoid further time being wasted on saying "this is
expected".
2022-04-11 09:05:33 +03:00
László Várady ebfa511515 synapse: do not expose plain federation port when it's disabled
matrix_synapse_federation_port_enabled can be disabled by users, for
example, when one wants to use the same port for client and federation
requests (docs/configuring-playbook-federation.md).
2022-03-14 03:45:46 +01:00
Slavi Pantaleev 86c36523df Replace ExecStopPost with ExecStop
Reverts b1b4ba501f, 90c9801c56, a3c84f78ca, ..

I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.

`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
2022-02-05 12:13:36 +02:00
Slavi Pantaleev b1b4ba501f Replace ExecStop with ExecStopPost
ExecStopPost should allow us to clean up (docker kill + docker rm)
even if the ExecStart (docker run ..) command failed, and not just after
a graceful service stop was initiated.

Source: https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStopPost=
2022-01-04 17:27:25 +02:00
Slavi Pantaleev e1a6d1e4b2 Upgrade Synapse (1.46.0 -> 1.47.0)
We had to remove UID/GID environment variables that we used to pass
to the Synapse container, because it was causing a problem after
https://github.com/matrix-org/synapse/pull/11209

We were using both `--user` and UID/GID environment variables until now.
2021-11-17 17:21:15 +02:00
boris runakov 1ec67f49b0 replaced 8008 where possible 2021-11-15 22:43:05 +02:00
Slavi Pantaleev c1bc7b9f93 Rename variables to prevent confusion
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/1397
and https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/1399
2021-11-15 14:56:11 +02:00
Slavi Pantaleev dc4452ac21
Merge branch 'master' into matrix-federation-api-port 2021-11-15 14:49:03 +02:00
boris runakov 8c3e25de1b renamed var to matrix_synapse_container_federation_api_port 2021-11-15 13:01:22 +02:00
b 07496069c8 rellocating variables for consistency 2021-11-15 12:07:54 +02:00
b afccc2b11f make 8448 configurable instead of hard coded 2021-11-14 23:32:25 +02:00
b 7756cc4c8e replace port 8048 with matrix_synapse_container_default_federation_port 2021-11-14 20:30:13 +02:00
Slavi Pantaleev f99dcd611f Pass proper UID/GID to Synapse
Fixes a regression caused by a5ee39266c.

If the user id and group id were different than 991:991
(which used to be a hardcoded default for us long ago),
there was a mismatch between what Synapse was trying to use (991:991)
and what it was actually started with (in `--user=..`). It was then
trying to change ownership, which was failing.

This was mostly affecting newer installations which were not using the
991:991 defaults we had long ago (since a1c5a197a9).
2021-03-19 16:44:10 +02:00
Slavi Pantaleev a5ee39266c Go through start.py when launching Synapse
This allows us to benefit from helpful things it does for us,
like enabling jemalloc: https://github.com/matrix-org/synapse/pull/8553

We weren't going through `start.py` before, because it was causing some
conflict with our `docker run --user=...` stuff, but it doesn't seem
to be a problem anymore.

Having done this, we won't need to do things like
https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/941
anymore.
2021-03-19 08:16:59 +02:00
Slavi Pantaleev d98a1ceadd Merge branch 'master' into synapse-workers 2021-01-27 10:27:17 +02:00
Slavi Pantaleev 512f42aa76 Do not report docker kill/rm attempts as errors
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:

> Error response from daemon: Cannot kill container: something: No such container: something

and:

> Error: No such container: something

People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.

In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)

.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
2021-01-27 10:22:46 +02:00
Slavi Pantaleev 70796703d3 Run Synapse workers in their own containers
This switches the `docker exec` method of spawning
Synapse workers inside the `matrix-synapse` container with
dedicated containers for each worker.

We also have dedicated systemd services for each worker,
so this are now:
- more consistent with everything else (we don't use systemd
instantiated services anywhere)
- we don't need the "parse systemd instance name into worker name +
port" part
- we don't need to keep track of PIDs manually
- we don't need jq (less depenendencies)
- workers dying would be restarted by systemd correctly, like any other
service
- `docker ps` shows each worker separately and we can observe resource
usage
2021-01-25 12:14:46 +02:00
Slavi Pantaleev 63301b0ef1 Improvements around Synapse worker/metrics ports exposure
There was a `matrix_nginx_proxy_enabled|default(False)` check, but:
- it didn't seem to work reliably for some reason (hmm)
- referring to a `matrix_nginx_proxy_*` variable from within the
  `matrix-synapse` role is not ideal
- exposing always happened on `127.0.0.1`, which may not be good enough
  for some rarer setups (where the own webserver is external to the host)
2021-01-25 08:25:43 +02:00
Marcel Partap edc21f15e5 Restrict publishing worker (metrics) ports to localhost 2021-01-24 08:53:09 +01:00
Marcel Partap 183adec3d8 Merge remote-tracking branch 'origin/master' into synapse-workers 2021-01-23 15:04:11 +01:00
Slavi Pantaleev 1692a28fe4 Work around annoying Docker warning about undefined $HOME
> WARNING: Error loading config file: .dockercfg: $HOME is not defined

.. which appeared in Docker 20.10.
2021-01-15 00:23:01 +02:00
Marcel Partap cd8100544b Merge remote-tracking branch 'origin/master' into synapse-workers
Sync with upstream
2021-01-08 20:58:50 +01:00
Slavi Pantaleev d08b27784f Fix systemd services autostart problem with Docker 20.10
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:

```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```

The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:

> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start

A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
2020-12-10 11:43:20 +02:00
Marcel Partap f201bca519 synapse workers: define and expose METRICS port for each worker
As seen on TV:
https://github.com/matrix-org/synapse/blob/master/docs/metrics-howto.md#monitoring-workers
2020-12-01 22:49:15 +01:00
Marcel Partap b73ac965ac Merge remote-tracking branch 'origin/master' into synapse-workers 2020-12-01 21:24:26 +01:00
Slavi Pantaleev 1fca917ad1 Replace some -v instances with --mount
`-v` magically creates the source destination as a directory,
if it doesn't exist already. We'd like to avoid this magic
and the potential breakage that it might cause.

We'd rather fail while Docker tries to find things to `--mount`
than have it automatically create directories and fail anyway,
while having contaminated the filesystem.

There's a lot more `-v` instances remaining to be fixed later on.
This is just some start.

Things like `matrix_synapse_container_additional_volumes` and
`matrix_nginx_proxy_container_additional_volumes` were not changed to
use `--mount`, as options for each one are passed differently
(`ro` is `ro`, but `rw` doesn't exist and `slave` is `bind-propagation=slave`).
To avoid breaking people's custom volume mounts, we keep it as it is for now.

A deficiency with `--mount` is that it lacks the `z` option (SELinux
ownership changes), and some of our `-v` instances use that. I'm not
sure how supported SELinux is for us right now, but it might be,
and breaking that would not be a good idea.
2020-11-24 10:26:05 +02:00
Marcel Partap 2d1b9f2dbf synapse workers: reworkings + get endpoints from upstream docs via awk
(yes, a bit awkward and brittle… xD)
2020-10-28 07:13:19 +01:00
Max Klenk 567d0318b0
Merge branch 'synapse-workers' into feature/add-worker-support 2020-08-27 15:22:12 +02:00
Slavi Pantaleev f56a9a0f5f
Merge pull request #524 from cnvandijk/fix-executable-path
Remove hardcoded paths to commands on the host machine
2020-05-28 15:39:25 +03:00
Slavi Pantaleev 8bae39050e Update settings for Synapse v1.14.0 2020-05-28 15:23:05 +03:00
Chris van Dijk 6334f6c1ea Remove hardcoded command paths in systemd unit files
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.

Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
2020-05-27 23:14:54 +02:00
Slavi Pantaleev 8fb3ce6f6d Upgrade Synapse (v1.12.4 -> v1.13.0) 2020-05-19 21:35:32 +03:00
Marcel Partap 66a4073512 Publish synapse worker ports, need to be accessible to nginx 2020-04-19 19:05:03 +02:00
Aaron Raimist 79d1576648
Allow Synapse manhole to be enabled
Can you double check that the way I have this set only exposes it locally? It is important that the manhole is not available to the outside world since it is quite powerful and the password is hard coded.
2019-12-05 00:07:15 -06:00
recklesscoder 5d3b765241
Actually use matrix_synapse_storage_path
matrix_synapse_storage_path is already defined in matrix-synapse/defaults/main.yml (with a default of "{{ matrix_synapse_base_path }}/storage"), but was not being used for its presumed purpose in matrix-synapse.service.j2. As a result, if matrix_synapse_storage_path was overridden (in a vars.yml), the synapse service failed to start.
2019-11-02 13:46:02 +01:00
Slavi Pantaleev d6d6c152a3 Delay bridge startup to ensure Synapse is up
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).

They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.

This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.
2019-06-07 12:15:37 +03:00
Slavi Pantaleev ab59cc50bd Add support for more flexible container port exposing
Fixes #171 (Github Issue).
2019-05-25 07:41:08 +09:00
Slavi Pantaleev ae7c8d1524 Use SyslogIdentifier to improve logging
Reasoning is the same as for matrix-org/synapse#5023.

For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
2019-05-16 09:43:46 +09:00
Hugues De Keyzer c451025134 Fix indentation in templates
Use Jinja2 lstrip_blocks option in templates to ensure consistent
indentation in generated files.
2019-05-07 21:23:35 +02:00
Sylvia van Os 75b1528d13 Add the possibility to pass extra flags to the docker container 2019-04-30 16:35:18 +02:00
Slavi Pantaleev 892abdc700 Do not refer to Synapse as "Matrix Synapse" 2019-04-23 10:20:56 +03:00
Slavi Pantaleev 91a757c581 Add support for reloading Synapse 2019-02-06 09:25:13 +02:00
Slavi Pantaleev f6ebd4ce62 Initial work on Synapse 0.99/1.0 preparation 2019-02-05 12:09:46 +02:00
dhose 87e3deebfd Enable exposure of Prometheus metrics. 2019-02-01 20:02:11 +01:00
Slavi Pantaleev 0be7b25c64 Make (most) containers run with a read-only filesystem 2019-01-29 18:52:02 +02:00
Slavi Pantaleev 316d653d3e Drop capabilities in containers
We run containers as a non-root user (no effective capabilities).

Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"

We'd rather prevent such a potential escalation by dropping ALL
capabilities.

The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
2019-01-28 11:22:54 +02:00
Slavi Pantaleev 299a8c4c7c Make (most) containers start as non-root
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.

We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.

Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.

The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).

Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
2019-01-27 20:25:13 +02:00
Slavi Pantaleev 56d501679d Be explicit about the UID/GID we start Synapse with
We do match the defaults anyway (by default that is),
but people can customize `matrix_user_uid` and `matrix_user_uid`
and it wouldn't be correct then.

In any case, it's better to be explicit about such an important thing.
2019-01-26 20:21:18 +02:00