This switches the `docker exec` method of spawning
Synapse workers inside the `matrix-synapse` container with
dedicated containers for each worker.
We also have dedicated systemd services for each worker,
so this are now:
- more consistent with everything else (we don't use systemd
instantiated services anywhere)
- we don't need the "parse systemd instance name into worker name +
port" part
- we don't need to keep track of PIDs manually
- we don't need jq (less depenendencies)
- workers dying would be restarted by systemd correctly, like any other
service
- `docker ps` shows each worker separately and we can observe resource
usage
There was a `matrix_nginx_proxy_enabled|default(False)` check, but:
- it didn't seem to work reliably for some reason (hmm)
- referring to a `matrix_nginx_proxy_*` variable from within the
`matrix-synapse` role is not ideal
- exposing always happened on `127.0.0.1`, which may not be good enough
for some rarer setups (where the own webserver is external to the host)
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
ma1sd requires the openid endpoints for certain functionality.
Example: 90b2b5301c/src/main/java/io/kamax/mxisd/auth/AccountManager.java (L67-L99)
If federation is disabled, we still need to expose these openid APIs on the
federation port.
Previously, we were doing similar magic for Dimension.
As per its documentation, when running unfederated, one is to enable
the openid listener as well. As per their recommendation, people
are advised to do enable it on the Client-Server API port
and use the `federationUrl` variable to override where the federation
port is (making federation requests go to the Client-Server API).
Because ma1sd always uses the federation port (unless you do some
DNS overwriting magic using its configuration -- which we'd rather not
do), it's better if we just default to putting the `openid` listener
where it belongs - on the federation port.
With this commit, we retain the "automatically enable openid APIs" thing
we've been doing for Dimension, but move it to the federation port instead.
We also now do the same thing when ma1sd is enabled.
`-v` magically creates the source destination as a directory,
if it doesn't exist already. We'd like to avoid this magic
and the potential breakage that it might cause.
We'd rather fail while Docker tries to find things to `--mount`
than have it automatically create directories and fail anyway,
while having contaminated the filesystem.
There's a lot more `-v` instances remaining to be fixed later on.
This is just some start.
Things like `matrix_synapse_container_additional_volumes` and
`matrix_nginx_proxy_container_additional_volumes` were not changed to
use `--mount`, as options for each one are passed differently
(`ro` is `ro`, but `rw` doesn't exist and `slave` is `bind-propagation=slave`).
To avoid breaking people's custom volume mounts, we keep it as it is for now.
A deficiency with `--mount` is that it lacks the `z` option (SELinux
ownership changes), and some of our `-v` instances use that. I'm not
sure how supported SELinux is for us right now, but it might be,
and breaking that would not be a good idea.
also, worker.yaml.j2:
- hone worker_name
- remove worker_pid_file entry (would only be used if worker_daemonize
set to true; also, synapse only knows about the container namespace
and thus can not provide the required host-view PID)
- cherry-pick "Ensure worker config exists in systemd service (#7528)"
from synapse d74cdc1a42e8b487d74c214b1d0ca575429d546a:
"check that the worker config file exists instead of silently failing."
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
Well, actually 8cd9cde won't work, unless we put the
`|to_nice_yaml` thing on a new line.
We can, but that takes more lines and makes things look uglier.
Using `|to_json` seems good enough.
The whole file is parsed as YAML later on and merged with the
`_extension` variable before being dumped as YAML again in the end.
Can you double check that the way I have this set only exposes it locally? It is important that the manhole is not available to the outside world since it is quite powerful and the password is hard coded.
Prompted by: https://matrix.org/blog/2019/11/09/avoiding-unwelcome-visitors-on-private-matrix-servers
This is a bit controversial, because.. the Synapse default remains open,
while the general advice (as per the blog post) is to make it more private.
I'm not sure exactly what kind of server people set up and whether they
want to make the room directory public. Our general goal is to favor
privacy and security when running personal (family & friends) and corporate
homeservers, both of which likely benefit from having a more secure default.
matrix_synapse_storage_path is already defined in matrix-synapse/defaults/main.yml (with a default of "{{ matrix_synapse_base_path }}/storage"), but was not being used for its presumed purpose in matrix-synapse.service.j2. As a result, if matrix_synapse_storage_path was overridden (in a vars.yml), the synapse service failed to start.
ef5e4ad061 intentionally makes us conform to
the logging format suggested by the official Docker image.
Reverting this part, because it's uglier.
This likely should be fixed upstream as well though.
Somewhat related to #213 (Github Pull Request).
We've been moving in the opposite direction for quite a long time.
All services should just leave logging to systemd's journald.
Using `|to_json` on a string is expected to correctly wrap it in quotes (e.g. `"4"`).
Wrapping it explicitly in double-quotes results in undesirable double-quoting (`""4""`).
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).
They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.
This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.
Attempt to fix#192 (Github Issue), potential regression since
70487061f4.
Serializing as JSON/YAML explicitly is much better than relying on
magic (well, Python serialization being valid YAML..).
It seems like Python may prefix strings with `u` sometimes (Python 3?),
which causes Python serialization to not be compatible with YAML.
Reasoning is the same as for matrix-org/synapse#5023.
For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
The goal is to move each bridge into its own separate role.
This commit starts off the work on this with 2 bridges:
- mautrix-telegram
- mautrix-whatsapp
Each bridge's role (including these 2) is meant to:
- depend only on the matrix-base role
- integrate nicely with the matrix-synapse role (if available)
- integrate nicely with the matrix-nginx-proxy role (if available and if
required). mautrix-telegram bridge benefits from integrating with
it.
- not break if matrix-synapse or matrix-nginx-proxy are not used at all
This has been provoked by #174 (Github Issue).
- matrix_enable_room_list_search - Controls whether searching the public room list is enabled.
- matrix_alias_creation_rules - Controls who's allowed to create aliases on this server.
- matrix_room_list_publication_rules - Controls who can publish and which rooms can be published in the public room list.
`{% matrix_s3_media_store_custom_endpoint_enabled %}` should have
been `{% if matrix_s3_media_store_custom_endpoint_enabled %}` instead.
Related to #132 (Github Pull Request).
This allows overriding the default value for `include_content`. Setting
this to false allows homeserver admins to ensure that message content
isn't sent in the clear through third party servers.
`matrix_synapse_no_tls` is now implicit, so we've gotten rid of it.
The `homeserver.yaml.j2` template has been synchronized with the
configuration generated by Synapse v0.99.1 (some new options
are present, etc.)
For consistency with all our other listeners,
we make this one bind on the `::` address too
(both IPv4 and IPv6).
Additional details are in #91 (Github Pull Request).
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.
We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.
Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.
The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).
Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
We do match the defaults anyway (by default that is),
but people can customize `matrix_user_uid` and `matrix_user_uid`
and it wouldn't be correct then.
In any case, it's better to be explicit about such an important thing.
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`
The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.
Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.
According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)
All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.
Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).
We've previously changed a bunch of lists in `homeserver.yaml.j2`
to be serialized using `|to_nice_yaml`, as that generates a more
readable list in YAML.
`matrix_synapse_federation_domain_whitelist`, however, couldn't have
been changed to that, as it can potentially be an empty list.
We may be able to differentiate between empty and non-empty now
and serialize it accordingly (favoring `|to_nice_yaml` if non-empty),
but it's not important enough to be justified. Thus, always
serializing with `|to_json`.
Fixes#78 (Github issue)
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.
This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.
- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse
We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.
As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.