**HSTS Preloading:**
In its strongest and recommended form, the [HSTS policy](https://www.chromium.org/hsts) includes all subdomains, and indicates a willingness to be “preloaded” into browsers:
`Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`
**X-Xss-Protection:**
`1; mode=block` which tells the browser to block the response if it detects an attack rather than sanitising the script.
This variable was previously undefined in the role and was only getting
defined via `group_vars/matrix_servers`.
We now properly initialize it (and its good default value) in the role
itself.
Self-checks against the .well-known URIs look for the HTTP header
"Access-Control-Allow-Origin" indicating that the remode endpoint
supports CORS. But the remote server is not required to include
said header in the response if the HTTP request does not include
the "Origin" header. This is in accordance with the specification
[1] stating: 'A CORS request is an HTTP request that includes an
"Origin" header.'
This is in fact true for Gitlab pages hosting and that's why the
issue was identified.
Let's specify "Origin" header in the respective uri tasks performing
the HTTP request and ensure a CORS request.
[1] https://fetch.spec.whatwg.org/#http-requests
We have a flow like this:
1. matrix.DOMAIN vhost (matrix-domain.conf)
2. matrix-synapse vhost (matrix-synapse.conf); or matrix-corporal container, if enabled
3. (optional) matrix-synapse vhost (matrix-synapse.conf), if matrix-corporal enabled
4. matrix-synapse container
We are setting `X-Forwarded-For` correctly in step #1, but were
overwriting it in step #2 with something inaccurate.
Not doing anything in step #2 is better than doing the wrong thing.
It's probably best if we append another reverse-proxy address there
though, although what we're doing now (with this patch) seems to yield
the correct result (when matrix-corporal is not enabled).
When matrix-corporal is enabled, we still seem to do the wrong thing for
some reason. It's something to be fixed later on.
People who were disabling matrix-nginx-proxy (in favor of their own
nginx webserver) and also overriding `matrix_federation_public_port`,
found that the generated nginx configuration still hardcoded `8448`,
which forced their nginx server to use that, regardless of the fact
that `matrix_federation_public_port` was pointing elsewhere.
We now allow for the in-container federation port to be configurable,
and also automatically wire things properly.
We're talking about a webserver running on the same machine, which
imports the configuration files generated by the `matrix-nginx-proxy`
in the `/matrix/nginx-proxy/conf.d` directory.
Users who run an nginx webserver on some other machine will need to do
something different.
This give us the possibility to run multiple instances of
workers that that don't expose a port.
Right now, we don't support that, but in the future we could
run multiple `federation_sender` or `pusher` workers, without
them fighting over naming (previously, they'd all be named
something like `matrix-synapse-worker-pusher-0`, because
they'd all define `port` as `0`).
I felt that adding another variable was probably going to be the easiest way to do this. I may end up adding another variable to enable this feature, for consistency with some of the other things.
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:
> Error response from daemon: Cannot kill container: something: No such container: something
and:
> Error: No such container: something
People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.
In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)
.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
This switches the `docker exec` method of spawning
Synapse workers inside the `matrix-synapse` container with
dedicated containers for each worker.
We also have dedicated systemd services for each worker,
so this are now:
- more consistent with everything else (we don't use systemd
instantiated services anywhere)
- we don't need the "parse systemd instance name into worker name +
port" part
- we don't need to keep track of PIDs manually
- we don't need jq (less depenendencies)
- workers dying would be restarted by systemd correctly, like any other
service
- `docker ps` shows each worker separately and we can observe resource
usage
We do this by creating one more layer of indirection.
First we reach some generic vhost handling matrix.DOMAIN.
A bunch of override rules are added there (capturing traffic to send to
ma1sd, etc). nginx-status and similar generic things also live there.
We then proxy to the homeserver on some other vhost (only Synapse being
available right now, but repointing this to Dendrite or other will be
possible in the future).
Then that homeserver-specific vhost does its thing to proxy to the
homeserver. It may or may not use workers, etc.
Without matrix-corporal, the flow is now:
1. matrix.DOMAIN (matrix-nginx-proxy/matrix-domain.conf)
2. matrix-nginx-proxy/matrix-synapse.conf
3. matrix-synapse
With matrix-corporal enabled, it becomes:
1. matrix.DOMAIN (matrix-nginx-proxy/matrix-domain.conf)
2. matrix-corporal
3. matrix-nginx-proxy/matrix-synapse.conf
4. matrix-synapse
(matrix-corporal gets injected at step 2).
I guess it didn't hurt to do it until now, but it's not great serving
federation APIs on the client-server API port, etc.
matrix-corporal doesn't work yet (still something to be solved in the
future), but its firewalling operations will also be sabotaged
by Client-Server APIs being served on the federation port (it's a way to get around its firewalling).
While administering we will occasionally invoke this script interactively with the "non-interactive" switch still there, yet still sit at the desk waiting for 300 seconds for this timer to run out.
The systemd-timer already uses a 3h randomized delay for automatic renewals, which serves this purpose well.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/756
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/737
I feel like timers are somewhat more complicated and dirty (compared to
cronjobs), but they come with these benefits:
- log output goes to journald
- on newer systemd distros, you can see when the timer fired, when it
will fire, etc.
- we don't need to rely on cron (reducing our dependencies to just
systemd + Docker)
Cronjobs work well, but it's one more dependency that needs to be
installed. We were even asking people to install it manually
(in `docs/prerequisites.md`), which could have gone unnoticed.
Once in a while someone says "my SSL certificates didn't renew"
and it's likely because they forgot to install a cron daemon.
Switching to systemd timers means that installation is simpler
and more unified.
This variable has been useless since 2019-01-08.
We probably don't need to check for its usage anymore,
given how much time has passed since then, but ..
Not sure if it breaks with them or not, but no other directive
uses quotes and the nginx docs show examples without quotes,
so we're being consistent with all of that.
The different configurations are now all lower case, for consistent
naming.
`matrix_nginx_proxy_ssl_config` is now called
`matrix_nginx_proxy_ssl_preset`. The different options for "modern",
"intermediate" and "old" are stored in the main.yml file, instead of
being hardcoded in the configuration files. This will improve the
maintainability of the code.
The "custom" preset was removed. Now if one of the variables is set, it
will use it instead of the preset. This will allow to mix and match more
easily, for example using all the intermediate options but only
supporting TLSv1.2. This will also provide better backward
compatibility.
A new variable called `matrix_nginx_proxy_ssl_config` is created for
configuring how the nginx proxy configures SSL. Also a new configuration
validation option and other auxiliary variables are created.
A new variable configuration called `matrix_nginx_proxy_ssl_config` is
created. This allow to set the SSL configuration easily using the
default options proposed by Mozilla. The default configuration is set to
"Intermediate", removing the weak ciphers used in the old
configurations.
The new variable can also be set to "Custom" for a more granular control.
This allows to set another three variables called:
- `matrix_nginx_proxy_ssl_protocols`,
- `matrix_nginx_proxy_ssl_prefer_server_ciphers`
- `matrix_nginx_proxy_ssl_ciphers`
Also a new task is added to validate the SSL configuration variable.
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
Since the switch from `-v` to `--mount` (in 1fca917ad1),
we've regressed when `matrix_ssl_retrieval_method == 'none'`.
In such a case, we don't create `/matrix/ssl` directories at all
and shouldn't be trying to mount them into the `matrix-nginx-proxy`
container.
Previously, with `-v`, Docker would auto-create them, effectively hiding
our mistake. Now that `--mount` doesn't do such auto-creation magic,
the `matrix-nginx-proxy` container was failing to start.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/734
`-v` magically creates the source destination as a directory,
if it doesn't exist already. We'd like to avoid this magic
and the potential breakage that it might cause.
We'd rather fail while Docker tries to find things to `--mount`
than have it automatically create directories and fail anyway,
while having contaminated the filesystem.
There's a lot more `-v` instances remaining to be fixed later on.
This is just some start.
Things like `matrix_synapse_container_additional_volumes` and
`matrix_nginx_proxy_container_additional_volumes` were not changed to
use `--mount`, as options for each one are passed differently
(`ro` is `ro`, but `rw` doesn't exist and `slave` is `bind-propagation=slave`).
To avoid breaking people's custom volume mounts, we keep it as it is for now.
A deficiency with `--mount` is that it lacks the `z` option (SELinux
ownership changes), and some of our `-v` instances use that. I'm not
sure how supported SELinux is for us right now, but it might be,
and breaking that would not be a good idea.
We'd like the roles to be self-contained (as much as possible).
Thus, the `matrix-nginx-proxy` shouldn't reference any variables from
other roles. Instead, we rely on injection via
`group_vars/matrix_servers`.
Related to #681 (Github Pull Request)
This broke in 63a49bb2dc.
Proxying the OpenID Connect endpoints is now possible,
but needs to be enabled explicitly now.
Supersedes #702 (Github Pull Request).
This patch builds up on the idea from that Pull Request,
but does things in a cleaner way.
We do this to match Synapse's new default "max_upload_size" (50MB).
This `matrix_nginx_proxy_proxy_matrix_client_api_client_max_body_size_mb`
default value only affects standalone usage of the `matrix-nginx-proxy`
role. When the role is used in the context of the playbook,
the value is dynamically assigned from `group_vars/matrix_servers`.
Somewhat related to #692 (Github Issue).
The regex introduced in 63a49bb2dc seems to take precedence
over the bare location blocks, causing a regression.
> It is important to understand that, by default, Nginx will serve regular expression matches in preference to prefix matches.
> However, it evaluates prefix locations first, allowing for the administer to override this tendency by specifying locations using the = and ^~ modifiers.
Source: https://www.digitalocean.com/community/tutorials/understanding-nginx-server-and-location-block-selection-algorithms
In #628 I proposed a CORS change that turns out not to be the root of the issue. Caffeine-addled diagnosis leads to sloppy thinking, and this change should be reverted. In fact, if left it will cause problems for new installations.
Even with the v2 updates listed in #503 and partially addressed in #614, this is still needed to enable identity services to function with Element Desktop/Web. Testing on multiple clients with a clean config has confirmed this, at least for my installation.
Fix regression since 2a50b8b6bb (#597).
Dimension is intended to be embedded in various clients,
be it the Element service that we host (at element.DOMAIN),
some other Element (element-desktop running locally), etc.
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
· 😅 How to keep this in sync with the matrix-synapse documentation?
· regex location matching is expensive
· nginx syntax limit: one location only per block / statement
· thus, lots of duplicate statements in this file
Continuation of #234 (Github Pull Request).
I had unintentionally updated the documentation for the feature,
saying the page is available at `https://matrix.DOMAIN/nginx_status`.
Looks like it wasn't the case, going against my expectations.
I'm correcting this with this patch.
The status page is being made available on both HTTP and HTTPS.
Serving over HTTP is likely necessary for services like
Longview
(https://www.linode.com/docs/platform/longview/longview-app-for-nginx/)
We recently had someone in the support room who set it to `false`
and the playbook ran without any issues.
This currently seems to yield the same result as 'none', but it's
better to avoid such behavior.