Continuation of #234 (Github Pull Request).
I had unintentionally updated the documentation for the feature,
saying the page is available at `https://matrix.DOMAIN/nginx_status`.
Looks like it wasn't the case, going against my expectations.
I'm correcting this with this patch.
The status page is being made available on both HTTP and HTTPS.
Serving over HTTP is likely necessary for services like
Longview
(https://www.linode.com/docs/platform/longview/longview-app-for-nginx/)
# Auth server config
auth:
# Publicly accessible base URL for the login endpoints.
# The prefix below is not implicitly added. This URL and all subpaths should be proxied
# or otherwise pointed to the appservice's webserver to the path specified below (prefix).
# This path should usually include a trailing slash.
public: http://example.com/login/
# Internal prefix in the appservice web server for the login endpoints.
prefix: /login
Also discussed previously in #213 (Github Pull Request).
shared-secret-auth and rest-auth logging is still at `INFO`
intentionally, as user login events seem more important to keep.
Those modules typically don't spam as much.
We recently had someone in the support room who set it to `false`
and the playbook ran without any issues.
This currently seems to yield the same result as 'none', but it's
better to avoid such behavior.
It adds support for a new `DISABLE_SENDER_VERIFICATION` environment
variable that can be used to disable verification of sender addresses.
It doesn't matter for us, but we upgrade to keep up with latest.
Looks like these client ids are actually integers,
but unless we pass them as a string, the bridge would complain with
an error like:
{"field":"data.auth.clientID","message":"is the wrong type","value":123456789012345678,"type":"string","schemaPath":["properties","auth","properties","clientID"]}
Explicitly-casting to a string should fix the problem.
The Discord bridge should probably be improved to handle both ints and
strings though.
Regression since 174a6fcd1b, #204 (Github Pull Request),
which only affects new servers.
Old servers which had their passkey.pem file relocated were okay.
ef5e4ad061 intentionally makes us conform to
the logging format suggested by the official Docker image.
Reverting this part, because it's uglier.
This likely should be fixed upstream as well though.
Somewhat related to #213 (Github Pull Request).
We've been moving in the opposite direction for quite a long time.
All services should just leave logging to systemd's journald.
Fixes a regression introduced during the upgrade to
Synapse v1.1.0 (in 2b3865ceea).
Since Synapse v1.1.0 upgraded to Python 3.7
(https://github.com/matrix-org/synapse/pull/5546),
we need to use a different modules directory when mounting
password provider modules.
Well, `config.yaml` has been playbook-managed for a long time.
It's now extended to match the default sample config of the Discord
bridge.
With this patch, we also make `registration.yaml` playbook-managed,
which leads us to consistency with all other bridges.
Along with that, we introduce `./config` and `./data` separation,
like we do for the other bridges.
I've been thinking of doing before, but haven't.
Now that the Whatsapp bridge does it (since 4797469383),
it makes sense to do it for all other bridges as well.
(Except for the IRC bridge - that one manages most of registration.yaml by itself)
appservice-irc doesn't have permission to create files in its project
directory and the intention is to log to the console, anyway. By
commenting out the file names, appservice-irc won't attempt to open the
files.
This means we need to explicitly specify a `media_url` now,
because without it, `url` would be used for building public URLs to
files/images. That doesn't work when `url` is not a public URL.
Until now, if `--tags=setup-synapse` was used, bridge tasks would not
run and bridges would fail to register with the `matrix-synapse` role.
This means that Synapse's configuration would be generated with an empty
list of appservices (`app_service_config_files: []`).
.. and then bridges would fail, because Synapse would not be aware of
there being any bridges.
From now on, bridges always run their init tasks and always register
with Synapse.
For the Telegram bridge, the same applies to registering with
matrix-nginx-proxy. Previously, running `--tags=setup-nginx-proxy` would
get rid of the Telegram endpoint configuration for the same reason.
Not anymore.
With most people on Synapse v0.99+ and Synapse v1.0 now available,
we should no longer try to be backward compatible with Synapse 0.34,
because this just complicates the instructions for no good reason.
Using `|to_json` on a string is expected to correctly wrap it in quotes (e.g. `"4"`).
Wrapping it explicitly in double-quotes results in undesirable double-quoting (`""4""`).
We do use some `:latest` images by default for the following services:
- matrix-dimension
- Goofys (in the matrix-synapse role)
- matrix-bridge-appservice-irc
- matrix-bridge-appservice-discord
- matrix-bridge-mautrix-facebook
- matrix-bridge-mautrix-whatsapp
It's terribly unfortunate that those software projects don't release
anything other than `:latest`, but that's how it is for now.
Updating that software requires that users manually do `docker pull`
on the server. The playbook didn't force-repull images that it already
had.
With this patch, it starts doing so. Any image tagged `:latest` will be
force re-pulled by the playbook every time it's executed.
It should be noted that even though we ask the `docker_image` module to
force-pull, it only reports "changed" when it actually pulls something
new. This is nice, because it lets people know exactly when something
gets updated, as opposed to giving the indication that it's always
updating the images (even though it isn't).
Previously, it only mentioned exposing for psql-usage purposes.
Realistically, it can be used for much more. Especially given that
psql can be easily accessed via our matrix-postgres-cli script,
without exposing the container port.
We log to journald anyway. There's no need for double-logging.
It should not that matrix-synapse logs to journald and to files,
but that's likely to change in the future as well.
Because Synapse's logs are insanely verbose right now (and may get
dropped by journald), it's more reliable to have file-logging too.
As Synapse matures and gets more stable, logging should hopefully
get less, we should be able to only use journald and stop writing to
files for it as well.
Using a separate directory allows easier backups
(only need to back up the Ansible playbook configuration and the
bridge's `./data` directory).
The playbook takes care of migrating an existing database file
from the base directory into the `./data` directory.
In the future, we can also mount the configuration read-only,
to ensure the bridge won't touch it.
For now, mautrix-facebook is keen on rebuilding the `config.yaml`
file on startup though, so this will have to wait.
Related to #193, but for the Facebook bridge.
(other bridges can be changed to do the same later).
This patch makes the bridge configuration entirely managed by the
Ansible playbook. The bridge's `config.yaml` and `registration.yaml`
configuration files are regenerated every time the playbook runs.
This allows us to apply updates to those files and to avoid
people having to manage the configuration files manually on the server.
-------------------------------------------------------------
A deficiency of the current approach to dumping YAML configuration in
`config.yaml` is that we strip all comments from it.
Later on, when the bridge actually starts, it will load and redump
(this time with comments), which will make the `config.yaml` file
change.
Subsequent playbook runs will report "changed" for the
"Ensure mautrix-facebook config.yaml installed" task, which is a little
strange.
We might wish to improve this in the future, if possible.
Still, it's better to have a (usually) somewhat meaningless "changed"
task than to what we had -- never rebuilding the configuration.
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).
They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.
This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.
Attempt to fix#192 (Github Issue), potential regression since
70487061f4.
Serializing as JSON/YAML explicitly is much better than relying on
magic (well, Python serialization being valid YAML..).
It seems like Python may prefix strings with `u` sometimes (Python 3?),
which causes Python serialization to not be compatible with YAML.
This doesn't replace all usage of `-v`, but it's a start.
People sometimes troubleshoot by deleting files (especially bridge
config files). Restarting Synapse with a missing registration.yaml file
for a given bridge, causes the `-v
/something/registration.yaml:/something/registration.yaml:ro` option
to force-create `/something/registration.yaml` as a directory.
When a path that's provided to the `-v` option is missing, Docker
auto-creates that path as a directory.
This causes more breakage and confusion later on.
We'd rather fail, instead of magically creating directories.
Using `--mount`, instead of `-v` is the solution to this.
From Docker's documentation:
> When you use --mount with type=bind, the host-path must refer to an existing path on the host.
> The path will not be created for you and the service will fail with an error if the path does not exist.
Related to #189 (Github Issue).
People had proxying problems if:
- they used the whole playbook (including the `matrix-nginx-proxy` role)
- and they were disabling the proxy (`matrix_nginx_proxy_enabled: false`)
- and they were proxying with their own nginx server
For them,
`matrix_nginx_proxy_proxy_matrix_additional_server_configuration_blocks`
would not be modified to inject the necessary proxying configuration.
While using certbot means we'll have both files retrieved,
it's actually the fullchain.pem file that we use in nginx configuration.
Using that one for the check makes more sense.
Reasoning is the same as for matrix-org/synapse#5023.
For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
The goal is to move each bridge into its own separate role.
This commit starts off the work on this with 2 bridges:
- mautrix-telegram
- mautrix-whatsapp
Each bridge's role (including these 2) is meant to:
- depend only on the matrix-base role
- integrate nicely with the matrix-synapse role (if available)
- integrate nicely with the matrix-nginx-proxy role (if available and if
required). mautrix-telegram bridge benefits from integrating with
it.
- not break if matrix-synapse or matrix-nginx-proxy are not used at all
This has been provoked by #174 (Github Issue).
As discussed in #151 (Github Pull Request), it's
a good idea to not selectively apply casting, but to do it in all
cases involving arithmetic operations.
Previously, we'd show an error like this:
{"changed": false, "item": null, "msg": "Detected an undefined required variable"}
.. which didn't mention the variable name
(`matrix_ssl_lets_encrypt_support_email`).
Looks like we may not have to do this,
since 1.4.2 fixes edge cases for people who used the broken
1.4.0 release.
We jumped straight to 1.4.1, so maybe we're okay.
Still, upgrading anyway, just in case.
Use an int conversion in the computation of the value of
matrix_nginx_proxy_tmp_directory_size_mb, to have the integer value
multiplied by 50 instead of having the string repeated 50 times.
It doesn't hurt to attempt renewal more frequently, as it only does
real work if it's actually necessary.
Reloading, we postpone some more, because certbot adds some random delay
(between 1 and 8 * 60 seconds) when renewing. We want to ensure
we reload at least 8 minutes later, which wasn't the case.
To make it even safer (in case future certbot versions use a longer
delay), we reload a whole hour later. We're in no rush to start using
the new certificates anyway, especially given that we attempt renewal
often.
Somewhat fixes#146 (Github Issue)
The code used to check for a `homeserver.yaml` file and generate
a configuration (+ key) only if such a configuration file didn't exist.
Certain rare cases (setting up with one server name and then
changing to another) lead to `homeserver.yaml` being there,
but a `matrix.DOMAIN.signing.key` file missing (because the domain
changed).
A new signing key file would never get generated, because `homeserver.yaml`'s
existence used to be (incorrectly) satisfactory for us.
From now on, we don't mix things up like that.
We don't care about `homeserver.yaml` anymore, but rather
about the actual signing key.
The rest of the configuration (`homeserver.yaml` and
`matrix.DOMAIN.log.config`) is rebuilt by us in any case, so whether
it exists or not is irrelevant and doesn't need checking.
- matrix_enable_room_list_search - Controls whether searching the public room list is enabled.
- matrix_alias_creation_rules - Controls who's allowed to create aliases on this server.
- matrix_room_list_publication_rules - Controls who can publish and which rooms can be published in the public room list.
`{% matrix_s3_media_store_custom_endpoint_enabled %}` should have
been `{% if matrix_s3_media_store_custom_endpoint_enabled %}` instead.
Related to #132 (Github Pull Request).
In most cases, there's not really a need to touch the system
firewall, as Docker manages iptables by itself
(see https://docs.docker.com/network/iptables/).
All ports exposed by Docker containers are automatically whitelisted
in iptables and wired to the correct container.
This made installing firewalld and whitelisting ports pointless,
as far as this playbook's services are concerned.
People that wish to install firewalld (for other reasons), can do so
manually from now on.
This is inspired by and fixes#97 (Github Issue).
Fixes#129 (Github Issue).
Unfortunately, we rely on `service_facts`, which is only available
in Ansible >= 2.5.
There's little reason to stick to an old version such as Ansible 2.4:
- some time has passed since we've raised version requirements - it's
time to move into the future (a little bit)
- we've recently (in 82b4640072) improved the way one can run
Ansible in a Docker container
From now on, Ansible >= 2.5 is required.
By default, `--tags=self-check` no longer validates certificates
when `matrix_ssl_retrieval_method` is set to `self-signed`.
Besides this default, people can also enable/disable validation using the
individual role variables manually.
Fixes#124 (Github Issue)
Most (all?) of our Matrix services are running in the `matrix` network,
so they were safe -- not accessible from Coturn to begin with.
Isolating Coturn into its own network is a security improvement
for people who were starting other services in the default
Docker network. Those services were potentially reachable over the
private Docker network from Coturn.
Discussed in #120 (Github Pull Request)
This is more explicit than hiding it in the role defaults.
People who reuse the roles in their own playbook (and not only) may
incorrectly define `ansible_host` to be a hostname or some local address.
Making it more explicit is more likely to prevent such mistakes.
Currently the nginx reload cron fails on Debian 9 because the path to
systemctl is /bin/systemctl rather than /usr/bin/systemctl.
CentOS 7 places systemctl in both /bin and /usr/bin, so we can just use
/bin/systemctl as the full path.
This allows overriding the default value for `include_content`. Setting
this to false allows homeserver admins to ensure that message content
isn't sent in the clear through third party servers.
`matrix_nginx_proxy_data_path` has always served as a base path,
so we're renaming it to reflect that.
Along with this, we're also introducing a new "data path" variable
(`matrix_nginx_proxy_data_path`), which is really a data path this time.
It's used for storing additional, non-configuration, files related to
matrix-nginx-proxy.
It's been reported that YAML parsing errors
would occur on certain Ansible/Python combinations for some reason.
It appears that a bare `{{ matrix_dimension_admins }}` would sometimes
yield things like `[u'@user:domain.com', ..]` (note the `u` string prefix).
To prevent such problems, we now explicitly serialize with `|to_json`.
The Server spec says that redirects should be followed for
`/.well-known/matrix/server`. So we follow them.
The Client-Server specs doesn't mention redirects, so we don't
follow redirects there.
Using `docker_container` with a `cap_drop` argument requires
Ansible >=2.7.
We want to support older versions too (2.4), so we either need to
stop invoking it with `cap_drop` (insecure), or just stop using
the module altogether.
Since it was suffering from other bugs too (not deleting containers
on failure), we've decided to remove `docker_container` usage completely.
Some resources shouldn't be cached right now,
as per https://github.com/vector-im/riot-web/pull/8702
(note all of the suggestions from that pull request were applied,
because some of them do not seem relevant - no such files)
Fixes#98 (Github Issue)
`matrix_synapse_no_tls` is now implicit, so we've gotten rid of it.
The `homeserver.yaml.j2` template has been synchronized with the
configuration generated by Synapse v0.99.1 (some new options
are present, etc.)
For consistency with all our other listeners,
we make this one bind on the `::` address too
(both IPv4 and IPv6).
Additional details are in #91 (Github Pull Request).
People who wish to rely on SRV records can prevent
the `/.well-known/matrix/server` file from being generated
(and thus, served.. which causes trouble).
If someone decides to not use `/.well-known/matrix/server` and only
relies on SRV records, then they would need to serve tcp/8448 using
a certificate for the base domain (not for the matrix) domain.
Until now, they could do that by giving the certificate to Synapse
and setting it terminate TLS. That makes swapping certificates
more annoying (Synapse requires a restart to re-read certificates),
so it's better if we can support it via matrix-nginx-proxy.
Mounting certificates (or any other file) into the matrix-nginx-proxy container
can be done with `matrix_nginx_proxy_container_additional_volumes`,
introduced in 96afbbb5a.
Certain use-cases may require that people mount additional files
into the matrix-nginx-proxy container. Similarly to how we do it
for Synapse, we are introducing a new variable that makes this
possible (`matrix_nginx_proxy_container_additional_volumes`).
This makes the htpasswd file for Synapse Metrics (introduced in #86,
Github Pull Request) to also perform mounting using this new mechanism.
Hopefully, for such an "extension", keeping htpasswd file-creation and
volume definition in the same place (the tasks file) is better.
All other major volumes' mounting mechanism remains the same (explicit
mounting).
Continuation of 1f0cc92b33.
As an explanation for the problem:
when saying `localhost` on the host, it sometimes gets resolved to `::1`
and sometimes to `127.0.0.1`. On the unfortunate occassions that
it gets resolved to `::1`, the container won't be able to serve the
request, because Docker containers don't have IPv6 enabled by default.
To avoid this problem, we simply prevent any lookups from happening
and explicitly use `127.0.0.1`.
This reverts commit 0dac5ea508.
Relying on pyOpenSSL is the Ansible way of doing things, but is
impractical and annoying for users.
`openssl` is easily available on most servers, even by default.
We'd better use that.
Seems like we unintentionally removed the mounting of certificates
(the `/matrix-config` mount) as part of splitting the playbook into
roles in 51312b8250.
It appears that those certificates weren't necessary for coturn to
funciton though, so we might just get rid of the configuration as well.
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
This is a known/intentional regression since f92c4d5a27.
The new stance on this is that most people would not have
dnspython, but may have the `dig` tool. There's no good
reason for not increasing our chances of success by trying both
methods (Ansible dig lookup and using the `dig` CLI tool).
Fixes#85 (Github issue).
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.
We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.
Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.
The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).
Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
We do match the defaults anyway (by default that is),
but people can customize `matrix_user_uid` and `matrix_user_uid`
and it wouldn't be correct then.
In any case, it's better to be explicit about such an important thing.
If this is a brand new server and Postgres had never started,
detecting it before we even start it is not possible.
This moves the logic, so that it happens later on, when Postgres
would have had the chance to start and possibly initialize
a new empty database.
Fixes#82 (Github issue)