As discussed in #151 (Github Pull Request), it's
a good idea to not selectively apply casting, but to do it in all
cases involving arithmetic operations.
Previously, we'd show an error like this:
{"changed": false, "item": null, "msg": "Detected an undefined required variable"}
.. which didn't mention the variable name
(`matrix_ssl_lets_encrypt_support_email`).
Looks like we may not have to do this,
since 1.4.2 fixes edge cases for people who used the broken
1.4.0 release.
We jumped straight to 1.4.1, so maybe we're okay.
Still, upgrading anyway, just in case.
Use an int conversion in the computation of the value of
matrix_nginx_proxy_tmp_directory_size_mb, to have the integer value
multiplied by 50 instead of having the string repeated 50 times.
It doesn't hurt to attempt renewal more frequently, as it only does
real work if it's actually necessary.
Reloading, we postpone some more, because certbot adds some random delay
(between 1 and 8 * 60 seconds) when renewing. We want to ensure
we reload at least 8 minutes later, which wasn't the case.
To make it even safer (in case future certbot versions use a longer
delay), we reload a whole hour later. We're in no rush to start using
the new certificates anyway, especially given that we attempt renewal
often.
Somewhat fixes#146 (Github Issue)
The code used to check for a `homeserver.yaml` file and generate
a configuration (+ key) only if such a configuration file didn't exist.
Certain rare cases (setting up with one server name and then
changing to another) lead to `homeserver.yaml` being there,
but a `matrix.DOMAIN.signing.key` file missing (because the domain
changed).
A new signing key file would never get generated, because `homeserver.yaml`'s
existence used to be (incorrectly) satisfactory for us.
From now on, we don't mix things up like that.
We don't care about `homeserver.yaml` anymore, but rather
about the actual signing key.
The rest of the configuration (`homeserver.yaml` and
`matrix.DOMAIN.log.config`) is rebuilt by us in any case, so whether
it exists or not is irrelevant and doesn't need checking.
- matrix_enable_room_list_search - Controls whether searching the public room list is enabled.
- matrix_alias_creation_rules - Controls who's allowed to create aliases on this server.
- matrix_room_list_publication_rules - Controls who can publish and which rooms can be published in the public room list.
`{% matrix_s3_media_store_custom_endpoint_enabled %}` should have
been `{% if matrix_s3_media_store_custom_endpoint_enabled %}` instead.
Related to #132 (Github Pull Request).
In most cases, there's not really a need to touch the system
firewall, as Docker manages iptables by itself
(see https://docs.docker.com/network/iptables/).
All ports exposed by Docker containers are automatically whitelisted
in iptables and wired to the correct container.
This made installing firewalld and whitelisting ports pointless,
as far as this playbook's services are concerned.
People that wish to install firewalld (for other reasons), can do so
manually from now on.
This is inspired by and fixes#97 (Github Issue).
Fixes#129 (Github Issue).
Unfortunately, we rely on `service_facts`, which is only available
in Ansible >= 2.5.
There's little reason to stick to an old version such as Ansible 2.4:
- some time has passed since we've raised version requirements - it's
time to move into the future (a little bit)
- we've recently (in 82b4640072) improved the way one can run
Ansible in a Docker container
From now on, Ansible >= 2.5 is required.
By default, `--tags=self-check` no longer validates certificates
when `matrix_ssl_retrieval_method` is set to `self-signed`.
Besides this default, people can also enable/disable validation using the
individual role variables manually.
Fixes#124 (Github Issue)
Most (all?) of our Matrix services are running in the `matrix` network,
so they were safe -- not accessible from Coturn to begin with.
Isolating Coturn into its own network is a security improvement
for people who were starting other services in the default
Docker network. Those services were potentially reachable over the
private Docker network from Coturn.
Discussed in #120 (Github Pull Request)
This is more explicit than hiding it in the role defaults.
People who reuse the roles in their own playbook (and not only) may
incorrectly define `ansible_host` to be a hostname or some local address.
Making it more explicit is more likely to prevent such mistakes.
Currently the nginx reload cron fails on Debian 9 because the path to
systemctl is /bin/systemctl rather than /usr/bin/systemctl.
CentOS 7 places systemctl in both /bin and /usr/bin, so we can just use
/bin/systemctl as the full path.
This allows overriding the default value for `include_content`. Setting
this to false allows homeserver admins to ensure that message content
isn't sent in the clear through third party servers.
`matrix_nginx_proxy_data_path` has always served as a base path,
so we're renaming it to reflect that.
Along with this, we're also introducing a new "data path" variable
(`matrix_nginx_proxy_data_path`), which is really a data path this time.
It's used for storing additional, non-configuration, files related to
matrix-nginx-proxy.
It's been reported that YAML parsing errors
would occur on certain Ansible/Python combinations for some reason.
It appears that a bare `{{ matrix_dimension_admins }}` would sometimes
yield things like `[u'@user:domain.com', ..]` (note the `u` string prefix).
To prevent such problems, we now explicitly serialize with `|to_json`.
The Server spec says that redirects should be followed for
`/.well-known/matrix/server`. So we follow them.
The Client-Server specs doesn't mention redirects, so we don't
follow redirects there.
Using `docker_container` with a `cap_drop` argument requires
Ansible >=2.7.
We want to support older versions too (2.4), so we either need to
stop invoking it with `cap_drop` (insecure), or just stop using
the module altogether.
Since it was suffering from other bugs too (not deleting containers
on failure), we've decided to remove `docker_container` usage completely.
Some resources shouldn't be cached right now,
as per https://github.com/vector-im/riot-web/pull/8702
(note all of the suggestions from that pull request were applied,
because some of them do not seem relevant - no such files)
Fixes#98 (Github Issue)
`matrix_synapse_no_tls` is now implicit, so we've gotten rid of it.
The `homeserver.yaml.j2` template has been synchronized with the
configuration generated by Synapse v0.99.1 (some new options
are present, etc.)
For consistency with all our other listeners,
we make this one bind on the `::` address too
(both IPv4 and IPv6).
Additional details are in #91 (Github Pull Request).
People who wish to rely on SRV records can prevent
the `/.well-known/matrix/server` file from being generated
(and thus, served.. which causes trouble).
If someone decides to not use `/.well-known/matrix/server` and only
relies on SRV records, then they would need to serve tcp/8448 using
a certificate for the base domain (not for the matrix) domain.
Until now, they could do that by giving the certificate to Synapse
and setting it terminate TLS. That makes swapping certificates
more annoying (Synapse requires a restart to re-read certificates),
so it's better if we can support it via matrix-nginx-proxy.
Mounting certificates (or any other file) into the matrix-nginx-proxy container
can be done with `matrix_nginx_proxy_container_additional_volumes`,
introduced in 96afbbb5a.
Certain use-cases may require that people mount additional files
into the matrix-nginx-proxy container. Similarly to how we do it
for Synapse, we are introducing a new variable that makes this
possible (`matrix_nginx_proxy_container_additional_volumes`).
This makes the htpasswd file for Synapse Metrics (introduced in #86,
Github Pull Request) to also perform mounting using this new mechanism.
Hopefully, for such an "extension", keeping htpasswd file-creation and
volume definition in the same place (the tasks file) is better.
All other major volumes' mounting mechanism remains the same (explicit
mounting).
Continuation of 1f0cc92b33.
As an explanation for the problem:
when saying `localhost` on the host, it sometimes gets resolved to `::1`
and sometimes to `127.0.0.1`. On the unfortunate occassions that
it gets resolved to `::1`, the container won't be able to serve the
request, because Docker containers don't have IPv6 enabled by default.
To avoid this problem, we simply prevent any lookups from happening
and explicitly use `127.0.0.1`.
This reverts commit 0dac5ea508.
Relying on pyOpenSSL is the Ansible way of doing things, but is
impractical and annoying for users.
`openssl` is easily available on most servers, even by default.
We'd better use that.
Seems like we unintentionally removed the mounting of certificates
(the `/matrix-config` mount) as part of splitting the playbook into
roles in 51312b8250.
It appears that those certificates weren't necessary for coturn to
funciton though, so we might just get rid of the configuration as well.
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
This is a known/intentional regression since f92c4d5a27.
The new stance on this is that most people would not have
dnspython, but may have the `dig` tool. There's no good
reason for not increasing our chances of success by trying both
methods (Ansible dig lookup and using the `dig` CLI tool).
Fixes#85 (Github issue).
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.
We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.
Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.
The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).
Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
We do match the defaults anyway (by default that is),
but people can customize `matrix_user_uid` and `matrix_user_uid`
and it wouldn't be correct then.
In any case, it's better to be explicit about such an important thing.
If this is a brand new server and Postgres had never started,
detecting it before we even start it is not possible.
This moves the logic, so that it happens later on, when Postgres
would have had the chance to start and possibly initialize
a new empty database.
Fixes#82 (Github issue)
The matrix-nginx-proxy role can now be used independently.
This makes it consistent with all other roles, with
the `matrix-base` role remaining as their only dependency.
Separating matrix-nginx-proxy was relatively straightforward, with
the exception of the Mautrix Telegram reverse-proxying configuration.
Mautrix Telegram, being an extension/bridge, does not feel important enough
to justify its own special handling in matrix-nginx-proxy.
Thus, we've introduced the concept of "additional configuration blocks"
(`matrix_nginx_proxy_proxy_matrix_additional_server_configuration_blocks`),
where any module can register its own custom nginx server blocks.
For such dynamic registration to work, the order of role execution
becomes important. To make it possible for each module participating
in dynamic registration to verify that the order of execution is
correct, we've also introduced a `matrix_nginx_proxy_role_executed`
variable.
It should be noted that this doesn't make the matrix-synapse role
dependent on matrix-nginx-proxy. It's optional runtime detection
and registration, and it only happens in the matrix-synapse role
when `matrix_mautrix_telegram_enabled: true`.
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`
The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.
Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.
According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)
All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.
Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).
We've previously changed a bunch of lists in `homeserver.yaml.j2`
to be serialized using `|to_nice_yaml`, as that generates a more
readable list in YAML.
`matrix_synapse_federation_domain_whitelist`, however, couldn't have
been changed to that, as it can potentially be an empty list.
We may be able to differentiate between empty and non-empty now
and serialize it accordingly (favoring `|to_nice_yaml` if non-empty),
but it's not important enough to be justified. Thus, always
serializing with `|to_json`.
Fixes#78 (Github issue)
Riot-web parses integrations_widgets_urls as a list, thus causing it to incorrectly think Scalar widgets are non-Scalar and not passing the scalar token
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.
This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.
- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse
We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.
As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.
This change is provoked by a few different things:
- #54 (Github Pull Request), which rightfully says that we need a
way to support ALL mxisd configuration options easily
- the upcoming mxisd 1.3.0 release, which drops support for
property-style configuration (dot-notation), forcing us to
redo the way we generate the configuration file
With this, mxisd is much more easily configurable now
and much more easily maintaneable by us in the future
(no need to introduce additional playbook variables and logic).
As suggested in #65 (Github issue), this patch switches
cronjob management from using templates to using Ansible's `cron` module.
It also moves the management of the nginx-reload cronjob to `setup_ssl_lets_encrypt.yml`,
which is a more fitting place for it (given that this cronjob is only required when
Let's Encrypt is used).
Pros:
- using a module is more Ansible-ish than templating our own files in
special directories
- more reliable: will fail early (during playbook execution) if `/usr/bin/crontab`
is not available, which is more of a guarantee that cron is working fine
(idea: we should probably install some cron package using the playbook)
Cons:
- invocation schedule is no longer configurable, unless we define individual
variables for everything or do something smart (splitting on ' ', etc.).
Likely not necessary, however.
- requires us to deprecate and clean-up after the old way of managing cronjobs,
because it's not compatible (using the same file as before means appending
additional jobs to it)
This means we no longer have a dependency on the `dig` program,
but we do have a dependency on `dnspython`.
Improves things as suggested in #65 (Github issue).
After having multiple people report issues with retrieving
SSL certificates, we've finally discovered the culprit to be
Ansible 2.5.1 (default and latest version on Ubuntu 18.04 LTS).
As silly as it is, certain distributions ("LTS" even) are 13 bugfix
versions of Ansible behind.
From now on, we try to auto-detect buggy Ansible versions and tell the
user. We also provide some tips for how to upgrade Ansible or
run it from inside a Docker container.
My testing shows that Ansible 2.4.0 and 2.4.6 are OK.
All other intermediate 2.4.x versions haven't been tested, but we
trust they're OK too.
From the 2.5.x releases, only 2.5.0 and 2.5.1 seem to be affected.
Ansible 2.5.2 corrects the problem with `include_tasks` + `with_items`.
This is a simplification and a way to make it consistent with
how we do Postgres imports (see 6d89319822), using
files coming from the server, not from the local machine.
By encouraging people NOT to use local files,
we potentially avoid problems such as #34 (Github issue),
where people would download `media_store` to their Mac's filesystem
and case-sensitivity issues will actually corrupt it.
By not encouraging local files usage, it's less likely that
people would copy (huge) directories to their local machine like that.
This is a simplification and a way to make it consistent with
how we do Postgres imports (see 6d89319822), using
files coming from the server, not from the local machine.
Until now, if the .sql file contained invalid data, psql would
choke on it, but still return an exit code of 0.
This is very misleading.
We need to pass `-v ON_ERROR_STOP=1` to make it exit
with a proper error exit code when failures happen.
We've had that logic in 2 places so far, leading to duplication
and a maintenance burden.
In the future, we'll also have an import-postgres feature,
which will also need Postgres version detection,
leading to more benefit from that logic being reusable.
Fixes#18 (Github issue).
It would probably be better if we serve our own page,
as the Matrix one says:
"To use this server you'll need a Matrix client", which
is true, but we install Riot by default and it'd be better if we mention
that instead.
If uppercase is used, certain tools (like certbot) would cause trouble.
They would retrieve a certificate for the lowercased domain name,
but we'd try to use it from an uppercase-named directory, which will
fail.
Besides certbot, we may experience other trouble too.
(it hasn't been investigated how far the breakage goes).
To fix it all, we lowercase `host_specific_hostname_identity` by default,
which takes care of the general use-case (people only setting that
and relying on us to build the other domain names - `hostname_matrix`
and `hostname_riot`).
For others, who decide to override these other variables directly
(and who may work around us and introduce uppercase there directly),
we also have the sanity-check tool warn if uppercase is detected
in any of the final domains.
Adds support for managing certificates manually and for
having the playbook generate self-signed certificates for you.
With this, Let's Encrypt usage is no longer required.
Fixes Github issue #50.
This is in line with what the recommendation is for matrix-corporal.
A value higher than 30 seconds is required to satisfy Riot
(and other clients') default long-polling behavior.
It looks like SELinux can be left running without any (so far) negative
effects on our Matrix services.
There's no need to use `:z` or `:Z` options when mounting volumes either.
This means that files we create are labeled with a default context
(which may not be ideal if we only want them used from containers),
but it's compatible and doesn't cause issues.
Relabelling files is probably something we wish to stay away from,
especially for things like the media store, which contains lots of
files and is possibly on a fuse-mounted (S3/goofys) filesystem.
The new image is built in a much better way (2-stage build)
and is 10x smaller.
In terms of Goofys version recency, it's about the same..
Both images (and others alike) seem to not use version tags,
but rather some `:latest` (master), with ewoutp/goofys being a bit
more recent than clodproto/goofys.
Not using version tags is good (in this case),
because the last Goofys release seems to be from about a year ago
and there had been a bunch of bugfixes afterwards.
This is described in Github issue #58.
Until now, we had the variable, but if you redefined it, you'd run
into multiple problems:
- we actually always mounted some "storage" directory to the Synapse
container. So if your media store is not there, you're out of luck
- homeserver.yaml always hardcoded the path to the media store,
as a directory called "media-store" inside the storage directory.
Relocating to outside the storage directory was out of the question.
Moreover, even if you had simply renamed the media store directory
(e.g. "media-store" -> "media_store"), it would have also caused trouble.
With this patch, we mount the media store's parent to the Synapse container.
This way, we don't care where the media store is (inside storage or
not). We also don't assume (anymore) that the final part of the path
is called "media-store" -- anything can be used.
The "storage" directory and variable (`matrix_synapse_storage_path`)
still remain for compatibility purposes. People who were previously
overriding `matrix_synapse_storage_path` can continue doing so
and their media store will be at the same place.
The playbook no longer explicitly creates the `matrix_synapse_storage_path` directory
though. It's not necessary. If the media store is specified to be within it, it will
get created when the media store directory is created by the playbook.
Previously, it was more necessary to have it
(because we had a dependency between matrix-synapse and matrix-nginx-proxy)..
But nowadays, it can be removed without negative side effects.
Restarting matrix-nginx-proxy is especially bad when the proxy is not installed at all.
mxisd supports several identity stores. Add support to configure two of them:
* synapseSql (storing identities directly in Synapse's database)
* LDAP
This removed the need to copy `mxisd.yaml.j2` to the inventory in case one wants
to use LDAP as identity store. Note that the previous solution (copying
`mxisd.yaml.j2` was poor because of two reasons:
* The copy remains outdated in case the original is updated in future versions
of this repo.
* The role's configuration should be in one place (configured only through role
variables) instead of in multiple.
Configuring more identity stores through role variables can be supported in the
future.
This is provoked by Github issue #46.
No client had made use of the well-known mechanism
so far, so the set up performed by this playbook was not tested
and turned out to be a little deficient.
Even though /.well-known/matrix/client is usually requested with a
simple request (no preflight), it's still considered cross-origin
and [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS)
applies. Thus, the file always needs to be served with the appropriate
`Access-Control-Allow-Origin` header.
Github issue #46 attempts to fix it at the "reverse-proxying" layer,
which may work, but would need to be done for every server.
It's better if it's done "upstream", so that all reverse-proxy
configurations can benefit.
Trying to:
- stay closer to naming in Synapse (autojoin -> auto_join)
- not create new variable namespaces (`matrix_homeserver_`),
when existing ones (`matrix_synapse_`) are more suitable
- allow `null` (`~`) values for `matrix_riot_web_welcome_user_id`
- render things like `auto_join_rooms` in `homeserver.yaml` more prettily
- fix breakage in `config.json` where `matrix_riot_web_roomdir_servers`
was rendered as YAML and not as JSON
- simplify code (especially in riot-web's `config.json`), which used
`if` statements that could have been omitted
- avoid changing comments in `homeserver.yaml` which are not ours,
so that we can keep closer to the configuration file generated by upstream
Pretty much all variables live in their own `matrix_<whatever>`
prefix now and are grouped closer together in the default
variables file (`roles/matrix-server/defaults/main.yml`).
It should be `/bin/mkdir` and `/bin/chown` on Ubuntu 18.04 for example.
Still, it doesn't seem like we need to create and chown these
directories at all, since the playbook takes care of creating them
and setting appropriate permission by itself.
If a network like `matrix-whatever` already exists for some reason,
the `docker_network` module would not create our `matrix` network.
Working around it by avoiding `docker_network` and doing it manually.
Fixes Github issue #12