As suggested in #65 (Github issue), this patch switches
cronjob management from using templates to using Ansible's `cron` module.
It also moves the management of the nginx-reload cronjob to `setup_ssl_lets_encrypt.yml`,
which is a more fitting place for it (given that this cronjob is only required when
Let's Encrypt is used).
Pros:
- using a module is more Ansible-ish than templating our own files in
special directories
- more reliable: will fail early (during playbook execution) if `/usr/bin/crontab`
is not available, which is more of a guarantee that cron is working fine
(idea: we should probably install some cron package using the playbook)
Cons:
- invocation schedule is no longer configurable, unless we define individual
variables for everything or do something smart (splitting on ' ', etc.).
Likely not necessary, however.
- requires us to deprecate and clean-up after the old way of managing cronjobs,
because it's not compatible (using the same file as before means appending
additional jobs to it)
This means we no longer have a dependency on the `dig` program,
but we do have a dependency on `dnspython`.
Improves things as suggested in #65 (Github issue).
After having multiple people report issues with retrieving
SSL certificates, we've finally discovered the culprit to be
Ansible 2.5.1 (default and latest version on Ubuntu 18.04 LTS).
As silly as it is, certain distributions ("LTS" even) are 13 bugfix
versions of Ansible behind.
From now on, we try to auto-detect buggy Ansible versions and tell the
user. We also provide some tips for how to upgrade Ansible or
run it from inside a Docker container.
My testing shows that Ansible 2.4.0 and 2.4.6 are OK.
All other intermediate 2.4.x versions haven't been tested, but we
trust they're OK too.
From the 2.5.x releases, only 2.5.0 and 2.5.1 seem to be affected.
Ansible 2.5.2 corrects the problem with `include_tasks` + `with_items`.
This is a simplification and a way to make it consistent with
how we do Postgres imports (see 6d89319822), using
files coming from the server, not from the local machine.
By encouraging people NOT to use local files,
we potentially avoid problems such as #34 (Github issue),
where people would download `media_store` to their Mac's filesystem
and case-sensitivity issues will actually corrupt it.
By not encouraging local files usage, it's less likely that
people would copy (huge) directories to their local machine like that.
This is a simplification and a way to make it consistent with
how we do Postgres imports (see 6d89319822), using
files coming from the server, not from the local machine.
Until now, if the .sql file contained invalid data, psql would
choke on it, but still return an exit code of 0.
This is very misleading.
We need to pass `-v ON_ERROR_STOP=1` to make it exit
with a proper error exit code when failures happen.
We've had that logic in 2 places so far, leading to duplication
and a maintenance burden.
In the future, we'll also have an import-postgres feature,
which will also need Postgres version detection,
leading to more benefit from that logic being reusable.
Fixes#18 (Github issue).
It would probably be better if we serve our own page,
as the Matrix one says:
"To use this server you'll need a Matrix client", which
is true, but we install Riot by default and it'd be better if we mention
that instead.
If uppercase is used, certain tools (like certbot) would cause trouble.
They would retrieve a certificate for the lowercased domain name,
but we'd try to use it from an uppercase-named directory, which will
fail.
Besides certbot, we may experience other trouble too.
(it hasn't been investigated how far the breakage goes).
To fix it all, we lowercase `host_specific_hostname_identity` by default,
which takes care of the general use-case (people only setting that
and relying on us to build the other domain names - `hostname_matrix`
and `hostname_riot`).
For others, who decide to override these other variables directly
(and who may work around us and introduce uppercase there directly),
we also have the sanity-check tool warn if uppercase is detected
in any of the final domains.
Adds support for managing certificates manually and for
having the playbook generate self-signed certificates for you.
With this, Let's Encrypt usage is no longer required.
Fixes Github issue #50.
This is in line with what the recommendation is for matrix-corporal.
A value higher than 30 seconds is required to satisfy Riot
(and other clients') default long-polling behavior.
It looks like SELinux can be left running without any (so far) negative
effects on our Matrix services.
There's no need to use `:z` or `:Z` options when mounting volumes either.
This means that files we create are labeled with a default context
(which may not be ideal if we only want them used from containers),
but it's compatible and doesn't cause issues.
Relabelling files is probably something we wish to stay away from,
especially for things like the media store, which contains lots of
files and is possibly on a fuse-mounted (S3/goofys) filesystem.
The new image is built in a much better way (2-stage build)
and is 10x smaller.
In terms of Goofys version recency, it's about the same..
Both images (and others alike) seem to not use version tags,
but rather some `:latest` (master), with ewoutp/goofys being a bit
more recent than clodproto/goofys.
Not using version tags is good (in this case),
because the last Goofys release seems to be from about a year ago
and there had been a bunch of bugfixes afterwards.
This is described in Github issue #58.
Until now, we had the variable, but if you redefined it, you'd run
into multiple problems:
- we actually always mounted some "storage" directory to the Synapse
container. So if your media store is not there, you're out of luck
- homeserver.yaml always hardcoded the path to the media store,
as a directory called "media-store" inside the storage directory.
Relocating to outside the storage directory was out of the question.
Moreover, even if you had simply renamed the media store directory
(e.g. "media-store" -> "media_store"), it would have also caused trouble.
With this patch, we mount the media store's parent to the Synapse container.
This way, we don't care where the media store is (inside storage or
not). We also don't assume (anymore) that the final part of the path
is called "media-store" -- anything can be used.
The "storage" directory and variable (`matrix_synapse_storage_path`)
still remain for compatibility purposes. People who were previously
overriding `matrix_synapse_storage_path` can continue doing so
and their media store will be at the same place.
The playbook no longer explicitly creates the `matrix_synapse_storage_path` directory
though. It's not necessary. If the media store is specified to be within it, it will
get created when the media store directory is created by the playbook.
Previously, it was more necessary to have it
(because we had a dependency between matrix-synapse and matrix-nginx-proxy)..
But nowadays, it can be removed without negative side effects.
Restarting matrix-nginx-proxy is especially bad when the proxy is not installed at all.
mxisd supports several identity stores. Add support to configure two of them:
* synapseSql (storing identities directly in Synapse's database)
* LDAP
This removed the need to copy `mxisd.yaml.j2` to the inventory in case one wants
to use LDAP as identity store. Note that the previous solution (copying
`mxisd.yaml.j2` was poor because of two reasons:
* The copy remains outdated in case the original is updated in future versions
of this repo.
* The role's configuration should be in one place (configured only through role
variables) instead of in multiple.
Configuring more identity stores through role variables can be supported in the
future.
This is provoked by Github issue #46.
No client had made use of the well-known mechanism
so far, so the set up performed by this playbook was not tested
and turned out to be a little deficient.
Even though /.well-known/matrix/client is usually requested with a
simple request (no preflight), it's still considered cross-origin
and [CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS)
applies. Thus, the file always needs to be served with the appropriate
`Access-Control-Allow-Origin` header.
Github issue #46 attempts to fix it at the "reverse-proxying" layer,
which may work, but would need to be done for every server.
It's better if it's done "upstream", so that all reverse-proxy
configurations can benefit.
Trying to:
- stay closer to naming in Synapse (autojoin -> auto_join)
- not create new variable namespaces (`matrix_homeserver_`),
when existing ones (`matrix_synapse_`) are more suitable
- allow `null` (`~`) values for `matrix_riot_web_welcome_user_id`
- render things like `auto_join_rooms` in `homeserver.yaml` more prettily
- fix breakage in `config.json` where `matrix_riot_web_roomdir_servers`
was rendered as YAML and not as JSON
- simplify code (especially in riot-web's `config.json`), which used
`if` statements that could have been omitted
- avoid changing comments in `homeserver.yaml` which are not ours,
so that we can keep closer to the configuration file generated by upstream
Pretty much all variables live in their own `matrix_<whatever>`
prefix now and are grouped closer together in the default
variables file (`roles/matrix-server/defaults/main.yml`).
It should be `/bin/mkdir` and `/bin/chown` on Ubuntu 18.04 for example.
Still, it doesn't seem like we need to create and chown these
directories at all, since the playbook takes care of creating them
and setting appropriate permission by itself.
If a network like `matrix-whatever` already exists for some reason,
the `docker_network` module would not create our `matrix` network.
Working around it by avoiding `docker_network` and doing it manually.
Fixes Github issue #12
`--log-driver=none` is used for all Docker containers now.
All these containers are started through systemd anyway and get logged in journald,
so there's no need for Docker to be logging the same thing using the default `json-file` driver.
Doing that was growing `/var/lib/docker/containers/..` infinitely until service/container restart.
As a result of this, things like `docker logs matrix-synapse` won't work anymore.
`journalctl -u matrix-synapse` is how one can see the logs.
If the playbook were to run with `--tags=setup-nginx-proxy`,
it wouldn't go into `setup_corporal.yml`, which meant it wouldn't
perform a bunch of `set_fact` calls which override important
nginx proxy configuration.
We run these variable overrides on each call now (tagged with `always`)
to avoid such problems in the future.
This disables federation on the 80 port, as it's
not necessary. We also disable the old Angular webclient.
For the federation port (8448), we disable the client APIs
as those are not necessary. Those can even cause trouble
if one doesn't know about them and thinks that guarding the client
APIs at the 80 port is enough.
Moving away from using the default bridge network to using our own.
This isolates our services from other Docker containers running
on the default network on the same host.
The benefits are that:
- isolation is a little better - we no longer share a default
bridge network with any other containers that might be running on the host
- there are no longer hard dependencies - we do service discovery
by DNS name, and not via explicit `--link` usage during container start,
so containers can start out of order and fail without bringing down others
with them
(`matrix-nginx-proxy` can continue running, even if one of the other services dies)
In the future, when other services get introduced,
the increased resilience and simplicity will help as well.
Until now, we were starting from a fresh configuration, as generated
by Synapse and manipulating it with regex and line replacements,
until we made it work.
This is more fragile and less predictable, so we're moving to a static
configuration file generated from a Jinja template.
The upside is that configuration will be stable and predictable.
The downside of this new approach is that any manual configuration changes
after the playbook is done, will be thrown away on future playbook
invocations.
There are 2 ways to work around the need for manual configuration
changes though:
- making them part of this playbook and its default template
configuration files (which benefits everyone)
- going your own way for a given host and overriding the template files
that gets used (that is, the
`matrix_synapse_template_synapse_homeserver` or
`matrix_synapse_template_synapse_log` variables)
This playbook does not set up guest access in Synapse anyway,
so until the need comes (or someone asks for it), guest access
is removed from riot-web's UI too.
As for supporting custom URLs, this is also not something
that seems like it'd be useful to most deployments.
Since cbee084ac1, this playbook supports Postgres 10.x,
but keeps existing Postgres-9.x installs on 9.x.
This playbook can now also be ran with `--tags=upgrade-postgres`
to make it upgrade from Postgres 9.x to 10.x (or other versions
in the future).
This playbook just tries to avoid trying to setup a Postgres 10
database with existing 9.x files, as that makes Postgres complain.
Due to this, existing installs (still on 9.x) are detected
and left on Postgres 9.x.
They need to be upgraded to Postgres 10.x manually.
Switching from from avhost/docker-matrix (silviof/docker-matrix)
to matrixdotorg/synapse.
The avhost/docker-matrix (silviof/docker-matrix) image used to bundle
in the coturn STUN/TURN server, so as part of the move,
we're separating this to a separately-ran service
(matrix-coturn.service, powered by instrumentisto/coturn-docker-image)
A `.log.config` file may be generated with a different
level of indentation depending on which (Docker image, etc.)
generates it.
With this patch, we tolerate different levels of indentation
(2 spaces, 4 spaces, etc.) and don't break the configuration.
When using matrix-nginx-proxy, the file permissions are organized
in a way that matrix-nginx-proxy could read the challenge files
produced by acmetool.
However, when another own/external webserver was used (like nginx
with our generated sample configuration), this could not work.
From on we're proxying the HTTP requests to port :402 in such a case,
which fixes the problem.
The matrix-nginx-proxy was reloaded on the 3rd day of the month (`15 4 3 * *`),
which makes no sense - it's too infrequently.
It's in line with the renewal time now (+5 minutes).
The non-working script is supposed to be fixed
by https://github.com/matrix-org/synapse/pull/2375
To have it work, we'd need an updated Docker image
of `silviof/matrix-riot-docker:latest`, which is not yet available
at the time of this commit.
Still, the previous patched synapse_port_db didn't work well either,
so it's not like we're regressing much by getting rid of it.
As described here (
https://github.com/matrix-org/synapse/issues/2438#issuecomment-327424711
), using own SSL certificates for the federation port is more fragile,
as renewing them could cause federation outages.
The recommended setup is to use the self-signed certificates generated
by Synapse.
On the 443 port (matrix-nginx-proxy) side, we still use the Let's Encrypt
certificates, which ensures API consumers work without having to trust
"our own CA".
Having done this, we also don't need to ever restart Synapse anymore,
as no new SSL certificates need to be applied there.
It's just matrix-nginx-proxy that needs to be restarted, and it doesn't
even need a full restart as an "nginx reload" does the job of swithing
to the new SSL certificates.
Moving keeps everything in the /matrix directory, so that we
wouldn't contaminate anything else on the system or risk
clashing with something else.
Also retrieving certificates separately for the Riot and Matrix domains,
which should help in multiple ways:
- allows them to be very different (completely separate base domain..)
- allows for Riot to be disabled for the playbook some time later
and still have the code not break
Let's let the admin set them as they wish.
We don't care what they are anyway.
If other things run on the same server,
it's also better not to hijack these for our
own purposes, especially when we don't need to.
The timedatectl call also seems to fail on Ubuntu 17.04
for some reason (missing timezones information file?).