This switches the `docker exec` method of spawning
Synapse workers inside the `matrix-synapse` container with
dedicated containers for each worker.
We also have dedicated systemd services for each worker,
so this are now:
- more consistent with everything else (we don't use systemd
instantiated services anywhere)
- we don't need the "parse systemd instance name into worker name +
port" part
- we don't need to keep track of PIDs manually
- we don't need jq (less depenendencies)
- workers dying would be restarted by systemd correctly, like any other
service
- `docker ps` shows each worker separately and we can observe resource
usage
We do this by creating one more layer of indirection.
First we reach some generic vhost handling matrix.DOMAIN.
A bunch of override rules are added there (capturing traffic to send to
ma1sd, etc). nginx-status and similar generic things also live there.
We then proxy to the homeserver on some other vhost (only Synapse being
available right now, but repointing this to Dendrite or other will be
possible in the future).
Then that homeserver-specific vhost does its thing to proxy to the
homeserver. It may or may not use workers, etc.
Without matrix-corporal, the flow is now:
1. matrix.DOMAIN (matrix-nginx-proxy/matrix-domain.conf)
2. matrix-nginx-proxy/matrix-synapse.conf
3. matrix-synapse
With matrix-corporal enabled, it becomes:
1. matrix.DOMAIN (matrix-nginx-proxy/matrix-domain.conf)
2. matrix-corporal
3. matrix-nginx-proxy/matrix-synapse.conf
4. matrix-synapse
(matrix-corporal gets injected at step 2).
This removes some `multi-target.wants` symlinks as well, etc.
But despite systemd saying:
> Removed symlink /etc/systemd/system/matrix-synapse.service.wants/matrix-synapse-worker@appservice:0.service
.. I still see such symlinks tehre for me for some reason, so keeping the
code (below) to find & delete them still seems like a good idea.
There was a `matrix_nginx_proxy_enabled|default(False)` check, but:
- it didn't seem to work reliably for some reason (hmm)
- referring to a `matrix_nginx_proxy_*` variable from within the
`matrix-synapse` role is not ideal
- exposing always happened on `127.0.0.1`, which may not be good enough
for some rarer setups (where the own webserver is external to the host)
I guess it didn't hurt to do it until now, but it's not great serving
federation APIs on the client-server API port, etc.
matrix-corporal doesn't work yet (still something to be solved in the
future), but its firewalling operations will also be sabotaged
by Client-Server APIs being served on the federation port (it's a way to get around its firewalling).
If we load it at runtime, during matrix-synapse role execution,
it's good enough for matrix-synapse and all roles after that,
but.. it breaks when someone uses `--tags=setup-nginx-proxy` alone.
The downside of including this vars file like this in `setup.yml`
is that the variables contained in it cannot be overriden by the user
(in their inventory's `vars.yml`).
... but it's not like overriding these variables was possible anyway
when including them at runtime.
Some people run Coturn or Jitsi, etc., by themselves and disable it
in the playbook.
Because the playbook is trying to be nice and clean up after itself,
it was deleting these Docker images.
However, people wish to pull and use them separately and would rather
they don't get deleted.
We could make this configurable for the sake of this special case, but
it's simpler to just avoid deleting these images.
It's not like this "cleaning things up" thing works anyway.
As time goes on, the playbook gets updated with newer image tags
and we leave so many images behind. If one doesn't run
`docker system prune -a` manually once in a while, they'd get swamped
with images anyway. Whether we leave a few images behind due to the lack
of this cleanup now is pretty much irrelevant.
We log everything in systemd/journald for every service already,
so there's no need for double-logging, bridges rotating log files
manually and other such nonsense.
In short, this makes Synapse a 2nd class citizen,
preparing for a future where it's just one-of-many homeserver software
options.
We also no longer have a default Postgres superuser password,
which improves security.
The changelog explains more as to why this was done
and how to proceed from here.
I had intentionally held it back in 39ea3496a4
until:
- it received more testing (there were a few bugs during the
migration, but now it seems OK)
- this migration guide was written
While administering we will occasionally invoke this script interactively with the "non-interactive" switch still there, yet still sit at the desk waiting for 300 seconds for this timer to run out.
The systemd-timer already uses a 3h randomized delay for automatic renewals, which serves this purpose well.
The `mobile` branch got merged to `master`, which ends up becoming
`:latest`. It's a "rewrite" of the bridge's backend and only
supports a Postgres database.
We'd like to go back (well, forward) to `:latest`, but that will take
a little longer, because:
- we need to handle and document things for people still on SQLite
(especially those with external Postgres, who are likely on SQLite for
bridges)
- I'd rather test the new builds (and migration) a bit before
releasing it to others and possibly breaking their bridge
Brave ones who are already using the bridge with Postgres
can jump on `:latest` and report their experience.
Fixes https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/756
Related to https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/737
I feel like timers are somewhat more complicated and dirty (compared to
cronjobs), but they come with these benefits:
- log output goes to journald
- on newer systemd distros, you can see when the timer fired, when it
will fire, etc.
- we don't need to rely on cron (reducing our dependencies to just
systemd + Docker)
Cronjobs work well, but it's one more dependency that needs to be
installed. We were even asking people to install it manually
(in `docs/prerequisites.md`), which could have gone unnoticed.
Once in a while someone says "my SSL certificates didn't renew"
and it's likely because they forgot to install a cron daemon.
Switching to systemd timers means that installation is simpler
and more unified.
This reverts commit 2a25b63bb6.
Looking at other roles, we trigger building regardless of this.
It's better to always trigger it, because it's less fragile.
If the build fails and we only trigger it on "git changes"
then we won't trigger it for a while. That's not good.
Triggering it each and every time may seem like a waste,
but it supposedly runs quickly due to Docker caching.
This variable has been useless since 2019-01-08.
We probably don't need to check for its usage anymore,
given how much time has passed since then, but ..
Before we potentially clone to that path, we'd better make sure it exists.
We also simplify `when` statements a bit.
Given that we're in `setup_install.yml`, we know that the bridge is enabled,
so there's no need to check for that.
Not sure if it breaks with them or not, but no other directive
uses quotes and the nginx docs show examples without quotes,
so we're being consistent with all of that.
The different configurations are now all lower case, for consistent
naming.
`matrix_nginx_proxy_ssl_config` is now called
`matrix_nginx_proxy_ssl_preset`. The different options for "modern",
"intermediate" and "old" are stored in the main.yml file, instead of
being hardcoded in the configuration files. This will improve the
maintainability of the code.
The "custom" preset was removed. Now if one of the variables is set, it
will use it instead of the preset. This will allow to mix and match more
easily, for example using all the intermediate options but only
supporting TLSv1.2. This will also provide better backward
compatibility.
While it's kind of nice having it, it's also somewhat raw
and unnecessary.
Having a good default and not even mentioning it seems better
for most users.
People who need a more exposed bridge (rare) can use
override the default configuration using
`matrix_mautrix_signal_configuration_extension_yaml`.
The answer to these is: it's good to have them in both places.
The role defines the obvious things it depends on (not knowing
what setup it will find itself into), and then
`group_vars/matrix_servers` "extends" it based on everything else it
knows (the homeserver being Synapse, whether or not the internal
Postgres server is being used, etc.)
We need to suppress systemd service-stopping requests in certain rare
cases like https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/771
That issue seems to describe a case, where a migration from mxisd to
ma1sd was happening (DB files had just been moved), and then we were
attemping to stop `matrix-ma1sd.service` so we could import that database into
Postgres. However, there's neither `matrix-mxisd.service`, nor
`matrix-ma1sd.service` after `migrate_mxisd.yml` had just run, so
stopping `matrix-ma1sd.service` was failing.
Fixes a problem like this:
> File "/usr/lib/python3.8/site-packages/mautrix/bridge/e2ee.py", line 79, in __init__
> raise RuntimeError("Unsupported database scheme")
mautrix-python's e2ee.py module expects to find `postgres://` instead of
`postgresql://`.
Our old (base-path -> data-path) SQLite migration can't work otherwise.
It's probably not necessary to keep it anymore, but since we still do,
at least we should take care to ensure it works.
Raspbian doesn't seem to support arm64, so this is somewhat pointless
right now.
However, they might in the future. Doing this should also unify us
some more with `setup_debian.yml` with the ultimate goal of
eliminating `setup_raspbian.yml`.
Until now, we've only supported non-amd64 on Raspbian.
Seems like there are now people running Debian/Ubuntu on ARM,
so we were forcing them into amd64 Docker packages.
I've gotten a report that this change fixes support
for Ubuntu Server 20.04 on RPi 4B.
A new variable called `matrix_nginx_proxy_ssl_config` is created for
configuring how the nginx proxy configures SSL. Also a new configuration
validation option and other auxiliary variables are created.
A new variable configuration called `matrix_nginx_proxy_ssl_config` is
created. This allow to set the SSL configuration easily using the
default options proposed by Mozilla. The default configuration is set to
"Intermediate", removing the weak ciphers used in the old
configurations.
The new variable can also be set to "Custom" for a more granular control.
This allows to set another three variables called:
- `matrix_nginx_proxy_ssl_protocols`,
- `matrix_nginx_proxy_ssl_prefer_server_ciphers`
- `matrix_nginx_proxy_ssl_ciphers`
Also a new task is added to validate the SSL configuration variable.
Revert "Correct inabillity for appservice-discord to connect"
This reverts commit 673e19f830.
While certain things do work even with such a local URL, sending
messages leads to an error like this:
> [DiscordBot] verbose: DiscordAPIError: Invalid Form Body
> avatar_url: Not a well formed URL.
Fixes https://github.com/Half-Shot/matrix-appservice-discord/issues/649
The sample configuration file for appservice-discord
c29cfc72f5/config/config.sample.yaml (L8)
explicitly says that we need a public URL.
Now that 0.7.2 is out, the Docker image supports Postgres
and we can do the (SQLite -> Postgres) migration.
I've also found out that we needed to fix up the `tokens.ex_date` column
data type a bit to prevent matrix-registration from raising exceptions
when comparing `datetime.now()` with `ex_date` coming from the database.
Example:
> File "/usr/local/lib/python3.8/site-packages/matrix_registration/tokens.py", line 58, in valid
> expired = self.ex_date < datetime.now()
> TypeError: can't compare offset-naive and offset-aware datetimes
In cases where pgloader is not enough and we need to do some additional
migration work after it, we can now use
`additional_psql_statements_list` and
`additional_psql_statements_db_name`.
This is to be used when migrating `matrix-registration`'s data at the
very least.
This switches us to a container image maintained by the
matrix-registration developer.
0.7.2 also supports a `base_url` configuration option we can use to
make it easier to reverse-proxy at a different base URL.
We still keep some workarounds, because of this issue:
https://github.com/ZerataX/matrix-registration/issues/47
We were running into conflicts, because having initialized
the roles (users) and databases, trying to import leads to
errors (role XXX already exists, etc.).
We were previously ignoring the Synapse database (`homeserver`)
when upgrading/importing, because that one gets created by default
whenever the container starts.
For our additional databases, it's a similar situation now.
It's not created by default as soon as Postgres starts with an empty
database, but rather we create it as part of running the playbook.
So we either need to skip those role/database creation statements
while upgrading/importing, or to avoid creating the additional database
and rely on the import for that. I've gone for the former, because
it's already similar to what we were doing and it's simpler
(it lets `setup_postgres.yml` be the same in all scenarios).
Auto-migration and everything seems to work. It's just that
matrix-registration cannot load the Python modules required
for talking to a Postgres database.
Tracked here: https://github.com/ZerataX/matrix-registration/issues/44
Until this gets fixed, we'll continue default to 'sqlite'.
I was thinking that it makes sense to be more specific,
and using `_postgres_` also separated these variables
from the `_database_` variables that ended up in bridge configuration.
However, @jdreichmann makes a good point
(https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/740#discussion_r542281102)
that we don't need to be so specific and can allow for other engines (like MySQL) to use these variables.
Regression since 2d99ade72f and 9bf8ce878e, respectively.
When SQLite is to be used, these bridges expect an `sqlite://`
connection string, and not a plain file name (path), like Appservice
Discord and mautrix-whatsapp do.
Instead of passing the connection string, we can now pass a name of a
variable, which contains a connection string.
Both are supported for having extra flexibility.
Since we'll likely have generic SQLite database importing
via [pgloader](https://pgloader.io/) for migrating bridge
databases from SQLite to Postgres, we'd rather avoid
calling the "import Synapse SQLite database" command
as just `--tags=import-sqlite-db`.
Similarly, for the media store, we'd like to mention that it's
related to Synapse as well.
We'd like to be more explicit, so as to be less confusing,
especially in light of other homeserver implementations
coming in the future.
People can toggle between them now. The playbook also defaults
to using SQLite if an external Postgres server is used.
Ideally, we'd be able to create databases/users in external Postgres
servers as well, but our initialization logic (and `docker run` command,
etc.) hardcode too many things right now.
While these modules are really nice and helpful, we can't use them
for at least 2 reasons:
- for us, Postgres runs in a container on a private Docker network
(`--network=matrix`) without usually being exposed to the host.
These modules execute on the host so they won't be able to reach it.
- these modules require `psycopg2`, so we need to install it before
using it. This might or might not be its own can of worms.
The tasks in `create_additional_databases.yml` will likely
ensure `matrix-postgres.service` is started, etc.
If no additional databases are defined, we'd rather not execute that
file and all these tasks that it may do in the future.
> Invalid data passed to 'loop', it requires a list, got this instead: matrix_postgres_additional_databases. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup.
Well, or working around it, as I've done in this commit (which seems
more sane than `wantlist=True` stuff).
To avoid needing to have `jq` installed on the machine, we could:
- try to run jq in a Docker container using some small image providing
that
- better yet, avoid `jq` altogether
Moving it above the "uninstalling" set of tasks is better.
Extracting it out to another file at the same time, for readability,
especially given that it will probably have to become more complex in
the future (potentially installing `jq`, etc.)
v0.7.0 is broken right now, because it calls
`/_matrix/client/r0/admin/register`, which is now at
`/_synapse/admin/v1/register`.
This has been fixed here: 6b26255fea
.. but it's not part of any release.
Switching to `master` (`docker.io/devture/zeratax-matrix-registration:latest`) until it gets resolved.
Reported upstream here: https://github.com/ZerataX/matrix-registration/issues/43
Starting with Docker 20.10, `--hostname` seems to have the side-effect
of making Docker's internal DNS server resolve said hostname to the IP
address of the container.
Because we were giving the mailer service a hostname of `matrix.DOMAIN`,
all requests destined for `matrix.DOMAIN` originating from other
services on the container network were resolving to `matrix-mailer`.
This is obviously wrong.
Initially reported here: https://github.com/spantaleev/matrix-docker-ansible-deploy/pull/748
We normally try to not use the public hostname (and IP address) on the
container network and try to make services talk to one another locally,
but it sometimes could happen.
With this, we use a `matrix-mailer` hostname for the matrix-mailer
container. My testing shows that it doesn't cause any trouble with
email deliverability.
If a service is enabled, a database for it is created in postgres with a uniqque password. The service can then use this database for data storage instead of relying on sqlite.
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
After recently updating my matrix-docker-ansible-deploy installation, matrix-appservice-discord would refuse to start, logging ECONNREFUSED to https://matrix.[mydomain]:443, which was resolving to 172.18.0.2 due to the `--hostname` in mailer grabbing that hostname.
Curious why the IRC bridge didn't have this issue, I looked into it, and it was connecting to `http://matrix-synapse:8008`. Correcting this one to that URL resolved the issue.