It's mildly annoying when trying to execute these scripts while logged
in as a regular user, as the missing execute permissions will hinder
autocompletion even when trying to use with sudo.
These shell scripts don't contain secrets, but may fail when ran by a
regular user. The failure is due to the lack of access to the /matrix
directory, and does not result in any damage.
This passes any arguments given to 'matrix-postgres-cli' to the 'psql' command.
Examples:
$ # start an interactive shell connected to a given db
$ sudo matrix-postgres-cli -d synapse
$ # run a query, non-interactively
$ sudo matrix-postgres-cli -d synapse -c 'SELECT group_id FROM groups;'
These are just defensive cleanup tasks that we run.
In the good case, there's nothing to kill or remove, so they trigger an
error like this:
> Error response from daemon: Cannot kill container: something: No such container: something
and:
> Error: No such container: something
People often ask us if this is a problem, so instead of always having to
answer with "no, this is to be expected", we'd rather eliminate it now
and make logs cleaner.
In the event that:
- a container is really stuck and needs cleanup using kill/rm
- and cleanup fails, and we fail to report it because of error
suppression (`2>/dev/null`)
.. we'd still get an error when launching ("container name already in use .."),
so it shouldn't be too hard to investigate.
In short, this makes Synapse a 2nd class citizen,
preparing for a future where it's just one-of-many homeserver software
options.
We also no longer have a default Postgres superuser password,
which improves security.
The changelog explains more as to why this was done
and how to proceed from here.
We need to suppress systemd service-stopping requests in certain rare
cases like https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/771
That issue seems to describe a case, where a migration from mxisd to
ma1sd was happening (DB files had just been moved), and then we were
attemping to stop `matrix-ma1sd.service` so we could import that database into
Postgres. However, there's neither `matrix-mxisd.service`, nor
`matrix-ma1sd.service` after `migrate_mxisd.yml` had just run, so
stopping `matrix-ma1sd.service` was failing.
In cases where pgloader is not enough and we need to do some additional
migration work after it, we can now use
`additional_psql_statements_list` and
`additional_psql_statements_db_name`.
This is to be used when migrating `matrix-registration`'s data at the
very least.
We were running into conflicts, because having initialized
the roles (users) and databases, trying to import leads to
errors (role XXX already exists, etc.).
We were previously ignoring the Synapse database (`homeserver`)
when upgrading/importing, because that one gets created by default
whenever the container starts.
For our additional databases, it's a similar situation now.
It's not created by default as soon as Postgres starts with an empty
database, but rather we create it as part of running the playbook.
So we either need to skip those role/database creation statements
while upgrading/importing, or to avoid creating the additional database
and rely on the import for that. I've gone for the former, because
it's already similar to what we were doing and it's simpler
(it lets `setup_postgres.yml` be the same in all scenarios).
Instead of passing the connection string, we can now pass a name of a
variable, which contains a connection string.
Both are supported for having extra flexibility.
Since we'll likely have generic SQLite database importing
via [pgloader](https://pgloader.io/) for migrating bridge
databases from SQLite to Postgres, we'd rather avoid
calling the "import Synapse SQLite database" command
as just `--tags=import-sqlite-db`.
Similarly, for the media store, we'd like to mention that it's
related to Synapse as well.
We'd like to be more explicit, so as to be less confusing,
especially in light of other homeserver implementations
coming in the future.
While these modules are really nice and helpful, we can't use them
for at least 2 reasons:
- for us, Postgres runs in a container on a private Docker network
(`--network=matrix`) without usually being exposed to the host.
These modules execute on the host so they won't be able to reach it.
- these modules require `psycopg2`, so we need to install it before
using it. This might or might not be its own can of worms.
The tasks in `create_additional_databases.yml` will likely
ensure `matrix-postgres.service` is started, etc.
If no additional databases are defined, we'd rather not execute that
file and all these tasks that it may do in the future.
> Invalid data passed to 'loop', it requires a list, got this instead: matrix_postgres_additional_databases. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup.
Well, or working around it, as I've done in this commit (which seems
more sane than `wantlist=True` stuff).
To avoid needing to have `jq` installed on the machine, we could:
- try to run jq in a Docker container using some small image providing
that
- better yet, avoid `jq` altogether
Moving it above the "uninstalling" set of tasks is better.
Extracting it out to another file at the same time, for readability,
especially given that it will probably have to become more complex in
the future (potentially installing `jq`, etc.)
If a service is enabled, a database for it is created in postgres with a uniqque password. The service can then use this database for data storage instead of relying on sqlite.
The Docker 19.04 -> 20.10 upgrade contains the following change
in `/usr/lib/systemd/system/docker.service`:
```
-BindsTo=containerd.service
-After=network-online.target firewalld.service containerd.service
+After=network-online.target firewalld.service containerd.service multi-user.target
-Requires=docker.socket
+Requires=docker.socket containerd.service
Wants=network-online.target
```
The `multi-user.target` requirement in `After` seems to be in conflict
with our `WantedBy=multi-user.target` and `After=docker.service` /
`Requires=docker.service` definitions, causing the following error on
startup for all of our systemd services:
> Job matrix-synapse.service/start deleted to break ordering cycle starting with multi-user.target/start
A workaround which appears to work is to add `DefaultDependencies=no`
to all of our services.
`-v` magically creates the source destination as a directory,
if it doesn't exist already. We'd like to avoid this magic
and the potential breakage that it might cause.
We'd rather fail while Docker tries to find things to `--mount`
than have it automatically create directories and fail anyway,
while having contaminated the filesystem.
There's a lot more `-v` instances remaining to be fixed later on.
This is just some start.
Things like `matrix_synapse_container_additional_volumes` and
`matrix_nginx_proxy_container_additional_volumes` were not changed to
use `--mount`, as options for each one are passed differently
(`ro` is `ro`, but `rw` doesn't exist and `slave` is `bind-propagation=slave`).
To avoid breaking people's custom volume mounts, we keep it as it is for now.
A deficiency with `--mount` is that it lacks the `z` option (SELinux
ownership changes), and some of our `-v` instances use that. I'm not
sure how supported SELinux is for us right now, but it might be,
and breaking that would not be a good idea.
If the SQLite database was from an older version of Synapse, it appears
that Synapse would try to run migrations on it first, before importing.
This was failing, because the file wasn't writable.
Hopefully, this fixes the problem.
We recently had a report of the Postgres backup container's log file
growing the size of /var/lib/docker until it ran out of disk space.
Trying to prevent similar problems in the future.
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
the current version fails the import, because the volume for the media is missing. It still fails if you have the optional shared secret password provider is enabled, so that might need another mount. Commenting out the password provider in the hoimeserver.yaml during the run works as well.