Reverts b1b4ba501f, 90c9801c56, a3c84f78ca, ..
I haven't really traced it (yet), but on some servers, I'm observing
`ansible-playbook ... --tags=start` completing very slowly, waiting
to stop services. I can't reproduce this on all Matrix servers I manage.
I suspect that either the systemd version is to blame or that some
specific service is not responding well to some `docker kill/rm` command.
`ExecStop` seems to work great in all cases and it's what we've been
using for a very long time, so I'm reverting to that.
We had to remove UID/GID environment variables that we used to pass
to the Synapse container, because it was causing a problem after
https://github.com/matrix-org/synapse/pull/11209
We were using both `--user` and UID/GID environment variables until now.
Fixes a regression caused by a5ee39266c.
If the user id and group id were different than 991:991
(which used to be a hardcoded default for us long ago),
there was a mismatch between what Synapse was trying to use (991:991)
and what it was actually started with (in `--user=..`). It was then
trying to change ownership, which was failing.
This was mostly affecting newer installations which were not using the
991:991 defaults we had long ago (since a1c5a197a9).
This switches the `docker exec` method of spawning
Synapse workers inside the `matrix-synapse` container with
dedicated containers for each worker.
We also have dedicated systemd services for each worker,
so this are now:
- more consistent with everything else (we don't use systemd
instantiated services anywhere)
- we don't need the "parse systemd instance name into worker name +
port" part
- we don't need to keep track of PIDs manually
- we don't need jq (less depenendencies)
- workers dying would be restarted by systemd correctly, like any other
service
- `docker ps` shows each worker separately and we can observe resource
usage