obs-portal/docs/architecture.md

197 lines
9.1 KiB
Markdown
Raw Normal View History

2021-11-28 21:51:47 +00:00
# Portal Architecture
Here I try to describe how the portal works in general, and which parts are
needed and developed in this repository. There is some variation possible for a
full setup, for example the data flow of the rendered data tiles can be
different. This article describes the standard production setup.
## General overview of the components
* **api**: A python process using sanic to provide a HTTP interface. Everything
revolves around this.
* **postgresql**: A database instance.
* **frontend**: A React based web application.
* **worker**: Optional, a dedicated process for processing of tracks
* **keycloak**: An installation of [Keycloak](https://www.keycloak.org/) which
stores user credentials and provides a secure login, registration, password
recovery, and more.
* **tools**: Scripts to run as an operator of the application for various setup
and maintenance task.
![Architecture Overview](./architecture-portal.png)
## PostgreSQL
This is a database instance running the modified postgresql docker image
`openmaptiles/postgis:6.0`. This includes the extensions `postgis` and
`hstore`, among others, used for geospatial data processing.
You can try to use an external postgresql installation instead of the docker
image, however, a lot of prequisites have to be installed into that database.
You can check out how the docker image is generated in [its
repository](https://github.com/openmaptiles/openmaptiles-tools/tree/master/docker/postgis)
and try to replicate that setup. However, this is generally not supported by
the developers of the OpenBikeSensor portal.
## API
The API is written in Python 3 with [sanic](https://sanicframework.org/) for
HTTP handling. It supports Python 3.6+ and comes with a list of dependencies
that is required. One of those is `openmaptiles-tools`, which is installed from
git (see `api/requirements.txt`). The API also depends on the `obs.face`
package, which is included as a submodule and developed [in its own
repository](https://github.com/openbikesensor/OpenBikeSensor-Scripts).
The API has the following tasks:
* Handle user authentication through keycloak
* Receive track uploads and serve track data and statistics via a RESTful API
* Process received tracks (unless using a dedicated worker, see below)
* Publish vector tiles directly from the database (if installed and configured)
### Authentication
The frontend can redirect to `$API_URL/login` to trigger a login. The API
negotiates a session and redirects the user agent to the keycloak instance.
Upon successful authentication, it receives user data and generates a user
object (or discovers the existing one) for the authenticated keycloak user.
A session is instanciated and kept alive through a session cookie. The API
currently stores session data in memory, so scaling the API process to more
replicas is not yet unsupported.
### RESTful API
There is not a lot to talk about here. The routes are pretty self explanatory,
please refer to the code for the current API. Consider it unstable as of now.
There are routes for general info (version number), track and recording
statistics (by user and time range), user management and track management.
### Track processing
If a dedicated worker is not used, the API runs the same logic as the worker
(see below), in an asyncio "background" task. It is however *not* threaded, so
it may block API request while processing tracks. This is the reason why a
dedicated worker is recommended, though for a simple or low traffic setup, it
is definitely not required. Configure whether you're using a dedicated worker
through the `DEDICATED_WORKER` api config flag.
### Publish vector tiles
Thanks to the [OpenMapTiles](https://openmaptiles.org/) project, we're able to
generate vector tiles from live data, directly in the PostGIS database. The
general workflow is as follows:
* We have defined a schema compatible with the `openmaptiles-tools` collection
that defines how to collect geospatial data from the postgresql database.
This depends on its `postgis` extension for computing geospatial information
(e.g. intersecting with a bounding box). This schema consists of a number of
layers, which contain SQL code that is used to produce the layer's geometries
and their attached properties.
* The `tools/prepare_sql_tiles.py` tool calls the respective scripts from
`openmaptiles-tools`, to compile all required SQL code into functions,
generate the "destination" function `getmvt` for generating a vector tile,
and store these [User-Defined
Functions](https://www.postgresql.org/docs/current/xfunc.html) in the
database.
* When a tile is requested from the Map Renderer through
`/tiles/{z}/{x}/{y}.pbf`, the API calls `getmvt` to have postgresql generate
the tile's content on the fly, and serves the result through HTTP.
For all of this to work, the `openmaptiles-tools` must be installed, and the
database has to prepared with the functions once, by use of the
`api/tools/prepare_sql_tiles.py` script. That script should be rerun every time the
schema changes, but doesn't need to be used if the data in the database was
edited, e.g. by uploading and processing a new track.
## Frontend
The frontend is written in React, using Semantic UI
([semantic-ui-react](https://react.semantic-ui.com/) and
[semantic-ui-less](https://www.npmjs.com/package/semantic-ui-less)), compiled
with Webpack. In a simple production setup, the frontend is compiled statically
and served by the API.
The `openbikesensor-portal` image (`Dockerfile` in repo root) performs the
build step and stores the compiled bundle and assets in
`/opt/obs/frontend/build`. The API process can simply serve the files from there.
This is done with a catchall route in `obs.api.routes.frontend`, which
determines whether to serve the `index.html` or an asset file. This ensures
that deep URLs in the frontend receive the index file, as frontend routing is
done in the JavaScript code by `react-router`.
In a development setup the frontend is served by a hot reloading development
server (`webpack-dev-server`), compiling into memory and updating as files
change. The frontend is then configured to communicate with the API on a
different URL (usually a different port on localhost), which the API has to
allow with CORS. It is configured to do so with the `FRONTEND_URL` and
`ADDITIONAL_CORS_ORIGINS` config options.
### Maps in the Frontend
The map data is visualized using
[maplibre-gl](https://github.com/MapLibre/maplibre-gl-js), a JavaScript library
for rendering (vector) maps in the browser.
The frontend combines a basemap (for example
[Positron](https://github.com/openmaptiles/positron-gl-style) with vector tiles
from Mapbox or from a custom OpenMapTiles schema vector tile source) with the
overlay data and styles. The overlay data is generated by the API
## Worker
The Worker's job is to import the uploaded track files. The track files are
stored as-is in the filesystem, and will usually follow the [OpenBikeSensor CSV
Format](https://github.com/openbikesensor/OpenBikeSensorFirmware/blob/master/docs/software/firmware/csv_format.md),
as they are generated by the measuring device.
The worker imports and uses the
[`obs.face`](https://github.com/openbikesensor/OpenBikeSensor-Scripts) scripts
to transform the data and extract the relevant events. Those are written into
the PostgreSQL database, such that it is easy to do statistics on them and
generate vector tiles with SQL code (see "Publish vector tiles" above).
The worker determines in a loop which track to process by looking for the oldes
unprocessed track in the database, ie. an entry in the `track` table with
column `processing_status` set to `"queued"`. After proessing the track, the
loop restarts after a short delay. If the worker has not found any track to
process, the delay is longer (typically 10s), to generate less load on the
database and CPU.
This means that uploading a track, within 0-10s the processing is started.
Bulk-reprocessing is possibly by just altering the `processing_status` of all
tracks you want to reprocess in the database directly, e.g. using the `psql`
command line client, for example:
```postgresql
UPDATE track SET processing_status = "queued" WHERE author_id = 100;
```
The worker script is
[`api/tools/process_track.py`](../api/tools/process_track.py). It has its own
command line parser with `--help` option, and uses the `config.py` from the API
for determining the connection to the PostgreSQL database.
## Keycloak
The use of keycloak as an authentication provider simplifies the code of the
portal immensely and lets us focus on actual features instead of authentication
and its security.
The portal might be compatible with other OpenID Connect providers, but only
the use of Keycloak is tested and documented. You can try to integrate with a
different provider -- if changes to the code are needed for this, please let us
know and/or create a Pull Request to share make the software better!
The keycloak configuration is rather straightforward, and it is described
shortly for a testing setup in [README.md](../README.md).
For the full, secure setup, make sure to reference the Keycloak documentation
at <https://www.keycloak.org/documentation>.