This commit is contained in:
Paul Bienkowski 2021-11-28 22:51:47 +01:00
parent 4f382819fd
commit 8c6579b9bf
4 changed files with 274 additions and 0 deletions

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 234 KiB

196
docs/architecture.md Normal file
View file

@ -0,0 +1,196 @@
# Portal Architecture
Here I try to describe how the portal works in general, and which parts are
needed and developed in this repository. There is some variation possible for a
full setup, for example the data flow of the rendered data tiles can be
different. This article describes the standard production setup.
## General overview of the components
* **api**: A python process using sanic to provide a HTTP interface. Everything
revolves around this.
* **postgresql**: A database instance.
* **frontend**: A React based web application.
* **worker**: Optional, a dedicated process for processing of tracks
* **keycloak**: An installation of [Keycloak](https://www.keycloak.org/) which
stores user credentials and provides a secure login, registration, password
recovery, and more.
* **tools**: Scripts to run as an operator of the application for various setup
and maintenance task.
![Architecture Overview](./architecture-portal.png)
## PostgreSQL
This is a database instance running the modified postgresql docker image
`openmaptiles/postgis:6.0`. This includes the extensions `postgis` and
`hstore`, among others, used for geospatial data processing.
You can try to use an external postgresql installation instead of the docker
image, however, a lot of prequisites have to be installed into that database.
You can check out how the docker image is generated in [its
repository](https://github.com/openmaptiles/openmaptiles-tools/tree/master/docker/postgis)
and try to replicate that setup. However, this is generally not supported by
the developers of the OpenBikeSensor portal.
## API
The API is written in Python 3 with [sanic](https://sanicframework.org/) for
HTTP handling. It supports Python 3.6+ and comes with a list of dependencies
that is required. One of those is `openmaptiles-tools`, which is installed from
git (see `api/requirements.txt`). The API also depends on the `obs.face`
package, which is included as a submodule and developed [in its own
repository](https://github.com/openbikesensor/OpenBikeSensor-Scripts).
The API has the following tasks:
* Handle user authentication through keycloak
* Receive track uploads and serve track data and statistics via a RESTful API
* Process received tracks (unless using a dedicated worker, see below)
* Publish vector tiles directly from the database (if installed and configured)
### Authentication
The frontend can redirect to `$API_URL/login` to trigger a login. The API
negotiates a session and redirects the user agent to the keycloak instance.
Upon successful authentication, it receives user data and generates a user
object (or discovers the existing one) for the authenticated keycloak user.
A session is instanciated and kept alive through a session cookie. The API
currently stores session data in memory, so scaling the API process to more
replicas is not yet unsupported.
### RESTful API
There is not a lot to talk about here. The routes are pretty self explanatory,
please refer to the code for the current API. Consider it unstable as of now.
There are routes for general info (version number), track and recording
statistics (by user and time range), user management and track management.
### Track processing
If a dedicated worker is not used, the API runs the same logic as the worker
(see below), in an asyncio "background" task. It is however *not* threaded, so
it may block API request while processing tracks. This is the reason why a
dedicated worker is recommended, though for a simple or low traffic setup, it
is definitely not required. Configure whether you're using a dedicated worker
through the `DEDICATED_WORKER` api config flag.
### Publish vector tiles
Thanks to the [OpenMapTiles](https://openmaptiles.org/) project, we're able to
generate vector tiles from live data, directly in the PostGIS database. The
general workflow is as follows:
* We have defined a schema compatible with the `openmaptiles-tools` collection
that defines how to collect geospatial data from the postgresql database.
This depends on its `postgis` extension for computing geospatial information
(e.g. intersecting with a bounding box). This schema consists of a number of
layers, which contain SQL code that is used to produce the layer's geometries
and their attached properties.
* The `tools/prepare_sql_tiles.py` tool calls the respective scripts from
`openmaptiles-tools`, to compile all required SQL code into functions,
generate the "destination" function `getmvt` for generating a vector tile,
and store these [User-Defined
Functions](https://www.postgresql.org/docs/current/xfunc.html) in the
database.
* When a tile is requested from the Map Renderer through
`/tiles/{z}/{x}/{y}.pbf`, the API calls `getmvt` to have postgresql generate
the tile's content on the fly, and serves the result through HTTP.
For all of this to work, the `openmaptiles-tools` must be installed, and the
database has to prepared with the functions once, by use of the
`api/tools/prepare_sql_tiles.py` script. That script should be rerun every time the
schema changes, but doesn't need to be used if the data in the database was
edited, e.g. by uploading and processing a new track.
## Frontend
The frontend is written in React, using Semantic UI
([semantic-ui-react](https://react.semantic-ui.com/) and
[semantic-ui-less](https://www.npmjs.com/package/semantic-ui-less)), compiled
with Webpack. In a simple production setup, the frontend is compiled statically
and served by the API.
The `openbikesensor-portal` image (`Dockerfile` in repo root) performs the
build step and stores the compiled bundle and assets in
`/opt/obs/frontend/build`. The API process can simply serve the files from there.
This is done with a catchall route in `obs.api.routes.frontend`, which
determines whether to serve the `index.html` or an asset file. This ensures
that deep URLs in the frontend receive the index file, as frontend routing is
done in the JavaScript code by `react-router`.
In a development setup the frontend is served by a hot reloading development
server (`webpack-dev-server`), compiling into memory and updating as files
change. The frontend is then configured to communicate with the API on a
different URL (usually a different port on localhost), which the API has to
allow with CORS. It is configured to do so with the `FRONTEND_URL` and
`ADDITIONAL_CORS_ORIGINS` config options.
### Maps in the Frontend
The map data is visualized using
[maplibre-gl](https://github.com/MapLibre/maplibre-gl-js), a JavaScript library
for rendering (vector) maps in the browser.
The frontend combines a basemap (for example
[Positron](https://github.com/openmaptiles/positron-gl-style) with vector tiles
from Mapbox or from a custom OpenMapTiles schema vector tile source) with the
overlay data and styles. The overlay data is generated by the API
## Worker
The Worker's job is to import the uploaded track files. The track files are
stored as-is in the filesystem, and will usually follow the [OpenBikeSensor CSV
Format](https://github.com/openbikesensor/OpenBikeSensorFirmware/blob/master/docs/software/firmware/csv_format.md),
as they are generated by the measuring device.
The worker imports and uses the
[`obs.face`](https://github.com/openbikesensor/OpenBikeSensor-Scripts) scripts
to transform the data and extract the relevant events. Those are written into
the PostgreSQL database, such that it is easy to do statistics on them and
generate vector tiles with SQL code (see "Publish vector tiles" above).
The worker determines in a loop which track to process by looking for the oldes
unprocessed track in the database, ie. an entry in the `track` table with
column `processing_status` set to `"queued"`. After proessing the track, the
loop restarts after a short delay. If the worker has not found any track to
process, the delay is longer (typically 10s), to generate less load on the
database and CPU.
This means that uploading a track, within 0-10s the processing is started.
Bulk-reprocessing is possibly by just altering the `processing_status` of all
tracks you want to reprocess in the database directly, e.g. using the `psql`
command line client, for example:
```postgresql
UPDATE track SET processing_status = "queued" WHERE author_id = 100;
```
The worker script is
[`api/tools/process_track.py`](../api/tools/process_track.py). It has its own
command line parser with `--help` option, and uses the `config.py` from the API
for determining the connection to the PostgreSQL database.
## Keycloak
The use of keycloak as an authentication provider simplifies the code of the
portal immensely and lets us focus on actual features instead of authentication
and its security.
The portal might be compatible with other OpenID Connect providers, but only
the use of Keycloak is tested and documented. You can try to integrate with a
different provider -- if changes to the code are needed for this, please let us
know and/or create a Pull Request to share make the software better!
The keycloak configuration is rather straightforward, and it is described
shortly for a testing setup in [README.md](../README.md).
For the full, secure setup, make sure to reference the Keycloak documentation
at <https://www.keycloak.org/documentation>.

77
docs/licenses.md Normal file
View file

@ -0,0 +1,77 @@
# Licenses
The world of software licenses is a confusing one, and you should be aware of a
few licenses that apply to you if you use this software and its dependencies,
or the data generated with it.
**Disclaimer:** This document is just an overview of things to look out for,
and no legal advice. It may even be incorrect or incomplete. Do your own
research or consult a professional.
#### Glossary
* OSM: [OpenStreetMap](https://openstreetmap.org)
* OBS: [OpenBikeSensor](https://openbikesensor.org)
* OMT: [OpenMapTiles](https://openmaptiles.org)
## OpenBikeSensor Portal
Our own software (the python packages `obs.face` and `obs.api`, the react
frontend, the accompanying documentation, etc.) are licensed as LGPL-3.0. You
will find the respective license files in the repositories.
## OpenStreetMap data
The moment you start importing OpenStreetMap (OSM) data into your system, its
license applies to everything you derive from it. This usually means you'll
have to visibly credit the OpenStreetMap contributors. Details can be found
here: <https://www.openstreetmap.org/copyright>.
## OpenMapTiles project
The OpenMapTiles (OMT) project has a confusing multi-license setup, with different
licenses covering the code, the schema, and the derived or generated data.
In any case, we're using all of that, so make sure to credit the project as
required by their license when using the data in a different context. The
frontend is set up to show the appropriate attributions, you should do the same
when building a different application with the same data.
## Vector tiles from the API
These are generated using OSM data *and* OMT tools. Both have to be credited.
Thanks to OSM being distributed under [ODbL
1.0](https://opendatacommons.org/licenses/odbl/), you will have to distribute
this combined data under the same license.
## OBS-only data exports
As of the writing of this document, there is no implemented export of OBS data
(not including OSM data) from the API. We will at some point generate datasets
of OBS data for use with other tools or to merge into a pool database, and
these datasets will not contain OSM-licensed data, so they will have their own
license. The portal software will not dictate the license, so as a portal
operator, you will have to decide this for your platform, and obtain consent
from your users to publish their uploaded data or derivatives thereof under
your chosen license.
The OBS community will at some point decide on an open license for the data
generated by the community-run portal instance(s) and pool servers, and will
recommend the same license to all portal operators for interoperability
reasons.
## Your basemap provider
You have to figure out who you have to credit for the basemap you are using,
regardless of whether you use a provider or host them yourself. In many cases,
you will need to credit at least OpenStreetMap (for the input data) and
OpenMapTiles (for the data scheme and tools used). If using a commercial
provider (such as Mapbox or MapTiler), read their TOS.
## Code dependencies
Please check the licenses of our code dependencies to figure out whether they
restrict what you can do with them. See `frontend/package.json` and
`api/requirements.txt` for which dependencies are required for installation and
operation.