Docs for new pipeline
This commit is contained in:
parent
ac90d50239
commit
c4cc4a9078
42
README.md
42
README.md
|
@ -164,7 +164,7 @@ You will need to re-run this command after updates, to migrate the database and
|
||||||
(re-)create the functions in the SQL database that are used when generating
|
(re-)create the functions in the SQL database that are used when generating
|
||||||
vector tiles.
|
vector tiles.
|
||||||
|
|
||||||
You should also import OpenStreetMap data now, see below for instructions.
|
You should also [import OpenStreetMap data](docs/osm-import.md) now.
|
||||||
|
|
||||||
### Boot the application
|
### Boot the application
|
||||||
|
|
||||||
|
@ -190,46 +190,6 @@ docker-compose run --rm api alembic upgrade head
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Import OpenStreetMap data
|
|
||||||
|
|
||||||
You need to import road information from OpenStreetMap for the portal to work.
|
|
||||||
This information is stored in your PostgreSQL database and used when processing
|
|
||||||
tracks (instead of querying the Overpass API), as well as for vector tile
|
|
||||||
generation. The process applies to both development and production setups. For
|
|
||||||
development, you should choose a small area for testing, such as your local
|
|
||||||
county or city, to keep the amount of data small. For production use you have
|
|
||||||
to import the whole region you are serving.
|
|
||||||
|
|
||||||
* Install `osm2pgsql`.
|
|
||||||
* Download the area(s) you would like to import from [GeoFabrik](https://download.geofabrik.de).
|
|
||||||
* Import each file like this:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
osm2pgsql --create --hstore --style roads_import.lua -O flex \
|
|
||||||
-H localhost -d obs -U obs -W \
|
|
||||||
path/to/downloaded/myarea-latest.osm.pbf
|
|
||||||
```
|
|
||||||
|
|
||||||
You might need to adjust the host, database and username (`-H`, `-d`, `-U`) to
|
|
||||||
your setup, and also provide the correct password when queried. For the
|
|
||||||
development setup the password is `obs`. For production, you might need to
|
|
||||||
expose the containers port and/or create a TCP tunnel, for example with SSH,
|
|
||||||
such that you can run the import from your local host and write to the remote
|
|
||||||
database.
|
|
||||||
|
|
||||||
The import process should take a few seconds to minutes, depending on the area
|
|
||||||
size. A whole country might even take one or more hours. You should probably
|
|
||||||
not try to import `planet.osm.pbf`.
|
|
||||||
|
|
||||||
You can run the process multiple times, with the same or different area files,
|
|
||||||
to import or update the data. However, for this to work, the actual [command
|
|
||||||
line arguments](https://osm2pgsql.org/doc/manual.html#running-osm2pgsql) are a
|
|
||||||
bit different each time, including when first importing, and the disk space
|
|
||||||
required is much higher.
|
|
||||||
|
|
||||||
Refer to the documentation of `osm2pgsql` for assistance. We are using "flex
|
|
||||||
mode", the provided script `roads_import.lua` describes the transformations
|
|
||||||
and extractions to perform on the original data.
|
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
|
|
|
@ -36,6 +36,8 @@ services:
|
||||||
- ./tile-generator/data/:/tiles
|
- ./tile-generator/data/:/tiles
|
||||||
- ./api/migrations:/opt/obs/api/migrations
|
- ./api/migrations:/opt/obs/api/migrations
|
||||||
- ./api/alembic.ini:/opt/obs/api/alembic.ini
|
- ./api/alembic.ini:/opt/obs/api/alembic.ini
|
||||||
|
- ./local/pbf:/pbf
|
||||||
|
- ./local/obsdata:/obsdata
|
||||||
depends_on:
|
depends_on:
|
||||||
- postgres
|
- postgres
|
||||||
- keycloak
|
- keycloak
|
||||||
|
|
103
docs/osm-import.md
Normal file
103
docs/osm-import.md
Normal file
|
@ -0,0 +1,103 @@
|
||||||
|
# Importing OpenStreetMap data
|
||||||
|
|
||||||
|
The application requires a lot of data from the OpenStreetMap to work.
|
||||||
|
|
||||||
|
The required information is stored in the PostgreSQL database and used when
|
||||||
|
processing tracks, as well as for vector tile generation. The process applies
|
||||||
|
to both development and production setups. For development, you should choose a
|
||||||
|
small area for testing, such as your local county or city, to keep the amount
|
||||||
|
of data small. For production use you have to import the whole region you are
|
||||||
|
serving.
|
||||||
|
|
||||||
|
## General pipeline overview
|
||||||
|
|
||||||
|
1. Download OpenStreetMap data as one or more `.osm.pbf` files.
|
||||||
|
2. Transform this data to generate geometry data for all roads and regions, so
|
||||||
|
we don't need to look up nodes separately. This step requires a lot of CPU
|
||||||
|
and memory, so it can be done "offline" on a high power machine.
|
||||||
|
3. Import the transformed data into the PostgreSQL/PostGIS database.
|
||||||
|
|
||||||
|
## Community hosted transformed data
|
||||||
|
|
||||||
|
Since the first two steps are the same for everybody, the community will soon
|
||||||
|
provide a service where relatively up-to-date transformed data can be
|
||||||
|
downloaded for direct import. Stay tuned.
|
||||||
|
|
||||||
|
## Download data
|
||||||
|
|
||||||
|
[GeoFabrik](https://download.geofabrik.de) kindly hosts extracts of the
|
||||||
|
OpenStreetMap planet by region. Download all regions you're interested in from
|
||||||
|
there in `.osm.pbf` format, with the tool of your choice, e. g.:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
wget -P local/pbf/ https://download.geofabrik.de/europe/germany/baden-wuerttemberg-latest.osm.pbf
|
||||||
|
```
|
||||||
|
|
||||||
|
## Transform data
|
||||||
|
|
||||||
|
To transform downloaded data, you can either use the docker image from a
|
||||||
|
development or production environment, or locally install the API into your
|
||||||
|
python environment. Then run the `api/tools/transform_osm.py` script on the data.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
api/tools/transform_osm.py baden-wuerttemberg-latest.osm.pbf baden-wuerttemberg-latest.msgpack
|
||||||
|
```
|
||||||
|
|
||||||
|
In dockerized setups, make sure to mount your data somewhere in the container
|
||||||
|
and also mount a directory where the result can be written. The development
|
||||||
|
setup takes care of this, so you can use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose run --rm api tools/transform_osm.py \
|
||||||
|
/pbf/baden-wuerttemberg-latest.osm.pbf /obsdata/baden-wuerttemberg-latest.msgpack
|
||||||
|
```
|
||||||
|
|
||||||
|
Repeat this command for every file you want to transform.
|
||||||
|
|
||||||
|
## Import transformed data
|
||||||
|
|
||||||
|
The command for importing looks like this:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
api/tools/import_osm.py baden-wuerttemberg-latest.msgpack
|
||||||
|
```
|
||||||
|
|
||||||
|
This tool reads your application config from `config.py`, so set that up first
|
||||||
|
as if you were setting up your application.
|
||||||
|
|
||||||
|
In dockerized setups, make sure to mount your data somewhere in the container.
|
||||||
|
Again, the development setup takes care of this, so you can use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker-compose run --rm api tools/import_osm.py \
|
||||||
|
/obsdata/baden-wuerttemberg-latest.msgpack
|
||||||
|
```
|
||||||
|
|
||||||
|
The transform process should take a few seconds to minutes, depending on the area
|
||||||
|
size. You can run the process multiple times, with the same or different area
|
||||||
|
files, to import or update the data. You can update only one region and leave
|
||||||
|
the others as they are, or add more filenames to the command line to
|
||||||
|
bulk-import data.
|
||||||
|
|
||||||
|
## How this works
|
||||||
|
|
||||||
|
* The transformation is done with a python script that uses
|
||||||
|
[pyosmium](https://osmcode.org/pyosmium/) to read the `.osm.pbf` file. This
|
||||||
|
script then filters the data for only the required objects (such as road
|
||||||
|
segments and administrative areas), and extracts the interesting information
|
||||||
|
from those objects.
|
||||||
|
* The node geolocations are looked up to generate a geometry for each object.
|
||||||
|
This requires a lot of memory to run efficiently.
|
||||||
|
* The geometry is projected to [Web Mercator](https://epsg.io/3857) in this
|
||||||
|
step to avoid continous transformation when tiles are generated later. Most
|
||||||
|
operations will work fine in this projection. Projection is done with the
|
||||||
|
[pyproj](https://pypi.org/project/pyproj/) library.
|
||||||
|
* The output is written to a binary file in a very simple format using
|
||||||
|
[msgpack](https://github.com/msgpack/msgpack-python), which is way more
|
||||||
|
efficient that (Geo-)JSON for example. This format is stremable, so the
|
||||||
|
generated file is never fully written or read into memory.
|
||||||
|
* The import script reads the msgpack file and sends it to the database using
|
||||||
|
[psycopg](https://www.psycopg.org/). This is done because it supports
|
||||||
|
PostgreSQL's `COPY FROM` statement, which enables much faster writes to the
|
||||||
|
database that a traditionional `INSERT VALUES`. The file is streamed directly
|
||||||
|
to the database, so it is never read into memory.
|
Loading…
Reference in a new issue