Docs for new pipeline
This commit is contained in:
parent
ac90d50239
commit
c4cc4a9078
42
README.md
42
README.md
|
@ -164,7 +164,7 @@ You will need to re-run this command after updates, to migrate the database and
|
|||
(re-)create the functions in the SQL database that are used when generating
|
||||
vector tiles.
|
||||
|
||||
You should also import OpenStreetMap data now, see below for instructions.
|
||||
You should also [import OpenStreetMap data](docs/osm-import.md) now.
|
||||
|
||||
### Boot the application
|
||||
|
||||
|
@ -190,46 +190,6 @@ docker-compose run --rm api alembic upgrade head
|
|||
```
|
||||
|
||||
|
||||
## Import OpenStreetMap data
|
||||
|
||||
You need to import road information from OpenStreetMap for the portal to work.
|
||||
This information is stored in your PostgreSQL database and used when processing
|
||||
tracks (instead of querying the Overpass API), as well as for vector tile
|
||||
generation. The process applies to both development and production setups. For
|
||||
development, you should choose a small area for testing, such as your local
|
||||
county or city, to keep the amount of data small. For production use you have
|
||||
to import the whole region you are serving.
|
||||
|
||||
* Install `osm2pgsql`.
|
||||
* Download the area(s) you would like to import from [GeoFabrik](https://download.geofabrik.de).
|
||||
* Import each file like this:
|
||||
|
||||
```bash
|
||||
osm2pgsql --create --hstore --style roads_import.lua -O flex \
|
||||
-H localhost -d obs -U obs -W \
|
||||
path/to/downloaded/myarea-latest.osm.pbf
|
||||
```
|
||||
|
||||
You might need to adjust the host, database and username (`-H`, `-d`, `-U`) to
|
||||
your setup, and also provide the correct password when queried. For the
|
||||
development setup the password is `obs`. For production, you might need to
|
||||
expose the containers port and/or create a TCP tunnel, for example with SSH,
|
||||
such that you can run the import from your local host and write to the remote
|
||||
database.
|
||||
|
||||
The import process should take a few seconds to minutes, depending on the area
|
||||
size. A whole country might even take one or more hours. You should probably
|
||||
not try to import `planet.osm.pbf`.
|
||||
|
||||
You can run the process multiple times, with the same or different area files,
|
||||
to import or update the data. However, for this to work, the actual [command
|
||||
line arguments](https://osm2pgsql.org/doc/manual.html#running-osm2pgsql) are a
|
||||
bit different each time, including when first importing, and the disk space
|
||||
required is much higher.
|
||||
|
||||
Refer to the documentation of `osm2pgsql` for assistance. We are using "flex
|
||||
mode", the provided script `roads_import.lua` describes the transformations
|
||||
and extractions to perform on the original data.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
|
@ -36,6 +36,8 @@ services:
|
|||
- ./tile-generator/data/:/tiles
|
||||
- ./api/migrations:/opt/obs/api/migrations
|
||||
- ./api/alembic.ini:/opt/obs/api/alembic.ini
|
||||
- ./local/pbf:/pbf
|
||||
- ./local/obsdata:/obsdata
|
||||
depends_on:
|
||||
- postgres
|
||||
- keycloak
|
||||
|
|
103
docs/osm-import.md
Normal file
103
docs/osm-import.md
Normal file
|
@ -0,0 +1,103 @@
|
|||
# Importing OpenStreetMap data
|
||||
|
||||
The application requires a lot of data from the OpenStreetMap to work.
|
||||
|
||||
The required information is stored in the PostgreSQL database and used when
|
||||
processing tracks, as well as for vector tile generation. The process applies
|
||||
to both development and production setups. For development, you should choose a
|
||||
small area for testing, such as your local county or city, to keep the amount
|
||||
of data small. For production use you have to import the whole region you are
|
||||
serving.
|
||||
|
||||
## General pipeline overview
|
||||
|
||||
1. Download OpenStreetMap data as one or more `.osm.pbf` files.
|
||||
2. Transform this data to generate geometry data for all roads and regions, so
|
||||
we don't need to look up nodes separately. This step requires a lot of CPU
|
||||
and memory, so it can be done "offline" on a high power machine.
|
||||
3. Import the transformed data into the PostgreSQL/PostGIS database.
|
||||
|
||||
## Community hosted transformed data
|
||||
|
||||
Since the first two steps are the same for everybody, the community will soon
|
||||
provide a service where relatively up-to-date transformed data can be
|
||||
downloaded for direct import. Stay tuned.
|
||||
|
||||
## Download data
|
||||
|
||||
[GeoFabrik](https://download.geofabrik.de) kindly hosts extracts of the
|
||||
OpenStreetMap planet by region. Download all regions you're interested in from
|
||||
there in `.osm.pbf` format, with the tool of your choice, e. g.:
|
||||
|
||||
```bash
|
||||
wget -P local/pbf/ https://download.geofabrik.de/europe/germany/baden-wuerttemberg-latest.osm.pbf
|
||||
```
|
||||
|
||||
## Transform data
|
||||
|
||||
To transform downloaded data, you can either use the docker image from a
|
||||
development or production environment, or locally install the API into your
|
||||
python environment. Then run the `api/tools/transform_osm.py` script on the data.
|
||||
|
||||
```bash
|
||||
api/tools/transform_osm.py baden-wuerttemberg-latest.osm.pbf baden-wuerttemberg-latest.msgpack
|
||||
```
|
||||
|
||||
In dockerized setups, make sure to mount your data somewhere in the container
|
||||
and also mount a directory where the result can be written. The development
|
||||
setup takes care of this, so you can use:
|
||||
|
||||
```bash
|
||||
docker-compose run --rm api tools/transform_osm.py \
|
||||
/pbf/baden-wuerttemberg-latest.osm.pbf /obsdata/baden-wuerttemberg-latest.msgpack
|
||||
```
|
||||
|
||||
Repeat this command for every file you want to transform.
|
||||
|
||||
## Import transformed data
|
||||
|
||||
The command for importing looks like this:
|
||||
|
||||
```bash
|
||||
api/tools/import_osm.py baden-wuerttemberg-latest.msgpack
|
||||
```
|
||||
|
||||
This tool reads your application config from `config.py`, so set that up first
|
||||
as if you were setting up your application.
|
||||
|
||||
In dockerized setups, make sure to mount your data somewhere in the container.
|
||||
Again, the development setup takes care of this, so you can use:
|
||||
|
||||
```bash
|
||||
docker-compose run --rm api tools/import_osm.py \
|
||||
/obsdata/baden-wuerttemberg-latest.msgpack
|
||||
```
|
||||
|
||||
The transform process should take a few seconds to minutes, depending on the area
|
||||
size. You can run the process multiple times, with the same or different area
|
||||
files, to import or update the data. You can update only one region and leave
|
||||
the others as they are, or add more filenames to the command line to
|
||||
bulk-import data.
|
||||
|
||||
## How this works
|
||||
|
||||
* The transformation is done with a python script that uses
|
||||
[pyosmium](https://osmcode.org/pyosmium/) to read the `.osm.pbf` file. This
|
||||
script then filters the data for only the required objects (such as road
|
||||
segments and administrative areas), and extracts the interesting information
|
||||
from those objects.
|
||||
* The node geolocations are looked up to generate a geometry for each object.
|
||||
This requires a lot of memory to run efficiently.
|
||||
* The geometry is projected to [Web Mercator](https://epsg.io/3857) in this
|
||||
step to avoid continous transformation when tiles are generated later. Most
|
||||
operations will work fine in this projection. Projection is done with the
|
||||
[pyproj](https://pypi.org/project/pyproj/) library.
|
||||
* The output is written to a binary file in a very simple format using
|
||||
[msgpack](https://github.com/msgpack/msgpack-python), which is way more
|
||||
efficient that (Geo-)JSON for example. This format is stremable, so the
|
||||
generated file is never fully written or read into memory.
|
||||
* The import script reads the msgpack file and sends it to the database using
|
||||
[psycopg](https://www.psycopg.org/). This is done because it supports
|
||||
PostgreSQL's `COPY FROM` statement, which enables much faster writes to the
|
||||
database that a traditionional `INSERT VALUES`. The file is streamed directly
|
||||
to the database, so it is never read into memory.
|
Loading…
Reference in a new issue