obs-portal/docs/osm-import.md

4.4 KiB

Importing OpenStreetMap data

The application requires a lot of data from the OpenStreetMap to work.

The required information is stored in the PostgreSQL database and used when processing tracks, as well as for vector tile generation. The process applies to both development and production setups. For development, you should choose a small area for testing, such as your local county or city, to keep the amount of data small. For production use you have to import the whole region you are serving.

General pipeline overview

  1. Download OpenStreetMap data as one or more .osm.pbf files.
  2. Transform this data to generate geometry data for all roads and regions, so we don't need to look up nodes separately. This step requires a lot of CPU and memory, so it can be done "offline" on a high power machine.
  3. Import the transformed data into the PostgreSQL/PostGIS database.

Community hosted transformed data

Since the first two steps are the same for everybody, the community will soon provide a service where relatively up-to-date transformed data can be downloaded for direct import. Stay tuned.

Download data

GeoFabrik kindly hosts extracts of the OpenStreetMap planet by region. Download all regions you're interested in from there in .osm.pbf format, with the tool of your choice, e. g.:

wget -P local/pbf/ https://download.geofabrik.de/europe/germany/baden-wuerttemberg-latest.osm.pbf 

Transform data

To transform downloaded data, you can either use the docker image from a development or production environment, or locally install the API into your python environment. Then run the api/tools/transform_osm.py script on the data.

api/tools/transform_osm.py baden-wuerttemberg-latest.osm.pbf baden-wuerttemberg-latest.msgpack

In dockerized setups, make sure to mount your data somewhere in the container and also mount a directory where the result can be written. The development setup takes care of this, so you can use:

docker-compose run --rm api tools/transform_osm.py \
  /pbf/baden-wuerttemberg-latest.osm.pbf /obsdata/baden-wuerttemberg-latest.msgpack

Repeat this command for every file you want to transform.

Import transformed data

The command for importing looks like this:

api/tools/import_osm.py baden-wuerttemberg-latest.msgpack

This tool reads your application config from config.py, so set that up first as if you were setting up your application.

In dockerized setups, make sure to mount your data somewhere in the container. Again, the development setup takes care of this, so you can use:

docker-compose run --rm api tools/import_osm.py \
  /obsdata/baden-wuerttemberg-latest.msgpack

The transform process should take a few seconds to minutes, depending on the area size. You can run the process multiple times, with the same or different area files, to import or update the data. You can update only one region and leave the others as they are, or add more filenames to the command line to bulk-import data.

How this works

  • The transformation is done with a python script that uses pyosmium to read the .osm.pbf file. This script then filters the data for only the required objects (such as road segments and administrative areas), and extracts the interesting information from those objects.
  • The node geolocations are looked up to generate a geometry for each object. This requires a lot of memory to run efficiently.
  • The geometry is projected to Web Mercator in this step to avoid continous transformation when tiles are generated later. Most operations will work fine in this projection. Projection is done with the pyproj library.
  • The output is written to a binary file in a very simple format using msgpack, which is way more efficient that (Geo-)JSON for example. This format is stremable, so the generated file is never fully written or read into memory.
  • The import script reads the msgpack file and sends it to the database using psycopg. This is done because it supports PostgreSQL's COPY FROM statement, which enables much faster writes to the database that a traditionional INSERT VALUES. The file is streamed directly to the database, so it is never read into memory.