30.11.2010 | Frederik Ramm
We are sometimes asked how we produce the files on our download server. Read this if you are one of those asking.
It is 22:30 in Central Europe, and the lights come on in the Geofabrik office. (The HDD LEDs, that is.) One of the servers, named bonne, begins downloading the collected works of mappers around the world from the last 24 hours
osmosis --rri --simc --write-xml-change 2010-11-16-22:30.osc
(duration 00:01:30)
and then applies them to a locally held, full copy of the OSM database (the “planet file”).
osmosis --read-xml-change 2010-11-16-22:30.osc --read-bin current-planet.osm.pbf --apply-change --write-bin new-planet.osm.pbf compress=none
(00:28:09)
The newly created planet file is transferred to another server, named hammer, where a script converts it to a simple CSV-like format (the “tbf” format) and then analyzes and converts it in several steps, creating statistics and a number of shape files. The shape files are later copied to our off-site tools server where they are used by the OSM Inspector.
But bonne, of course, doesn’t sit idle.
Back on bonne, the planet file is now split into continents (and, in the case of North America, a few large regions).
osmosis --read-pbf current-planet.osm.pbf --bp file=central-america.poly --write-pbf central-america.osm.pbf.part --bp file=asia.poly --write-pbf asia.osm.pbf.part --bp file=south-america.poly --write-pbf south-america.osm.pbf.part --bp file=us-west.poly --write-pbf north-america/us-west.osm.pbf.part --bp file=us-northeast.poly --write-pbf north-america/us-northeast.osm.pbf.part --bp file=us-south.poly --write-pbf north-america/us-south.osm.pbf.part --bp file=us-midwest.poly --write-pbf north-america/us-midwest.osm.pbf.part --bp file=us-pacific.poly --write-pbf north-america/us-pacific.osm.pbf.part --bp file=north-america/canada.poly --write-pbf north-america/canada.osm.pbf.part
(02:45:00)
It is now about 1:45, and this is where the network pipes start to warm up, as the *.pbf files just generated get uploaded in the background to our off-site download server. The Geofabrik office has an upload bandwidth of 10 MBit/s, so uploading these files will take about an hour and a half; once they are on the download server, they will automatically be rsynced to the openstreetmap.de dev server where the incoming files trigger various other jobs, like the building of Garmin maps.
With the network bandwidth maxed out, bonne will now start reading the individual continent files, and split them into smaller entities (usually countries). The time taken by this depends on the ever-expanding list of supported countries; bonne currently uses about three hours for Europe:
osmosis --read-pbf europe.osm.pbf --tee 5 --bb top=47.10269 bottom=34.04251 left=6.601696 right=44.83484 --tee 14 --bp file=montenegro.poly --write-pbf montenegro.osm.pbf --bp file=italy.poly --write-pbf italy.osm.pbf --bp file=croatia.poly --write-pbf croatia.osm.pbf --bp file=slovenia.poly --write-pbf slovenia.osm.pbf --bp file=cyprus.poly --write-pbf cyprus.osm.pbf --bp file=serbia.poly --write-pbf serbia.osm.pbf --bp file=kosovo.poly --write-pbf kosovo.osm.pbf --bp file=turkey.poly --write-pbf turkey.osm.pbf --bp file=bulgaria.poly --write-pbf bulgaria.osm.pbf --bp file=greece.poly --write-pbf greece.osm.pbf --bp file=malta.poly --write-pbf malta.osm.pbf --bp file=bosnia-herzegovina.poly --write-pbf bosnia-herzegovina.osm.pbf --bp file=albania.poly --write-pbf albania.osm.pbf --bp file=macedonia.poly --write-pbf macedonia.osm.pbf --bb top=55.14877 bottom=45.25782 left=5.57078 right=24.16622 --tee 10 --bp file=luxembourg.poly --write-pbf luxembourg.osm.pbf --bp file=austria.poly --write-pbf austria.osm.pbf --bp file=liechtenstein.poly --write-pbf liechtenstein.osm.pbf --bp file=germany.poly --write-pbf germany.osm.pbf --bp file=slovakia.poly --write-pbf slovakia.osm.pbf --bp file=poland.poly --write-pbf poland.osm.pbf --bp file=alps.poly --write-pbf alps.osm.pbf --bp file=switzerland.poly --write-pbf switzerland.osm.pbf --bp file=czech_republic.poly --write-pbf czech_republic.osm.pbf --bp file=hungary.poly --write-pbf hungary.osm.pbf --bb top=61.13664 bottom=32.03026 left=-31.57393 right=9.801 --tee 9 --bp file=andorra.poly --write-pbf andorra.osm.pbf --bp file=netherlands.poly --write-pbf netherlands.osm.pbf --bp file=portugal.poly --write-pbf portugal.osm.pbf --bp file=azores.poly --write-pbf azores.osm.pbf --bp file=france.poly --write-pbf france.osm.pbf --bp file=spain.poly --write-pbf spain.osm.pbf --bp file=british_isles.poly --write-pbf british_isles.osm.pbf --bp file=belgium.poly --write-pbf belgium.osm.pbf --bp file=monaco.poly --write-pbf monaco.osm.pbf --bb top=71.65257 bottom=54.37261 left=-25.74185 right=31.60189 --tee 8 --bp file=latvia.poly --write-pbf latvia.osm.pbf --bp file=faroe_islands.poly --write-pbf faroe_islands.osm.pbf --bp file=sweden.poly --write-pbf sweden.osm.pbf --bp file=norway.poly --write-pbf norway.osm.pbf --bp file=denmark.poly --write-pbf denmark.osm.pbf --bp file=iceland.poly --write-pbf iceland.osm.pbf --bp file=finland.poly --write-pbf finland.osm.pbf --bp file=estonia.poly --write-pbf estonia.osm.pbf --bb top=56.4479 bottom=43.58624 left=20.26895 right=40.2184 --tee 5 --bp file=romania.poly --write-pbf romania.osm.pbf --bp file=ukraine.poly --write-pbf ukraine.osm.pbf --bp file=moldova.poly --write-pbf moldova.osm.pbf --bp file=belarus.poly --write-pbf belarus.osm.pbf --bp file=lithuania.poly --write-pbf lithuania.osm.pbf
(03:15:00)
As you may have seen from the --bb
tasks, this command line splits Europe into five rectangular areas before splitting out the remaining polygons; this saves about 40 minutes of runtime because it reduces the number of point-in-polygon checks that have to be made. Of course these Osmosis command lines are generated by a script and not by hand.
The other continents require less elaborate command lines and are usually done within a few minutes. All the time, the background job will take any finished files and transfer them to the download server.
Some countries are split further; bonne will spend another hour to split France into regions, and half an hour to further subdivide Germany and England.
Until now, the process is mainly disk-bound; even though bonne has a fast disk array, there’s a lot of reading and writing to do. But when all the extracts are ready in pbf form, bonne’s 16 cores get to work. Most of the generated files are now converted to XML and compressed using the 7zip program for bzip2 compression. For many extracts the machine will also create shape files from the XML which are then zipped, and again, the results are uploaded as soon as bandwidth is available.
It is around 9:00 in the morning when the last extracts are done. After that, bonne proceeds to load the Europe dataset into a PostGIS instance, and run a number of analyses that cannot be done by the world-wide job on hammer. The machine will use the best part of the remaining day for this, uploading its results to the OSM Inspector on the tools server in the late evening – just before the whole process starts once again.
(Path names and some fine-tune parameters, most notably --buffer
, have been dropped from the Osmosis examples but details are available on request. )