23.01.2017 | Frederik Ramm
We’ve just rolled out a small enhancement to the OSM extracts available on our download server, concerning the way we deal with objects that cross an extract boundary. In the last couple of years we’ve used a program called osm-history-splitter to create the extracts. This program is based on an older version of the Osmium library and is able to ensure referential integrity on the way/node level only.
The ways shown as a and b in the sketch above would have been fully contained in the extract cut our along the dotted boundary. A polygon formed by a relation c in which at least one way lies fully outside the boundary, however, would not be constructable from the extract because those ways would be missing.
We’ve now switched to the newest version of osmium, the command line companion to the libosmium library, for producing the extracts and deriving the change files. This allows us to offer referential integrity for boundary-crossing multipolygon relations, while other (non-multipolygon) relations are still handled the same as before. This is important because otherwise a large boundary or route relation crossing a small extract would blow up the size of that extract too much.
With the new complete multipolygon relations (called the “smart” strategy in osmium), the extracts we offer have seen a size increase of just 0.5% on average. Some very small extracts with lots of border-crossing multipolygons have become much larger – the Andorra extract, for example, is now three times as big as before. But we believe it is worth it! If you process nightly updates you’ll likely see a small spike today, with today’s update bringing in all those extra ways needed to complete polygons.
Using the new software has also brought down the overall processing time from around 10 hours to under 4 hours, a 60% speedup. Kudos to Jochen Topf and Mapbox for their tireless improvements to Osmium! This is a nice proof that even in times of the ever-scalable cloud, solid engineering still has its place.