{"id":397,"date":"2017-01-23T01:01:09","date_gmt":"2017-01-23T01:01:09","guid":{"rendered":"http:\/\/blog.geofabrik.de\/?p=397"},"modified":"2017-01-23T01:01:09","modified_gmt":"2017-01-23T01:01:09","slug":"improving-geofabrik-osm-extracts","status":"publish","type":"post","link":"https:\/\/blog.geofabrik.de\/index.php\/2017\/01\/23\/improving-geofabrik-osm-extracts\/","title":{"rendered":"Improving Geofabrik OSM Extracts"},"content":{"rendered":"<p>We&#8217;ve just rolled out a small enhancement to the OSM extracts available on our <a href=\"http:\/\/download.geofabrik.de\/\">download server,<\/a> concerning the way we deal with objects that cross an extract boundary. In the last couple of years we&#8217;ve used a program called <i>osm-history-splitter<\/i> to create the extracts. This program is based on an older version of the Osmium library and is able to ensure referential integrity on the way\/node level only.<\/p>\n<div id=\"attachment_398\" style=\"width: 310px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/blog.geofabrik.de\/wp-content\/uploads\/2017\/01\/referential_integrity-1.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-398\" src=\"https:\/\/blog.geofabrik.de\/wp-content\/uploads\/2017\/01\/referential_integrity-1-300x280.png\" alt=\"Referential Integrity when making extracts\" width=\"300\" height=\"280\" class=\"size-medium wp-image-398\" \/><\/a><p id=\"caption-attachment-398\" class=\"wp-caption-text\">Referential Integrity when making extracts. Cases <i>a<\/i> and <i>b<\/i> covered until now, case <i>c<\/i> additionally covered from now on.<\/p><\/div>\n<p>The ways shown as <i>a<\/i> and <i>b<\/i> in the sketch above would have been fully contained in the extract cut our along the dotted boundary. A polygon formed by a relation <i>c<\/i> in which at least one way lies fully outside the boundary, however, would not be constructable from the extract because those ways would be missing.<\/p>\n<p>We&#8217;ve now switched to the <a href=\"https:\/\/github.com\/osmcode\/osmium-tool\/releases\/tag\/v1.5.0\">newest version of osmium,<\/a> the command line companion to the libosmium library, for producing the extracts and deriving the change files. This allows us to offer referential integrity for boundary-crossing multipolygon relations, while other (non-multipolygon) relations are still handled the same as before. This is important because otherwise a large boundary or route relation crossing a small extract would blow up the size of that extract too much.<\/p>\n<p>With the new complete multipolygon relations (called the &#8220;smart&#8221; strategy in osmium), the extracts we offer have seen a size increase of just 0.5% on average. Some very small extracts with lots of border-crossing multipolygons have become much larger &#8211; the Andorra extract, for example, is now three times as big as before. But we believe it is worth it! If you process nightly updates you&#8217;ll likely see a small spike today, with today&#8217;s update bringing in all those extra ways needed to complete polygons.<\/p>\n<p>Using the new software has also brought down the overall processing time from around 10 hours to under 4 hours, a 60% speedup. Kudos to Jochen Topf and Mapbox for their tireless improvements to Osmium! This is a nice proof that even in times of the ever-scalable cloud, solid engineering still has its place.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve just rolled out a small enhancement to the OSM extracts available on our download server, concerning the way we deal with objects that cross an extract boundary. In the last couple of years we&#8217;ve used a program called osm-history-splitter to create the extracts. This program is based on an older version of the Osmium [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/posts\/397"}],"collection":[{"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/comments?post=397"}],"version-history":[{"count":0,"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/posts\/397\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/media?parent=397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/categories?post=397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.geofabrik.de\/index.php\/wp-json\/wp\/v2\/tags?post=397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}