We always aspire to provide a wide range of integration options at Terramonitor, and this is why we offer geospatial data download in Shapefile, GeoJSON and Geopackage formats. Your choice of file format is likely based on the tools you have and the formats you are used to. But what if you were given the option to choose freely? Is there one file format that is technically superior to the others? Let's test!
For testing, we took the vector data for the VESA Index (read more) in North Karelia, Finland, and encoded it to each of the three file formats. The data set contains a total of 637545 polygons with three properties: the actual VESA Index, development class of the forest stand, and shrub probability. The polygon geometries are fairly simple, containing 10-20 points in general.
Shapefile is the most widely known format for distributing geospatial data. It is a standard first developed by ESRI almost 30 years ago, which is considered ancient in software development.
The Shapefile in fact consists of several files: in addition to one file with the actual geometry data, another file for defining the coordinate reference system is needed, as well as a file for defining the attributes and a file to index the geometries. This makes operating Shapefiles slightly clunky and confusing. However, Shapefile has been around for so long that any GIS software supports handling it.
Internally, Shapefile uses Well-known binary (WKB) for encoding the geometries. This is a compact format which is based on tabular thinking, i.e. the row and column number of a value is significant. A minor nuisance is the limitation of the attribute field names to 10 characters and poor Unicode support, so some abbreviations and forcing to ASCII may have to be used.
Outputting the VESA Index data resulted in 5 files and a total of 139 megabytes of data, and 29 megabytes compressed.
Being a subset of the immensely popular JSON, the parsing support is on a different level than with Shapefile. In addition to support from most GIS software, any web developer will be able to write a custom GeoJSON parser, opening new possibilites for integrating the data.
Being designed as one blob of data instead of a small "text database" means it is simpler to handle but is essentially designed to be loaded to memory in full at once. Using QGIS 3.8, opening and handling the data in GeoJSON format was many times slower compared to Shapefile, while memory usage was in similar scale.
Outputting the VESA Index data resulted in 314 megabytes of data, and 26 megabytes compressed.
GeoPackage was first developed by Open Geospatial Consortium (OGC) 5 years ago, making it the official alternative for Shapefile. It is a subset of SQLite, which in turn is a lighweight SQL implementation designed for stand-alone databases. Similar to GeoJSON, this makes GeoPackage highly compatible by design, and accessible by non-GIS software as well.
The minor nuisances of Shapefile, such as non-standardized format for CRS and legacy limitations on attribute fields, have been fixed in GeoPackage.
Internally, GeoPackage uses Well-known binary (WKB) for storing the geometries, the same as Shapefile. Interestingly, GeoPackage also provides support for storing raster data. QGIS handling performance with the training data set was on par with Shapefile.
Outputting the VESA Index data resulted in 181 megabytes of data, and 40 megabytes compressed.
Here is a summary of the results:
|Compatibility||GIS||GIS, any text editor||GIS, SQL|
|Use case||Old standard||Web, small data sets||New standard|
This quick comparison has hopefully demystified the differences between the three file formats for storing vector data. If you are interested in seeing the results for your own area of interest, log in to Terramonitor and see for yourself!