mt-polygon-simplification/thesis/chapters/02.03-Dataformats.tex

58 lines
6.6 KiB
TeX

\subsection{Geodata formats on the web}
\label{ch:dataformats}
Here the data formats that are used through this theses will be explained.
\paragraph{The JavaScript Object Notation (JSON) Data Interchange Format} was derived from the ECMAScript Programming Language Standard \parencite{bray2014javascript}. It is a text format for the serialization of structured data. As a text format it is well suited for the data exchange between server and client. Also it can easily be consumed by JavaScript. These characteristics are ideal for web based applications. It does however only support a limited number of data types. Four primitive ones (string, number, boolean and null) and two structured ones (objects and array). Objects are an unordered collection of name-value pairs, while arrays are simply ordered lists of values. JSON was meant as a replacement for XML as it provides a more human readable format. Through nesting, complex data structures can be created.
\paragraph{The GeoJSON Format} is a geospatial data interchange format \parencite{butler2016geojson}. As the name suggests it is based on JSON and deals with data representing geographic features. There are several geometry types defined to be compatible with the types in the OpenGIS Simple Features Implementation Specification for SQL \parencite{open1999opengis}. These are Point, MultiPoint, LineString, MultiLineString, Polygon, Multipolygon and the heterogeneous GeometryCollection. Listing \ref{lst:geojson-example} shows a simple example of a GeoJSON object with one point feature. A more complete example can be viewed in the file \path{thesis/code/example-7946.geojson}.
\lstinputlisting[
float=!htb,
language=javascript,
caption=An example for a GeoJSON object,
label=lst:geojson-example
]{./code/example-simple.geojson}
The feature types differ in the format of their coordinates property. A position is an array of at least two elements representing longitude and latitude. An optional third element can be added to specify altitude. While the coordinates member of a Point-feature is simply a single position, a LineString-feature describes its geometry through an Array of at least two positions. More interesting is the specification for Polygons. It introduces the concept of the linear ring as a closed LineString with at least four positions where the first and last positions are equivalent. The Polygon's coordinates member is an array of linear rings with the first one representing the exterior ring and all others interior rings, also named surface and holes respectively. At last the coordinates member of MultiLineStrings and MultiPolygons is defined as a single array of its singular feature type.
GeoJSON is mainly used for web-based mapping. Since it is based on JSON it inherits its strengths. There is for one the enhanced readability through reduced markup overhead compared to XML-based data types like GML. Interoperability with web applications comes for free since the parsing of JSON-objects is integrated in JavaScript. Unlike the Esri Shapefile\footnote{\url{https://doc.arcgis.com/en/arcgis-online/reference/shapefiles.htm}} format a single file is sufficient to store and transmit all relevant data, including feature properties.
To its downsides count that a text based format cannot store the geometries as efficiently as it would be possible with a binary format. Also only vector-based data types can be represented. Another disadvantage can be the strictly non-topologic approach. Every feature is completely described by one entry. However, when there are features that share common components, like boundaries in neighboring polygons, these data points will be encoded twice in the GeoJSON object. This further poses concerns about data size. Also it is more difficult to execute topological analysis on the data set. Luckily there is a related data structure to tackle this problem.
\paragraph{TopoJSON} is an extension of GeoJSON and aims to encode datastructures into a shared topology \parencite{bostock2017topojson}. It supports the same geometry types as GeoJSON. It differs in some additional properties to use and new object types like "Topology" and "GeometryCollection". Its main feature is that LineStrings, Polygons and their multiplicitary equivalents must define line segments in a common property called "arcs". The geometries themselves then reference the arcs from which they are made up. This reduces redundancy of data points. Another feature is the quantization of positions. To use it, one can define a "transform" object which specifies a scale and translate point to encode all coordinates. Together with delta-encoding of position arrays one obtains integer values better suited for efficient serialization and reduced file size.
Other than the reduced data duplication topological formats have the benefit of topological analysis and editing. When modifying adjacent Polygons for example by simplification one would prefer TopoJSON over GeoJSON. Figure \ref{fig:topological-editing} shows what this means. When modifying the boundary of one polygon, one can create gaps or overlaps in non-topological representations. With a topological data structure however the topology will be preserved. \parencite{theobald2001understanding}
\begin{figure}
\centering
\includegraphics[width=.3\linewidth]{./images/topological-editing.png}
\caption{Topological editing (top) vs. Non-topological editing (bottom) \parencite{theobald2001understanding}}
\label{fig:topological-editing}
\end{figure}
\paragraph{Coordinate representation} Both GeoJSON and TopoJSON represent positions as an array of numbers. The elements depict longitude, latitude and optionally altitude in that order. For simplicity, this thesis will deal with two-dimensional positions only. A polyline is described by creating an array of these positions as seen in listing \ref{lst:coordinates-array}.
\begin{lstlisting}[
float=htb,
label=lst:coordinates-array,
caption=Polyline coordinates in nested-array form
]
[[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]]
\end{lstlisting}
There is however one library in this thesis which expects coordinates in a different format. Listing \ref{lst:coordinates-object} shows a polyline in the sense of this library. Here one location is represented by an object with x and y properties.
\begin{lstlisting}[
float=htb,
label=lst:coordinates-object,
caption=Polyline in array-of-objects form
]
[{x: 102.0, y: 0.0}, {x: 103.0, y: 1.0}, {x: 104.0, y: 0.0}, {x: 105.0, y: 1.0}]
\end{lstlisting}
To distinguish these formats in future references the first first format will be called nested-array format, while the latter will be called array-of-objects format.