mt-polygon-simplification/thesis/chapters/02.02-Dataformats.tex
2019-07-28 12:03:15 +02:00

53 lines
5.3 KiB
TeX

\subsection{Geodata formats on the Web}
Here the data formats that are used through this theses will be explained.
\paragraph{The JavaScript Object Notation (JSON) Data Interchange Format} was derived from the ECMAScript Programming Language Standard\footnote{\path{https://tools.ietf.org/html/rfc8259}}. It is a text format for the serialization of structured data. As a text format is suites well for the data exchange between server and client. Also it can easily be consumed by JavaScript. These characteristics are ideal for web based applications. It does however only support a limited number of data types. Four primitive ones (string, number, boolean and null) and two structured ones (objects and array). Objects are an unordered collection of name-value pairs, while arrays are simply ordered lists of values. JSON was meant as a replacement for XML as it provides a more human readable format. Through nesting complex data structures can be created.
\paragraph{The GeoJSON Format} is a geospatial data interchange format\footnote{\path{https://tools.ietf.org/html/rfc7946}}. As the name suggests it is based on JSON and deals with data representing geographic features. There are several geometry types defined to be compatible with the types in the OpenGIS Simple Features Implementation Specification for SQL\footnote{\path{https://portal.opengeospatial.org/files/?artifact_id=829}}. These are Point, MultiPoint, LineString, MultiLineString, Polygon, Multipolygon and the heterogeneous GeometryCollection. Listing \ref{lst:geojson-example} shows a simple example of a GeoJSON object with one point feature. A more complete example can be viewed in the file \path{./data/example-7946.geojson}.
\lstinputlisting[
float=!htb,
language=javascript,
caption=An example for a GeoJSON object,
label=lst:geojson-example
]{../data/example-simple.geojson}
\todo[inline]{Explain the nested array coordinates of polygons}
GeoJSON is mainly used for web-based mapping. Since it is based on JSON it inherits its strength. There is for one the enhanced readability through reduced markup overhead compared to XML-based data types like GML. Interoperability with web applications comes for free since the parsing of JSON-objects is integrated in JavaScript. Unlike the Esri Shapefile format a single file is sufficient to store and transmit all relevant data, including feature properties.
To its downsides count that a text based cannot store the geometries as efficiently as it would be possible with a binary format. Also only vector-based data types can be represented. Another disadvantage can be the strictly non-topologic approach. Every feature is completely described by one entry. However when there are features that share common components, like boundaries in neighboring polygons, these data points will be encoded twice in the GeoJSON object. On the one hand this further poses concerns about data size. On the other hand it is more difficult to execute topological analysis on the data set. Luckily there is a related data structure to tackle this problem.
\todo[inline]{Extract more info about topology: https://www.esri.com/news/arcuser/0401/topo.html}
\paragraph{TopoJSON} is an extension of GeoJSON and aims to encode datastructures into a shared topology\footnote{\path{https://github.com/topojson/topojson-specification}}. It supports the same geometry types as GeoJSON. It differs in some additional properties to use and new object types like "Topology" and "GeometryCollection". Its main feature is that LineStrings, Polygons and their \todo{find better word}multiplicary equivalents must define line segments in a common property called "arcs". The geometries themselves then reference the arcs from with they are made up. This reduces redundancy of data points. Another feature is the quantization of positions. To use it one can define a "transform" object which specifies a scale and translate point to encode all coordinates. Together with delta-encoding of position arrays one obtains integer values better suited for efficient serialization and reduced file size.
\todo[inline]{Explain why topology-preserving shape simplification is important}
\paragraph{Coordinate representation} Both GeoJSON and TopoJSON represent positions as an array of numbers. The elements depict longitude, latitude and optionally altitude in that order. For simplicity this thesis will deal with two-dimensional positions only. A polyline is described by creating an array of these positions as seen in listing \ref{lst:coordinates-array}.
\begin{lstlisting}[
float=htb,
label=lst:coordinates-array,
caption=Polyline coordinates in nested-array form
]
[[102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]]
\end{lstlisting}
There will be however one library in this thesis which expects coordinates in a different format. Listing \ref{lst:coordinates-object} shows a polyline in the sense of this library. Here one location is represented by an object with x and y properties.
\begin{lstlisting}[
float=htb,
label=lst:coordinates-object,
caption=Polyline in array-of-objects form
]
[{x: 102.0, y: 0.0}, {x: 103.0, y: 1.0}, {x: 104.0, y: 0.0}, {x: 105.0, y: 1.0}]
\end{lstlisting}
To distinguish these formats in future references the first first format will be called nested-array format, while the latter will be called array-of-objects format.