mt-polygon-simplification/thesis/A-integration.tex

\section[Practical application]{Compiling an existing C++ library for use on the web}

\todo[inline]{maybe remove whole chapter :'(}

In this chapter I will explain how an existing C++ library was utilized compare different simplification algorithms in a web browser. The library is named \textsl{psimpl} and was written in 2011 from Elmar de Koning. It implements various Algorithms used for polyline simplification. This library will be compiled to WebAssembly using the Emscripten compiler. Furthermore a Web-Application will be created for interactively exploring the Algorithms. The main case of application is simplifying polygons, but also polylines will be supported. The data format used to read in the data will be GeoJSON. To maintain topological correctness a intermediate conversion to TopoJSON will be applied if requested.


\paragraph{Integrating an existing C++ library}

An existing implementation of several simplification algorithms has been found in the C++ ecosystem. \textsf{psimpl} implements 8 algorithms distributed as a single header file. It also provides a function for measuring positional errors making it ideal for use in a quality analysis tool for those algorithms.


\subsection{State of the art: psimpl}\
\label{ch:psimpl}

\textsl{psimpl} is a generic C++ library for various polyline simplification algorithms. It consists of a single header file \texttt{psimpl.h}. The algorithms implemented are \textsl{Nth point}, \textsl{distance between points}, \textsl{perpendicular distance}, \textsl{Reumann-Witkam}, \textsl{Opheim}, \textsl{Lang}, \textsl{Douglas-Peucker} and \textsl{Douglas-Peucker variation}. It has to be noted, that the \textsl{Douglas-Peucker} implementation uses the \textsl{distance between points} routine, also named the \textsl{radial distance} routine, as preprocessing step just like Simplify.js (Section \ref{sec:simplify.js}). All these algorithms have a similar templated interface. The goal now is to prepare the library for a compiler.

\todo[inline]{Describe the error statistics function of psimpl}

\subsection{Compiling to WebAssembly}

As in the previous chapter the compiler created by the Emscripten project will be used. This time the code is not directly meant to be consumed by a web application. It is a generic library. There are no entry points defined that Emscripten can export in WebAssembly. So the entry points will be defined in a new package named psimpl-js. It will contain a C++ file that uses the library, the compiled code and the JavaScript files needed for consumption in a JavaScript project. \textsl{psimpl} makes heavy use of C++ template functions which cannot be handled by JavaScript. So there will be entry points written for each exported algorithm. These entry points are the point of intersection between JavaScript and the library. Listing \ref{lst:psimpl-js-entrypoint} shows one example. They all follow the same procedure. First the pointer given by JavaScript is interpreted as a double-pointer in line 2. This is the beginning of the coordinates array. \textsl{psimpl} expects the first and last point of an iterator so the pointer to the last point is calculated (line 3). The appropriate function template from psimpl is instantiated and called with the other given parameters (line 5). The result is stored in an intermediate vector.

\lstinputlisting[
float=htb,
language=c++,
firstline=56, lastline=62,
caption=One entrypoint to the C++ code,
label=lst:psimpl-js-entrypoint
]{../lib/psimpl-js/psimpl.cpp}

Since this is C++ the the capabilities of Emscripten's Embind can be utilized. Embind is realized in the libraries \texttt{bind.h}\footnote{\path{https://emscripten.org/docs/api_reference/bind.h.html#bind-h}} and \texttt{val.h}\footnote{\path{https://emscripten.org/docs/api_reference/val.h.html#val-h}}. \texttt{val.h} is used for transliterating JavaScript to C++. In this case it is used for the type conversion of C++ Vectors to JavaScript's Typed Arrays as seen at the end of listing \ref{lst:psimpl-js-entrypoint}. On the other hand \texttt{bind.h} is used for for binding C++ functions, classes, or enumerations to from JavaScript callable names. Aside from providing a better developer experience this also prevents name mangling in cases where functions are overloaded. Instead of listing the exported functions in the compiler command or annotating it  with \texttt{EMSCRIPTEN\_KEEPALIVE} the developer gives a pointer to the object to bind. Listing \ref{lst:psimpl-js-bindings} shows each entry point bound to a readable name and at last the registered vector datatype. The parameter \texttt{my\_module} is merely for marking a group of related bindings to avoid name conflicts in bigger projects.

\lstinputlisting[
float=htb,
language=c++,
firstline=72, lastline=82,
caption=Emscripten bindings,
label=lst:psimpl-js-bindings
]{../lib/psimpl-js/psimpl.cpp}

\todo[inline]{Compiler call (--bind)}

The library code on JavaScript side is similar to the one in chapter \ref{sec:benchmark-webassembly}. This time a function is exported per routine.

\todo[inline]{More about javascript glue code with listing callSimplification.}


\subsection{The implementation}

The implementation is just as in the last chapter a web page and thus JavaScript is used for the interaction. The source code is bundled with Webpack. React is the UI Component library and babel is used to transform JSX to JavaScript. MobX\footnote{\path{https://mobx.js.org/}} is introduced as a state management library. It applies functional reactive programming by giving the utility to declare observable variables and triggering the update of derived state and other observers intelligently. To do that MobX observes the usage of observable variables so that only dependent observers react on updates. In contrast to other state libraries MobX does not require the state to be serializable. Many existing data structures can be observed like objects, arrays and class instances. It also does not constrain the state to a single centralized store like Redux\footnote{\path{https://redux.js.org/}} does. The final state diagram can be seen in listing \ref{fig:integration-state}. It represents the application state in an object model. Since this has drawbacks in showing the information flow the observable variables are marked in red, and computed ones in blue.

\begin{figure}[htb]
	\centering
	\fbox{\includegraphics[width=.8\linewidth]{images/integration-state.jpg}}
	\caption{The state model of the application}
	\label{fig:integration-state}
\end{figure}

On the bottom the three main state objects can be seen. They are implemented as singletons as they represent global application state. Each of them will now be explained.

\paragraph{MapState} holds state relevant for the map display. An array of TileLayers defines all possible background layers to choose from. The selected one is stored in \texttt{selectedTileLayerId}. The other two variables toggle the display of the vector layers to show.

\paragraph{AlgorithmState} stores all the information about the simplification algorithms to choose from. The class \texttt{Algorithm} acts as a generalization interface. Each algorithm defines which fields are used to interact with its parameters. These fields hold their current value, so the algorithm can compute its parameters array at any time. The fields also define additional restrictions in their \texttt{props} attribute like the number range from which to choose from. An integer field for example, like the n value in the \textsl{Nth point} algorithm, would instantiate a range field with a step value of one. The \texttt{ToleranceRange} however, which is modeled as its own subclass due to its frequent usage, allows for smaller steps to represent decimal numbers.

\paragraph{FeatureState} encapsulates the state of the vector features. Each layer is represented in text form and object format of the GeoJSON standard. The text form is needed as a serializable form for detecting whether the map display needs to update on an action. As the original features come from file or the server, the text representation is the source of truth and the object format derives from it. The simplified features are asynchronously calculated. This process is outsourced to a debounced reaction that updates the state upon finish.


\subsection{The user interface}

After explaining the state model the User Interface (UI) shall be explained. The interface is implemented in components which are modeled in a shallow hierarchy. They represent and update the application state. In listing \ref{fig:integration-ui} the resulting web page is shown. The labeled regions correspond to the components. Their behavior will be explained in the following.

\todo{Insert final picture.}
\todo{Red boxes around regions}
\todo{Make ui fit description}
\begin{figure}[htb]
	\centering
	\fbox{\includegraphics[width=\linewidth]{images/integration-ui.jpg}}
	\caption{The user interface for the algorithm comparison. (not final)}
	\label{fig:integration-ui}
\end{figure}

\paragraph{Leaflet Map}

The big region on the left marks the Leaflet map. Its main use is the visualization of Features. The layers to show are one background tile layer, the original and the simplified features. Original marks the user specified input features for simplification. These are marked in blue with a thin border. The simplified features are laid on top in a red styling. Aside from the default control for zooming on the top left the map contains a info widget showing the length of the currently specified tolerance on the top right.

\paragraph{Background Layers Control}

The first component in the Options panel is a simple radio button group for choosing the background layer of the map or none at all. They are provided by the OpenStreetMap (OSM) foundation\footnote{\path{https://wiki.osmfoundation.org/wiki/Main_Page}}. By experience the layer "OpenStreetMap DE" provides better loading times in Germany. "OpenStreetMap Mapnik" is considered the standard OSM tile layer\footnote{\path{https://wiki.openstreetmap.org/wiki/Featured_tile_layers}}.

\paragraph{Data Selection}

Here the input layer can be specified. Either by choosing one of the prepared data sets or by selecting a locally stored GeoJSON file. The prepared data will be loaded from the server upon selection by an Ajax call. Ajax stands for asynchronous JavaScript and XML and describes the method of dispatching an HTTP request from the web application without reloading the page. This way not all of the data has to be loaded on initial page load. On the other hand the user can select a file with an HTML input or via drag \& drop.  For the latter the external package "file-drop-element" is used\footnote{\path{https://github.com/GoogleChromeLabs/file-drop#readme}}. It is a custom element based on the rather recent Custom Elements specification\footnote{\path{https://w3c.github.io/webcomponents/spec/custom/}}. It allows the creation of new HTML elements. In this case it is an element called "file-drop" that encapsulates the drag \& drop logic and provides a simple interface using attributes and events. Listing \ref{lst:compare-algorithms-file-drop} shows the use of the element. The mime type is restricted by the \texttt{accept} attribute to GeoJSON files.

\begin{lstlisting}[
language=html,
caption=The file-drop element in use,
label=lst:compare-algorithms-file-drop
]
<file-drop accept="application/geo+json">Drop area</file-drop>
\end{lstlisting}

\paragraph{Layer Control}

This element serves the purpose of toggling the display of the vector layers. The original and the simplified features can be independently displayed or be hidden. If features have been loaded, the filename will be shown here.

\paragraph{Simplification Control}

The last element in this section is the control for the simplification parameters. At first the user can choose if a conversion to TopoJSON should be performed before simplification. Then the algorithm itself can be selected. The parameters change to fit the requirements of the algorithm. The update of one of the parameters trigger live changes in the application state so the user can get direct feedback how the changes affect the geometries.