mt-polygon-simplification/thesis/chapters/03.00-methodology.tex

\section[Methodology]{Implementation of a performance benchmark}

% Performance benchmark

In this chapter the approach to improve the performance of a simplification algorithm in a web browser via WebAssembly will be explained. The go-to library for this kind of operation is Simplify.js. It is the JavaScript implementation of the Douglas-Peucker algorithm with optional radial distance preprocessing. The library will be rebuilt in the C programming language and compiled to WebAssembly with Emscripten. A web page is built to produce benchmarking insights to compare the two approaches performance wise.

\subsection{State of the art: Simplify.js}
\label{ch:simplify.js}
% Simplify.JS + turf

Simplify.js calls itself a "tiny high-performance JavaScript polyline simplification library"\footnote{\url{https://mourner.giformthub.io/simplify-js/}}. It was extracted from Leaflet, the "leading open-source JavaScript library for mobile-friendly interactive maps"\footnote{\url{https://leafletjs.com/}}. Due to its usage in leaflet and Turf.js, a geospatial analysis library, it is the most common used library for polyline simplification. The library itself currently has 20,066 weekly downloads on the npm platform while the Turf.js derivate @turf/simplify has 30,389. Turf.js maintains an unmodified fork of the library in its own repository. The mentioned mapping library Leaflet is downloaded 189,228 times a week.

The Douglas-Peucker algorithm is implemented with an optional radial distance preprocessing routine. This preprocessing trades performance for quality. Thus the mode for disabling this routine is called highest quality.

Interestingly the library expects coordinates to be a list of objects with x and y properties. GeoJSON and TopoJSON however store coordinates in nested array form (see chapter \ref{ch:dataformats}). Luckily since the library is small and written in JavaScript any skilled web developer can easily fork and modify the code for his own purpose. This is even pointed out in the library's source code. The fact that Turf.js, which can be seen as a convenience wrapper for processing GeoJSON data, decided to keep the library as is might indicate some benefit to this format. Listing \ref{lst:turf-transformation} shows how Turf.js calls Simplify.js. Instead of altering the source code the data is transformed back and forth between the formats on each call. It is questionable if this practice is advisable at all.

\lstinputlisting[
	float=htbp,
	language=javascript,
	firstline=116, lastline=122,
	caption=Turf.js usage of simplify.js,
	label=lst:turf-transformation
]{../lib/turf-simplify/index.js}

Since it is not clear which case is faster, and given how simple the required changes are, two versions of Simplify.js will be tested. The original version, which expects the coordinates to be in array-of-objects format and the altered version, which operates on nested arrays. Listing \ref{lst:diff-simplify.js} shows an extract of the changes performed on the library. Instead of using properties, the coordinate values are accessed by index. Except for the removal of the licensing header the alterations are restricted to these kind of changes. The full list of changes can be viewed in \path{lib/simplify-js-alternative/simplify.diff}.


\lstinputlisting[
	float=htbp,
	language=diff,
	firstline=11, lastline=16,
	caption=Snippet of the difference between the original Simplify.js and alternative,
	label=lst:diff-simplify.js
]{../lib/simplify-js-alternative/simplify.diff}

\subsection{The WebAssembly solution}
\label{sec:benchmark-webassembly}

In scope of this thesis a library will be created that implements the same procedure as Simplify.js in C code. It will be made available on the web platform through WebAssembly. In the style of the model library it will be called Simplify.wasm. The compiler to be used will be Emscripten as it is the standard for porting C code to WebAssembly.

As mentioned, the first step is to port Simplify.js to the C programming language. The file \path{lib/simplify-wasm/simplify.c} shows the attempt. It is kept as close to the JavaScript library as possible. This may result in C-untypical coding style but prevents skewed results from unexpected optimizations to the procedure itself. The entry point is not the \texttt{main}-function but a function called \texttt{simplify}. This is specified to the compiler as can be seen in listing \ref{lst:simplify-wasm-compiler-call}.

\lstinputlisting[
float=htpb,
language=bash,
% firstline=2, lastline=3,
label=lst:simplify-wasm-compiler-call,
caption={The call to compile the C source code to WebAssembly in a Makefile}
]{../lib/simplify-wasm/Makefile}

Furthermore, the functions \texttt{malloc} and \texttt{free} from the standard library are made available for the host environment. Another option specifies the optimisation level. With \texttt{O3} the highest level is chosen. The closure compiler minifies the JavaScript glue code. Compiling the code through Emscripten produces a binary file in wasm format and the glue code as JavaScript. These files are called \texttt{simplify.wasm} and \texttt{simplify.js} respectively.

An example usage can be seen in \path{lib/simplify-wasm/example.html}. Even though the memory access is abstracted in this example the process is still unhandy and far from a drop-in replacement of Simplify.js. Thus in \path{lib/simplify-wasm/index.js} a further abstraction to the Emscripten emitted code was written. The exported function \texttt{simplifyWasm} handles module instantiation, memory access and the correct call to the exported wasm function. Finding the correct path to the wasm binary is not always clear when the code is imported from another location. The proposed solution is to leave the resolving of the code-path to an asset bundler that processes the file in a preprocessing step.

\lstinputlisting[
float=htpb,
language=javascript,
firstline=22, lastline=33,
label=lst:simplify-wasm,
caption={The top level function to invoke the WebAssembly simplification.}
]{../lib/simplify-wasm/index.js}

Listing \ref{lst:simplify-wasm} shows the function \texttt{simplifyWasm}. Further explanation will follow regarding the abstractions \texttt{getModule}, \texttt{storeCoords} and \texttt{loadResultAndFreeMemory}.

\paragraph {Module instantiation} will be done on the first call only but requires the function to be asynchronous. For a neater experience in handling Emscripten modules, a utility function named \texttt{initEmscripten}\footnote{/lib/wasm-util/initEmscripten.js} was written to turn the module factory into a JavaScript \texttt{Promise} that resolves on finished compilation. The usage of this function can be seen in listing \ref{lst:simplify-wasm-emscripten-module}. The resulting WebAssembly module is cached in the variable \texttt{emscriptenModule}.

\lstinputlisting[
float=htbp,
language=javascript,
firstline=35, lastline=40,
caption=Caching the instantiated Emscripten module,
label=lst:simplify-wasm-emscripten-module
]{../lib/simplify-wasm/index.js}

\paragraph {Storing coordinates} into the module memory is done in the function \texttt{storeCoords}. Emscripten offers multiple views on the module memory. These correspond to the available WebAssembly data types (e.g. HEAP8, HEAPU8, HEAPF32, HEAPF64, ...)\footnote{\url{https://emscripten.org/docs/api_reference/preamble.js.html\#type-accessors-for-the-memory-model}}. As Javascript numbers are always represented as a double-precision 64-bit binary\footnote{\url{https://www.ecma-international.org/ecma-262/6.0/\#sec-4.3.20}} (IEEE 754-2008), the HEAPF64-view is the way to go to not lose precision. Accordingly the datatype double is used in C to work with the data. Listing \ref{lst:wasm-util-store-coords} shows the transfer of coordinates into the module memory. In line 3 the memory is allocated using the exported \texttt{malloc}-function. A JavaScript TypedArray is used for accessing the buffer such that the loop for storing the values (lines 5 - 8) is trivial.

\lstinputlisting[
float=tbph,
language=javascript,
firstline=12, lastline=21,
caption=The storeCoords function,
label=lst:wasm-util-store-coords
]{../lib/wasm-util/coordinates.js}

\paragraph{To read the result} back from memory one has to look at how the simplification is returned in the C code. Listing \ref{lst:simplify-wasm-entrypoint} shows the entry point for the C code. This is the function which gets called from JavaScript. As expected, arrays are represented as pointers with corresponding length. The first block of code (line 2 - 6) is only meant for declaring needed variables. Lines 8 to 12 mark the radial distance preprocessing. The result of this simplification is stored in an auxiliary array named \texttt{resultRdDistance}. In this case, \texttt{points} will have to point to the new array and the length is adjusted. Finally, the Douglas-Peucker procedure is invoked after reserving enough memory. The auxiliary array can be freed afterwards. The problem now is to return the result pointer and the array length back to the calling code. The fact that pointers in Emscripten are represented by 32bit will be exploited to return a fixed size array of two integers containing the values. We can now look back at how the JavaScript code reads the result.

\lstinputlisting[
float=tbph,
language=c,
firstline=104, lastline=124,
caption=Entrypoint in the C-file,
label=lst:simplify-wasm-entrypoint
]{../lib/simplify-wasm/simplify.c}


Listing \ref{lst:wasm-util-load-result} shows the code to read the values back from module memory. The result pointer and its length are acquired by dereferencing the \texttt{resultInfo}-array. The buffer to use is the heap for unsigned 32-bit integers. This information can then be used to align the Float64Array-view on the 64-bit heap. Construction of the appropriate coordinate representation, by reversing the flattening, can be looked up in the same file. It is realised in the \texttt{unflattenCoords} function. At last it is important to actually free the memory reserved for both the result and the result-information. The exported method \texttt{free} is the way to go here.

\lstinputlisting[
float=!tbph,
language=javascript,
firstline=29, lastline=43,
caption=Loading coordinates back from module memory,
label=lst:wasm-util-load-result
]{../lib/wasm-util/coordinates.js}

\subsection{File sizes}
\label{ch:file-sizes}

For web applications an important measure is the size of libraries. It defines the cost of including the functionality in terms of how much the application size will grow. When it gets too large, especially users with low bandwidth are discriminated as it might be impossible to load the app at all in a reasonable time. Even with fast internet, loading times are relevant as users expect a fast time to first interaction. Also users with limited data plans are glad when developers keep their bundle size to a minimum.

The file sizes in this chapter will be given as the gzipped size. gzip is a file format for compressed files based on the DEFLATE algorithm. It is natively supported by all browsers and the most common web server software. So this is the format that files will be transmitted in on production applications.

For JavaScript applications there is also the possibility of reducing filesize by code minification. This is the process of reformating the source code without changing the functionality. Optimization are brought for example by removing unnecessary parts like spaces and comments or reducing variable names to single letters. Minification is often done in asset bundlers that process the JavaScript source files and produce the bundled application code.

For the WebAssembly solution there are two files required to work with it. The \texttt{.wasm} bytecode and JavaScript glue code. The glue code is already minified by the Emscripten compiler. The binary has a size of 3.8KB while the JavaScript code has a total of 3.1KB. Simplify.js on the other hand will merely need a size of 1.1KB. With minification the size shrinks to 638 bytes.

File size was not the main priority when producing the WebAssembly solution. There are ways to further shrink the size of the bytecode. As of now it contains the logic of the library but also necessary functionality from the C standard library. These were added by Emscripten automatically. The bloat comes from using the memory management functions \texttt{malloc} and \texttt{free}. If the goal was to reduce the file size, one would have to get along without memory management at all. This would even be possible in this case as the simplification process is a self-contained process and the module has no other usage. The input size is known beforehand so instead of creating reserved memory one could just append the result in memory at the location directly after the input feature. The function would merely need to return the result size. After the call is finished and the result is read by JavaScript the memory is not needed anymore. A test build was made which renounced from memory management. The size of the wasm bytecode shrunk to 507 byte and the glue code to 2.8KB. By using vanilla JavaScript API one could even ditch the glue code altogether \parencite{surma2019replacing}.

For simplicity the memory management was left in as the optimizations would require more careful engineering to ensure correct functionality. The example above shows however, that there is enormous potential to cut the size. Even file sizes below the JavaScript original are possible.


\subsection{The implementation of a web framework}
\label{ch:benchmark-app}

The performance comparison of the two methods will be realized in a web page. It will be built as a frontend web application that allows the user to specify the input parameters of the benchmark. These parameters are: the polyline to simplify, a range of tolerances to use for simplification and if the so called high quality mode shall be used. By building this application it will be possible to test a variety of use cases on multiple devices. Also the behavior of the algorithms can be researched under different preconditions. In the scope of this thesis a few cases will be investigated. The application structure will now be introduced.

\subsubsection{External libraries}

The dynamic aspects of the web page will be built in JavaScript. Webpack\footnote{https://webpack.js.org/}  will be used to bundle the application code and use compilers like babel\footnote{https://babeljs.io/} on the source code. As mentioned in section \ref{sec:benchmark-webassembly}, the bundler is also useful for handling references to the WebAssembly binary as it resolves the filename to the correct download path to use. There will be intentionally no transpiling of the JavaScript code to older versions of the ECMA standard. This is often done to increase compatibility with older browsers. Luckily this is not a requirement in this case and by refraining from this practice there will also be no unintentional impact on the application performance. Libraries in use are Benchmark.js\footnote{https://benchmarkjs.com/} for statistically significant benchmarking results, React\footnote{https://reactjs.org/} for the building the user interface and Chart.js\footnote{https://www.chartjs.org/} for drawing graphs.

\subsubsection{The application logic}
The web page consists of static and dynamic content. The static parts refer to the header and footer with explanation about the project. Those are written directly into the root HTML document. The dynamic parts are injected by JavaScript. Those will be further discussed in this chapter as they are the main application logic.

\begin{figure}[htb]
	\centering
	\fbox{\includegraphics[width=\linewidth]{images/benchmark-uml.jpg}}
	\caption{UML diagram of the benchmarking application}
	\label{fig:benchmarking-uml}
\end{figure}

The web app is built to test a variety of cases with multiple datapoints. As mentioned, Benchmark.js will be used for statistically significant results. It is however rather slow as it needs about 5 to 6 seconds per datapoint. This is why multiple types of benchmarking methods are implemented. Figure \ref{fig:benchmarking-uml} shows the corresponding UML diagram of the application. One can see the UI components in the top-left corner. The root component is \texttt{App}. It gathers all the internal state of its children and passes state down where it is needed.

\subsubsection{Benchmark cases and chart types}
\label{ch:benchmark-cases}
In the upper right corner the different Use-Cases are listed. These cases implement a function \texttt{"fn"} to benchmark. Additional methods for setting up the function and clean up afterwards can be implemented as given by the parent class \texttt{BenchmarkCase}. Concrete cases can be created by instantiating one of the benchmark cases with a defined set of parameters. There are three charts that will be rendered using a subset of these cases. These are:

\begin{itemize}
	\item \textbf{Simplify.js vs Simplify.wasm} - This Chart shows the performance of the simplification by Simplify.js, the altered version of Simplify.js and the newly developed Simplify.wasm.
	\item \textbf{Simplify.wasm runtime analysis} - To further gain insights to WebAssembly performance this stacked barchart shows the runtime of a call to Simplify.wasm. It is partitioned into time spent for preparing data (\texttt{storeCords}), the algorithm itself and the time it took for the coordinates being restored from memory (\texttt{loadResult}).
	\item \textbf{Turf.js method runtime analysis} - The last chart will use a similar structure. This time it analyses the performance impact of the back and forth transformation of data used in Truf.js.
\end{itemize}

\subsubsection{The different benchmark types}
On the bottom the different types of Benchmarks implemented can be seen. They all implement the abstract \texttt{measure} function to return the mean time to run a function specified in the given \texttt{BenchmarkCase}. The \texttt{IterationsBenchmark} runs the function a specified number of times, while the \texttt{OpsPerTimeBenchmark} always runs a certain amount of milliseconds to run as much iterations as possible. Both methods got their benefits and drawbacks. Using the iterations approach one cannot determine the time the benchmark runs beforehand. With fast devices and a small number of iterations one can even fall in the trap of the duration falling under the accuracy of the timer used. Those results would be unusable of course. It is however a very fast way of determining the speed of a function. And it holds valuable for getting a first approximation of how the algorithms perform over the span of datapoints. The second type, the operations per time benchmark, seems to overcome this problem. It is however prune to garbage collection, engine optimizations and other background processes. \parencite{bynens2010bulletproof}

Benchmark.js combines these approaches. In a first step it approximates the runtime in a few cycles. From this value it calculates the number of iterations to reach an uncertainty of at most 1\%. Then the samples are gathered. \parencite{hossain2012benchmark}

\subsubsection{The benchmark suite}
For running multiple benchmarks the class \texttt{BenchmarkSuite} was created. It takes a list of \texttt{BenchmarkCases} and runs them through a \texttt{BenchmarkType}. The suite manages starting, pausing and stopping of going through list. It updates the statistics gathered on each cycle. By injecting an \texttt{onCycle} method, the \texttt{Runner} component can give live feedback about the progress.

\begin{figure}[htb]
	\centering
	\label{fig:benchmarking-statemachine}
	\fbox{\includegraphics[width=.8\linewidth]{images/benchmark-statemachine.jpg}}
	\caption{The state machine for the benchmark suite}
\end{figure}

Figure \ref{fig:benchmarking-statemachine} shows the state machine of the suite. Based on this diagram the user interface component shows action buttons so the user can interact with the state. While running, the suite checks if a state change was requested and acts accordingly by pausing the benchmarks or resetting all statistics gathered when stopping.

\subsubsection{The user interface}

The user interface has three sections. One for configuring input parameters. One for controlling the benchmark process and at last a diagram of the results. Figure \ref{fig:benchmark-ui} shows the user interface.

\begin{figure}[!htb]
	\centering
	\fbox{\includegraphics[width=.9\linewidth]{images/benchmark-ui.png}}
	\caption{The user interface for benchmarking application.}
	\label{fig:benchmark-ui}
\end{figure}

\paragraph{Settings} At first the input parameters of the algorithm have to be specified. For that there are some polylines prepared to choose from. They are introduced in chapter \ref{ch:benchmark-data}. Instead of testing a single tolerance value the user can specify a range. This way the behavior of the algorithms can be observed in one chart. The high quality mode got its name from Simplify.js. If it is enabled there will be no radial-distance preprocessing step before applying the Douglas-Peucker routine. The next option determines which benchmarks will be run. The options are mentioned in chapter \ref{ch:benchmark-cases}. One of the three benchmark methods implemented can be selected. Depending on the method chosen additional options will show to further specify the benchmark parameters. The last option deals with chart rendering. Debouncing limits the rate at which functions are called. In this case the chart will delay rendering when datapoints come in at a fast rate.

\paragraph{Run Benchmark} This is the control that displays the status of the benchmark suite. Here benchmarks can be started, stopped, paused and resumed. It also shows the progress of the benchmarks completed in percentage and absolute numbers.

\paragraph{Chart} The chart shows a live diagram of the results. The title represents the selected chart. The legend gives information on which benchmark cases will run. Also the algorithm parameters (dataset and high quality mode) and current platform description can be found here. The tolerance range maps over the x-Axis. On the y-Axis two scales can be seen. The left hand shows by which unit the performance is displayed. This scale corresponds to the colored lines. Every chart will show the number of positions in the result as a grey line. Its scale is displayed on the right. This information is important for selecting a proper tolerance range as it shows if a appropriate order of magnitude has been chosen. Below the chart additional control elements are placed to adjust the visualization. The first selection lets the user choose between a linear or logarithmic y-Axis. The second one changes the unit of measure for performance. The two options are the mean time in milliseconds per operation (ms) and the number of operations that can be run in one second (hz). These options are only available for the chart "Simplify.wasm vs. Simplify.js" as the other two charts are stacked bar charts where changing the default options won't make sense. Finally the result can be saved via a download button. A separate page can be fed with this file to display the diagram only.


\subsection{The test data}
\label{ch:benchmark-data}

Here the test data will be shown. There are two data sets chosen to operate on. The first is a testing sample used in Simplify.js the second one a boundary generated from the OpenStreetMap (OSM) data.

\paragraph{Simplify.js example}

This is the polyline used by Simplify.js to demonstrate its capabilities. Figure \ref{fig:dataset-simplify} shows the widget on its homepage. The user can modify the parameters with the interactive elements and view the live result. The data comes from a 10.700 mile car route from Lisboa, Portugal to Singapore and is based on OpenStreetMap data. The line is defined by 73,752 positions. Even with low tolerances this number reduces drastically. This example shows perfectly why it is important to generalize polylines before rendering them.

\begin{figure}[htb]
	\centering
	\includegraphics[width=.9\linewidth]{images/dataset-simplify.png}
	\caption{The Simplify.js test data visualized}
	\label{fig:dataset-simplify}
\end{figure}

\paragraph{Bavaria outline}

The second polyline used for benchmarking contains 116,829 positions. It represents the outline of a german federate state, namely bavaria. It was extracted from the OSM dataset by selecting administrative boundaries. On the contrary to the former polyline this one is a closed line, often used in polygons to represent a surface. The plotted line can be seen in figure \ref{fig:dataset-bavaria}.

\begin{figure}[htb]
	\centering
	\includegraphics[width=.7\linewidth]{images/dataset-bavaria.png}
	\caption{The Bavaria test data visualized}
	\label{fig:dataset-bavaria}
\end{figure}

\paragraph{Simple line}

There is a third line used in the application to choose from. This one is however not used for benchmarking since it contains only 8 points. It is merely a placeholder to prevent the client application to load a bigger data sets from the server on page load. This way the transmitted data size will be reduced. The larger lines will only be requested when they are actually needed.