mt-polygon-simplification/thesis/chapters/chapter05.tex

99 lines
9.3 KiB
TeX
Raw Normal View History

2019-07-14 20:37:26 +02:00
% Performance benchmark
2019-07-16 17:42:56 +02:00
In this chapter i will explain the approach to improve the performance of a simplification algorithm in a web browser via WebAssembly. The go-to library for this kind of operation is simplifyJS. It is the javascript implementation of the Douglas-Peucker algorithm with optional radial distance preprocessing. The library will be rebuilt in the C programming language and compiled to Webassembly with emscripten. A web page is built to produce benchmarking insights to compare the two approaches performance wise.
2019-07-14 20:37:26 +02:00
\subsection{State of the art: simplifyJS}
% simplifyJS + turf
2019-07-15 09:57:59 +02:00
Simplify.JS calls itself a "tiny high-performance JavaScript polyline simplification library. It was extracted from Leaflet, the "leading open-source JavaScript library for mobile-friendly interactive maps". Due to its usage in leaflet and Turf.js, a geospatial analysis library, it is the most common used library for polyline simplification. The library itself currently has 20,066 weekly downloads while the Turf.js derivate @turf/simplify has 30,389. Turf.js maintains an unmodified fork of the library in its own repository.
2019-07-14 20:37:26 +02:00
The Douglas-Peucker algorithm is implemented with an optional radial distance preprocessing routine. This preprocessing trades performance for quality. Thus the mode for disabling this routine is called "highest quality".
Interestingly the library expects coordinates to be a list of object with x and y properties. \todo{reference object vs array form} GeoJSON and TopoJSON however store Polylines in nested array form. Luckily since the library is small and written in javascript any skilled webdeveloper can easily fork and modify the code for his own purpose. This is even pointed out in the source code. The fact that Turf.js, which can be seen as a convenience wrapper for processing GeoJSON data, decided to keep the library as is might indicate a performance benefit to this format. Listing \ref{lst:turf-transformation} shows how Turf.js calls Simplify.js. Instead of altering the source code the data is transformed back and forth between the formats on each call as it is seen in listing. It is questionable if this practice is advisable at all.
\lstinputlisting[
2019-07-15 09:57:59 +02:00
float=htbp,
2019-07-14 20:37:26 +02:00
language=javascript,
firstline=116, lastline=122,
caption=Turf.js usage of simplify.js,
label=lst:turf-transformation
]{../lib/turf-simplify/index.js}
2019-07-16 17:42:56 +02:00
Since it is not clear which case is faster, and given how simple the required changes are, two versions of Simplify.js will be tested: the original version, which expects the coordinates to be in array-of-objects form and the altered version, which operates on nested arrays. Listing \ref{lst:diff-simplify.js} shows an extract of the changes performed on the library. Instead of using properties, the coordinate values are accessed by index. Except for the removal of the licensing header the alterations are restricted to these kind of changes. The full list of changes can be viewed in \path{lib/simplify-js-alternative/simplify.diff}.
2019-07-14 20:37:26 +02:00
\lstinputlisting[
2019-07-15 09:57:59 +02:00
float=htbp,
2019-07-14 20:37:26 +02:00
language=diff,
firstline=11, lastline=16,
caption=Snippet of the difference between the original Simplify.js and alternative,
label=lst:diff-simplify.js
]{../lib/simplify-js-alternative/simplify.diff}
\subsection{The webassembly solution}
2019-07-15 09:57:59 +02:00
In scope of this thesis a library will be created that implements the same procedure as simplify.JS in C code. It will be made available on the web platform through WebAssembly. In the style of the model library it will be called simplify.WASM. The compiler to use will be emscripten as it is the standard for porting C code to wasm.
2019-07-16 17:42:56 +02:00
As mentioned the first step is to port simplify.JS to the C programming language. The file \path{lib/simplify-wasm/simplify.c} shows the attempt. It is kept as close to the Javascript library as possible. This may result in C-untypical coding style but prevents skewed results from unexpected optimizations to the procedure itself. The entrypoint is not the \texttt{main}-function but a function called simplify. This is specified to the compiler as can be seen in \path{lib/simplify-wasm/Makefile}. Furthermore the functions malloc and free from the standard library are made available for the host environment. Compling the code through emscripten produces a wasm file and the glue code in javascript format. These files are called simplify.wasm and simplify.js respectively. An example usage can be seen in \path{lib/simplify-wasm/example.html}. Even through the memory access is abstracted in this example the process is still unhandy and far from a drop-in replacement of simplify.JS. Thus in \path{lib/simplify-wasm/index.js} the a further abstraction to the emscripten emitted code was realised. The exported function \verb simplifyWasm handles module instantiation, memory access and the correct call to the exported wasm code. Finding the correct path to the wasm binary is not always clear however when the code is imported from another location. The proposed solution is to leave the resolving of the code-path to an asset bundler that processes the file in a preprocessing step.
2019-07-15 09:57:59 +02:00
\lstinputlisting[
float=htpb,
language=javascript,
firstline=22, lastline=33,
label=lst:simplify-wasm
]{../lib/simplify-wasm/index.js}
Listing \ref{lst:simplify-wasm} shows the function \texttt{simplifyWASM}. Further explanaition will follow regarding the functions \texttt{getModule}, \texttt{storeCoords} and \texttt{loadResultAndFreeMemory}.
Module instantiation will be done on the first call only but requires the function to be asynchronous. For a neater experience in handling emscripten modules a utility function named \texttt{initEmscripten}\footnote{/lib/wasm-util/initEmscripten.js} was written to turn the module factory into a Javascript Promise that resolves on finished compilation. The result from this promise can be cached in a global variable. The usage of this function can be seen in listing \ref{lst:simplify-wasm-emscripten-module}.
\lstinputlisting[
float=htbp,
language=javascript,
firstline=35, lastline=40,
caption=My Caption,
label=lst:simplify-wasm-emscripten-module
]{../lib/simplify-wasm/index.js}
Next clarification is provided about how coordinates will be passed to this module and how the result is returned. Emscripten offers multiple views on the module memory. These correspond to the available WebAssembly datatypes (e.g. HEAP8, HEAPU8, HEAPF32, HEAPF64, ...)\footnotemark. As Javascript numbers are always represented as a double-precision 64-bit binary\footnotemark (IEEE 754-2008) the HEAP64-view is the way to go to not lose precision. Accordingly the datatype double is used in C to work with the data.
\footnotetext{\path{https://emscripten.org/docs/api_reference/preamble.js.html#type-accessors-for-the-memory-model}}
\footnotetext{\path{https://www.ecma-international.org/ecma-262/6.0/#sec-4.3.20}}
Listing \ref{lst:wasm-util-store-coords} shows the transfer of coordinates into the module memory. In line 3 the memory is allocated using the exported \texttt{malloc}-function. A Javascript TypedArray is used for accessing the buffer such that the loop for storing the values (lines 5 - 8) is trivial.
\lstinputlisting[
float=tbph,
language=javascript,
firstline=12, lastline=21,
caption=The storeCoords function,
label=lst:wasm-util-store-coords
]{../lib/wasm-util/coordinates.js}
2019-07-16 17:42:56 +02:00
\todo{Check for coords length < 2}
2019-07-15 09:57:59 +02:00
2019-07-16 17:42:56 +02:00
Now we dive int C-land. Listing \ref{lst:simplify-wasm-entrypoint} shows the entry point for the C code. This is the function that gets called from Javascript. As expected arrays are represented as pointers with corresponding length. The first block of code (line 2 - 6) is only meant for declaring needed variables. Lines 8 to 12 mark the radial distance preprocessing. The result of this simplification is stored in n auxiliary array named \texttt{resultRdDistance}. In this case points will have to point to the new array and the length is adjusted. Finally the Douglas-Peucker procedure is invoked after reserving enough memory. The auxiliary array can be freed afterwards. The problem now is to return the result pointer and the array length back to the calling code. \todo{Fact check. evtl unsigned}The fact that pointers in emscripten are represented by an integer will be exploited to return a fixed size array of two containing the values. A hacky solution but it works. We can now look back at how the javascript code reads the result.
\lstinputlisting[
float=tbph,
language=c,
firstline=104, lastline=124,
caption=Entrypoint in the C-file,
label=lst:simplify-wasm-entrypoint
]{../lib/simplify-wasm/simplify.c}
Listing \ref{lst:wasm-util-load-result} shows the code to read the values back from module memory. The result pointer and its length are acquired by dereferencing the \texttt{resultInfo}-array. The buffer to use is the heap for unsigned 32-bit integers. This information can then be used to align the Float64Array-view on the 64-bit heap. Constructing the appropriate coordinate representation by reversing the flattening can be looked up in the same file. It is realised in the \texttt{unflattenCoords} function. At last it is important to actually free the memory reserved for both the result and the result-information. The exported method \texttt{free} is the way to go here.
\lstinputlisting[
float=tbph,
language=javascript,
firstline=29, lastline=43,
caption=Loading coordinates back from module memory,
label=lst:wasm-util-load-result
]{../lib/wasm-util/coordinates.js}
2019-07-15 09:57:59 +02:00
2019-07-14 20:37:26 +02:00
\subsection{The implementation}