update thesis

This commit is contained in:
Alfred Melch 2019-07-15 09:57:59 +02:00
parent 65e74582ce
commit 9f2ce8823e
8 changed files with 141 additions and 75 deletions

View File

@ -5,27 +5,25 @@ In this chapter i will explain the approach to improve the performance of a simp
\subsection{State of the art: simplifyJS}
% simplifyJS + turf
Simplify.JS calls itself a "tiny high-performance JavaScript polyline simplification library. It was extracted from Leaflet, the "leading open-source JavaScript library for mobile-friendly interactive maps". Due to its usage in leaflet and Turf.js, a geospatial analysis library, it is the most common used library for polyline simplification.
The library itself has currently 20,066 weekly downloads while the Turf.js derivate @turf/simplify has 30,389.
Simplify.JS calls itself a "tiny high-performance JavaScript polyline simplification library. It was extracted from Leaflet, the "leading open-source JavaScript library for mobile-friendly interactive maps". Due to its usage in leaflet and Turf.js, a geospatial analysis library, it is the most common used library for polyline simplification. The library itself currently has 20,066 weekly downloads while the Turf.js derivate @turf/simplify has 30,389. Turf.js maintains an unmodified fork of the library in its own repository.
The Douglas-Peucker algorithm is implemented with an optional radial distance preprocessing routine. This preprocessing trades performance for quality. Thus the mode for disabling this routine is called "highest quality".
Interestingly the library expects coordinates to be a list of object with x and y properties. \todo{reference object vs array form} GeoJSON and TopoJSON however store Polylines in nested array form. Luckily since the library is small and written in javascript any skilled webdeveloper can easily fork and modify the code for his own purpose. This is even pointed out in the source code. The fact that Turf.js, which can be seen as a convenience wrapper for processing GeoJSON data, decided to keep the library as is might indicate a performance benefit to this format. Listing \ref{lst:turf-transformation} shows how Turf.js calls Simplify.js. Instead of altering the source code the data is transformed back and forth between the formats on each call as it is seen in listing. It is questionable if this practice is advisable at all.
\lstinputlisting[
float, floatplacement=H,
float=htbp,
language=javascript,
firstline=116, lastline=122,
caption=Turf.js usage of simplify.js,
label=lst:turf-transformation
]{../lib/turf-simplify/index.js}
Since it is not clear which case is faster, and given how simple the required changes are, two versions of Simplify.js will be tested: the original version, which expects the coordinates to be in array-of-objects form and the altered version, which operates on nested arrays. Listing \ref{lst:diff-simplify.js} shows an extract of the changes performed on the library. Instead of using properties, the coordinate values are accessed by index. Except for the removal of the lisencing header the alterations are restricted to these kind of changes. The full list of changes can be viewed in \texttt{lib/simplify-js-alternative/simplify.diff}.
Since it is not clear which case is faster, and given how simple the required changes are, two versions of Simplify.js will be tested: the original version, which expects the coordinates to be in array-of-objects form and the altered version, which operates on nested arrays. Listing \ref{lst:diff-simplify.js} shows an extract of the changes performed on the library. Instead of using properties, the coordinate values are accessed by index. Except for the removal of the lisencing header the alterations are restricted to these kind of changes. The full list of changes can be viewed in \path{lib/simplify-js-alternative/simplify.diff}.
\lstinputlisting[
float, floatplacement=H,
float=htbp,
language=diff,
firstline=11, lastline=16,
caption=Snippet of the difference between the original Simplify.js and alternative,
@ -34,6 +32,49 @@ Since it is not clear which case is faster, and given how simple the required ch
\subsection{The webassembly solution}
Just like the simplify-js library the webassembly solution requires the data to be transformed for processing. Meant with that is the storing and loading of bytes into and from the module heap. This transformations however are intensive ones and not as easy to overcome. In a larger project the data may already be managed in a webassembly module. So the raw execution time might be relevant as well. To make assumptions about the real-world usage of WebAssembly in this case there will be seperate measurements for storing and loading of data and the execution.
In scope of this thesis a library will be created that implements the same procedure as simplify.JS in C code. It will be made available on the web platform through WebAssembly. In the style of the model library it will be called simplify.WASM. The compiler to use will be emscripten as it is the standard for porting C code to wasm.
As mentioned the first step is to port simplify.JS to the C programming language. The file \path{/lib/simplify-wasm/simplify.c} shows the attempt. It is kept as close to the Javascript library as possible. This may result in C-untypical coding style but prevents skewed results from unexpected optimizations to the procedure itself. The entrypoint is not the \texttt{main}-function but a function called simplify. This is specified to the compiler as can be seen in \path{/lib/simplify-wasm/Makefile}. Furthermore the functions malloc and free from the standard library are made available for the host environment. Compling the code through emscripten produces a wasm file and the glue code in javascript format. These files are called simplify.wasm and simplify.js respectively. An example usage can be seen in \path{/lib/simplify-wasm/example.html}. Even through the memory access is abstracted in this example the process is still unhandy and far from a drop-in replacement of simplify.JS. Thus in \path{/lib/simplify-wasm/index.js} the a further abstraction to the emscripten emitted code was realised. The exported function \verb simplifyWasm handles module instantiation, memory access and the correct call to the exported wasm code. Finding the correct path to the wasm binary is not always clear however when the code is imported from another location. The proposed solution is to leave the resolving of the code-path to an asset bundler that processes the file in a preprocessing step.
\lstinputlisting[
float=htpb,
language=javascript,
firstline=22, lastline=33,
label=lst:simplify-wasm
]{../lib/simplify-wasm/index.js}
Listing \ref{lst:simplify-wasm} shows the function \texttt{simplifyWASM}. Further explanaition will follow regarding the functions \texttt{getModule}, \texttt{storeCoords} and \texttt{loadResultAndFreeMemory}.
Module instantiation will be done on the first call only but requires the function to be asynchronous. For a neater experience in handling emscripten modules a utility function named \texttt{initEmscripten}\footnote{/lib/wasm-util/initEmscripten.js} was written to turn the module factory into a Javascript Promise that resolves on finished compilation. The result from this promise can be cached in a global variable. The usage of this function can be seen in listing \ref{lst:simplify-wasm-emscripten-module}.
\lstinputlisting[
float=htbp,
language=javascript,
firstline=35, lastline=40,
caption=My Caption,
label=lst:simplify-wasm-emscripten-module
]{../lib/simplify-wasm/index.js}
Next clarification is provided about how coordinates will be passed to this module and how the result is returned. Emscripten offers multiple views on the module memory. These correspond to the available WebAssembly datatypes (e.g. HEAP8, HEAPU8, HEAPF32, HEAPF64, ...)\footnotemark. As Javascript numbers are always represented as a double-precision 64-bit binary\footnotemark (IEEE 754-2008) the HEAP64-view is the way to go to not lose precision. Accordingly the datatype double is used in C to work with the data.
\footnotetext{\path{https://emscripten.org/docs/api_reference/preamble.js.html#type-accessors-for-the-memory-model}}
\footnotetext{\path{https://www.ecma-international.org/ecma-262/6.0/#sec-4.3.20}}
Listing \ref{lst:wasm-util-store-coords} shows the transfer of coordinates into the module memory. In line 3 the memory is allocated using the exported \texttt{malloc}-function. A Javascript TypedArray is used for accessing the buffer such that the loop for storing the values (lines 5 - 8) is trivial.
\lstinputlisting[
float=tbph,
language=javascript,
firstline=12, lastline=21,
caption=The storeCoords function,
label=lst:wasm-util-store-coords
]{../lib/wasm-util/coordinates.js}
\todo[inline]{C code: int* simplify}
\todo[inline]{loadResult}
\subsection{The implementation}

View File

@ -60,24 +60,33 @@
\@writefile{toc}{\contentsline {section}{\numberline {5}Benchmark}{7}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {5.1}State of the art: simplifyJS}{7}\protected@file@percent }
\@writefile{tdo}{\contentsline {todo}{reference object vs array form}{7}\protected@file@percent }
\pgfsyspdfmark {pgfid13}{9505910}{24351831}
\pgfsyspdfmark {pgfid16}{36067891}{24366576}
\pgfsyspdfmark {pgfid17}{37916186}{24097879}
\pgfsyspdfmark {pgfid13}{9505910}{23172546}
\pgfsyspdfmark {pgfid16}{36067891}{23187291}
\pgfsyspdfmark {pgfid17}{37916186}{22918594}
\newlabel{lst:turf-transformation}{{1}{8}}
\@writefile{lol}{\contentsline {lstlisting}{\numberline {1}Turf.js usage of simplify.js}{8}\protected@file@percent }
\newlabel{lst:diff-simplify.js}{{2}{8}}
\@writefile{lol}{\contentsline {lstlisting}{\numberline {2}Snippet of the difference between the original Simplify.js and alternative}{8}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {5.2}The webassembly solution}{8}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}The implementation}{8}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {6}Compiling an existing C++ library for use on the web}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}State of the art: psimpl}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Compiling to webassembly}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {6.2.1}Introduction to emscripten}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.3}Preserving topology GeoJSON vs TopoJSON}{9}\protected@file@percent }
\@writefile{tdo}{\contentsline {todo}{object form vs array form}{9}\protected@file@percent }
\pgfsyspdfmark {pgfid18}{19562753}{38119504}
\@writefile{toc}{\contentsline {subsection}{\numberline {6.4}The implementation}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {7}Results}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}Benchmark results}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Comparing the results of different algorithms}{9}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {8}Conclusion}{10}\protected@file@percent }
\newlabel{lst:simplify-wasm}{{5.2}{9}}
\newlabel{lst:simplify-wasm-emscripten-module}{{3}{9}}
\@writefile{lol}{\contentsline {lstlisting}{\numberline {3}My Caption}{9}\protected@file@percent }
\newlabel{lst:wasm-util-store-coords}{{4}{10}}
\@writefile{lol}{\contentsline {lstlisting}{\numberline {4}The storeCoords function}{10}\protected@file@percent }
\@writefile{tdo}{\contentsline {todo}{C code: int* simplify}{10}\protected@file@percent }
\pgfsyspdfmark {pgfid18}{19562753}{26245426}
\@writefile{tdo}{\contentsline {todo}{loadResult}{10}\protected@file@percent }
\pgfsyspdfmark {pgfid19}{19562753}{24967837}
\@writefile{toc}{\contentsline {subsection}{\numberline {5.3}The implementation}{10}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {6}Compiling an existing C++ library for use on the web}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.1}State of the art: psimpl}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.2}Compiling to webassembly}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {subsubsection}{\numberline {6.2.1}Introduction to emscripten}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {6.3}Preserving topology GeoJSON vs TopoJSON}{11}\protected@file@percent }
\@writefile{tdo}{\contentsline {todo}{object form vs array form}{11}\protected@file@percent }
\pgfsyspdfmark {pgfid20}{19562753}{38119504}
\@writefile{toc}{\contentsline {subsection}{\numberline {6.4}The implementation}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {7}Results}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.1}Benchmark results}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {subsection}{\numberline {7.2}Comparing the results of different algorithms}{11}\protected@file@percent }
\@writefile{toc}{\contentsline {section}{\numberline {8}Conclusion}{12}\protected@file@percent }

View File

@ -1,4 +1,4 @@
This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded format=pdflatex 2019.7.11) 14 JUL 2019 13:49
This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2019/dev/Debian) (preloaded format=pdflatex 2019.7.11) 15 JUL 2019 09:56
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
@ -411,6 +411,10 @@ LaTeX Info: Redefining \addtolength on input line 81.
)
\c@@todonotes@numberoftodonotes=\count115
)
(/usr/share/texlive/texmf-dist/tex/latex/url/url.sty
\Urlmuskip=\muskip10
Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc.
)
(/usr/share/texlive/texmf-dist/tex/latex/fancyhdr/fancyhdr.sty
Package: fancyhdr 2019/01/31 v3.10 Extensive control of page headers and footer
s
@ -473,18 +477,18 @@ Package: listings 2018/09/02 1.7 (Carsten Heinz)
(./custom-listing.tex) (./main.aux)
\openout1 = `main.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 33.
LaTeX Font Info: ... okay on input line 33.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 33.
LaTeX Font Info: ... okay on input line 33.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 33.
LaTeX Font Info: ... okay on input line 33.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 33.
LaTeX Font Info: ... okay on input line 33.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 33.
LaTeX Font Info: ... okay on input line 33.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 33.
LaTeX Font Info: ... okay on input line 33.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 34.
LaTeX Font Info: ... okay on input line 34.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 34.
LaTeX Font Info: ... okay on input line 34.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 34.
LaTeX Font Info: ... okay on input line 34.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 34.
LaTeX Font Info: ... okay on input line 34.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 34.
LaTeX Font Info: ... okay on input line 34.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 34.
LaTeX Font Info: ... okay on input line 34.
*geometry* driver: auto-detecting
*geometry* detected driver: pdftex
@ -581,12 +585,12 @@ File: epstopdf-sys.cfg 2010/07/13 v1.3 Configuration of (r)epstopdf for TeX Liv
e
))
ABD: EveryShipout initializing macros ABD: EverySelectfont initializing macros
LaTeX Info: Redefining \selectfont on input line 33.
LaTeX Info: Redefining \selectfont on input line 34.
\c@lstlisting=\count130
<images/uni-augsburg.jpeg, id=29, 465.23813pt x 238.64156pt>
File: images/uni-augsburg.jpeg Graphic file (type jpg)
<use images/uni-augsburg.jpeg>
Package pdftex.def Info: images/uni-augsburg.jpeg used on input line 56.
Package pdftex.def Info: images/uni-augsburg.jpeg used on input line 57.
(pdftex.def) Requested size: 170.71393pt x 87.56407pt.
[1
@ -603,43 +607,52 @@ LaTeX Font Info: External font `cmex10' loaded for size
\tf@toc=\write4
\openout4 = `main.toc'.
[3] (./chapters/content.tex
(./chapters/introduction.tex [1]) [2] (./chapters/chapter01.tex) [3]
(./chapters/chapter03.tex) [4] (./chapters/chapter04.tex [5]) [6]
(./chapters/chapter05.tex (../lib/turf-simplify/index.js
[3] (./chapters/introduction.tex [1]) [2]
(./chapters/chapter01.tex) [3] (./chapters/chapter03.tex) [4]
(./chapters/chapter04.tex [5]) [6] (./chapters/chapter05.tex
(../lib/turf-simplify/index.js
LaTeX Font Info: Font shape `OT1/cmtt/bx/n' in size <10> not available
(Font) Font shape `OT1/cmtt/m/n' tried instead on input line 116.
)
Overfull \hbox (28.87335pt too wide) in paragraph at lines 24--25
\OT1/cmr/m/n/12 The full list of changes can be viewed in \OT1/cmtt/m/n/12 lib/
simplify-js-alternative/simplify.diff\OT1/cmr/m/n/12 .
) (../lib/simplify-js-alternative/simplify.diff)
[7] [8] (../lib/simplify-wasm/index.js)
LaTeX Font Info: External font `cmex10' loaded for size
(Font) <7> on input line 49.
LaTeX Font Info: External font `cmex10' loaded for size
(Font) <5> on input line 49.
(../lib/simplify-wasm/index.js)
[9]
Underfull \hbox (badness 10000) in paragraph at lines 61--61
[][]$\OT1/cmtt/m/n/10 https : / / emscripten . org / docs / api _ reference / p
reamble . js . html #
[]
(../lib/simplify-js-alternative/simplify.diff) [7]) [8]
(./chapters/chapter06.tex) (./chapters/results.tex) [9]
(./chapters/conclusion.tex) [10]) (./main.lol)
(../lib/wasm-util/coordinates.js)) [10] (./chapters/chapter06.tex)
(./chapters/results.tex) [11] (./chapters/conclusion.tex) [12] (./main.lol)
\tf@lol=\write5
\openout5 = `main.lol'.
[11] (./main.aux) )
[13] (./main.aux) )
Here is how much of TeX's memory you used:
14824 strings out of 492615
287477 string characters out of 6131390
419968 words of memory out of 5000000
18408 multiletter control sequences out of 15000+600000
14939 strings out of 492615
289125 string characters out of 6131390
421122 words of memory out of 5000000
18513 multiletter control sequences out of 15000+600000
8583 words of font info for 31 fonts, out of 8000000 for 9000
1141 hyphenation exceptions out of 8191
62i,12n,81p,1332b,1290s stack positions out of 5000i,500n,10000p,200000b,80000s
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/
cm/cmbx12.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr
10.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb>
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmsl12.pfb></usr/
share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmtt10.pfb></usr/share/
texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmtt12.pfb>
Output written on main.pdf (15 pages, 134931 bytes).
62i,12n,81p,1556b,1286s stack positions out of 5000i,500n,10000p,200000b,80000s
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/c
m/cmbx12.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr1
0.pfb></usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb><
/usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr7.pfb></usr/sha
re/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr8.pfb></usr/share/texli
ve/texmf-dist/fonts/type1/public/amsfonts/cm/cmsl12.pfb></usr/share/texlive/tex
mf-dist/fonts/type1/public/amsfonts/cm/cmtt10.pfb></usr/share/texlive/texmf-dis
t/fonts/type1/public/amsfonts/cm/cmtt12.pfb>
Output written on main.pdf (17 pages, 170327 bytes).
PDF statistics:
105 PDF objects out of 1000 (max. 8388607)
64 compressed objects within 1 object stream
119 PDF objects out of 1000 (max. 8388607)
74 compressed objects within 1 object stream
0 named destinations out of 1000 (max. 500000)
114 words of extra memory for PDF output out of 10000 (max. 10000000)

View File

@ -1,2 +1,4 @@
\contentsline {lstlisting}{\numberline {1}Turf.js usage of simplify.js}{8}%
\contentsline {lstlisting}{\numberline {2}Snippet of the difference between the original Simplify.js and alternative}{8}%
\contentsline {lstlisting}{\numberline {3}My Caption}{9}%
\contentsline {lstlisting}{\numberline {4}The storeCoords function}{10}%

Binary file not shown.

Binary file not shown.

View File

@ -13,6 +13,7 @@
\usepackage{graphicx} % for figures
\usepackage{todonotes} % for todo notes
\usepackage{url}
% configure headers
\usepackage{fancyhdr} % for headers

View File

@ -35,14 +35,14 @@
\contentsline {section}{\numberline {5}Benchmark}{7}%
\contentsline {subsection}{\numberline {5.1}State of the art: simplifyJS}{7}%
\contentsline {subsection}{\numberline {5.2}The webassembly solution}{8}%
\contentsline {subsection}{\numberline {5.3}The implementation}{8}%
\contentsline {section}{\numberline {6}Compiling an existing C++ library for use on the web}{9}%
\contentsline {subsection}{\numberline {6.1}State of the art: psimpl}{9}%
\contentsline {subsection}{\numberline {6.2}Compiling to webassembly}{9}%
\contentsline {subsubsection}{\numberline {6.2.1}Introduction to emscripten}{9}%
\contentsline {subsection}{\numberline {6.3}Preserving topology GeoJSON vs TopoJSON}{9}%
\contentsline {subsection}{\numberline {6.4}The implementation}{9}%
\contentsline {section}{\numberline {7}Results}{9}%
\contentsline {subsection}{\numberline {7.1}Benchmark results}{9}%
\contentsline {subsection}{\numberline {7.2}Comparing the results of different algorithms}{9}%
\contentsline {section}{\numberline {8}Conclusion}{10}%
\contentsline {subsection}{\numberline {5.3}The implementation}{10}%
\contentsline {section}{\numberline {6}Compiling an existing C++ library for use on the web}{11}%
\contentsline {subsection}{\numberline {6.1}State of the art: psimpl}{11}%
\contentsline {subsection}{\numberline {6.2}Compiling to webassembly}{11}%
\contentsline {subsubsection}{\numberline {6.2.1}Introduction to emscripten}{11}%
\contentsline {subsection}{\numberline {6.3}Preserving topology GeoJSON vs TopoJSON}{11}%
\contentsline {subsection}{\numberline {6.4}The implementation}{11}%
\contentsline {section}{\numberline {7}Results}{11}%
\contentsline {subsection}{\numberline {7.1}Benchmark results}{11}%
\contentsline {subsection}{\numberline {7.2}Comparing the results of different algorithms}{11}%
\contentsline {section}{\numberline {8}Conclusion}{12}%