This commit is contained in:
Alfred Melch 2019-08-05 11:46:35 +02:00
parent 53256578b6
commit c039da9d05
11 changed files with 54 additions and 54 deletions

View File

@ -9,3 +9,4 @@
\input{chapters/03.02-integration.tex}
\input{chapters/04-results.tex}
\input{chapters/05-conclusion.tex}
\input{chapters/06-conclusion.tex}

View File

@ -9,29 +9,29 @@
% Why important, who participants, trends,
Compression of polygonal data structures is the task of simplifying geometries while preseving topological characteristics. The simplification often takes the form of removing points that make up the geometry. There are several solutions that tackle the problem in different ways. This thesis aims to compare and classify these solutions by various heuristics. Performance and compression rate are quantitative heuristic used. Positional, length and area errors will also be measured to quantify simplification errors. Qualitative heuristics will be determined by a user study. With the rising trend of moving desktop applications to the web platform also geographic information systems (GIS) have experienced the shift towards web browsers [example ESRI Web Gis]. Performance is critical in these applications. Since simplification is an important factor to performance the solutions will be tested by constructing a web application using a technology called WebAssembly.
Simplification of polygonal data structures is the task of reducing data points while preserving topological characteristics. The simplification often takes the form of removing points that make up the geometry. There are several solutions that tackle the problem in different ways. This thesis aims to compare and classify these solutions by various heuristics. Performance and compression rate are quantitative heuristic used. Positional, length and area errors will also be measured to quantify simplification errors. With the rising trend of moving desktop applications to the web platform also geographic information systems (GIS) have experienced the shift towards web browsers \footnote{\path{https://www.esri.com/about/newsroom/arcnews/implementing-web-gis/}}. Performance is critical in these applications. Since simplification is an important factor to performance the solutions will be tested by constructing a web application using a technology called WebAssembly.
\subsection{Binary instruction sets on the web platform}
The recent development of WebAssembly allows code written in various programming languages to be run natively in web browsers. A privilege thus far only granted to the Javascript programming language. The goals of WebAssembly are to define a binary instruction format as a compilation target to execute code at native speed and taking advantage of common hardware capabilities [web-source wasm]. The integration into the web platform brings portability to a wide range of platforms like mobile and internet of things (IoT). The usage of this technology promises performance gains that will be tested by this thesis. The results can give conclusions to whether WebAssembly is worth a consideration for web applications with geographic computational aspects. WebGIS is an example technology that would benefit greatly of such an advancement. Thus far WebAssembly has been shipped to the stable version of the four most used browser engines [source]. The mainly targeted high-level languages for compilation are C and C++ [wasm-specs]. Also a compiler for Rust has been developed [rust-wasm working group]. It will be explored how existing implementations could easily be adopted when using a compiler.
The recent development of WebAssembly allows code written in various programming languages to be run natively in web browsers. So far JavaScript was the only native programming language on the web. The goals of WebAssembly are to define a binary instruction format as a compilation target to execute code at native speed and taking advantage of common hardware capabilities \footnote{\path{https://webassembly.org/}}. The integration into the web platform brings portability to a wide range of platforms like mobile and internet of things (IoT). The usage of this technology promises performance gains that will be tested by this thesis. The results can give conclusions to whether WebAssembly is worth a consideration for web applications with geographic computational aspects. Web GIS is an example technology that would benefit greatly of such an advancement. Thus far WebAssembly has been shipped to the stable version of the four most used browser engines \footnote{\path{https://lists.w3.org/Archives/Public/public-webassembly/2017Feb/0002.html}]}. The mainly targeted high-level languages for compilation are C and C++. Also a compiler for Rust and a TypeScript subset has been developed. It will be explored how existing implementations could easily be adopted when using a compiler.
\subsection{Performance as important factor for web applications}
Performance is one of the factors users complain the most about in websites. [Some study] shows that insufficient UI-performance is the main reason for negative user experience. [Another study] states that users will immediately leave websites after only 2 seconds of unresponsiveness. There has been a rapid growth of complex applications running in web-browsers [source]. These so called progressive web apps (PWA) combine the fast reachability of web pages with the feature richness of locally installed applications. Even though these applications can grow quire complex, the requirement for fast page loads and short time to user interaction still remains. One way to cope this need is the use of compression algorithms to reduce the amount of data transmitted and processed. Compression can be lossless. This is often used for the purpose of data transmission. Web servers use lossless compression algorithms like gzip to deflate data. Browsers that implement these algorithms can then fully restore the requested ressources resulting in lower bandwidth usage. Another form of compression removes information of the data in a way that cannot be restored. This is called lossy compression. The most common usage on the web is the compression of image data.
There has been a rapid growth of complex applications running in web-browsers. These so called progressive web apps (PWA) combine the fast reachability of web pages with the feature richness of locally installed applications. Even though these applications can grow quire complex, the requirement for fast page loads and instant user interaction still remains. One way to cope this need is the use of compression algorithms to reduce the amount of data transmitted and processed. In a way simplification is a form of data compression. Web servers use lossless compression algorithms like gzip to deflate data before transmission. Browsers that implement these algorithms can then fully restore the requested ressources resulting in lower bandwidth usage. The algorithms presented here however remove information from the data in a way that cannot be restored. This is called lossy compression. The most common usage on the web is the compression of image data.
\subsection{Topology simplification for rendering performance}
While compression is often used to minimize bandwidth usage the compression of geospatial data can particulary influence rendering performance. The bottleneck for rendering often is the svg transformation used to display topology on the web [source]. Implementing simplification algorithms for use on the web platform can lead to smoother user experience when working with large geodata sets.
While compression is often used to minimize bandwidth usage the compression of geospatial data can particulary influence rendering performance. The bottleneck for rendering often is the svg transformation used to display topology on the web. Implementing simplification algorithms for use on the web platform can lead to smoother user experience when working with large geodata sets.
\subsection{Related work}
\todo[inline]{Related Work}
\subsection{Structure of this thesis}
This thesis is structured into a theoretical and a practical component. First the theoretical principles will be reviewed. Topology of polygonal data will be explained as how to store geodata. Also the fundamentals of LineString simplification will be covered.
This thesis is structured into a theoretical and a practical component. First the theoretical principles will be reviewed. Topology of polygonal data will be explained as how to describe geodata on the web. A number of algorithms will be introduced in this section. Each algorithm will be dissected by complexity, characteristics and the possible influence to the heuristics mentioned above. An introduction to WebAssembly will be given here.
Then a number of algorithms will be introduced. In this section the each algorithm will be dissected by complexity, characteristics and the possible influence to the heuristics mentioned above.
In the next chapter the practical implementation will be presented. This section is divided in two parts since two web applications are produced in this thesis. The first one is a benchmark comparison of an algorithm implemented in JavaScript and in WebAssembly. It will be used investigate if performance of established implementations can be improved by a new technology. The second part is about several algorithms brought to the web by compiling an existing C++ library. This application can be used for qualitative analysis of the algorithms. It will show live results to see the characteristics and influence of single parameters.
In the fourth chapter the practical implementation will be presented. This section will dig deeper in several topics important to web development. Such as single page applications, WebAssembly and how web workers will be used for asynchronous execution. The developed application will aim to implement modern best practices in web development such fast time to first user interaction and deferred loading of modules.
The results of the above methods will be shown in chapter 4. After discussion of the results a concluion will finish the thesis.
The fifth chapter explains how performance will be measured in the web application. After presenting the
results the concluion chapter will finish the thesis.

View File

@ -1,6 +1,7 @@
\subsection{Generalization in cartography}
\subsubsection{Goals of reducing data}

View File

@ -6,11 +6,21 @@ JavaScript has been the only native programming language of web browsers for a l
\subsubsection{Introduction to Webassembly}
\todo[inline]{Present WebAssembly}
WebAssembly is designed by engineers from the four major browser vendors (Mozilla, Google, Apple, Microsoft). It is a portable low-level bytecode designed as target for compilationof high-level languages. By being an abstraction over modern hardware it is language-, hardware-, and platform-independent. It is intended to be run in a stack-based virtual machine. This way it is not restrained to the Web platform or a JavaScript environment. Some key concepts are the structuring into modules with exported and imported definitions and the linear memory model. Memory is represented as a large array of bytes that can be dynamically grown. Security is ensured by the linear memory being disjoint from code space, the execution stack and the engine's data structures. Another feature of WebAssembly is the possibility of streaming compilation and the parallelization of compilation processes. \footnote{\path{https://people.mpi-sws.org/~rossberg/papers/Haas,\%20Rossberg,
\%20Schuff,\%20Titzer,\%20Gohman,\%20Wagner,\%20Zakai,\%20Bastien,\%20Holman\%20-\%20Bringing\%20the\%20Web\%20up\%20to\%20Speed\%20with\%20WebAssembly.pdf}}
\paragraph{Benefits of WebAssembly}
The goals of WebAssembly have been well defined. It's semantics are intended to be safe and fast to execute and bring portability by language-, hardware- and platform-independence. Furthermore it should be deterministic and have simple interoperability with the web platform. For its representation the following goals are declared. It shall be compact and easy to decode, validate and compile. Parallelization and streamable compilation are also mentioned.
These goals are not specific to WebAssembly. They can be seen as properties that a low-level compilation target for the web should have. In fact there have been previous attempts to run low-level code on the web. Examples are Microsoft's ActiveX, Native Client (NaCl) and Emscripten each having issues complying with the goals. Java and Flash are examples for managed runtime plugins. Their usage is declining however not at least due to falling short on the goals mentioned above.
It is often stated that WebAssembly can bring performance benefits. It makes sense that statically typed machine code beats scripting languages performance wise. It has to be observed however if the overhead of switching contexts will neglect this performance gain. JavaScript has made a lot of performance improvements over the past years. Not at least Googles development on the V8 engine has brought JavaScript to an acceptable speed for extensive calculations. The engine observes the execution of running javaScript code and will perform optimizations that can be compared to optimizations of compilers.
The JavaScript ecosystem has rapidly evolved the past years. Thanks to package managers like bower, npm and yarn it is simple to pull code from external sources into ones codebase. Initially thought for server sided JavaScript execution the ecosystem has found its way into front-end development via module bundlers like browserify, webpack and rollup. In course of this growth many algorithms and implementations have been ported to JavaScript for use on the web. With WebAssembly this ecosystem can be broadened even further. By lifting the language barrier existing work of many more programmers can be reused on the web. Whole libraries exclusive for native development could be imported by a few simple tweaks. Codecs not supported by browsers can be made available for use in any browser supporting WebAssembly. One example could be the promising AV1 video codec. In this these the C++ library psimpl will be utilized to bring polyline simplification to the web. This library already implements various algorithms for this task. It will be further introduced in chapter \ref{ch:psimpl}.
\paragraph{Existing compilers}
\todo[inline]{Languages from which to compile}
\todo[inline]{emscripten}
\todo[inline]{assemblyscript}
\todo[inline]{rust}
@ -19,34 +29,3 @@ JavaScript has been the only native programming language of web browsers for a l
\todo[inline]{Managing memory}
\todo[inline]{passing arrays}
\paragraph{Benefits of WebAssembly}
Why are people going through the hassle of bringing machine code to a platform with a working scripting engine. Is javascript really that aweful. It is often stated that WebAssembly can bring performance benefits. It makes sense that statically typed machine code beats scripting languages performance wise. It has to be observed however if the overhead of switching contexts will neglect this performance gain. Javascript has made a lot of performance improvements over the past years. Not at least Googles development on the V8 engine has brought Javascript to an acceptable speed for extensive calculations. The engine observes the execution of running javascript code and will perform optimizations that can be compared to optimizations of compilers.
\todo[inline]{Get chart and source of js performance}
\todo[inline]{Source for V8 performance observation}
The javascript ecosystem has rapidly evolved the past years. Thanks to package managers like bower, npm and yarn it is super simple to pull code from external sources into ones codebase. In course of this growth many algorithms and implementations have been ported to javascript for use on the web. After all it is however not more then that. A port splits communities and contradicts the DRY principle. With WebAssembly existing work of many programmers can be reused as is for usage on the web. This is the second benefit proposed by the technology. Whole libraries exclusive for native development could be imported by a few simple tweaks. Codecs not supported by browsers can be made available for use in any browser supporting WebAssembly. One example could be the promising AV1 codec
\todo[inline]{more about av1}
To summarize the two main benefits that are expected from WebAssembly are perfomance and integration. In this thesis these two benefits will be tested.
\paragraph{Two test cases - performance and integration}
The benefits that WebAssembly promises shall be tested in two seperate Webpages. One for the performance measurements and one to test the integration of existing libraries.
\paragraph{Performance}
As it is the most applicated algorithm the Douglas-Peucker algorithm will be used for measuring performance. A Javascript implementation is quickly found. simplifyJS. It is the package used by Turf, the most used for geospatial calculations. To produce comparable results the implementation will be based on this package. Since WebAssembly defines a compilation goal several languages can be used for this test.
\todo[inline]{source for simplify JS}
\todo[inline]{source for turf}
\paragraph{Integration}
An existing implementation of several simplification algorithms has found in the C++ ecosystem. psimpl implements x algorithms distributed as a single header file. It also provides a function for measuring positional errors making it ideal for use in a quality analysis tool for those algorithms.

View File

@ -1 +1,13 @@
\section{Methodology}
The benefits that WebAssembly promises shall be tested in two seperate web pages. One for the performance measurements and one to test the integration of existing libraries.
\paragraph{Performance}
As it is the most applicated algorithm the Douglas-Peucker algorithm will be used for measuring performance. A JavaScript implementation is quickly found. SimplifyJS. It is used by Turf, a geospatial analysis library. To produce comparable results the implementation will be based on this package. A separate library, called SimplifyWASM, written in C will be created that mimics the JavaScript original.
\paragraph{Integrating an existing C++ library}
An existing implementation of several simplification algorithms has been found in the C++ ecosystem. \textsf{psimpl} implements 8 algorithms distributed as a single header file. It also provides a function for measuring positional errors making it ideal for use in a quality analysis tool for those algorithms.

View File

@ -83,7 +83,7 @@ caption=The storeCoords function,
label=lst:wasm-util-store-coords
]{../lib/wasm-util/coordinates.js}
\todo{Check for coords length < 2}
\todo{programming: Check for coords length $<$ 2}
\paragraph{To read the result} back from memory we have to look at how the simplification will be returned in the C code. Listing \ref{lst:simplify-wasm-entrypoint} shows the entry point for the C code. This is the function that gets called from JavaScript. As expected arrays are represented as pointers with corresponding length. The first block of code (line 2 - 6) is only meant for declaring needed variables. Lines 8 to 12 mark the radial distance preprocessing. The result of this simplification is stored in an auxiliary array named \texttt{resultRdDistance}. In this case \texttt{points} will have to point to the new array and the length is adjusted. Finally the Douglas-Peucker procedure is invoked after reserving enough memory. The auxiliary array can be freed afterwards. The problem now is to return the result pointer and the array length back to the calling code. \todo{Fact check. evtl unsigned}The fact that pointers in Emscripten are represented by an integer will be exploited to return a fixed size array of two containing the values. A hacky solution but it works. We can now look back at how the JavaScript code reads the result.
@ -120,7 +120,7 @@ The dynamic aspects of the web page will be built in JavaScript to make it run i
The web page consist of static and dynamic content. The static parts refer to the header and footer with explanation about the project. Those are written directly into the root HTML document. The dynamic parts are injected by JavaScript. Those will be further discussed in this chapter as they are the main application logic.
\begin{figure}[htbp]
\begin{figure}[htb]
\centering
\label{fig:benchmarking-uml}
\fbox{\includegraphics[width=\linewidth]{images/benchmark-uml.jpg}}
@ -143,4 +143,6 @@ Benchmark.js combines these approaches. In a first step it approximates the runt
For running multiple benchmarks the class \texttt{BenchmarkSuite} was created. It takes a list of BenchmarkCases and runs them through a BenchmarkType. The Suite manages starting, pausing and stopping of going through list. It updates the statistics gathered on each cycle. By injecting an onCycle method, the \texttt{App} component can give live feedback about the progress.
\subsubsection{The user interface}
\subsubsection{The user interface}
\todo[inline]{Explain user interface}

View File

@ -3,7 +3,8 @@
In this chapter I will explain how an existing C++ library was utilized compare different simplification algorithms in a web browser. The library is named \textsl{psimpl} and was written in 2011 from Elmar de Koning. It implements various Algorithms used for polyline simplification. This library will be compiled to WebAssembly using the Emscripten compiler. Furthermore a Web-Application will be created for interactively exploring the Algorithms. The main case of application is simplifying polygons, but also polylines will be supported. The data format used to read in the data will be GeoJSON. To maintain topological correctness a intermediate conversion to TopoJSON will be applied if requested.
\subsubsection{State of the art: psimpl}
\subsubsection{State of the art: psimpl}\
\label{ch:psimpl}
\textsl{psimpl} is a generic C++ library for various polyline simplification algorithms. It consists of a single header file \texttt{psimpl.h}. The algorithms implemented are \textsl{Nth point}, \textsl{distance between points}, \textsl{perpendicular distance}, \textsl{Reumann-Witkam}, \textsl{Opheim}, \textsl{Lang}, \textsl{Douglas-Peucker} and \textsl{Douglas-Peucker variation}. It has to be noted, that the \textsl{Douglas-Peucker} implementation uses the \textsl{distance between points} routine, also named the \textsl{radial distance} routine, as preprocessing step just like Simplify.js (Section \ref{sec:simplify.js}). All these algorithms have a similar templated interface. The goal now is to prepare the library for a compiler.
@ -68,7 +69,7 @@ After explaining the state model the User Interface (UI) shall be explained. The
\begin{figure}[htb]
\centering
\fbox{\includegraphics[width=\linewidth]{images/integration-ui.jpg}}
\caption{The user interface for the algorithm comparison.}
\caption{The user interface for the algorithm comparison. (Placeholder!)}
\label{fig:integration-ui}
\end{figure}

View File

@ -1,3 +1 @@
\section{Conclusion}
Enhancement: Line Smoothing as preprocessing step
\section{Discussion}

View File

@ -0,0 +1,6 @@
\section{Conclusion}
\subsection{Enhancements}
Enhancement: Line Smoothing as preprocessing step
\subsection{Future Work}

View File

@ -1,3 +1,3 @@
\contentsline {figure}{\numberline {1}{\ignorespaces UML diagram of the benchmarking application}}{15}{figure.1}%
\contentsline {figure}{\numberline {2}{\ignorespaces The state model of the application}}{20}{figure.2}%
\contentsline {figure}{\numberline {3}{\ignorespaces The user interface for the algorithm comparison.}}{21}{figure.3}%
\contentsline {figure}{\numberline {1}{\ignorespaces UML diagram of the benchmarking application}}{17}{figure.1}%
\contentsline {figure}{\numberline {2}{\ignorespaces The state model of the application}}{22}{figure.2}%
\contentsline {figure}{\numberline {3}{\ignorespaces The user interface for the algorithm comparison. (Placeholder!)}}{23}{figure.3}%

Binary file not shown.