Home Articles GIS data compression and its need for LBS

GIS data compression and its need for LBS

Naoki Ueda
Founder and CEO
Locazing Inc., Japan
[email protected]

Imagine if your MP3 player could store only 20 songs, or if you are unable to send or receive an email because it has a large file attached. Data compression would be your need. Today, data compression technologies are a very popular, common, and fundamental part of Information Technology and our daily lives as well.

In 1948, Dr. Claude E. Shannon formulated the theory of data compression. After that, a variety of data compression theories and methods have been developed. Today, they an integral part of the applications software and not explicitly mentioned, but they certainly contribute a great deal of convenience to our daily life. There are various 'ready-to-use' compression tools available for many different types of usage. You may have heard of some of them, such as MP3, AAC, WMA formats, which were designed for audio data compression, JPEG, for picture compression, and ZIP, for the compression of data. Data compression technologies always work behind the scenes to improve the usability, conformability, and convenience of hardware and related services.

Today, Location-Based Services (LBS) is focused on expanding their markets and becoming as popular as other IT services. Especially in the mobile phone market in advanced countries, almost every mobile phone is equipped with a GPS and is LBS-ready. Yet, people do not use these services unless they are as easy and comfortable to use as MP3 players and music download services. Now new data compression technologies especially designed for LBS/GIS data are in high demand because of the following reasons:

  • Existing data compression technologies are insufficient to meet the demands of LBS/GIS
  • GIS data compression improve customer's experience of LBS
  • GIS data compression boosts LBS

There are a lot of data compression algorithms and methods available in the world. Each of them is designed and optimised for specialised purposes. Most compression tools are combinations of multiple compression algorithms and try to maximise the compression efficiency of target object data. Compression theory always uses the characteristics and inherent redundancy of target data.

Compression of audio data utilises the limited nature of human's cognition process, and cuts out sound which can hardly be heard.

In this case, the data compression is not 100% reversible but ensures adequate quality to allow pleasant listening. This is called 'Lossy' data compression. MP3 audio – MPEG layer 3 – is based on this principle and combines many algorithms to balance compression efficiency and quality loss. On the contrary, for file data, the compressionmust be 100% reversible. This is called 'Lossless' data compression. If the file is a text file, such as an English article or computer programming code, there should be repeating pattern of words and phrases. If there is a repeated pattern, an algorithm called Universal Coding can be used for lossless compression. LZW coding, which is used in GIF image format, is one example of Universal Coding. For visual image data, the fact that a pixel and its neighbour have similar colour is utilised as part of the mechanism of compression.

When data has some relation to neighbouring data, Data Relativity Coding may be used. DPCM – Differential Pulse Coding Modulation – is one example of Data Relativity Coding. The result of calculating the difference between one data point and the next data point is a small number. When there is a bias of appearance probability among values, the method called 'Entropy Coding' is suitable. The famous Huffman coding is one such method. The JPEG image format uses one of these algorithms.

First, GIS data has a layer structure. It often represents lines, areas or both. For example, navigation routes (line) or the outer limits of a park (area), etc. These GIS data consist of multiple geographic points or locations. Each point data consists of latitude, longitude, and optionally altitude, time, or other parameters. Second, any geographic point and the next point in GIS data are probably 'close' to each other and stay in the local region, if data represents routes or areas. However, data scale, latitude and longitude, is available globally. Thus, sets of data cause large redundancy that can be compressed easily. Even single usage of differential-based compression method may work fine.

shows an example of differential-based compression. Third, GIS data require 'Lossless' compression and 'Lossy' compression, depending on the situation and purposes. In other words, what you need is to pinpoint an exact location, route, or shape of path/area. 'Lossy' compression may fit when the 'route' is more important than each of the points it consists of. In this case, the following options may be considered. (Fig. 1) In addition, unlike other types of data, a GIS 'route' data can be recovered by Map-Matching technologies that adjust position error onto road shape on the map.

  • Thinning out the points it consists of
  • Leaving only the intersection points remaining
  • Decreasing the precision of these points

LBS may be used with handheld computers or mobile phones. Generally speaking, data compression algorithms are 'heavy' processes for computers. PCbased- LBS have the power to handle this, but mobile-device-based-LBS have to be capable of handling these kinds of compute intensive processes. 'Heavy' processing slows down systems and consumes batteries quickly. Thus, GIS data compression should be a 'light' process for such devices.

In addition, it is preferable that compressed data has 'later adding capability'. In case of real-time locating, data arrive periodically and need to be added to previous data. If GIS data compression needs decompression, adding, and compression again every time, it will consume a lot of hardware resources.

As explained above, GIS data has unique characteristics, so existing data compression methods that are optimised for other types of data do not cover the demands of GIS data compression.

Usability or human interfacing is the key for any product or service, if your business target is consumers. In other words, people only use what is easy and comfortable to use. Like the MP3 makes your music more accessible, GIS data compression could improve your customers' experience.

First, GIS-compressed data saves storage capacity. Users may use LBS in handheld devices or mobile phones with limited storage capacity. Saving data storage space immediately increases available memory.

shows an example of GIS data compressing. Second, it improves communication speed – an important factor to ensure user comfort in the service, especially for network-based LBS. Many IT services – whether LBS or non- LBS – start considering packet saving for enhancing performance, because the slowest element determines the whole performance in network-based services.

Thus, many Web services have started offering the JSON data format along with the XML format to improve the performance of their services. JSON is a much 'lighter' processing system and provides a smaller data size than XML. In addition, if customers use LBS on mobile devices, GIS data compression may save them money, assuming that the cost of data communication depends on data traffic.

LBS will evolve in two directions. One is "M-LBS" and another is "C-LBS".

Mash-up LBS
What I call 'M-LBS' is 'Mash-up LBS'. Generally speaking, IT services are moving towards independent services to 'mash-up' their services. The era of mash-up services is just arriving. Commercial companies are shifting from contents provider to platform provider. The best example of mash-up in LBS is the Google 'Mapplets' gadget. By connecting Google Maps™ to other information providers, you can show any location-based information on Google Maps™.

Platform vs Content
Google never provides any service content. However, Google provides a platform – i.e., materials to build up services. Thousands of amateur developers called "mash-uppers" and many commercial companies create services by combining parts called APIs or Applicaton Programming Interface. Some are offered by Google, some are from other Web-service companies, and some are the original creations of individuals. As a result, Google Maps™ gets thousands of LBS contents available for Google Maps™ in a short period. Not every commercial company that makes LBS contents can provide contents in this manner. In addition, many 'Mapplets' are used by many customers, so the contents that people use most come to the top of list.

In short, Google only provides a platform for mash-uppers, and as a result many excellent LBS contents come is added to Google Maps™. Because of Google Maps™ API, most of the early mash-up services that people created are Google Map-based LBS. So, surprisingly, in the field called 'Web 2.0', LBS play a popular part.

Complex GIS Data interaction LBS Second, what I call 'C-LBS' is Complexdata- LBS. In the era of the mash-up, Web services need to exchange data. If both services are LBS, then the GIS data

Fig. 2 Fig. 3 should be exchanged or shared. Today, only a few services have started exporting/ importing GIS data among other services and even in these cases, they only share GIS data of a single point, probably a 'destination', or two points, probably 'start and goal'.

If LBS platform services start exchanging complex GIS data, such as routes, directions, or areas, then service interaction will become richer, rather than interacting with a single point. Figure 3 shows the concept of M-LBS and C-LBS.

  • A navigation system exports your 'route', and then a city guide service shows only those restaurants that are along your route.
  • A real estate search service exports a 'list of the houses you chose', and then search services give you a list of sports gyms close to each of the houses.
  • Travel planning services export 'your travel schedule', and then an ad service gives you coupons of shops that you may visit later.

There must be more creative combinations of LBS services. The key is the portability of GIS data. In independent GIS or LBS, GIS data stay within a system database. In inter-LBS mash-up, GIS data must be available even outside a system as well as sharable and exchangeable with other mash-up services. In other words, GIS data need to be "portable" There are already some file formats that enable data exchange among LBS. The formats called KML, KMZ, GML, and GeoJSON are commonly used to share data among GIS systems. KMZ is a ZIP-compressed file of KML. However, KMZ is not efficient. Our experiments found that replacing GIS data with GIScompressed- format, such as Google Maps™ API encoded Polyline algorithm or LocaPort™ GIS data compression Algorithm, in KML is much more effective than KMZ. This is only because the ZIP method is not suitable for GIS data, as explained before.

Once practical GIS data compression becomes popular and available, LBS will rise to a new mash-up-based stage. GIS systems – the foundation of LBS – should support GIS data compression and decompression for the services that run on it. With thousands of mashed-up LBS contents, there will doubtlessly be very creative ones that nobody could even imagine today. LBS and its market will then expand at an increasing rate. In the near future, GIS data compression technologies will definitely play an important role behind the scenes.