Institute of Information Science, Academia Sinica
A frequently repeated factoid is that 80% of all digital data generated today includes geospatial reference. Even though digital maps and other cartographic products are directly geo-referenced with geographic coordinates, these geo-referenced data usually are limited to be used in experts or people who are trained. However, a large volume of the digital data that people are familiar with does not use coordinates but are geo-referred with place names and other plain descriptors of geographic objects and features, like address and postal codes. The place names are instinctive geospatial conception for people. The conceptions of places are complicated, diversified, ambiguous and multi-scaled geospatial objects. Therefore, there is a need to specify the place name to canonical and interchangeable geospatial knowledge. An ontology is a shared, formal conceptualization of a domain. The geospatial ontology is considered as a formal modeling of the geospatial world as this is experienced and conceptualized by non-experts. In this talk, the gazetteer of Taiwan coming from Computer Center of Academia Sinica has been encoded by GML (Geography Markup Language). The study builds two ontologies of place name. One is built by hyponymy of place-name’s feature class and the other is built by spatial relationship. The ontologies are represented by RDF (Resource Description Framework) and used for information retrieval and reasoning. With GML document and RDF of place name, the query engine basing on JENA API is capable of providing user suitable geospatial information by information reasoning. The studied results show how the ontologies of place name are important for next generation Web/Internet GIS.
It is a commonly quoted estimation that up to 80% of all digital data generated today includes geospatial reference. Some of these digital data, like digital maps and other cartographic products, are directly geo-referenced with geographic coordinates. A large volume of the available data, however, do not use coordinates but are indirectly geo-referenced with place names and other text plain text descriptors of geographic objects and features (Vögele and Schlieder, 2002; Hill, 2000). For example, the Flickr is a famous online photograph management and sharing application. According their most popular tags, 31.7% of the tags are place-name and 12.4% of the tags are related to space. Obviously, place name is important in people’s common practice. Although people instinctively will wish to use place name as part of their query, the imprecise and vagueness of place name is considered as a big challenge for data accessing. Often there may be an imprecise match between the query name and the names associated with candidate source of information (Arampatzis, 2006; Jones et al, 2001). There is therefore need for specifying place name to explicit semantic representation for data accessing.
Geospatial domain is characterized by vagueness, especially in the semantic disambiguation of conceptions in the domain, which make defining universally accepted geo-ontology an onerous task (Agarwal, 2005). There are several reasons that why it is difficult to extract geospatial ontology. First, geographic objects are typically complex, and they will in every case have parts. An ontology of geographic objects must therefore contain a theory of part and whole, or mereology (Smith and Mark, 1998). The same geographic object could be made different definitions in different events, scales, and situations. Thus, the geographic domain often has specific issues regarding ontology primarily because of its unstructure. Then, a standard terminology is not prevalent within the geographic domain and is dependent on the context of use and the user. It causes confusion in specification of universally accepted entities, concepts, rules, relation, and semantics as the basis of a consensual ontology (Agarwal, 2005). While ontologies have been promoted as a mean to improve access to and sharing of existing geographical information resources (Smith and Mark 1998; Fonseca et al., 2000; Kuhn, 2001), to build the geospatial ontology is a definitely significant research work for geographic information retrieval.
Digital gazetteers play many new roles in the information architecture for geo-locational access to information and data. For example, a place name in an everyday discourse can be used to identify location, to get (from computer systems) driving direction, or to find information about an area (Hill, 2000). Gazetteers can be seen as specialized GISes that are tailored to handle a number of specific tasks: (1) indirect geo-referencing; (2) vertical data integration; and (3) handling large data sets (Schlieder et al, 2001). Moreover, digital gazetteers can be defined as geospatial dictionaries of geographic names with the core components of (1) name (could have variant names also); (2) a location (coordinates representing a point, line, or areal location); (3) a type (selected from type scheme of categories for places/features). In analogy to the thesauri used to define thematic concepts, a controlled and structural vocabulary of place names is needed as a basis for spatial information retrieval. Gazetteer links the names of spatial objects to thematic concepts and to their spatial representations, the “geographic footprints” (Schlieder et al, 2001). A place can have multiple footprint representations (Hill, 2000): (1) of different type: a point, bounding box, and a polygon, for example; (2) from different sources; (3) for different time periods (for example, the extents of cities change through time); (4) suitable for varying resolutions (e.g., more detailed vs. more generalized). For accurate information retrieval, a polygon-based footprint representation probably would be an optimal solution. Using exact polygon footprint, full-scale GIS functionality could be applied to select all geographic objects within a given region, to determine neighboring polygons, or to perform other complex spatial queries (Schlieder et al, 2001). In this study, a ontology of place name will be worked out via geographic feature type of place name. We created RDF to implement ontology of place name through Jena API for RDF. With the RDF of place name, the SQL-like query can be operated in Web with GML document of place name.
2. Related Work
In order to facilitate the linking of a place name to related information for geo-referenced digital library, Smith and Crane (2001) developed the toponym-disambiguation method to automatically identify place names. With the popularization of geographical information on the Web, place names become important geographical identifiers for tagging web resources. Volz et al. (2007) established an ontology model of place names via WordNet and the place names’ geographic features; the ontology was then used to disambiguate place names in text. Fu et al. (2005) considered that ontologies play a key role in Semantic Web research and reported how to develop ontologies of place name to support retrieval of documents that are considered to be spatially relevant to users’ queries. Vogele et al. (2003) presented an approach to an intuitive and user-friendly creation and application of spatial metadata that are used for spatial relevant reasoning.
3. Ontology of Place Name
3.1 Geographic Ontology
Ontology, a philosophical tradition, studies the nature of being and existence which could be described via taxonomies for hierarchical classification (Guarino and Giaretta, 1995; Agarwal, 2005). In domain-specific and user-dependent view, ontology is a formal specification of a shared conceptualization of a domain of interest (Gruber, T. 1993). Furthermore, the “conceptualization” is explained as abstract model of some phenomenon in the world by having identified the relevant concepts of that phenomenon; the “formal” refers to the fact that the ontology should be machine-readable; and the “shared” refers to notion that on ontology captures consensual knowledge. While the geographic objects are often complicate, hierarchical, and diversified, geographic information science (GIScience) have to fuse ontology to specify geographic objects and phenomenon to canonical description of geographic knowledge domains. Moreover, geographic ontology should be the formal modeling of the geospatial world as this is experience and conceptualized by non-experts. There are two distinct approaches that applied ontology in GIScience (Kuhn, 2001):
- The philosophical approach for identification of top-level categories from a formal ontology perspective
- The domain-specific and task-oriented approach focused on explicating the actions, terms and relation for particular specification and ranging from natural language to rigorously formal specifications.
3.2 Place name and GML
Digital gazetteers play many new roles in the information architecture for geo-locational access to information and data. For example, a place name in an everyday discourse can be used to identify location, to get (from computer systems) driving direction, or to find information about an area (Hill, 2000). Gazetteers can be seen as specialized GISes that are tailored to handle a number of specific tasks: (1) indirect geo-referencing; (2) vertical data integration; and (3) handling large data sets (Schlieder et al, 2001). Moreover, digital gazetteers can be defined as geospatial dictionaries of geographic names with the core components of (1) name (could have variant names also); (2) a location (coordinates representing a point, line, or areal location); (3) a type (selected from type scheme of categories for places/features). In analogy to the thesauri used to define thematic concepts, a controlled and structural vocabulary of place names is needed as a basis for spatial information retrieval.
Geography Markup Language (GML) is an XML based encoding standard for geographic information, as well as already has been used to store, exchange and model geospatial data. GML is designed for the Internet and directly embraces the ideas of interconnected and distributed information elements (Raper 1999; Lake, 1996, 2001, 2005). GML has three main roles with respect to geographic information. First as an encoding for the transport of geographic information from one system to another; second as a modeling language for describing geographic information types; and third as storage format for geographic information (Lake, 2005).
Therefore, GML is designed for Geo-Web, which has ability to display and transport geographic information, as well as mashup the non-geographic information. Furthermore, Kolas et al. (2005) considered the goal of common geospatial ontology is similar to the creation of GML. In this study, the place name is encoded by GML. In GML application schema, place name has been considered as Feature Collection and included four properties: “boundedBy”, recording the bounding box of place name; “footprint”, a geometry property; “featureType”, describing the categories of place name; and “description”, shown as figure 1. Moreover, figure 2 shows the GML instance of place name in this study.
3.3 The ontology of Place name
The XML representation of GML is standard communication between geospatial applications, but a base geospatial ontology extends its power with the significantly greater expressiveness of OWL and ability to link this data to knowledge outside the geospatial realm. This expands the overall usefulness of the geospatial data while enriching it with complementary information. In this study, RDF (Resource Description Framework) is used to represents information of place name on Web with machine-understandable syntax and semantics. Jena API for RDF, one of popular tool developed by Brian McBride of HP, can parse, create, and search RDF models. The ontology is built by place name’s type which is a category.
4. The Ontology of Place name for information retrieval
The Jena API for RDF not only help users to create the RDF but also is used to query the exist RDF by RDQL (RDF Data Query Language). The following example is a RDQL for querying this study’s place name’s RDF. The query process is shown as Figure 4. If we queried a fourth level city name, Houng-Tou (??), the result shows Houng-Tou is a populated place, as well as a administrative area.
In this study, we reported the experience of creating the ontology of place name serving as a specification of domain knowledge, as well as used the ontology of place-name to information retrieval. The results show the geographic ontology can to rid of ambiguous of geospatial data. It is a common situation that a place name refers to different places and a place has different names. The ontology of place name might be a useful solution to provide exact result in the Web application. However, the ontology of place name built by feature type might solve the terminology problem of place name, but doesn’t figure out the spatial nature of place name. In the next research work, we will expand the ontology of place-name to include spatial relationships.
- Agarwal, P. Ontological considerations in GIScience, International Journal of Geographical Information Science, 19(5): 501-536, 2005.
- Arampatzis, A., van Kreveld, M., Reinbacher, I., Jones, C. B., Vaid, S., Clough, P., Joho, H., and Sanderson, M., Web-based delineation of imprecise regions, Journal of Computers, Environment and Urban Systems, 30 (4), 436-459, 2006.
- Fu, G., C.B. Jones, and A.I. Abdelmoty, Ontology-Based Spatial Query Expansion in Information Retrieval ODBASE: OTM Confederated International Conferences, 4, November, 2005.
- Gruber, T., Ontolingua: A translation approach to portable ontology specifications,. Knowledge Acquisition, 5(2), pp.199-200, 1993.
- Guarino, N. and P. Giaretta, Ontologies and knowledge bases: towards a terminological clarification. In N. Mars (Ed.), Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, pp. 25–32, 1995.
- Hill, L. Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha, J. and Baker, T. eds. Research and Advanced Technology for Digital Libraries: Proceedings of the 4th European Conference, ECDL 2000 Lisbon, Portugal, pp.280-290, 2000.
- Jones, C.B., Alani, H., Tudhope, D. Geographical Information Retrieval with Ontologies of Place. In: Montello, D.R. eds. Proceeding of the Conference on Spatial Information Theory (COSIT), Lectures Notes in Computer Science 2205. Springer, pp. 322-335, 2001.
- Kolas, D., J. Hebelar, and M. Dean, 2005. Geospatial Semantic Web: Architecture of Ontologies, Geospatial Semantics 2005, LNCS 3799, pp 183-194, Eds. M.A. Rodriguez et al., Mexico city, Mexico, Nov. 29-30, 2005.
- Kuhn, W., Ontologies in support of activities in geographical space, International Journal of Geographical Information Science, 15(7):613-631, 2001.
- Lake, R., 1996. Information communities support information sharing. GIS World 9(2): 72-73, 1996.
- Lake, R., Enabling the geo-spatial web. GeoSpatial Solutions, July 2001, Available from URL https://www.geospatial-online.com/geospatialsoluations/article/articleDetail.jsp?id=8156
- Lake, R., The application of geography markup language (GML) to the geological sciences. Computer &Geosciences, 31:1081-1094, 2005.
- Raper, J., GIS without border. GEOEurope 3(8), 22-23, 1999.
- Schlieder, C, Vögele, T., Visser, U. Qualitative Spatial Reasoning for Information Retrieval by Gazetteers. In Montello, D.R. eds. Proceedings of the Conference on Spatial Information Theory (COSIT’01), Lecture Notes In Computer Science, 2205. Springer, pp. 336-351, 2001.
- Smith, D. A. and Crane, G. Disambiguating Geographic Names in a Historical Digital Library. In Proceedings of the 5th European Conference on Research and Advanced Technology For Digital Libraries. P. Constantopoulos and I. Sølvberg, Eds. Lecture Notes In Computer Science, vol. 2163. Springer, pp. 127-136, 2001.
- Stefanakis, E. NET-DBSCAN: Clustering the Nodes of a Dynamic Linear Network, International Journal of Geographical Information Science, 21(4): 427 – 442, 2007.
- Tezuka, T., Yokota, Y., Iwaihara, M., Tanaka, K. Extraction of Cognitively-Significant Place Names and Regions from Web-Based Physical Proximity Co-occurrences. In: Zhou, X. et al. eds. Web Information Systems – WISE 2004. Lecture Notes in Computer Science, 3306. Springer, pp. 113-124, 2004.
- Vögele, T., C. Schlieder, U. Visser, Intuitive Modelling of Place Name Regions for Spatial Information Retrieval. In Conference on Spatial Information Theory – COSIT’03, Lecture Notes in Computer Science, 2825, Springer, pp. 239-252, 2003.
- Volz, R., Kleb, J., and Mueller, W. Towards Ontology-based Disambiguation of Geographical Identifiers, The 16th International World Wide Web conference, May 8-12, 2007, Banff. Alberta, Canada.