Col (Dr.) R Siva Kumar
Member Secretary, NGDI Task Force, Government of India
In the context of National Spatial Data Infrastructure (NSDI), a number of people including fellow professionals such as surveyors, cartographers and geospatial information scientists, often ask me, by taking me aside, “what do you mean by metadata?” This question has prompted me to share my thoughts about metadata.
Mapmakers have been using this concept of metadata for over a long time and at least for about 100 years, since 1905 when the era of modern surveys started in India. When we closely scrutinise the map, a large amount of information is found outside the border of the map. The concept of metadata is also familiar to most people who deal with spatial issues. A map legend is one representation of metadata, containing information about the publisher of the map, the publication year, year of survey, the type of map, a description of the map, spatial references, the map’s scale and its accuracy, among other things.
When we migrated to digital domain, an information file is appended to the digital data file giving the metadata. They are a common set of terms and definitions to use when documenting and using geospatial data. Most digital geospatial files now have some associated metadata. In the area of geospatial information or information with a geographic component this normally means the What, Who, Where, Why, When and How of the data. The only major difference that therefore exists from the many other metadata sets being collected for libraries, academia, professions and elsewhere is the emphasis on the spatial component – or the where element.
The word metadata shares the same Greek root as the word metamorphosis. ‘Meta’ means change and metadata, or ‘data about data’ describe the origins of and track the changes to data. Metadata is the term used to describe the summary information or characteristics of a set of data. This very general definition includes an almost limitless spectrum of possibilities.
The term metadata has become widely used over the past 15 years, and has become particularly common with the popularity of the World Wide Web. But the underlying concepts have been in use for as long as collections of information have been organised. Library catalogues represent an established variety of metadata that has served for decades as collection management and resource discovery tools.
The Benefits of Metadata
Metadata helps people who use geospatial data find the data they need and determine how best to use it. Metadata benefit the data-producing organisations as well. With personnel change in an organisation, undocumented data may lose their value. Later workers may have little understanding of the contents and uses for a digital database and may find they can’t trust results generated from these data. Lack of knowledge about other organisations’ data can lead to duplication of effort. It may seem burdensome to add the cost of generating metadata to the cost of data collection, but in the long run the value of the data is dependent on its documentation.
Levels of Metadata
There are different levels that metadata may be used for:
Discovery metadata: What data sets hold the sort of data I am interested in? This enable organisations to know and publicise what data holdings they have.
Exploration metadata: Do the identified data sets contain sufficient information to enable a sensible analysis to be made for my purposes? This is documentation to be provided with the data to ensure that others use the data correctly and wisely.
Exploitation metadata: What is the process of obtaining and using the data that are required? This helps end users and data providing organisations to effectively store, reuse, maintain and archive their data holdings. Each of these purposes, while complementary, requires different levels of information. As such organisations should look at their overall needs and requirements before developing their metadata systems. The important aspect is for agencies to establish their business requirements first, the content specifications second and the technology and implementation methods third. This is not to say that these levels of metadata are unique. There is a high degree of reuse of the metadata for each level and an organisation will design its metadata scheme and implementation based on its business needs to accommodate these three requirements.
Discovery Metadata is the minimum amount of information that needs to be provided to convey to the inquirer the nature and content of the data resource. This falls into broad categories to answer the ‘what’, ‘why’, ‘when’, ‘who’, ‘where’ and ‘how’ questions about geospatial data.
What – title and description of the data set.
Why – abstract detailing reasons for the data collection and its uses.
When – when the data set was created and the update cycles if any.
Who – originator, data supplier, and possibly intended audience.
Where – the geographical extent based on latitude / longitude, co-ordinates, geographical names or administrative areas.
How – how it was built and how to access the data.
The broad categories are only few in number to reduce the effort required to collect the information at the same time conforming to the requirement to convey to the inquirer the nature and content of the data resource. Online systems for handling metadata need to rely on their (metadata is plural, like data) being predictable in both form and content. The level of metadata detail that will be documented is dependent on the type of data held and the methods that it is being accessed and used. Different types of data (e.g. vector, raster, textual, imagery, thematic, boundary, polygon, attribute, point, etc.) will require different levels and forms of metadata to be collected. However, there is still a high degree of compatibility between most of the metadata elements required.
Not only can metadata content vary according to purpose; it can also vary according to scope of the data being defined. Discovery metadata usually, relates to collections of data resources or data set series that have similar characteristics but relate to different geographic extents or times. A map series is the commonest example but it can equally be applied to statistical surveys. More detailed metadata may be applied to a collection or series but may apply to an individual data set (e.g. one map tile).
Exploration metadata provides sufficient information to enable an inquirer to ascertain that data fit for a given purpose exists, and a linkage for more information. Thus, after discovery, more detail is needed about individual data sets, and more comprehensive and more specific metadata is required. If the data are transferred as a single data set then quite specific and detailed metadata is needed possibly down to the feature, object or record level. Exploration metadata include those properties required to allow the prospective end user know whether the data will meet general requirements of a given problem.
Exploitation metadata include those properties required to access, transfer, load, interpret, and apply the data in the end application where it is exploited. This class of metadata often includes the details of a data dictionary, the data organization or schema, projection and geometric characteristics, and other parameters that are useful to human and machine in the proper use of the geospatial data. These roles form a continuum in which a user cascades through a pyramid of choices to determine what data are available, to evaluate the fitness of the data for use, to access the data, and to transfer and process the data. The exact order in which data elements are evaluated, and the relative importance of data elements, will not be the same for all users.
Metadata for NSDI
NSDI Task Force recognised the importance of metadata and constituted a working group with Mr Mukund Rao (ISRO) as Convenor, drawing professionals from various organisations including Department of Space, Department of Science & Technology, Survey of India, Geological Survey of India. The working group has brought out a draft metadata which is being discussed and finalised. The final draft will be kept on the web for peer review before acceptance.
GSDI Cookbook, Version 1.1n
Sum & Substance
- The concept of metadata is familiar to most people who deal with spatial issues.
- When we migrated to digital domain, an information file is appended to the digital data file giving the metadata.
- Metadata helps people who use geospatial data find the data they need and determine how best to use it.
- The word metadata shares the same Greek root as the word metamorphosis.
- There are different levels that metadata may be used for: discovery metadata, exploration metadata and exploitation metadata.
- Discovery Metadata is the minimum amount of information that needs to be provided to convey to the inquirer the nature and content of the data resource.
- Exploration metadata provides sufficient information to enable an inquirer to ascertain that data fit for a given purpose exists, and a linkage for more information.
- Exploitation metadata include those properties required to access, transfer, load, interpret, and apply the data in the end application where it is exploited.
- NSDI Task Force recognised the importance of metadata and constituted a working group