In one of the discussions I had with Ed Parsons from Google on making SDI’s work, data discoverability and accessibility was one of the key areas that came up. So Why can’t we make search and discovery of digital maps, spatial datasets as easy as going on to a search engine like Google and finding data? The key challenge to this is now being addressed by Google Search Science Datasets Schema based on Schema.org.
Why is spatialdata not discoverable by search engines?
This is one of the key challenges we have till now. Considering the way spatial data is published generally,
- one over a website where a zip file is uploaded (Shapefiles of India – Download the Zip File here)
- second – via Spatial Data Infrastructures (SDIs) where a catalog of spatial data available along with metadata describing the data
There are two clear challenges in both of these two approaches.
- Google cannot tell to the user, for example, last this data was updated, thematic information of the data etc.
- Google cannot really index an SDI catalog and expose the data to the global audience (correct me if I am wrong!)
In both of these two cases, there are limitations for making data available and accessible.
How Google Search Science Datasets Schema will help?
The aim of Google is to make the data from scientific domains managed and maintained by governments to be readily findable based on the characteristics of the data.
Ex: Temperature data for India should not return a list of web pages, rather should only return those pages that are really publishing such data, and pointing the user to such datasets.
What is the logic of Google behind scientific schema?
According to Google “When webmasters provide structured markup, they enable search engines to “understand” this metadata, which in turn improves data discovery, leading scientists to the information they need for their work.”
What kinds of datasets does Google Search Science Datasets Schema support?
Currently, Google seems to be supporting the following types of data:
- A table or a CSV file (Example here)
- Any file in a proprietary format that contains data (There is no clarity if this means Shape Files, AutoCad Files etc.)
- Object-based data types
- Images capturing the data (if this also means satellite imagery)
What kind of metadata does the Google Support?
As on date (1 Oct 2016), the Google Scientific data supports the following properties derived from schema.org
- Basic dataset properties – Name, URL, version, Keywords
- Data Catalogs –
- Download Information – URL, File Format info
- Temporal Coverage – Date, Time
- Spatial Coverage – Point, area, named locations (cities)
- Licence Information
- Citation Information
How to join the Google Scientific Data Schema Program?
Currently, under experimentation mode, this is more suitable for government agencies, international development agencies, data providers to make their scientific and geospatial oriented datasets easily accessible.
You can join this program by expressing interest, and submitting URL of an entry point into your dataset repository along with Sample dataset URL. Click this link to go to the form suggested by Google to enable testing your data.