S. D. Sharma, Randhir Singh and Anil Rai
Indian Agricultural Statistics Research Institute, New Delhi
National agricultural production, on a sustainable basis, depends on the judicious use of natural resources like soil, water, animal resources, crop/plant genetic resources, etc. with an acceptable technology management under prevailing socio-economic infrastructure. In order to achieve an economically sound society, environmentally benign development and judicious utilization of natural resources, it is necessary that a comprehensive information system be developed to provide systematic and periodic information to the planners, decision-makers and developmental agencies. The animal and plant genetic resources of India are of great global importance. These resources need proper evaluation that can be done through interactive interpretation in a relational database system. Also, digitizing socio-economic database along with biophysical factors is important to objectively monitor and evaluate the current and future agricultural growth and development.
Many agencies have developed various information systems, which include databases on different resources. Space Application Center (SAC) has developed Natural Resources Information System (NRIS). The database covers information on various soil types and water bodies of the entire country. It includes both the spatial and non-spatial databases. Department of Science and Technology (DST) has developed a Natural Resources Data Management System (NRDMS) with the aim of developing and demonstrating the use of spatial decision support for integrated planning and management of resources for micro level planning. Under NRDMS they have also developed a user-friendly Geographic Information System (GIS) package viz. GeoReferenced Area Management (GRAM) for entry, storage, manipulation, analysis and display of spatial data on a low cost computer configuration.
Thus, the development of a data warehouse on soil, water, climate, animal, fisheries, crops and cropping system along with socio-economic and geographical features on a single platform, and to evolve methodologies to interpret the inter-linked data through the Central Data Warehouse (CDW) for planning and development purposes. Therefore a nation wide project named Integrated National Agricultural Resources Information System (INARIS) has been initiated at Indian Agricultural Statistics Research Institute (IASRI), Indian Council Of Agriculture Research (ICAR) funded through National Agriculture Technology Mission (NATP). In this project various existing databases developed or being developed at various centers will be integrated and database on critical data gaps of various fields will be designed for important parameters in respective field and an operational and flexible warehouse for agricultural resources so that it can be expanded further, in future as per requirements.
Data Warehouse- A strategic decision making tool
The data warehouse is an architectural construct that addresses the growing need of information for enterprise-wide data access. It functions as a core for decision support processing at the strategic/ managerial level, separate from day to day operational data. It is not software specific and can be used in any computing environment. Ralph Kimball defines data warehouse as “a copy of information data specifically structured for query and analysis”.
The data in a data warehouse can be seen as a set of materialized views derived from source data, where the source data can be relational in the operational database or other non-traditional data such as data files, legacy systems and document data. Since the raw data normally changes over time, materialized views in a data warehouse have to be updated to ensure consistency with the source data. Data warehousing is more than a database and involves the entire information delivery process i.e., from access and transformation of data from different operational sources, through the process that makes it available for decision making, to the exploitation of the retrieved data via a range of decision support tools.
The growing need for the data warehousing technology in recent years has stemmed because of the technology’s importance in supporting decision support processing and analysis. A specific property of data warehouse that makes efficient application processing is that most of the applications are decision support oriented applications that need to summarize huge amounts of data. The growing trend in data warehouse architecture is to store the data both in the data warehouse and in several data marts, where each data mart contains the data pertaining to a particular domain of the organization’s operations.
Databases to be included in the Datawarehouse
In all there are thirteen different institutions/centers of ICAR associated with this project which will be covering databases on crops, cropping systems, plantation crops, horticultural crops, agro-forestry, agricultural farm mechanization, animal genetic resource, plant genetic resource, fish genetic resource, soil, water, spices, climatic parameters together with socio-economic databases relevant to agriculture research and education. The entire information system will include several databases, which can be broadly divided into following major categories.
This database will cover information about all the research projects carried out in the entire National Agriculture Research System (NARS) which includes all the ICAR institutes related to research in crop sciences, animal sciences, Fisheries etc.
State Agriculture Universities (SAUs), regional stations, Project Directorates, National Research Centers (NRCs), National Bureaux, All India
Coordinated Research Projects (AICRP)and network of Krishi Vigyan Kendras (KVKs) etc. This database will include both the completed and ongoing projects. Various fields covered in this database will be Title of the project, location, objectives, year of start and end along with silent findings and funding agency.
All the major technologies and management practices developed for growing various crops, managing water and soil resources, management practices adopted for rearing livestock, fisheries etc. will be included in this database. The database will also include the preventive measures which should be adopted incase of any pest and diseases attack on crops, information regarding soil and climate suitable for various crops, their nutritional requirements etc.
This database will provide information on different agricultural statistics like area, yield and production of various crops, plantation crops like cashew, coconut, tea coffee etc., important fruit crops and spices. This will also include commodities like milk, meat, eggs, wool, and fish production. This database will be covering data regarding import, export, consumption and prices of important agricultural commodities.
Major Data Sources
The major sources from which the data for developing various databases will be obtained are the Census records from population census, agricultural census and livestock census, administrative records, financial records, institutional libraries, computer networks, reports of sample surveys conducted at national/state/district level, remote sensing satellites and other publications. Spatial data in the form of digitized maps of administrative units like state, district, tehsil/block/mandal and other biophysical maps will be used for GIS database.
The information system will consist of several integrated sub-systems for input, storage, retrieval, analysis and output based on strong database design with its essential functions. Besides this it will include other functions such as manipulation and dissemination of information to various users. The information system, composed of set of files for use in a RDBMS and GIS will be capable of delivering accurate, useful and timely information to various applications. Design of spatial and non-spatial database will have specifications of different data fields, their logical array and inter-relationship with subsystem database.
Process Flow within the CDW
There are four major processes that take place within the CDW. These processes are Extraction and loading of data, Cleaning and transforming the data into a form that can cope with large data volumes and provide good query performance, Back up and archive data and Query management.
Data extraction will take data from source systems and make it available to the CDW while data load will take extracted data and loads it into the CDW. Data in operational system is be held in a form suitable for that system and whenever data is extracted from the physical database, the original information content will have to be modified and extended over the years in order to support the data/ performance requirement of the operational system. Before loading the data into the CDW, this information system will be reconstructed. The data from different sources needs consistency checks, which will be executed whereever required.
The process of cleaning and transforming data in CDW cleans and transform the loaded data into such a structure which helps in making efficient querying. The data is then partitioned in order to speed up queries. This optimizes the hardware performance and simplifies the management of the CDW. Further aggregations are created to speed up the common queries. Once the data will be cleaned, then the next task will be taken up within the cleaned-and-transformed process to convert the source data in the temporary data and store it into a structure that is designed to balance query performance and operational cost.
Backup and archive data implies that the data within the CDW will be backed up regularly in order to ensure that the CDW can always recover the data if there is loss or software failure or hardware failure. In archiving, older data will be removed from the system in a format that allows it to be quickly restored if and when required.
The query management process is a system process that manages the query and speeds them up by directing the queries to the most effective data source. This process will ensure that all the system sources are used in the most effective way, usually by scheduling and execution of queries. The query management process will be required to monitor the actual query profiles. This information will be used by the CDW to determine which aggregation to generate. CDW that contains summary data will provide a number of distinct data sources to respond to a specific query. These will be the detailed information itself, and any number of aggregations that satisfy the query information needs.
Thus the development of CDW under Integrated National Agricultural Resources Information System (INARIS) project will improve the quality of research and planning, reduce the duplication of research efforts, encourage dissemination of research findings, facilitate qualitative research supported by agricultural databases. Besides this it will help in the development of Decision Support Systems (DSSs) which in turn can be used as effective tools for agricultural research and education planning. Further, it will also help in developing effective linkages with other national and international organizations in sustainable development.