Stratified one-stage cluster sampling using GIS for surveys

Stratified one-stage cluster sampling using GIS for surveys

SHARE

Rural sample surveys in the dairy sector acts as one of the vital inputs for formulating business plans and strategies and therefore, generating statistically robust estimates of critical parameters to ascertain the potential pockets and geographies through scientifically planned sample survey is crucial. Sampling technique using GIS technology for formulating quadrants at the sub-district level not only ensures proper geographic spread of the sample, but also gives robust estimates of the parameters

Backdrop

Rural surveys play an important role in the dairy sector for preparation of various business plans to initiate/ expand the business operations spread over large geographical areas. For devising plans and strategies, the critical indicators required to ascertain the potential pockets and geographies are not available at micro level through secondary sources and hence, estimation of the same through scientifically planned surveys is vital.

Generally, the administrative boundaries of the geography are taken as a base for such surveys and the required indicators are estimated. Conventionally, multi-stage stratified random sampling technique is used to estimate the same. However, the only limitation of such technique being it does not take into account the nature and shape of the geography and hence, ensuring proper spread in a sample is a matter of chance.

GIS technology is generally used at the time of sampling to overcome the above limitation. One of the common sampling methods is area sampling or geographical cluster sampling, wherein the area to be sampled is sub-divided into smaller blocks for further sampling. In area sampling, different methods like (i) systematic point sampling, (ii) systematic line sampling and (iii) systematic area sampling are followed. The same principle has been used for the present exercise; however, suiting to specific needs and requirements, the same has been further fine-tuned using different GIS techniques.

Methodology

The sampling process involving GIS has been used to estimate the incidence of milch animal ownership among rural households and milk production at the sub-district (i.e., tehsil) level, which are not available otherwise through secondary sources. The shape of any tehsil is not regular and also, the distribution of the villages within the tehsil is uneven. Therefore, if one opts for traditional sampling procedure, it would not take into consideration the shape of the tehsil or concentration of villages.

In order to address such constraints, the tehsil has been cut into four equal parts based on almost equal area distribution principle following area sampling methodology. While doing so, the shape of the tehsil, concentration of villages and village size (i.e., households per village) has been taken into consideration to ensure that each of the four parts contains almost the same number of villages.
The following process has been adopted to cut the tehsil into four parts (i.e., quadrants).

Esri’s Spatial Statistics utility called “Directional Distribution (Standard Deviational Ellipse)” has been used to find out the directional distribution of villages within tehsil. The snapshot of the utility is given below.

Figure 1: Snapshot of the process for directional distribution utility

Upon selecting the method for measuring geographic distribution, the following inputs are required to perform the operation.

Table 1: Input parameters for directional distribution

The output of the process draws the directional ellipse for the tehsil by measuring the directional trend of the features within a specified boundary. The output also gives resultant parametric values as given below, which are subsequently used as inputs for the process of cutting tehsil into four parts.

The graphical representation of the attribute parameters along with the illustrative ellipse is presented in Graph 1.

Graph 1: Illustrative directional ellipse

From the above resultant parameters, the vertices and co-vertices of an ellipse are calculated on the basis of following trigonometric principle.

Step 1
The co-ordinates for vertices (A & A’) and co-vertices (B & B’) are calculated as below.
If ø < 90 and Y >X i.e., Y is the major axis

Step 2
The major and minor axes are drawn by joining the vertices (A & A’) and co-vertices (B & B’) using the functionality ‘Add XY Line Data From Table’ of Hawth’s Tool provided under ‘Table Tools’. The snapshot of the same is given in Figure 2.

Figure 2: Use of Hawth’s tool

Step 3
Taking these lines drawn in above step 2 as reference, each polygon was cut manually by using Esri’s “Cut Polygon features” task under the ‘Modify Tasks’ of Editor toolbar of ArcGIS desktop.
The process flow of the entire exercise is given below (Figure 3).

Figure 3: Process flow

Sample selection

Having cut the tehsil into four parts, villages belonging to each part have been categorized into two part based on average village size (i.e., average households per village) viz., a) village size having more than average village size of respective quadrant and b) village size having less than average village size of respective quadrant.

Within each category of village size, three sampling schemes can be applied viz., random sampling, stratified sampling and systematic sampling. However, the drawback with these sampling techniques is that they do not take into account the spatial phenomenon of the data into account . Therefore, in each of the village size categories, the villages are sorted on the basis of village census codes. Census of India has assigned the village codes keeping in mind the spatial distribution of villages. Subsequantly, two villages from each category have been randomly selected. Therefore, four villages from each quadrant and total 16 villages from the tehsil are randomly selected to arrive at the estimates at the tehsil level. Within the sample villages, all households are surveyed irrespective of their animal ownership status.

Result

It is found that the results of the sample survey where in the sample selection has been undertaken using GIS technique provides statistically robust estimates at the tehsil level, where the percentage standard error of the estimate is in the range of ± 10-20 percent.

Conclusion

Rural sample surveys in the dairy sector acts as one of the vital inputs for formulating business plans and strategies and therefore, generating statistically robust estimates of critical parameters to ascertain the potential pockets and geographies through scientifically planned sample survey is crucial.

Conventionally, multi-stage stratified random sampling technique is used to estimate the same. However, it does not take into account the size, shape and distribution of villages during sample selection. Sampling technique using GIS technology for formulating quadrants of the tehsil not only ensures proper geographic spread of the sample, but also gives robust estimates of the parameters.