How to calculate the Root Mean Square Error (RMSE) of an interpolated pH raster?


The root mean square error (RMSE) has been used as a standard statistical parameter to measure model performance in several natural sciences. The parameter indicates the standard deviation of the residuals or how far the points are from the regression or modelled line. The following figure shows the residuals as green arrows and its location between the point data and the regression line. 


To calculate the RMSE, the following equation is used



  • n: number of samples
  • f: forecasts
  • o: observed values

The RMSE is a good indicator to evaluate the performance of the interpolation exercises. This tutorial will show how to interpolate pH values in QGIS and how to evaluate the interpolation using the RMSE value. For this instance, the forecasts will be the interpolated values and the observed values will be the samples.

For this exercise, we will divide the point data as follows: 80% for the interpolation process and 20% for Ground Control Points (GCP). To do so, go to the processing toolbox and look for the Random selection tool. The input layer will be the pH one, the selection method will be Percentage of selected features and we will write 20. Click on Run



Right click on the pH layer and Save as GCP, making sure to click on Save only selected features and save it as GCP20.shp



Open the attribute table of pH and click on invert selection.



Right click on the pH layer and Save as Data_for_interpolation.shp. This layer will contain the 80% remaining of the data and we will use it to interpolate the data.



To interpolate the pH values, go to QGIS and look for the IDW interpolation tool in the Processing toolbox.



Select the layer that contains the pH information and the attribute that contains the pH values



Click on the three points next to extent and click on Select extent on canvas



Draw a rectangle around the study area



Click on Run in background



Now, install the Point Sampling Tool. Go to Plugins/Install plugins and look for Point Sampling Tool. Click on install plugin.



Go to Plugins/Analyses/Point sampling tool



Select the layer containing sampling points: GCP20 and the layers with fields to get values from will be GCP20 : pH (source point)  and the Interpolated: Band 1 (raster)



Click on Browse in the Output point vector layer section and save the file as RMSE.shp



Open the attribute table of the RMSE layer and click on open field calculator



Now we will start calculating the RMSE, to do so, we will calculate the residuals of each ground control point. The GCPs are the observed values and the interpolated values are the forecasts. Therefore, we have to subtract the forecasts from the observed values and square them. Open the attribute table of the RMSE and then Click on Create a New Field, the output field name will be SE and the output field type will be Decimal number (real). Write the following in the expression box: (“Interpolat”-“pH”)^2



Click on toggle editing mode and save changes.



Look for Basic statistics tool in the Processing toolbox and open it.



The input layer will be RMSE.shp and the Field to calculate statistics will be SE. Save the file as pH_SE_stats.



Open the pH_SE_stats and look for the mean value



The mean values for this GCP is 0.3047, now calculate the square root of 0.3047 and the RMSE will be the result. For this instance, the result is 0.552. Therefore, the RMSE of the pH interpolated layer is 0.552.




Input files

You can download the input files for this tutorial here.

Saul Montoya

Saul Montoya es Ingeniero Civil graduado de la Pontificia Universidad Católica del Perú en Lima con estudios de postgrado en Manejo e Ingeniería de Recursos Hídricos (Programa WAREM) de la Universidad de Stuttgart con mención en Ingeniería de Aguas Subterráneas y Hidroinformática.

Smiley face

Subscribe to our free e-newsletter for tutorials, articles, webminars, courses and more.