# Massive Operations on Rasters with QGIS3 and Python - Tutorial

There are new available tools and resources to understand climate change, land use dynamics, water cycle and other parts of our physical environment. Many spatial data come on raster format and are available on web servers, those servers have a image register every year, every month, every day, hour, half hour or minute. If we want to assess a physical phenomena we have to be able to analyze large set of data.

Traditional GIS approach and tools were focussed in performing a spatial algorithm on a single layer or a small group of layers, however, we need to develop and use tools to increase our data manipulation capabilities to undefined number of layers; afortunately this can be done on the same QGIS software with Python console.

If we deal with raster data, it is important to remember that all rasters are in fact matrices with a spatial georeference. When the amount of required operation to rasters is huge, it is preferable to convert the rasters to matrixes (or 2D Numpy arrays), to perform operations with Numpy tools and then to geotransform them to matrixes. Concerning the spatial suitability of our matrix algebra, all raster files need to have the same raster resolution, projection and dimesion.

## Python code

Python code capable to do multiple operation on rasters is not pretended to be long with scripts in the order of 40 to 60 lines upon the level of the required analysis. Most of the Python efficiency for raster analysis come from the loops, conditionals, matrix operations in Numpy and tools to open and write files. There are two versions of the Python code, one for HDF5 files and the other for Tiff files, that works almost on the same direction. Here is the HDF5 version:

```import gdal, os, osr
import numpy as np
from qgis.core import *
import qgis.utils

os.chdir('C:\\Users\\USER\\Documents\\Infohataris\\MassiveOperationwithRasterswithPyQGIS\\HDF5')
print(os.listdir(os.getcwd()))

sum_array = np.zeros([1800,3600])

band_ds = gdal.Open(hdf_ds.GetSubDatasets()[5][0], gdal.GA_ReadOnly) #choose the 5th dataset that corresponds to precipitationCal
band_array[band_array<0] = 0 #filter all NaN values that appear as negative values, specially for the tiff representation
sum_array = band_array.T[::-1]*0.5 + sum_array

geotransform = ([-180,0.1,0,90,0,-0.1])
raster = gdal.GetDriverByName('GTiff').Create("../Sum/SumImagesHdf.tif",3600,1800,1,gdal.GDT_Float32)
raster.SetGeoTransform(geotransform)
srs=osr.SpatialReference()
srs.ImportFromEPSG(4326)
raster.SetProjection(srs.ExportToWkt())
raster.GetRasterBand(1).WriteArray(sum_array)
raster = None

```

## Tutorial

This tutorial deal with the sum of 48 raster files with some matrix operations as transpose and row inverse order. Dataset come from the IMERG precipitation images every half hour for a one day (January 1 2017).

## Input data

### Saul Montoya

Saul Montoya es Ingeniero Civil graduado de la Pontificia Universidad Católica del Perú en Lima con estudios de postgrado en Manejo e Ingeniería de Recursos Hídricos (Programa WAREM) de la Universidad de Stuttgart con mención en Ingeniería de Aguas Subterráneas y Hidroinformática.

Subscribe to our free e-newsletter for tutorials, articles, webminars, courses and more.