Massive Operations on Rasters with QGIS3 and Python - Tutorial


There are new available tools and resources to understand climate change, land use dynamics, water cycle and other parts of our physical environment. Many spatial data come on raster format and are available on web servers, those servers have a image register every year, every month, every day, hour, half hour or minute. If we want to assess a physical phenomena we have to be able to analyze large set of data.

Traditional GIS approach and tools were focussed in performing a spatial algorithm on a single layer or a small group of layers, however, we need to develop and use tools to increase our data manipulation capabilities to undefined number of layers; afortunately this can be done on the same QGIS software with Python console.

If we deal with raster data, it is important to remember that all rasters are in fact matrices with a spatial georeference. When the amount of required operation to rasters is huge, it is preferable to convert the rasters to matrixes (or 2D Numpy arrays), to perform operations with Numpy tools and then to geotransform them to matrixes. Concerning the spatial suitability of our matrix algebra, all raster files need to have the same raster resolution, projection and dimesion.


Python code

Python code capable to do multiple operation on rasters is not pretended to be long with scripts in the order of 40 to 60 lines upon the level of the required analysis. Most of the Python efficiency for raster analysis come from the loops, conditionals, matrix operations in Numpy and tools to open and write files. There are two versions of the Python code, one for HDF5 files and the other for Tiff files, that works almost on the same direction. Here is the HDF5 version:

import gdal, os, osr
import numpy as np
from qgis.core import *
import qgis.utils

totallinks = os.listdir(os.getcwd())

hdflinks = []
for link in totallinks:
    if link[-4:] == 'HDF5':

sum_array = np.zeros([1800,3600])

for link in hdflinks:
    hdf_ds = gdal.Open(link, gdal.GA_ReadOnly)
    band_ds = gdal.Open(hdf_ds.GetSubDatasets()[5][0], gdal.GA_ReadOnly) #choose the 5th dataset that corresponds to precipitationCal
    band_array = band_ds.ReadAsArray()
    band_array[band_array<0] = 0 #filter all NaN values that appear as negative values, specially for the tiff representation     
    sum_array = band_array.T[::-1]*0.5 + sum_array

geotransform = ([-180,0.1,0,90,0,-0.1])
raster = gdal.GetDriverByName('GTiff').Create("../Sum/SumImagesHdf.tif",3600,1800,1,gdal.GDT_Float32)
raster = None




This tutorial deal with the sum of 48 raster files with some matrix operations as transpose and row inverse order. Dataset come from the IMERG precipitation images every half hour for a one day (January 1 2017).


Input data

Download the input data on this link.

Saul Montoya

Saul Montoya es Ingeniero Civil graduado de la Pontificia Universidad Católica del Perú en Lima con estudios de postgrado en Manejo e Ingeniería de Recursos Hídricos (Programa WAREM) de la Universidad de Stuttgart con mención en Ingeniería de Aguas Subterráneas y Hidroinformática.

Smiley face

Subscribe to our free e-newsletter for tutorials, articles, webminars, courses and more.