# Chapter 4 Spatial Analyses

Spatial analyses is the process of examining the locations, attributes, and relationships of features in spatial data through overlay and other analytical techniques in order to address a question or gain useful knowledge. Spatial analysis extracts or creates new information from spatial data.

## 4.1 Basic Geoprocessing

Geoprocessing is any GIS operation used to manipulate data. A typical geoprocessing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset, also referred to as derived data.

Common geoprocessing operations are geographic feature overlay, feature selection and analysis, topology processing, and data conversion. Geoprocessing allows you to define, manage, and analyze geographic information used to make decisions.

In other words, any alteration or information extraction you want to perform on your data involves geoprocessing.

### 4.1.1 Filtering data

Protected areas (PAs - national parks, wildlife sanctuaries, and reserve forests) are planned and managed with the prime objective of biodiversity conservation. Almost all PAs have human habitations abutting their boundaries, which intricately interact with wildlife and habitats within. Therefore, population growth is a major issue for the managament. It will trigger the change in landcover and land use, thus increasing the pressure to PAs.

Although the effect of PAs on their human neighbors is a widely debated issue, the importance of managing such interplays, positive and negative, is vital in sustaining the ecological setting of the PAs (Mohd. Zeeshan 2017). Due to this reason, mapping the distribution of the human settlement is considered as one of the first steps in managing protected areas.

In this training, human settlements represented as village points in OpenStreetMap will be extracted by query or filtering data. The steps are shown as follows:

2. Right-click at the layer and select Open Attribute Table.

1. Click Select features by expression button
1. Select place under Field and Values option. Click All unique button to show available content in the Place column. Type the expression “place” = ‘village’ and click Select features button at the bottom to filter the data.
1. Selected records will be highlighted.
1. Right-click at the osm_points layer and choose “Save as”

2. Fill in the options as shown in Figure 4.6. Don’t forget to choose Save only selected features.

To change the Coordinate Reference System (CRS), click the world icon on CRS option and type 32648 in the Filter box.

### 4.1.2 Dissolve

One or more attributes can be specified to dissolve (merge) only geometries belonging to the same class (having the same value for the specified attributes), alternatively all geometries can be dissolved.

All output geometries will be converted to multi geometries. In case the input is a polygon layer, common boundaries of adjacent polygons being dissolved will get erased.

Adminstrative boundary downloaded from GADM contains three levels of boundary, (1) country, (2) province and (3) district. For this practice, we will merge the district boundaries to provincial level.

Follow along the steps explained below to merge the shape based on attribute:

1. Select Vector -> GeoProcessing Tools -> Dissolve
1. Click … on Unique ID Field, select NAME1 and click OK.
1. In the Dissolve menu, click Run in the background to start the process. The result will be shown in the Map Display.

### 4.1.3 Polygon from layer extent

The following lesson sometimes failed in QGIS 3. You can try the process in QGIS 3 first and if it is failed, try it in QGIS 2.18

The tools is useful when we want to clip raster or vector by using a bounding box of certain feature.

Follow along the steps below to extract extent from polygon:

1. Select Vector -> Research Tools -> Extract layer extent

2. You can set the output as temporary layer or save it as a file.

1. The result will be a rectangle polygon based on top-left and bottom-right coordinate of the feature used in the process.

### 4.1.4 Reclass

A very useful technique to work with raster data is changing their values or grouping them into categories.

We will classify the elevation in three groups:

- Lower than 1,000m

- Between 1,000 and 2,000m

- Higher than 2,000m

To do this, follow these steps:

1. Open srtm_58_09.tif

2. Open the r.reclass from the Processing Toolbox menu. Fill in the options as shown in the figure and click Run.

1. This will create the reclassified layer

Values for each cell are compared with the range limits in the lookup table, considering the specified comparison criteria. Whenever a value falls into a given range, the class value specified for this range will be used in the output layer.

## 4.2 Terrain analyses

Certain types of rasters allow you to gain more insight into the terrain that they represent. Digital Elevation Models (DEMs) are particularly useful in this regard. In this lesson you will use terrain analysis tools to find out more about the study area for the proposed residential development from earlier.

### 4.2.1 Preparation

1. Open srtm_58_09.tif. This layer contains a DEM in the EPSG:4326 CRS and elevation data in feet. These characteristics are unsuitable to run most terrain analysis algorithms, so we will modify this layer to get a suitable one.

2. Reproject the layer to the EPSG:32648 CRS, using the Save as… entry in the context menu that appears by right-clicking on the layer name.

3. Open the resulting reprojected layer.

There are other things that must be taken into account when running a terrain analysis algorithm to ensure that results are correct.

One common problem is dealing with different cell sizes. An assumption that is made by most terrain analysis algorithms (and also most of the ones not related to terrain analysis) is that cells are square. That is, their horizontal and vertical values are the same. This is the case in our input layer (you can verify this by checking the layer properties), but it may not be true for other layers.

In this case, you should export the layer and define the sizes of the cells of the exported layer to have the same value. Right-click on the layer name and select Save as…. In the save dialog that will appear, enter the new sizes of the cells in the lower part of the dialog:

### 4.2.2 Slope

Slope is one of the most basic parameters that can be derived from a DEM. It corresponds to the first derivative of the DEM, and it represents the rate of change of the elevation. It is computed by analyzing the elevation of each cell and comparing this with the elevation of the surrounding ones. This recipe shows you how to compute slope in QGIS.

1. In the Processing Toolbox option, find the Slope algorithm and double-click on it to open it.
1. Select the DEM in the Input layer field.

2. Click on Run to run the algorithm.

A hillshade layer is commonly used to enhance the appearance of a map and display topography in an intuitive way, by simulating a light source and the shadows it casts. This can be computed from a DEM by using this recipe.

1. In the Processing Toolbox option, find the Hillshade algorithm and double-click on it to open it.
1. Select the DEM in the Input layer field. Leave the rest of the parameters with their default values.

2. Click on Run to run the algorithm.

## 4.3 Density Analyses

We often need to work with large and dense datasets in which there is a lot of overplotting and features experience significant overlap. Such datasets may be very slow at rendering because they contain thousands, or even millions, of features, and very difficult to interpret because overlaps make it difficult to detect any clusters or distribution patterns.

In this section, you will learn the techniques that allow you to visualize such datasets in a more readable and faster way by displaying feature density instead of the features themselves. After practicing the lessons in this section, you will be able to perform density analysis of your data and extract information from density maps.

In this section, we will go through the following topics:

• Density analysis and heat maps

• Creating heat maps with the Heatmap plugin

### 4.3.1 Density Analyses and Heatmaps

Density maps allow visual estimation of object or event concentration over the study area. Such maps are very useful for assessment of the distribution patterns of the features over the study area. When we simply add locations of the features or events (for example, as points) to the map, we cannot see the changes in their concentration in different areas. Density analysis gives us such functionality by using uniform area characteristics, such as feature count per acre or square kilometer.

A density map gives us the ability to estimate the concentration of some features within an area. This helps us find areas where an urgent reaction is required or which match your criteria. Heatmaps also help control conditions and their changes.

Density maps are also extremely useful when mapped regions (for example, districts) have different sizes. For example, if we want to know how many people live in each district, we just need an ordinal map with the population data. According to this map, a large district may have a higher population than a smaller district. But if we want to identify the districts with a higher concentration of population, then we need a density map to see the number of people per square kilometer. And a density map will show us that, in fact, small regions with a high population density may have more people per square kilometer than larger districts.

Generally, we can show on a map the density distributions of the features themselves (for example, schools), as well as distributions of some numerical characteristics of these features (for example, the number of pupils in schools). The results will be completely different in these cases. A density map of schools can help an education department find areas where more schools are needed, while a density map created from information about the number of pupils in each school may help a transportation company to plan bus routes and to decide where to place bus stops.

The most common use case is the creation of density maps to display the density of point features. Such maps are often called heat maps. What is a heat map? It is a raster layer. Each cell of it contains a representation of the density of features in its vicinity (for example, the number of people per square kilometer), which depends on the number of features within some area.

To create a heat map, in the simplest case, GIS looks at the features around a cell center, using a given search radius. Then the number of features that fall within the given radius is calculated and divided by the area of the region. This value will be a cell value. Then next cell will be analyzed, and so on. As a result, we will get a combination of values, which creates a smooth surface. To understand this better, refer to the following diagram:

This diagram shows the general principle of creating heat maps. The green dots depict the features used for density map generation, the blue square is a current raster cell, and the red dotted circle marks the search radius, for example, 1 km. In this case, the area covered will be about 3.14 sq. km. As we can see in the diagram to the left, four features are within the search radius. So, the raster cell will get a value of 4/3.14 = 1.27. On the right side, we notice that the next cell will get a value of 1.59 because now there are five features inside the search radius.

This is the simplest approach. In real-world applications, more complex algorithms are used, where each point has some impact on the values of the neighboring cells, depending on its distance from those cells.

### 4.3.2 Creating heat maps with the Heatmap plugin

With the help of the QGIS core plugin called Heatmap, we can easily create heat maps from vector point data and use it for further analysis. First, we need to activate this plugin, if it has not yet been activated. After activation, it creates a submenu under the Raster menu and places its button on the Raster toolbar.

Let’s create a density map for the noise layer, which contains information about complaints regarding high noise levels. This layer contains 44,397 features, and it is difficult to know which places are noisy.

Information about such places may be useful for a police department or other agencies to plan some activities for noise reduction, or for those who are looking for apartments and don’t want neighbors who like playing loud music.

1. Start the plugin by clicking on the Heatmap button on the Raster panel, or by navigating to Raster | Heatmap | Heatmap….
1. Select the noise layer from the Input point layer combobox.

2. Using the … button on the right side of the Output raster field, specify the location in which the resulting heat map needs to be saved. Note that there is no need to specify the file extension; it will be picked up automatically, based on the output file format.

3. Use the Output format combobox to select the desired format for the heat map. The most common choice here is GeoTIFF, but for very large maps, it is better to use something different, for example, Erdas Imagine.

4. The last thing we need to specify is Radius. This value defines the distance from each cell up to which QGIS will look for neighbor features and take their presence into account. Generally, a bigger search radius gives a more generalized result, as the number of features found will be divided by a bigger area. A smaller radius gives more precise results, but if this value is too small, we may not find any distribution patterns. The search radius can be defined in meters or map units.

To determine the search radius from a known area, we can use a very simple formula derived from formula for the area of a circle:

$\begin{equation*} r = \sqrt{\frac{S}{\pi}} \end{equation*}$

For example, if we need to calculate density per square kilometer, then the search radius will be as follows:

$\begin{equation*} r = \sqrt{\frac{1 km^2}{\pi}} = \sqrt{\frac{1000000 m^2}{3.1415926}} \approx 564.2 m \end{equation*}$

For more fine-grained control over the result, we can check the Advanced box and define some additional parameters:

• Rows and Columns:

These allow us to define dimensions of the output raster. Larger dimensions will result in a bigger output file size, while smaller dimensions will result in a rough and pixelated output. Input fields are linked to each other, so changing the value in the Rows field (for example, halving it) will also cause the corresponding change to the value in the Columns field, and vice versa. Furthermore, these values have a direct influence on the raster cell size (see the next point). It is worth mentioning that the extent of the raster preserved when changing raster dimensions.

• Cell size X and Cell size Y:

The raster cell size determines how coarse or detailed the display of the distribution patterns will be. A smaller cell size will give smoother results, but the processing time and memory required for the analysis will increase. Large cells will be processed faster, but the resulting raster will be pixelated. If the cells are really big, some patterns will become invisible, so you may need to run the analysis several times, trying different cell sizes to get results that satisfy your requirements.

The cell size depends on and is linked to the raster dimensions. Increasing it will decrease the number of rows and columns, and vice versa.

• Kernel shape:

This controls how the point influence changes with changes in distance from this point. The QGIS Heatmap plugin currently supports the following kernels:

– quartic (also known as biweight)

– triangular

– uniform

– triweight

– Epanechnikov

Depending on the kernel shape, we will get a smoother heat map, or more clearly exposed hotspots. For example, the triweight kernel will give clearer, sharper hotspots than the Epanechnikov kernel, because the Epanechnikov kernel has lower influence near the hotspot center. Also, in different scientific fields, different kernels are preferred; for example, in crime analysis, the quartic kernel is typically used.

It is also possible to use a variable search radius for each point by selecting the Use radius from field checkbox and selecting the attribute field with radius value from the combobox. If you need to weight points (in other words, increase or decrease their influence) by some numeric attribute, activate the Use weight from field checkbox and select the corresponding field. In our example, we will not use this functionality, but you can try it yourself.

As we said before, cell size has a direct influence on the quality of the resulting heat map, so it is important to select it carefully. In most cases, the cell size is chosen in such a way that we get 10 to 100 cells per unit area (which in turn is defined by the search radius). To calculate the cell size, we need to align area units with distance units; for example, if we calculate the density using square kilometers and define the search radius in meters, then it is necessary to convert the square kilometers to square meters. The next step is to divide the area by the desired number of cells. Finally, as the cell size is defined by its width or height (because raster cells usually have a square shape), we need to extract the square root of this value.

In our example, we will create a heat map with a search radius of 1000 m, so the lookup area will be approximately 3.14 square kilometers. When expressed in meters, this will be as follows:

$\begin{equation*} 3.14 km^2 = 3.14 . 1000 m . 1000 m = 3140000m^2 \end{equation*}$

As we want a smooth heat map, we will use a relatively large number of cells per unit area; let’s say 100 cells per 3.14 square kilometers. So, we divide the area in square meters by the desired cell count:

$\begin{equation*} \frac{3140000m^2}{100 cells} = 3140000m^2 per cell \end{equation*}$

Finally, we calculate the square root of this value to get the cell size that allows us to have 100 cells per 3.14 square kilometers:

$\begin{equation*} \sqrt{31400m^2} \approx 177.2 m \end{equation*}$

Of course this is not a strict rule but just a recommendation. You can safely use another cell size, depending on your data and the desired results. Just remember that smaller values lead to smoother heat maps, but at the same time increase the analysis time and output raster size.

When all the inputs and parameters are set, press the OK button to start the process of the heat map generation. The progress of the heat map formation will be displayed in a small progress dialog. If this process is taking too long time to complete, you can interrupt it by pressing the Abort button. Note that after aborting heat map generation, you still get the output, but it will be incomplete and not useful for further analysis.

When the process completes, the generated heat map will be added to QGIS as a grayscale raster, where lighter regions correspond to higher density values and darker regions correspond to lower density values, like this:

To improve readability and make it look like real heat map, we need to change its style. To do this, follow the next steps.

1. Right-click on the heatmap layer in the QGIS layer tree. In the context menu, select Properties.

2. Go to the Style tab and select Singleband pseudocolor as Render type.

3. In the Load min/max values group, activate the Min/max options. Set Extent to Full and Accuracy to Actual (slower). Press the Load button to get raster statistics. This will be used for further classification.

4. Select a suitable color ramp in the Generate new color map group, for example, YlOrBr (which changes colors from yellow to orange and then brown), or __Red__s (which uses different shades of red). If necessary, change the number of classes and click on the Classify button.

5. Click on OK to apply the changes and close the properties dialog.

Now we can easily locate the hottest points (displayed in colors closer to red if the Reds color map is used), and even recognize some distribution patterns that were not visible when we looked at the original point layer:

Now we can easily locate the hottest points (displayed in colors closer to red if the Reds color map is used), and even recognize some distribution patterns that were not visible when we looked at the original point layer. Also, our heatmap layer showed up much faster than the vector which is used to create this heat map.

Detecting the “hottest” regions

Sometimes, you don’t need the heat map itself, but just want to find the hotspots—areas with the highest density—and use them in further analysis. It is pretty easy to find such regions in QGIS and extract them in the vector form.

First, we should define threshold value, which will be used to recognize the hotspots. As a starting value, we can use the maximum pixel value in our heat map and then adjust it as per our needs.

The simplest way to find the maximum pixel value is to use the Identify Features tool. Select a layer in the QGIS layer tree, activate the Identify Features tool, click on the most visually “hottest” regions, and look at the reported value. With our heat map, this will be 540.32.

If we will use this value as is, we cannot find all the important clusters, so this value should be reduced first. The smaller the selected value (in comparison with the maximum value), the larger the number of clusters found. The area of separate clusters will also grow. For our example, we choose a value of 200.

Now, open Raster Calculator from the Raster menu, specify a path where the output file should be saved in the Output layer field, and enter the “heatmap@1”>=200 formula in the Raster calculator expression field, like this:

This formula is used to create a so-called mask. If the pixel value of the input layer is greater or equal to our threshold value of 200, then the output pixel value will be 1. Otherwise, it will be 0. So, our output raster will be a binary raster, with only two pixel values—0 and 1—which is very easy to convert to a vector.

Leave all other values unchanged so that the resulting raster will have exactly the same dimensions and cell size as the input one. Press the OK button to start the calculation. When it is done, a new black-and-white raster layer will be added to the QGIS canvas, as shown here:

To convert the mask raster into vector format, we need to create polygons from all connected pixels with the same value. This is where the Polygonize tool comes to help. In the Processing toolbox, you can find the Polygonize algorithm by typing its name in the filter field at the top of the toolbox. Double-click on the algorithm name to open its dialog, and you will see something like this:

Select the mask layer that was previously created as Input layer, specify the path where the result will be saved using the Output layer field, and click on the Run button to start the algorithm. When it is done, a new vector layer will be added to QGIS. This layer has an attribute called DN (if you did not change it) that indicates the pixel value of each polygon in the layer. So, all we need to do is remove all the features that have the attribute value equal to zero. The remaining features will be the hotspots.

To delete unnecessary features from the hotspots layer, select it in the QGIS layer tree, right-click to open the context menu, and select Open Attribute Table. Click on the Select features using an expression button. In the Select by expression dialog, enter “DN” = 0 (if necessary, replace DN with your field name), click on the Select button, and close the dialog. Start editing by clicking on the Toggle editing mode button, or press Ctrl + E. To remove the selected features, press the Delete key or click on Delete selected features. Finally, turn editing mode off by pressing Ctrl + E or clicking on Toggle editing mode again.

Now, the hotspots layer contains only hotspot polygons, which can be used for further analysis. For example, we can combine this cluster with information about the nearest buildings and noise types to find dependencies and develop some suggestions for reducing the noise levels there.

Looking for distribution patterns with contour lines

Apart from detecting hotspots, heat maps can also be used to detect intensity changes or visualize the direction of value changes. The most common way to do both of these tasks is contour lines generation.

Fortunately, QGIS has all the necessary tools for this. We will use processing again, but contour lines generation is also available in the GDALTools plugin (which can be found in the Raster menu). In the Processing toolbox, you can find the Contour algorithm by typing its name in the filter field at the top of the toolbox. Double-click on the algorithm name to open its dialog, which looks like this:

Select the heatmap raster layer as Input layer. In the Output file field, specify the path where the results will be saved. Also, it is necessary to define Interval between contour lines. There are no strict principles about determining this interval. The general rule is to select such an interval that detects patterns in the areas with smooth density changes. We will select an interval of 10.

When all of the necessary information has been defined, click on Run to start the contour line generation. After some time, a new polygonal vector layer will be added to QGIS, and we can start analyzing it. First, if necessary, move the contours layer to the top of the heat map in the QGIS layer tree. Also, it is better to adjust contours’ symbology to make them more recognizable against the background of the heat map.

More dense contours correspond to more intense density changes. Also, we can identify the direction of changes in noise. For example, in the preceding screenshot, we can see that in some places, the noise distribution around the center is not equal; the intensity reduces more quickly in the southeast than in the northwest. So, we may assume that there are some obstacles to noise there.

## 4.4 Suitability Analyses

We live in a world full of various relationships that can be analyzed in functional, temporal, or spatial contexts. Spatial relationships are of particular interest when it comes to GIS, because here objects in space are represented in a way that facilitates explanations and analyses of their geographic relations. Suitability analysis is a fundamental part of GIS analysis that answers the question, “Where is the best place for something to be located?” In this chapter, you will be exposed to the fundamentals of suitability analysis through the search for the best place to live in. You will learn how to:

- interpret spatial relationships between objects

- express these relationships through spatial data

- analyze spatial data according to a set of predefined criteria

- overlap layers and interpret the results

Basic of Suitability Analyses

Suitability analysis is recognized as a multi-criteria decision support approach. In other words, its main aim is to divide the area of interest into two categories based on a set of predefined criteria: appropriate for some kind of use (living, building, conservation, and so on) and inappropriate. A general approach that is used for suitability assessment is multiple layer overlays that support multi-criteria decisions. Depending on the data that represents suitability criteria and overlaid, there are two basic approaches available:

1. Suitability analysis with vector data.

This primarily utilizes operations such as buffering and their sequential combination using vector overlay operations, such as clipping, intersection, and union.

1. Suitability analysis with raster data.

This heavily relies on raster algebra, which is used to reclassify initial raster coverages and then combine them to produce binary or ranked suitability rasters. This approach is more flexible, as it makes it possible to produce multiple suitability classes and change the raster weight according to the importance of the factor it represents. As a result, the user is able to produce a compound result, but the workflow requires more effort connected to data preprocessing and consolidation decisions.

Vector data Raster data
Main operations
Buffering Vector data rasterization
Clip overlay Proximity raster creation
Intersection overlay Raster reclassification
Raster algebra multiplication
Raster algebra substraction
Workflow quickness Simple data reclassification
Workflow simplicity Good representation of continuous features
Good representation of man-made features Crisp and fuzzy classes are possible
Different weighting according to their importance
Various assessments are possible
Limitations
Provide only crisp classes Reclassification and ranking subjectivity
Usually provide binary assessment only Workflow complexity

No matter what approach you will follow, the general suitability analysis workflow involves several common steps. We will now take a closer look at them to ensure better understanding of the suitability analysis system:

1. Define the goal and objectives of your analysis

The question to be studied is formulated in general, and its applied significance is determined, which will later become the set of suitability criteria.

Some popular suitability applications include the following:

• Agriculture: The appropriateness of the area for cultivation of certain crops is assessed.

• Retail: The area is assessed from a marketing prospective—whether it will attract new customers or buyers, or not. This type of analysis is in great demand when selecting preferable shopping locations.

• Renewable energy: Assessing land suitability for locations of wind power or solar power stations is a remarkable trend in the field of geospatial planning for sustainability.

• Nature conservation: Conservancy needs are prioritized using habitat suitability modeling, and the area is divided into locations that are more or less valuable for certain species survival and reproduction.

Generally speaking, the primary application of suitability analysis is in the field of land use planning, aimed at reasonable prioritization of various types of human activities within limited space and natural resources.

1. Analyze the available data and define its relevance to the goals and objectives

Data relevance to the goal and objectives is defined. Current data availability and future data requirements are analyzed, especially the knowledge of whether the current data derivatives can be used for analysis or not. For example, if we have to analyze suitability for agricultural needs, a DEM can be a great source. It provides not only basic information about the relief, but also some useful derivatives, such as slope and aspect. The main outcome of this stage is a list of primary data sources and their potential derivatives.

1. Define the criteria of analysis

This is the most important stage, where the objectives of analysis are described as clear numerical criteria based on relevant data. Descriptive objectives are translated into the language of GIS analysis. At this stage, various types of spatial relationships between objects are analyzed, and some of the most popular relationships include the following:

• point-to-point relationships:

– “is within”: All schools that are within a distance of 1 km from a living place

– “is nearest to”: The primary school that is closest to a living place

• point-to-line relationships:

– “is nearest to”: The street that is closest to a subway station entrance

• point-to-polygon relationships:

– “is contained in”: All public schools within a certain community boundary

• line-to-line relationships:

– “crosses”: Whether a particular trail crosses a road

– “is within”: Find all trails that are within a distance of 1 km from a particular river

– “is connected to”: Find all streets connected to a particular highway

• line-to-polygon relationships:

– “intersects”: Find all districts crossed by a cycleway

– “contains”: Find all streets that are completely within a certain district

• polygon-to-polygon relationships:

– “completely within”: Find all districts that are completely within a hazard zone

– “is nearest to”: Find the building that is nearest to a park

1. Primarily analyze and prepare the data

The data is analyzed according to the set of criteria defined in the previous stages. Common analysis operations involve selection by location, buffering, rasterization, proximity (raster distance), and so on. After all the necessary layers are ready, they should be prepared for overlay, which involves reprojection into a common coordinate reference system (if necessary), setting ranges for various ranks, and reclassification in the common ranking system. All the layers should contain values in uniform units, otherwise their overlay will be meaningless and hard to interpret.

1. Overlay the data and interpret the results

Previously prepared layers are combined into a single coverage based on a set of user-defined rules. Depending on the data available and the rules applied, the following suitability assessments are possible:

• Binary suitability assessment

All of the area is divided into appropriate and inappropriate categories. This is the simplest type of assessment that can be obtained from vector data overlay.

• Ranked suitability assessment

Places are ranked from least appropriate to most appropriate based on the entire range of predefined criteria. This type of assessment can be derived from both vector and raster data. It lets you avoid simple yes/no assessments, which are not inherent to the real word. This advantage is counterbalanced by the subjectivity of data rankings and equal importance of various factors. Nevertheless, in the real world, their contribution to overall assessment can vary.

• Weighted suitability assessment

This is similar to the previous type of assessment and has only one significant difference: various factors can be weighted differently according to their importance for a certain type of activity. This type of assessments relies on the raster algebra approach and is thought to be all-inclusive, but not without some subjectivity, especially when it comes to factor weighting and interpreting the final result.

### 4.4.1 Step 1: Define the goal and objectives

Throughout this tutorial, we will suppose that we are working on behalf of a young couple with a little child. They are looking for a perfect place to live in within a certain region of their interest. Our goal is to use the power of GIS-based suitability analysis methods and provide an objective and reliable answer to their question.

The goal of this analysis is to find areas that would be a good match for a young family with a child, given certain considerations. Many of their requirements are similar to those of a traditional housing estate development company. For example, proximity of subway stations, green zones, and public safety must be all taken into consideration. There are also some family-specific requirements that should be considered. As already mentioned, the family has a little baby, which means that we should take into account the existence of early childhood or elementary schools nearby. Also, they are keen on sports, and it would be great if the area they are going to live in has well-developed and active rest infrastructure. After this kind of review, we can formulate some more specific requirements and objectives:

• Safety: The area should not be exposed to various natural and crime hazards

• Connectivity: It should be well-connected to the city’s transport network

• Greenness and openness: It should be close to parks or other green areas

• Educational potential: Early childhood or elementary schools should be located in the neighborhood

• Active rest opportunities: These include a cycling network and athletic facilities

• Cultural life: Art galleries and museums can be recognized as general signs of cultural pulse beating

Now that the main objectives and requirements have been clarified, we can proceed to the next step and study all of the available data to assess its relevance to our example.

### 4.4.2 Step 2: Analyze the available data and define its relevance

As soon as we have defined the basic requirements, we need to explore the data layers that are potentially useful and relevant for our analysis. The training dataset contains a large number of datasets, and the most relevant among them are listed in the following list :

#### 4.4.2.1 Safety

• Layer hurricane_evacuation_zones :

Hurricane evacuation zones are the areas of the city that may need to be evacuated due to life- and safety-related threats from a hurricane storm surge.

• Layer hurricane_inundation_zones :

Hurricane inundation zones are the areas of worst-case storm surge inundation.

• Layer noise_heatmap :

The raster created in the Creating heat maps with the Heatmap plugin section of Chapter 5, Answering Questions with Density Analysis, that shows the spatial density of registered noise complaints might be useful for potential assessment for public safety.

#### 4.4.2.2 Connectivity

• Layer subway_entrances :

Locations of subway entrances.

#### 4.4.2.3 Greenness and openness

• Layer parks :

A layer containing open space features, such as courts, tracks, parks, and so on.

• Layer tree_density :

This is a raster layer created from the tree census data.

#### 4.4.2.4 Educational potential

• Layer elementary_schools :

These are the point locations of schools based on the official addresses. This layer includes some basic information about the school, such as the name, address, type, and principal’s contact information.

#### 4.4.2.5 Active rest opportunities

• Layer bike_routes :

Locations of bike lanes and routes throughout the city.

• Layer athletic_facilities

This layer contains athletic facilities and some basic information about them, including primary sport type, surface, dimensions, and so on.

#### 4.4.2.6 Cultural life

• Layer musemart :

Locations of museums and art galleries.

### 4.4.3 Step 3: Define the criteria of analyses

According to the description of hurricane_evacuation_zones, there are six zones, ranked in the attribute field zone by risk of storm surge impact, with zone 1 being the region most likely to be flooded. In the event of a hurricane or tropical storm, residents in these zones may be ordered to evacuate. Areas with a zone value of X are not in any evacuation zone. Areas with a zone value of 0 are any of the following: water, small piers, or uninhabited islands. For the purpose of analysis, this layer should be rasterized and ranked according to the risk of hurricane impact, with rank values descending from areas that are not in any evacuation zone to those most likely to be flooded.

The hurricane_inundation_zones polygon layer contains information about the risk of storm surge inundation in the category attribute field, in which the value is the surge height in feet. Areas that are most likely to be inundated are assigned a value of 1, and areas that are excluded from inundation modeling are assigned a value of 5. This layer should be rasterized and ranked with the highest potential suitability values for excluded areas, and the lowest for the areas of the 1 category.

The noise_heatmap raster layer is a raster that should be ranked using several categories, with the lowest suitability values for the noisiest places and vice versa. A good thing here is that we don’t need to rasterize this layer, as we did with the previous layers. At the same time, establishing the amount and ranges for the rankings brings subjectivity into our assessment. The tree_density layer, which is also a density raster, should be analyzed similarly.

The other selected layers should be analyzed first for their proximity. For this purpose, we will first rasterize them, then create continuous proximity rasters, and finally rank them under several categories according to the proximity values (the closer an object, the higher the suitability value). Again, in the case of user rankings, we will not be able to avoid some subjectivity in our assessments. Also, the final proximity rasters can be weighted according to their importance in the overall suitability assessment.

### 4.4.4 Step 4: Analyze and prepare the data

There are three main approaches to primary data analysis. These depend on the initial data type and available attributes:

• Rasterizing and ranking categorized vector layers: These are the layers that already contain all the necessary values, and at the preparation stage, all of them should be rasterized to the similar extent and resolution. Also, their categories should be ranked properly, with the highest values for the most suitable areas and vice versa. Examples of these layers are hurricane_evacuation_zones, hurricane_inundation_zones, and so on.

• Ranking density rasters: These are raster heat maps that should be converted from continuous coverages to categorized values where the highest value symbolizes the most appropriate area, and the lowest is related to the least suitable area. Examples of these layers are noise_heatmap and tree_density.

• Generating and ranking proximity rasters: This is the most tedious workflow. Vector layers should be rasterized first, and then proximity rasters should be created and ranked properly. This category encompasses the following vector layers: subway_entrances, parks, public_schools, bike_routes, athletic_facilities, and museumart.

Note that for the final output, we will always have ranked unique value rasters, with the highest values denoting the most suitable areas. Also, it is important that all output rasters share a common extent, which is necessary for their proper overlay with the raster calculator and overall suitability assessment.

In the upcoming sections, we will go through the previously mentioned workflows for the example of raster layers. As soon as you grasp the principle, you will be able to prepare other layers independently.

#### 4.4.4.1 Rasterizing and ranking categorized vector layers

In this example, we will work on the hurricane_evacuation_zones layer. The attribute we are particularly interested in is zone. This is because by this attribute, areas are prioritized for evacuation. Areas with the lowest value, 1, are the most likely to be evacuated, and vice versa. In this case, we can use these values directly to rank the raster.

One thing that prevents us from doing rasterization directly is that the attribute field that is used for rasterization should be numeric. If you check out the attribute field in the Fields section under Properties, you will see that the field zone has Type QString, and its values that contain numbers 1-6 and letters (X) are interpreted not as numbers but as sequences of symbols or strings. That’s why this field is unavailable for rasterization and should first be converted to numbers. This can be done easily with Field calculator:

1. Open the attribute table of the layer using the right-click Open Attribute Table shortcut, or hit the relative button from the Attributes toolbar. In the attribute table toolbar, either click on the Open field calculator button or use the Ctrl + I keyboard shortcut.

2. First of all, we should get rid of the X value that cannot be interpreted as a number and cannot be converted to it. As we have only one line that contains the X value, we simply toggle editing mode by clicking on the button in the attribute table toolbar. Double-click on the cell and manually enter the new value of 7. If you have multiple values to change, you can use the following expression in Field calculator to change some zone field values: CASE WHEN “zone” = ‘X’ THEN ‘7’ ELSE “zone” END.

3. Click on the button to open the Field calculator dialog window as shown in the following screenshot, and make the following adjustments:

• Make sure that the Create a new field toggle is on.

• Type the Output field name manually, for example, rank.

• Select Whole number (integer) from the Output field type list, as we are going to use short integers for ranking.

• Reduce Output field width to 1. This is because the rank values don’t exceed 10 and we don’t want to create excess data by producing fields of length that exceed actual data values.

1. In the Expression field, we need to type a function that will be used to create the values of a new field. In the Functions list, expand the Conversions item and double-click on the toint function. According to its description, it converts a string to an integer number. Nothing changes if a value cannot be converted to integer (for example, 123asd is invalid). After double-clicking, the function will be added to the expression with an open bracket, after which you should type (or double-click to add an item from Fields and Values) the field name to be converted in double quotation marks, and close the brackets. In our case, the resulting expression will be toint( “zone” ).

2. After you’ve clicked on the OK button, the new field will appear at the end of the table. Deactivate editing mode, confirm saving of the edits, and exit the attribute table window.

The layer is ready for rasterization. Open a dialog window by going to Raster | Conversion | Rasterize (Vector to Raster). In this dialog window as shown in the following screenshot, adjust the following settings:

1. From the Input file (shapefile) drop-down list, select hurricane_evacuation_zones.

2. From the Attribute drop-down list, select rank.

3. In Output file for rasterized vectors (raster), click on the Select button. Navigate to your working directory, and type hurricane_evacuation_zones.tif as the new layer name. This message will be shown: The output file doesn’t exist. You must set up the output size or resolution to create it. Click on OK and proceed to the next stage.

4. In the previous steps, you defined some major options. For convenience, we will create rasters in this tutorial with the same extent and resolution using $$lidar_dem.tif$$ as a template. Click on the button to make the gdal_rasterize command-line parameters editable. After modification, the line should look similar to the following example:

gdal_rasterize -a rank -l hurricane_evacuation_zones –a_nodata 0 -te 982199.3000000000465661 188224.6749999999883585 991709.3000000000465661 196484.6749999999883585 -tr 10 10 -ot UInt16 fullpath/hurricane_evacuation_zones.shp fullpath/hurricane_evacuation_zones.tif

This means that the output raster will contain rasterized values from the rank attribute field. Areas outside the layer polygons will be assigned a value of 0, which will be interpreted as nodata. The output data type is a 16-bit unsigned integer.

1. Click on the OK button. The result of rasterization will appear in the Layers panel.

The hurricane_inundation_zones layer should be preprocessed in a similar way. The layer contains some zone numbers. These can be interpreted as the severity of inundation risk; that is, the higher the number, the lower the risk. This means that we can use these values directly for ranking. In the case of this layer, the gdal_rasterize command-line parameters will look as follows:

gdal_rasterize -a category -l hurricane_inundation_zones -a_nodata 0 -te 982199.3000000000465661 188224.6749999999883585 991709.3000000000465661 196484.6749999999883585 -tr 10 10 -ot UInt16 fullpath/hurricane_inundation_zones.shp fullpath/hurricane_inundation_zones.tif

When combined, these layers can give a cumulative assessment of suitability based on risk’s severity from natural hazards.

#### 4.4.4.2 Ranking density rasters

In the case of density rasters, we can use the ready noise_heatamp.tif and create the tree_density.tif layer ourselves. First, we will prepare a noise_heatmap.tif that is originally much larger by extent than the area of interest that we are working on in this chapter. To clip the raster layer go to Raster | Extraction | Clipper:

1. From the Input file (raster) drop-down list select noise-heatmap, which is the raster to be clipped.

2. In Output file, click on the Select button to set a path and name for the output file, for example, noise_heatmap_clip.tif.

3. In the Clipping mode section, there are two possible modes:

• Extent is where you can enter the bounding box’s coordinates manually or by dragging the map canvas. Use this approach to set the extent for your output file. Just remember that the extent should exceed the boundaries of the area of interest set by the zipcode_bound shapefile.

• If Mask layer is active, you can select a polygonal shapefile, and it will be used as a clipping boundary.

After you have clicked on the OK button, the layer will be loaded into the map canvas.

The following steps should be done to assess its value range, divide it into categories, and rank them:

1. Double-click on the noise_heatmap_clip layer to open the Layer Properties window. In this window, go to the Metadata section. Explore the Properties window in the lower part of the dialog until you find the Band 1 highlighted line with basic raster statistics, as shown here:

We will deal with the density raster, and its values are interpreted as the number of noise complaints per pixel area. This continuous surface should be divided into categories and they should be ranked according to the number of complaints. For this, a certain critical number of complaints should be set. There are no officially recognized limits and requirements, but we can interpret the mean dataset value as some critical number (it is slightly greater than 56, but we will use 50 for convenience). Divide the values into categories and assign them the following ranks (the less the number of complaints, the better):

Category Value range Suitability rank
1 Less than 50 5
2 50 to 100 4
3 100 to 150 3
4 150 to 200 2
5 Greater than 200 1

For the rankings, go to Raster | Raster Calculator and adjust the following options in its dialog window:

• Enter the Output layer path and name, for example, noise_ranked

• Select the __noise_heatmap_clip (???) raster band and click on the Current layer extent button to make sure that the output layer will have the same resolution and extent

• In the Raster Calculator expression window, enter the following expression:

("noise_heatmap_clip@1" <= 50) *5 + ("noise_heatmap_clip@1" > 50 AND"noise_heatmap_clip@1" <= 100) *4 + ("noise_heatmap_clip@1" > 100 AND "noise_heatmap_clip@1" <= 150) *3 + ("noise_heatmap_clip@1" > 150 AND "noise_heatmap_clip@1" <= 200) *2+ ("noise_heatmap_clip@1" > 200)*1

This expression means that every pixel that falls under a certain range given in brackets first assigned 1 value, and then it is ranked by a certain multiplier value, which is outside the brackets. After you click on the OK button, the reclassified raster will be loaded into the Layers panel.

Pay attention to this: due to ranking, the heat map was inverted; that is, the noisiest hotspots get the lowest ranks and the quietest places get the highest.

In a similar way, we can create and rank heat maps for other point objects that are of interest in our suitability analysis, namely trees and museumart.

#### 4.4.4.3 Generating and ranking proximity rasters

The workflow will be explained in the example of the subway_entrances point vector layer:

1. First, the layer should be rasterized in a common way. The gdal_rasterize line will contain the following parameters:
gdal_rasterize -l subway_entrances -burn 1 -a_nodata 0 -te 982199.3000000000465661 188224.6749999999883585 991709.3000000000465661 196484.6749999999883585 -tr 10 10 -ot Byte fullpath/subway_entrances.shp fullpath/subway_entances.tif

In the output layer, existing point locations will be marked by pixel values of 1, while all other areas will be assigned the nodata values of 0.

1. Go to _Raster | Analysis | Proximity (Raster Distance). This dialog generates a raster proximity map that indicates the distance from the center of each pixel to the center of the nearest pixel identified as a target pixel. Target pixels are those pixels in the source raster for which the raster pixel value is in the set of target pixel values. If not specified, all nonzero pixels will be considered target pixels. In the dialog window, adjust the following parameters:
• From the Input file drop-down list, select the subway_entrances layer.

• In the Output file, click on the Select button and specify a path and a name for the output layer, for example, subway_entances_proximity.tif.

• Make sure that the Dist units parameter is activated and set to GEO. In this case, the distances generated will be in georeferenced coordinates (feet).

The resulting parameters line will look like this:

gdal_proximity.bat fullpath/subway_entances.tif fullpath/subway_entances_proximity.tif -distunits GEO -of GTiff
1. The proximity raster should be divided into discrete categories and they should be ranked. For this operation, we will use Raster Calculator. The major problem here is to decide on the amount and properly select the proximity categories for ranking. The optimal decision depends on the so-called walking radius, that is, the distance that people are comfortable to walk up to. Generally, transport planners have observed that the walking distance that most people seem to cover comfortably— beyond which ridership falls drastically—is about 400 m (approximately 1300 feet). We will apply this 400 m rule to categorize the layer values (min as zero and max as 4988.71) into four categories, with the following ranks:
Category Proximity values range (feet) Suitability rank
1 Less than 1,300 4
2 1,300 to 2,600 3
3 2,600 to 3,900 2
4 Greater than 3,900 1

Go to Raster | Raster Calculator and adjust the main options. Enter the following expression to generate a new subway_entrances_proximity_ranks raster:

("subway_entances_proximity@1" <= 1300) *4 + ("subway_entances_proximity@1" > 1300 AND "subway_entances_proximity@1" <= 2600) *3 + ("subway_entances_proximity@1" > 2600 AND "subway_entances_proximity@1" <= 3900) *2 + ("subway_entances_proximity@1" > 3900)*1

In the following screenshot, you can see what continuous (on the left) and ranked (on the right) rasters may look like:

In a similar way, you can preprocess other vector layers that should be analyzed from the proximity position, namely parks, bike_routes, athletic_facilities, and elementary schools. As a result, you will have a set of raster layers ranked with several categories according to the proximity of selected objects. The higher the rank, the closer the objects. As a general rule of thumb, apply the 400 m (or 1300 feet) value to rank the rasters properly.

### 4.4.5 Step 5: Overlay the data and interpret the results

Now we have everything ready for overlaying the rasters and generating a cumulative suitability assessment. In the following figure, you can see the full list of layers, weighted by their importance for general suitability. Weights are simple coefficients within a range of 0 to 1, and they are used to modify the rank properly.

The expression to be used for our assessment of suitability can be constructed by several steps:

1. All the available factors that are represented by raster layers should be multiplied by their weight coefficients and summarized, like this:

(factor_1*weight + factor_2*weight + factor_3*weight + … factor_n*weight)

This kind of assessment gives gross suitability, which is difficult for interpretation because the obtained values are not calculated relatively to the possible minimum and maximum.

1. For simplicity of interpretation, a gross suitability assessment can be divided into the sum of the maximum possible value. The extended formula will look like this:
(factor_1*weight + factor_2*weight + factor_3*weight + … factor_n*weight) / (factor_1_max*weight + factor_2_max*weight + factor_3_max*weight + … factor_n_max*weight)

As a result, the output value range will be from 0 to 1, where the maximum suitability values are close to 1.

1. Optionally, the result can be multiplied by 100. Then the output values will be a percentage of suitability.

Go to Raster | Raster Calculator to perform an assessment:

• Set the path and name for Output layer

• Select a layer from the Raster bands list to set up Current layer extent, for example, hurricane_inundataion

• In the Expression window, enter the following formula (if you are not sure about the values, check out the preceding table):

( ( "athletic_facilities_proximity_ranks@1" * 0.04 + "bike_routes_proximity_ranks@1" * 0.06+"hurricane_evacuation_zones@1" * 0.16+"hurricane_inundation_zones@1" * 0.16+"museumart_ranked@1" * 0.05+"noise_ranked@1" * 0.13+"parks_proximity@1" * 0.12+"schools_proximity@1" * 0.15+"subway_entrances_proximity_ranks@1" * 0.06+"tree_ranked@1"*0.07) / 4.63 ) *1000

After you have clicked on the OK button, the resulting layer will be added to the map canvas. The suitability raster value’s range varies from 42 to 96 percent. Thus, it can easily be classified and interpreted. Navigate to Layer properties | Style and adjust the rendering properties to those shown in the following screenshot:

After applying these settings, the layer will look as follows:

We can use this raster as a basis for a primary visual suitability assessment of the area of our interest, or go further and combine it with other layers to identify the exact blocks that match best the entire range of suitability criteria.

In that case, we need to perform the inverse sequence of steps: select the most suitable areas, vectorize, and overlay the polygons of maximum suitability with residential living areas to identify the blocks and buildings that would be particularly suitable:

1. The initial suitability raster should be categorized into only two classes by a limit of 90 percent suitability. Go to Raster | Raster Calculator and define the path and name for the output raster (for example max_suitability). In the Expression window, enter “suitability, %(???)” >= 90. On the next page, you can see that the resulting layer contains only two classes: suitable (the value is 1; assigned to areas with suitability greater than or equal to 90 percent), and unsuitable (the value is 0).

2. Now we need to vectorize these areas to be able to query vector layers with them. Open the Polygonize (Raster to vector) dialog window by going to Raster | Conversion and adjust the following parameters:

• Select max_suitability as the raster to be polygonized from Input file (raster).

• Provide the path and name of the output shapefile in Output file for polygons (shapefile), for example, max_suitability_polygons.

• Activate the Field name toggle and accept the default DN value. This option is responsible for creating and populating field with class values from the initial raster.

1. After you click on the OK button, the resulting shapefile will be added to the Layers panel. Originally, the layer contains polygons of all classes, whereas we are interested only in class 1. For the purpose of selecting and deleting unnecessary polygons, we will use the Select features using an expression option:
• Open the max_suitability attribute table by clicking on the button in the Attribute panel, or from the right-click Open Attribute Table layer shortcut.

• In the attribute table toolbar panel, click on the Select features using an expression button, , and enter the following expression: “DN” = 0. After clicking on OK, all the polygons that satisfy the condition will be highlighted in the attribute table and on the map canvas.

• As we don’t need these polygons for further analysis, we should delete them. In the attribute table toolbar panel, click on the button to toggle editing mode, or use the Ctrl + E keyboard shortcut. Now we can execute Delete selected features using the button, or just press Del from the keyboard.

• After deleting the unnecessary data, don’t forget to save your edits (using or Ctrl + S) and deactivate editing mode. In the following screenshot, you can see that the layer contains only those polygons that encompass the maximum suitability values, which are set to 90 percent:

1. Now this layer can be used to overlay with other vector layers and analyze spatial relationships between objects, as was described in the Basics of suitability analysis section. For example, we can identify the zoning areas of primary residential living area that are of potential interest to us according to the suitability criteria:
• Open a dialog window by going to Vector | Research tools | Select by location. In this dialog, you should first select the layer from which objects are to be selected—residential_zoning from the Select features in: drop-down list—like this:
• the that intersect features in: drop-down list, select max_suitability_polygons, which will be used as a selector.

• There are several overlay selection options. Activate Include input features that intersect the selection features. Only those polygons that are within the query mask boundary or intersect it will be added to the selection. Click on the OK button. You will see the following result on the map canvas:

Similarly, you can try another type of query and identify the exact buildings that satisfy a suitability condition. In that case, the dialog window will look as follows:

After clicking on the OK button, you will see the following results on the map canvas:

Notice that this time, we have selected buildings that were completely within the maximum suitability areas. So, we are ready to provide a solid and specific answer to the question, “Which is the best place to live?”