0.1 Imports

setwd('C:/Users/jourd/OneDrive/Documents/UCL/STDM_2/')
library(knitr)
library(reshape)
library(foreign)
library(gstat)
library(sf)
library(rgdal)
library(tmap)
library(foreign)
library(scatterplot3d)
library(plot3D)
library(rgl)
library(maptools)
library(lattice)
library(spdep)
library(sp)
library(rgdal)
library(sf)
library(ggplot2)
library(gridExtra)
library(car)
library(dplyr)
library(devtools)
library(keras)
library(tidyverse)
library(tidyverse)
library(glue)
library(forcats)
library(timetk)
library(tidyquant)
library(tibbletime)
library(cowplot)
library(recipes)
library(rsample)
library(yardstick) 
library(keras)
library(tfruns)
library(spdep)
library(nnet)
library(mlbench)
library(caret)
library(neuralnet)
library(robustbase)

1 Introduction

Wildfires have been a persistent and devastating problem in California, causing extensive damage to public health and infrastructure. Due to its hot and dry climate California is prone to wildfires. However, the frequency and size of the wildfires have increased considerably in recent years (Westerling and Bryant, 2008). The cause of wildfires is attributed to a combination of climate, landscape, and anthropogenic factors. These include high temperatures, low precipitation rates, vegetation density, alongside increased human activity and lack of forest management. These factors have been exacerbated by climate change, as abnormal weather patterns are more frequent and harder to predict (Westerling and Bryant, 2008). Machine learning can be utilised to predict the location and spread of potential wildfires. This has been done in several areas around the world using both supervised and non-supervised techniques. This can be seen in Banerjee’s 2021 paper where they study the spatial and temporal patterns in wildfires in California by using non-supervised k-Means clustering.

1.1 Experiment Outline

The aim of this experiment is to utilise supervised machine learning methods to try and predict the size of a wildfire; based upon climate, anthropogenic and landscape data. For this experiment spatio-temporal variables were combined into a dataset to construct the predictor variables. Two supervised machine learning methods were chosen, namely an Artificial Neural Network, and LSTM. ANNs were inspired by the structure and function of the human brain, they consist of nodes (neurons) interconnected in a multilayer system (Vatsavai R, 2008). Most simple ANN’s have 3 layers; the input layer, hidden layer and output layer; each layer in connected by weights. Weights represent numerical values that are assigned and optimised in the training of the network. In basic terms the weights determine the strength of the connection between the layers. ANN’s were chosen for this experiment because of their ability to model complex behaviours in large datasets. Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN), the architecture of the network means it can selectively forget or remember information to make predictions (Vatsavai. R 2008). They are particularly useful for time-series forecasting in deep learning. LSTM’s are constructed with 4 layers; the input layer, LSTM layer, hidden layer and output layer (Liang et al, 2019). The LSTM layer is where the model architecture differs from a more traditional forward feeding network like the ANN. Within the LSTM layer are memory cells, these give the LSTM the ability to store and retrieve data, following this there are several ‘gates’ that allow the model to choose what information is stored or forgotten (Vatsavai. R 2008). This mechanism allows the LSTM to learn patterns in complex relationships. An LSTM model was selected for this experiment as we expect the relationship between the independent and predictor variables to be non-linear. Furthermore, we expect the input data to have a degree of noise, the ability of the LSTM to selectively remember information can help mitigate the effect of this noise.
Although both ANN’s and LSTM’s are capable of regression and classification analysis, LSTM’s has the benefit of being able to process long term predictions. As our dataset involves many years of wildfire data, a model that can ‘remember’ previous information may produce more accurate results. It may seem counterintuitive to use 2 methods that ultimately make the same prediction; however, it is important to compare the results of the methods and benchmark them upon each other.

1.2 Data aquisition and preprocessing

Dataset

Elevation

Source

Extracted from California DEM

https://tinyurl.com/tvxb97y3

Methods Applied

Converted to polygons and spatially joined to California grid.

Population

California 2010 Census data

https://tinyurl.com/3ex9ddsk

Joined to county shapefile of California, then spatially joined to the California grid
Distance to Road

Distance to the nearest road

OpenStreetMap

Converted to polygons and spatially joined to the California grid
Climate

California 2010 Census data

https://tinyurl.com/3ex9ddsk

Joined to county shapefile of California, then spatially joined to the California grid
Historic Wildfire Polygon Year of wildfire https://scholar.colorado.edu/ concern/datasets/8336h304x Spatially joined to the California

1.3 Data Description

The data used in this report was gathered from several sources (as seen in the table below). The final form of the data is a collection of point shapefiles with longitude, latitude values and predictor variables. Each shapefile contains 8800 points and represents a year of data (2001 to 2019), and thus a year of wildfire polygons, as seen in Figure 1. The raw data contains 75 columns of attributes which were reduced to the final 13 predictor variables. The Albers projection for California was used as this allows the data to be viewed in metres. Each attribute in the table is numerical with land cover type being the only categorical attribute. Land cover ranges from 1 to 12 and is developed to categorise the Earth’s surface into units.

2 Importing the data

# set the directory where the shapefiles are located
dir_path <- "Shapefiles_small/"

# list all the shapefiles in the directory
shp_files <- list.files(dir_path, pattern = "\\.shp$")

# create an empty list to store the shapefiles
shp_list <- list()

# loop through the shapefiles and read them into the list
for (i in 1:length(shp_files)) {
  shp_list[[i]] <- readOGR(dsn = paste(dir_path,shp_files[i], sep = ""))
  
}
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2001.shp", layer: "2001"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2002.shp", layer: "2002"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2003.shp", layer: "2003"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2004.shp", layer: "2004"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2005.shp", layer: "2005"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2006.shp", layer: "2006"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2007.shp", layer: "2007"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2008.shp", layer: "2008"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2009.shp", layer: "2009"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2010.shp", layer: "2010"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2011.shp", layer: "2011"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2013.shp", layer: "2013"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2014.shp", layer: "2014"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2015.shp", layer: "2015"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2016.shp", layer: "2016"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2017.shp", layer: "2017"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2018.shp", layer: "2018"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID 
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Shapefiles_small\2019.shp", layer: "2019"
## with 8801 features
## It has 75 fields
## Integer64 fields read as strings:  OBJECTID dist_roads Elevation OBJECTID_1 Join_Count TARGET_FID test OBJECTID_2 Join_Cou_1 TARGET_F_1 OBJECTID_3 Join_Cou_2 TARGET_F_2 ORIG_FID

2.1 Introduction

Wildfires have been a persistent and devastating problem in California, causing extensive damage to public health and infrastructure. Due to it’s hot and dry climate California is prone to wildfire. However, the frequency and size of the wildfires have increased considerably in recent years. The cause of wildfires is attributed to a combination of climate, landscape and anthropological factors. These include high temperatures, low precipitation rates, vegetation density, alongside increased human activity and lack of forest management. These factors have been exacerbated by climate change, as abnormal weather patterns are more frequent and harder to predict. Wildfires are an issue that needs to be addressed, through the use of machine learning to predict the spread and size of these fires.

3 Exploratory spatio-temporal data analysis

3.1 Variable description

Below is a table outlining the attributes of each point in the dataframe

Variable Description
pop Population density at point
Elevation Elevation at point
dist_roads Distance to the nearest road
ig_Year Year of wildfire
ig_Month Month of wildfire
tot_pix Total area of wildfire
lc_Code Land type for each point
coords.x1 Longitude of point
coords.x2 Latitude of point
tp total precipitation
d2m Average dewpoint temperature
t2m Average surface temperature
si10 10m_wind_speed
swvl1 Volumetric soil water layer
pev Potential Evaporation
Tot_ar_km2 Total Burn area km

3.2 Data pre-processing

Most of the data was pre-processed in Arcgis Pro due to its ability to handle large amounts of netCDF data. A 20km x 20km grid was generated covering the state. The grid was then used to store relevant data such as population density, distance to roads, elevation, and mean summer climate data. Climate data was extracted using the CDS API in python. Subsequently, shapefiles were created for each year from 2001 to 2019, representing point clouds where each point corresponds to the centroid of a grid cell. These shapefiles contain information on the attributes of the wildfires that occurred during each year, enabling analysis of their patterns and trends. Finally, rows with missing data or Null data were removed.

# Selecting one year to plot
test_plot <- shp_list[[4]]
# Set the year as a factor
test_plot$ig_year <- as.factor(test_plot$ig_year)
# plot the shapefile
tm_shape(test_plot) + tm_dots("ig_year", palette = c("lightgrey", "orange"), size = 0.06, legend.show = FALSE) +
tm_scale_bar(position = c("LEFT")) +
tm_compass() +
tm_layout(main.title = "Location of Wildfires in 2018", title.position = c("centre", "top"))

3.3 Visualizing climatic variables

Climate Data Analysis: To analyse the temporal patterns in the climate data we plot a histogram, and heatmap showing the frequency of wildfire each month over the years. We can see the months with the largest number of wildfires. On average June to October had the most wildfires. This analysis allows us to select just the average of these months for the climate data. We can then plot the climate variables on a shapefile of California to understand the spatial distribution. The main observation from the climate data is the divide between the North and South of California, in general precipitation and volumetric soil layer values are higher in the North than South, but the surface temperature, windspeed and evaporation values are lower in the North. This represents that the North of California experiences larger precipitation rates but an overall cooler temperature. This may influence the occurrence of wildfires.

# Select 1 year shapefile
climate_shp <- shp_list[[3]]

# Create a list of the climate variables
atr_list = c("tp", "d2m", "si10",
            "t2m", "swvl1", "pev")
# Create a list to store the description of each of the climate variables
desc_list = c("Total\nPrecipitation", "Dewpoint\nTemperature",
              "Surface\nTemperature", "10m Wind Speed", 
              "Volumetric\nSoil Layer", "Evaporation")

# Empty list to store the plots
plots_list <- list()

# Loop through the list of climate variables and construct a plot for each
for (i in 1:length(atr_list)) {
  plot <- tm_shape(climate_shp) + tm_dots(atr_list[i], title = desc_list[i]) + tm_layout(legend.width = 0.4,
                             legend.frame = TRUE,legend.text.size = 0.8,
                             legend.title.size = 0.9,
                             legend.title.fontface = "bold")
  plots_list[[i]] <- (plot) # Add the plot to the plots_list
}

# Arrange plots in a grid
grid_arrange <- tmap_arrange(plots_list, ncol = 2, nrow = 3)

# Display final grid with title
grid_arrange

3.4 Plotting frequency of fires

In order to test the temporal frequency of fires we can plot the number of fires each month for 1 selected year.

# List to store extracted attributes
attribute_list <- list()

# Loop through shapefiles and extract attribute
for (i in 1:length(shp_list)) {
  # Extract the month values from the shapefile
  attribute_value <- shp_list[[i]]$ig_month
  # Remove all 0 values, ie no fire data
  attribute_value <- Filter(function(x) x != 0, attribute_value)
  # Add the month values to the list
  attribute_list[[i]] <- attribute_value
}



# Filter again to catch any non fire values that may have passed
my_matrix_nonzero <- Filter(function(x) x != 0, attribute_list)

# Create a histogram of the fire frequency per month for one year 
ggplot()+aes(attribute_list[[1]]) +
  geom_histogram(binwidth = 1, fill = 'orange', alpha = 0.7) + 
    scale_x_continuous(breaks = 1:12, labels = month.name) +
    ggtitle("Histogram of Fire Frequency over Months")+
      labs(y = 'Frequency', x = 'Months') +
  theme(axis.text.x = element_text(angle = 90))

3.5 Heatmap

# Create a list to store the dataframes to
attribute_list = list()

# Loop through shapefiles and extract attributes
for (i in 1:length(shp_list)) {
  # Extract the month values
  attribute_value <- shp_list[[i]]$ig_month
  # Filter out all of the fire data, to only include that in the list
  attribute_value <- Filter(function(x) x != 0, attribute_value)
  # add the month numbers to a list
  attribute_list[[i]] <- attribute_value
}

# generate a list of years
years <- 2001:2011

# Create a dataframe with 1 to 12 in the first col
month_df <- data_frame(month = 1:12)

# Loops through each years month list
for (i in 1:length(attribute_list)) {
  months <- attribute_list[[i]]
  f_table <- aggregate(list(values_list = months), by = list(month = months), FUN = length)
f_table <- setNames(f_table, c("month", i))
month_df <- merge(month_df, f_table, by = "month", all.x = TRUE)
}
month_df[is.na(month_df)] <- 0

month_melt <- melt(month_df, id.vars = "month")

ggplot(month_melt, aes(x= variable, y = month, fill = value)) +
  geom_tile() +
  scale_y_discrete(limits = month.name) +
  scale_fill_gradient(low = "white", high = "orange")+
  labs(title = "Fire Frequency Heatmap",
       x = "Year",
       y = "Month",
       fill = "Frequency") 

# Exploratory spatio-temporal analysis ## Multicollinearity Testing for collinearity is important before data is used to build an ANN. If there is a high degree of correlation between the input variables they may provide redundant information, leading to a model that is overfitted; that performs well on the training data but poorly on test data. Collinearity was tested on the climate variables, as these are most likely to be highly correlated. This was done using the car package and extracting the VIF (variance inflation factor) values.

# Select one year to test 
mc_shp <- shp_list[[2]]
# Select the variables to be tested
mc_shp <- subset(mc_shp, select= c("pop_est", "dist_roads", "Elevation", "tp", "t2m",
    "d2m", "si10", "swvl1", "tot_ar_km2"))
# Run the lm function from the car library
model <- lm(tot_ar_km2 ~ tp+t2m+d2m+si10+swvl1, data=mc_shp)

# Create a table to display the VIF results
r <- as.data.frame(vif(model))
colnames(r)[1] <- "VIF"
knitr::kable(r)
VIF
tp 2.928386
t2m 6.996844
d2m 6.767041
si10 1.929448
swvl1 6.256030

3.6 Spatial Autocorrelation

3.6.1 Global morans i

Global Moran’s I: Moran’s I is a measure of spatial autocorrelation; it is a measure of how clustered or dispersed the variables are across an area. It is important to test for autocorrelation in data as it can provide an insight into the distribution of data. The Moran I coefficient can range from -1 to 1; values lose to 1 show positive spatial autocorrelation, while values close to -1 show a negative spatial correlation.
Using the moran.mc() function the global Moran I was calculated, this uses Monte Carlo simulation to calculate the Moran coefficient in each region of the study area and compares this to the observed value. The Moran I Statistic and P value is displayed in figure 4. The Moran Coefficients produced were positive and relatively close to 1, meaning the land cover, burned area and wildfire year are positively spatially autocorrelated. The P value for each observation also lies below the 0.05 threshold, meaning we are 95% confident in the Moran statistic.

# Reading in shapefile of combined data, to speed up compilation time
combined_shp <- readOGR(dsn = "Combined/master2.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\jourd\OneDrive\Documents\UCL\STDM_2\Combined\master2.shp", layer: "master2"
## with 15013 features
## It has 81 fields
## Integer64 fields read as strings:  Join_Count TARGET_FID JOIN_FID OBJECTID Join_Cou_1 TARGET_F_1 OBJECTID_1 Join_Cou_2 TARGET_F_2 dist_roads OBJECTID_2 Join_Cou_3 TARGET_F_3 Elevation OBJECTID_3 Join_Cou_4 TARGET_F_4 X Y
#Create a spatial weights matrix:
set.ZeroPolicyOption(TRUE)
## [1] FALSE
fire.W <- nb2listw(poly2nb(combined_shp),zero.policy = TRUE)


# Moran I test land cover
#mt_lc <- moran.test(combined_shp$lc_code, fire.W)
mc_lc <- moran.mc(combined_shp$lc_code, listw=fire.W, nsim=9999)

# Moran I test burned area
#mt_area <- moran.test(combined_shp$tot_pix, fire.W)
mc_area <- moran.mc(combined_shp$tot_ar_km2, listw=fire.W, nsim=9999)

# Moran I test year
#mt_year <- moran.test(combined_shp$ig_year, fire.W)
mc_year <- moran.mc(combined_shp$ig_year, listw=fire.W, nsim=9999)

# Extracting the moran statistic and P value
statistic <- as.double(c(mc_lc$statistic, mc_area$statistic, mc_year$statistic))
p_value <- as.double(c(mc_lc$p.value, mc_area$p.value, mc_year$p.value))

# Creating a dataframe of the results to display in a graph
df <- data.frame(Statistic = c(statistic), P_Value = c(p_value))
rownames(df) <- c("Land Cover", "Burned Area", "Wildfire Year")

knitr::kable(df, format = 'pipe')
Statistic P_Value
Land Cover 0.6523640 1e-04
Burned Area 0.4492389 1e-04
Wildfire Year 0.6436802 1e-04

3.6.2 Local morans i

# Below code has been adapted from the geocomputation tutorial notebook
moranCluster <- function(shape, W, var, alpha=0.05, p.adjust.method="bonferroni")
{
  # Code adapted from https://rpubs.com/Hailstone/346625
  Ii <- localmoran(shape[[var]], W)
  shape$Ii <- Ii[,"Ii"]
  shape$Iip <- Ii[,"Pr(z != E(Ii))"]
  shape$sig <- shape$Iip<alpha
  # Scale the data to obtain low and high values
  shape$scaled <- scale(shape[[var]]) # high low values at location i
  shape$lag_scaled <- lag.listw(W, shape$scaled) # high low values at neighbours j
  shape$lag_cat <- factor(ifelse(shape$scaled>0 & shape$lag_scaled>0, "HH",
                                 ifelse(shape$scaled>0 & shape$lag_scaled<0, "HL",
                                        ifelse(shape$scaled<0 & shape$lag_scaled<0, "LL",
                                               ifelse(shape$scaled<0 & shape$lag_scaled<0, "LH", "Equivalent")))))
  shape$sig_cluster <- as.character(shape$lag_cat)
  shape$sig_cluster[!shape$sig] <- "Non-sig"
  shape$sig_cluster <- as.factor(shape$sig_cluster)
  results <- data.frame(Ii=shape$Ii, pvalue=shape$Iip, type=shape$lag_cat, sig=shape$sig_cluster)

  return(list(results=results))
}

clusters <- moranCluster(combined_shp, W = fire.W, var="ig_year")$results
combined_shp$Ii_cluster <- clusters$sig

tm_shape(combined_shp) + tm_polygons(col="Ii_cluster")

4 Methodology

5 ANN

5.0.1 Predictor variables

Variable Description
pop Population density at point
Elevation Elevation at point
dist_roads Distance to the nearest road
ig_Month Month of wildfire
lc_Code Land type for each point
tp total precipitation
d2m Average dewpoint temperature
t2m Average surface temperature
si10 10m wind speed
swvl1 Volumetric soil water layer
pev Potential Evaporation

5.0.2 Method Description

The first supervised machine learning technique utilised was an Artificial Neural Network (ANN). We utilised a simple model with one hidden layer a sigmoid activation function and linear output. From this model we hope to be able to predict the size of a wildfire based upon the predictor values seen in figure 3. The structure of the network can be seen in Figure X. where I1:13 represent the predictor variables, H1:9 are the 9 neurons in the hidden layer and O1 is the output layer. B1 and B2 represent the bias introduced into the network, this is a number that is added to the weights of the input and the activation function. It is effectively like an intercept in a linear regression model.

5.0.3 Way data was divided

To undertake regression analysis on the size of wildfires, only the grid cells that contained an historic wildfire were extracted, along with the climate, anthropogenic and landscape data. These records were then combined into one database (containing the attributes in figure 3). From this database the training and validation subsets can be taken, we chose to train on 60% of the data. The data then had to be normalised; for this experiment the data was normalised, so it had a mean of 0 and a variance of 1, this was done using the built-in R function scale(). Normalisation helps improve the convergence rate of the model; this is because the predictor variables may have different scales which can lead to suboptimal convergence rates.

5.0.4 Parameter selection

To select the most optimal parameters for the model we used a hyperparameter grid, here we specify a range of values for the number of artificial neurons in the hidden layer and the decay rate, which controls the amount of regularisation applied to the weights. We can then run the model on each combination of parameters in the grid and calculate the RMSE of each output. We then used a k-fold cross validation with 10 folds and 3 repeats, this is important because it will estimate how well the model will perform on unseen data. We applied a sigmoid function to the model using act.fct = “sigmoid”, this maps the input values to an output value using a sigmoidal function, limiting the output to between 0 and 1. This sigmoid function adds some non-linearity to the hidden layer of the model, it allows the model to learn more complex patterns. Linout = TRUE was set for the model to change the output from logistic to linear. Finally, each combination of parameters was tested on the model and a mean RMSE score is returned. The optimal parameters were shown by the lowest RMSE. We tested a network with 2 hidden layers, but this was more prone to overfitting.

5.0.5 Visualising the network

source_url('https://gist.githubusercontent.com/fawda123/7471137/raw/466c1474d0a505ff044412703516c34f1a4684a5/nnet_plot_update.r')
plot.nnet(fire.nnet.cv$finalModel)

### Performance of the Model The implementation of the ANN was relatively straightforward, this was because this experiment used a simple ANN with just one hidden layer. The observed values were plotted against the predicted values for the test subset. This plot shows a slight y=x relationship, particularly around the origin where the majority of the points are located. This is potentially due to the varying wildfire sizes, our dataset contained a larger number of smaller scale fires than larger scale; so it is understandable that we have a larger number of points here. Furthermore, horizontal artifacts can be seen within the distribution of points, this is particularly evident in the larger fire sizes. The accuracy of the model was tested using RMSE and R2 values, these values can be seen in figure 15. An RMSE close to 0 would indicate that the model is making good predictions, meaning we can predict wildfire size from predictor variables. The optimal parameters gave an RMSE of 0.6, meaning on average the predictions differed from the observed values by 0.6 units. The range of our dependant variable was -0.4 to 4.7 meaning the RMSE is reasonably good. The R2 of the model was 0.6, this indicated that 60 amount of the variance in the size of a wildfire can be explained by the climate, landscape and anthropogenic predictor variables. After denormalising the RMSE value we contextualise the error in the model. This was on average: X km2. The overall performance of the model was reasonable but would not be a very reliable model for government work.

# Modified from the Spatial Temporal Notebook
fire.test <- (testing_data[,-15])

pred<-predict(fire.nnet.cv$finalModel, fire.test)

y <- testing_data$tot_ar_km2
max(testing_data$tot_ar_km2)
## [1] 5.220514
ggplot()+ aes(x = pred, y = y) +
  geom_point(size = 1) +
  xlab("Observed") +
  ylab("Predicted") +
  theme(aspect.ratio = 1) +
  geom_abline(slope=1, intercept=0)

testing_data$difference <- abs(testing_data$tot_ar_km2-pred)

ggplot(testing_data, aes(x = coords.x1, y = coords.x2, color = difference)) + geom_point() + scale_color_gradient(trans="log10")

6 lstm

6.0.1 Method desciption

The second supervised machine learning technique utilised was an LSTM. This is a more advanced model that is used in deep learning methods. We started by building a simple multivariate LSTM from the predictor variables and built upon this to try and constrain the best parameters for the model. This model architecture was chosen as LSTM’s can avoid the vanishing gradient problem (Nunes et al, 2019) ### Way the data was divided For this experiment we used an LSTM to do time series forecasting, so the first step of analysis was to plot the wildfire size based upon their data. This yielded a timeseries showing the variation in wildfire size throughout the years, from this we can formulate the experimental problem. We wish to be able to predict the wildfire size of the following month, meaning we need our model to be able to predict the size of the wildfire for the next month. As the data does not have a predictable frequency of data collection, i.e there is not a defined amount of time between observations, we must define the timestep manually. Our data has a lower temporal resolution so using a timestep of single days may yield poor results. A timestep of 10 should be suitable, giving us 3 timesteps per month of data. We must first pre-process the data by creating a matrix from the dataframe and normalising the data before feeding it into the model by using the R scale() function.

6.0.2 Parameter selection

We must determine a few model parameters; loopback = number of back steps we want the input data to go, delay = the number of steps into the future. These will vary depending upon the experimental problem. For our model we selected a loopback of 50 and a delay of 10. The model implementation was a recurrent neural network with the first layer being an LSTM layer with 32 nodes, this is then fed into a dropout layer with 30% dropout. Next, it is passed into a dense layer with 32 nodes and a ‘relu’ activation function. Finally, the last layer is a dense layer with one node and a ‘sigmoid’ activation function, this represents the output layer. The selection of these layers was done by experimenting with different combinations to find the optimal architecture.

p2

# The code for creating and running the LSTM was modified from (François Chollet, 2017)
# https://blogs.rstudio.com/ai/posts/2017-12-20-time-series-forecasting-with-recurrent-neural-networks/

generator <- function(data, lookback, delay, min_index, max_index,
                      shuffle = FALSE, batch_size, step) {
  if (is.null(max_index))
    max_index <- nrow(data) - delay - 1
  i <- min_index + lookback
  function() {
    if (shuffle) {
      rows <- sample(c((min_index+lookback):max_index), size = batch_size)
    } else {
      if (i + batch_size >= max_index)
        i <<- min_index + lookback
      rows <- c(i:min(i+batch_size-1, max_index))
      i <<- i + length(rows)
    }

    samples <- array(0, dim = c(length(rows),
                                lookback / step,
                                dim(data)[[-1]]))
    targets <- array(0, dim = c(length(rows)))
                      
    for (j in 1:length(rows)) {
      indices <- seq(rows[[j]] - lookback, rows[[j]]-1,
                     length.out = dim(samples)[[2]])
      samples[j,,] <- data[indices,]
      targets[[j]] <- data[rows[[j]] + delay,2]
    }           
    list(samples, targets)
  }
}


lookback <- 20
step <- 2
delay <- 10
batch_size <- 128

train_max_ind = 4000
val_max_ind = 5500

train_gen <- generator(
  data,
  lookback = lookback,
  delay = delay,
  min_index = 1,
  max_index = train_max_ind,
  shuffle = TRUE,
  step = step, 
  batch_size = batch_size
)

val_gen = generator(
  data,
  lookback = lookback,
  delay = delay,
  min_index = train_max_ind+1,
  max_index = val_max_ind,
  step = step,
  batch_size = batch_size
)

test_gen <- generator(
  data,
  lookback = lookback,
  delay = delay,
  min_index = val_max_ind+1,
  max_index = NULL,
  step = step,
  batch_size = batch_size
)
# How many steps to show the whole validation dataset
val_steps <- (val_max_ind+1 - train_max_ind - lookback) / batch_size

# How many steps to show whole data of test dataset
test_steps <- (nrow(data) - val_max_ind - lookback) / batch_size

6.1 MLP

model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(lookback / step, dim(data)[-1])) %>%
  layer_dense(units = 32, activation = "relu",  kernel_regularizer = regularizer_l2(0.01)) %>%
  layer_dense(units = 1)

model %>% compile(
  optimizer = optimizer_rmsprop(lr = 0.07),
  loss = "mae"
)



history <- model %>% fit_generator(
  train_gen,
  steps_per_epoch = 200,
  epochs = 10,
  validation_data = val_gen,
  validation_steps = val_steps
)
plot(history)

6.2 LSTM MODEL

model <- keras_model_sequential() %>% 
  bidirectional(
    layer_lstm(units = 6), input_shape = list(NULL, dim(data)[[-1]]),
  ) %>% 
  layer_flatten() %>%
  layer_dense(units = 32, activation = "relu",  kernel_regularizer = regularizer_l2(0.01)) %>%
  layer_dense(units = 1)

model %>% compile(
  optimizer = optimizer_rmsprop(lr = 0.1),
  loss = 'MAE',
)

history <- model %>% fit_generator(
  train_gen,
  steps_per_epoch = 100,
  epochs = 10,
  validation_data = val_gen,
  validation_steps = val_steps
)

plot(history)

6.2.1 Performance of the model

The model performance was analysed by plotting the validation and training loss on a graph throughout training. This way we can visualize how the loss improves or degrades through the training process. A good model will show the validation and training loss decrease and both lines will have similar trajectories. Our model showed the validation and training loss lines lie within similar loss ranges, this shows the model is not overfitting and makes a reasonable attempt at prediction. However, the MAE values are quite large 0.9 on average. This means that the model is not predicting the output layer fully with the predictor variables. The decrease in loss from both validation and loss through training suggests the model is learning how to predict the size of wildfire. The scale of the burned area ranged from -0.3 to 7, with skew in the higher numbers. This suggests that the model is able to predict the size of wildfires with an error of 0.9 units on average but this is overall not an accurate model. To convert the MAE to RMSE we multiple it by the number of observations in the validation set (1500) and square root this. This yielded an RMSE of 36.7.

7 Discussion

From our analysis the LSTM model provided a better prediction of the wildfire size, this was based on the RMSE value produced from the optimal parameters. The structure of the LSTM makes it suitable architecture to analyse multivariate timeseries, especially because the wildfire data has a long history of results. However, both models did not produce an accurate enough prediction for use in government work. This is likely because the climate data gathered just represented a mean summer result instead of daily climate values. If each wildfire contained climate data taken on the given day then the model may be more accurate. Furthermore, both models were run on the entire state of California, we showed in figure 8 that there is a clear divide in climate data between the North and South of California. Potentially splitting the data into North and South and running 2 independent models would yield better results. Furthermore, temporal analysis on the data as the equations for temporal autocorrelation did not show any meaningful results. The implementation of the ANN made it easier to interpret the results. In just a couple lines of codes the observed vs predicted graph can be plotted and the RMSE, R2 and MAE values can be shown. It is a little harder to produce the same error checking result with the LSTM but it also yields a RMSE and Loss graph. The implementation of the ANN was very easy, little pre-processing of the data was required to produce a satisfactory result. Hyper parameters could be adjusted with simple modifications in code and the implementation of the ANN allowed accuracy checking to be done. The LSTM was a little harder to implement as it requires the data to be slightly more pre-processed. Furthermore, it requires the user to manually select important information such as the timestep and loopback, which could lead to experimental errors. The running time of the LSTM was also slightly shorter than the ANN but this could have been due to the differing number of epochs and steps per epoch between the two models. We also had to run the ANN on the hyperparameter grid each time we adjusted the data pre-processing, this increased computation time. Another benefit of the ANN model is that we can analyse how it performed across the study area. By taking a difference between the observed and predicted values we can plot the accuracy spatially. From figure 12 you can see that it performed pretty well across the study area, some areas especially along the North coast did not perform as well as inland California. This may be due to the record of climate data. How could they be improved: As the size of a wildfire varies between a finite range we could improve the predictive power of the models by classifying the fires into several groups. We could have 5 groups ranging from small fires to very large fires and instead allow the models to predict which class the wildfire would lie within. Furthermore, we could split the wildfires into small and large fires and run the model on both sets of data, it is possible that the mechanisms that control the spread of a large fire differ from those that control a smaller one.