Detecting and counting objects with Neural Network - Part I

A couple of month ago, I volunteer to help on a project that consists in creating a script that could count how many rootless a photograh has and what is their length. What was I thinking?

Objects recognition and objects counting are defintely something for neural network to do.

Although I had quite a few dig in machine learnig, I seldom used neural network. Keras having been ported on R recently thanks to the Rstudio team, it was time to use neural network in R and see what we could do. The initial task being too big to start, I thought to break it down in managing learning chunk.

First I decided to see if I could train a computer to recognize where a rectangle was on a blank canvas and what was its size. Neural network are notorious for needing big training data set. I did not have a big set of pictures of rectangle on canvas on my laptop, so let’s first create our training set.

Create a dataset of images

Using the power of functional programming (what R is good at), we can generate thousands of random image on a 8 x 8 plots, then save them in a different folder. Inspiration to make ggplot2 purrr can be found here

library(tidyverse)

num_plot_train <- 5000

This part we only evaluate once.

# Create a plot which is balck rectangle on white canvas  
create_basic_rectangle <- function(xmini, xmaxi, ymini, ymaxi){
  ggplot() + 
  geom_rect(mapping = aes(xmin = xmini, xmax = xmaxi, ymin = ymini, ymax = ymaxi)) + 
  coord_cartesian(xlim = c(0, 8), ylim = c(0,8)) + 
  theme(panel.background = element_blank(), 
        axis.text = element_blank(), 
        axis.ticks = element_blank())
}

# Create a df that consists of random x-min, x-max, y_min, y_max
df_rectangle <- tibble(x_min = sample(0:7, num_plot_train, replace=TRUE), 
             y_min = sample(0:7, num_plot_train, replace=TRUE), 
             id = 1:num_plot_train) %>% 
  mutate(x_max = map_int(x_min, function(.x) if_else(.x == 7, as.integer(8), sample((.x + 1):8, 1))), 
         y_max = map_int(y_min, function(.x) if_else(.x == 7, as.integer(8), sample((.x + 1):8, 1))), 
         width = x_max - x_min, 
         height = y_max - y_min)

all_plots <- pmap(list(df_rectangle$x_min, df_rectangle$x_max, df_rectangle$y_min, df_rectangle$y_max), create_basic_rectangle)

yo <- map2(paste0("~/Documents/Keras/basic_rectangles/basic_plot", 1001:(1000+num_plot_train), ".png"), 
           all_plots, ~ ggsave(.x, .y, width = 4, height = 4, dpi = 4, device='png'))


rm(yo)
rm(all_plots)

Et voila … We now have 5000 small plots ready in our folder for our neural network to train on. Well there are several considerations here. The bigger is your pictures the more training pictures you’ll need. If you get a 4 x 4 units picture with 8 dpi that will make a 32 by 32 pixel matrix. This translates into a 1024 long vector for each pictures. So you’ll probably need a lot more training pictures. In this case, I suggest to not get integer values anymore as this will create duplicates.

Because this previous step is rather time consuming (well depending on the specs of your machine), let’s save our df_rectangle data frame into .csv so we can re-use it at a later stage without having to re-create all our rectangle plots.

write_csv(df_rectangle, "df_rectangle.csv")

Transform our images into matrix

Now that we have our images all cleaned up in our folder, let’s make a function to transform them into a format that our NN will be able to read. That is a normalized vector of data for each picture. We then create a training and testing set.

library(keras)

df_rectangle <- read_csv("df_rectangle.csv")

# This function help loading all the images and divide the bits by 256
# The maximum on gray scale is 256.  
load_basic_plot <- function(file_path){
  yo  <- image_load(file_path, grayscale = TRUE) %>% 
    image_to_array(data_format = "channels_last")
  yo <- yo / 255
  yo
}

# Loading all the plot files
df_img <- tibble(id = 1:num_plot_train, 
                   file_path = list.files("~/Documents/Keras/basic_rectangles", pattern = "\\.png", 
                                          full.names = TRUE), 
                   im_data = map(file_path, function(.x) load_basic_plot(.x)))

# Transforming each pcitures into a single vectors.  
# Because each canvas has been saved on a 4 x 4 units with each units being 4 dpi, that puts us to a 16 x 16 units with is 256.  
x <- df_img$im_data %>% array_reshape(., dim = c(num_plot_train, 256))

# We normalise the data
x_norm <- (x - mean(x)) / sd(x)

# Using caret to create our training / testing partition
param_split <- caret::createDataPartition(df_img$id, p = 0.95, list = FALSE)

# We create a training and testing sets.  
x_train <- x_norm[param_split, ]
y_train <- df_rectangle[param_split, ] %>% select(x_min, y_min, width, height) %>% as.matrix()

x_test <- x_norm[-param_split,  ]
y_test <- df_rectangle[-param_split, ] %>% select(x_min, y_min, width, height) %>% as.matrix()

# This we do to only have value between 0 and 1 for the rectangle position and size. 
y_train <- y_train / 8
y_test <- y_test / 8

Create a basic neural network

We will first use a very simple feed forward neural network as described in this blog

# Starting with a blank model
single_rectangle_model <- keras_model_sequential()

# Build the model
single_rectangle_model %>% 
  layer_dense(units = 256, activation = "relu", input_shape = c(256)) %>% 
  layer_dropout(rate = 0.2) %>% 
  # The output is a vector of units 4
  layer_dense (4)

# Adding the optimizer and loss function
single_rectangle_model %>% 
  compile(
    loss = "mean_squared_error", 
    optimizer = optimizer_adadelta()
  )



summary(single_rectangle_model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 256)                   65792       
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 256)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 4)                     1028        
## ===========================================================================
## Total params: 66,820
## Trainable params: 66,820
## Non-trainable params: 0
## ___________________________________________________________________________

Now we can run the model

# Fitting the model
history <- single_rectangle_model %>% 
  fit(
    x_train, y_train,  
    epochs = 50, 
    validation_split = 0.2)
    

plot(history)

Use the model to make prediction. This is out of training set prediction.

y_actual <- df_rectangle[-param_split, ]

y_pred <- single_rectangle_model %>% predict(x_test)

# We need to put back the output values bewtween 0 and 8
y_pred <- (y_pred * 8) %>% as_tibble()
colnames(y_pred) <- c("pred_x", "pred_y", "pred_width", "pred_height")

Visualizing results and IOU

A common way to measure the accuracy of such a model is to calculate the IOU (aka intersection over Union). We take the area of the intersection of the 2 rectangles and we divide it by the area of the union of the 2 rectangles. The IOU is then always between 0 (no overlap) and 1 (perfect overlap). The highest the IOU the better.

#Function to calculate the IOU
iou <- function(i){
  intersecting_width = min((y_pred$pred_x[i] + y_pred$pred_width[i]), (y_actual$x_min[i]) + y_actual$width[i]) - 
    max(y_pred$pred_x[i], y_actual$x_min[i])
  intersection_height = min((y_pred$pred_y[i] + y_pred$pred_height[i]), (y_actual$y_min[i]) + y_actual$height[i]) - 
    max(y_pred$pred_y[i], y_actual$y_min[i])
  
  intersection <- intersecting_width * intersection_height
  #union is already taken by dplyr, so we'll go with onion instead ;-(
  onion <- (y_pred$pred_width[i] * y_pred$pred_height[i]) + (y_actual$width[i] * y_actual$height[i]) - intersection
  
  IOU <- if_else((intersecting_width <= 0 | intersection_height <=0), 0, intersection / onion)
  
  return(IOU)
  
}
i = sample(1:length(y_actual), 1)

ggplot() + 
  geom_rect(mapping = aes(xmin = y_pred$pred_x[i], ymin = y_pred$pred_y[i], 
                          xmax = y_pred$pred_x[i] + y_pred$pred_width[i], ymax = y_pred$pred_y[i] + y_pred$pred_height[i]), 
            fill = alpha("red", 0.35)) + 
  geom_rect(mapping = aes(xmin = y_actual$x_min[i], ymin = y_actual$y_min[i], 
                          xmax = y_actual$x_max[i], ymax = y_actual$y_max[i]), fill = alpha("grey", 0.8)) + 
  annotate("text", x = y_actual$x_min[i] + 0.3, y = y_actual$y_max[i] + 0.2, label = round(iou(i), 2)) + 
  annotate("text", x = 0.8, y = c(0, 0.5), label = c("red rect = prediction", "grey rect = actual    "), color = c("red", "grey30")) + 
  coord_cartesian(xlim = c(0, 8), ylim = c(0,8))

To check the accuracy of our model, we can calculate the mean IOU when applying our model for the whole testing set.

yo <- tibble(id = 1:nrow(y_actual)) %>% 
  mutate(iou_value = map_dbl(id, function(.x) iou(.x)))
mean(yo$iou_value)
## [1] 0.7814121

Improvements

How could we immprove these results? What did you get as results? I would love if you could share them with me.

By changing a bit our hidden layer, we get already a slight improvment in our IOU rate.

single_rectangle_model <- keras_model_sequential()

single_rectangle_model %>% 
  layer_dense(units = 500, activation = "relu", input_shape = c(256)) %>% 
  layer_dropout(rate = 0.2) %>% 
  layer_dense (4)

single_rectangle_model %>% 
  compile(
    loss = "mean_squared_error", 
    optimizer = optimizer_adadelta()
  )

history <- single_rectangle_model %>% 
  fit(
    x_train, y_train,  
    epochs = 80, 
    validation_split = 0.2)

y_pred <- single_rectangle_model %>% predict(x_test)
y_pred <- (y_pred * 8) %>% as_tibble()
colnames(y_pred) <- c("pred_x", "pred_y", "pred_width", "pred_height")

yo <- tibble(id = 1:nrow(y_actual)) %>% 
  mutate(iou_value = map_dbl(id, function(.x) iou(.x)))

paste0("The mean IOU for this model is ", mean(yo$iou_value))
## [1] "The mean IOU for this model is 0.82630163070312"

My next post would be about different shape, different colours and counting them. I am also especially interested in using the polygon function in ggplot2 to start counting and measuring them. Any advices there would also be appreciated.