Writeup: Behavioral Cloning

Posted on Mar 31, 2017

This one-to-one copy of my writeup for Udacity follows the format in the writeup template, hopefully making the writeup more thorough and easier to grade. It follows a pretty strict format, but includes a lot of interesting content.

Behavioral Cloning Project

The goals / steps of this project are the following:

Use the simulator to collect data of good driving behavior
Build, a convolution neural network in Keras that predicts steering angles from images
Train and validate the model with a training and validation set
Test that the model successfully drives around track one without leaving the road
Summarize the results with a written report

Rubric Points

Here I will consider the rubric points individually and describe how I addressed each point in my implementation.

Files Submitted & Code Quality

1. Submission includes all required files and can be used to run the simulator in autonomous mode

My project includes the following files:

model.py containing the script to create and train the model
drive.py for driving the car in autonomous mode
model.h5 containing a trained convolution neural network
writeup_report.md or writeup_report.pdf summarizing the results
video.mp4 showing the car driving

2. Submission includes functional code

Make sure you have similar versions of the following packages:

Keras 2.0.1
Python 3.5.2
TensorFlow 0.12.1
OpenCV 3.1.0
Scikit Learn 0.18.1
Numpy 1.12.0

Using the Udacity provided simulator and my drive.py file, the car can be driven autonomously around the track by executing:

python3 drive.py ./model.h5

3. Submission code is usable and readable

The training script model.py defines the model and runs it against the recorded data. After training, it saves the model into a file named “./model.h5”. It also saves checkpoints after every epoch as well as TensorBoard data.

Model Architecture and Training Strategy

1. An appropriate model architecture has been employed

The model architecture consists of a modification of LeNet with deeper convolutional layers and a few applications of dropout.

The layer architecture is as follows:

Layer	Layer Kind	Size	Stride	Kernel Count	Padding
1	Convolution2D	(5, 5)	(1, 1)	20	Same
2	ReLU
3	MaxPooling2D	(2, 2)	(2, 2)		Valid
4	Convolution2D	(5, 5)	(1, 1)	50	Same
5	ReLU
6	MaxPooling2D	(2, 2)	(2, 2)		Valid
6	Convolution2D	(3, 3)	(1, 1)	70	Same
7	ReLU
8	MaxPooling2D	(4, 4)	(1, 1)		Valid
9	Flatten
10	Dense	120
11	ReLU
12	Dropout	0.5
13	Dense	84
14	ReLU
15	Dropout	0.5
16	Dense	1

2. Attempts to reduce overfitting in the model

The model contains dropout layers in order to reduce overfitting (Layers 12 and 15, lines 158 and 161 in the source code). As well as that, the data is shuffled after every epoch and augmented with multiple methods.

3. Model parameter tuning

The only hyperparameter to tune while improving the model was the left-right steering offset when using left or right images. After experimenting, I left it at 0.25.

The learning rate was automatically scaled using the Adam optimizer.

4. Appropriate training data

I used Udacity training data to learn for Track 1, and my own recordings for Track 2. I augmented the training data by using left-right images to allow recovery learning and by flipping the image horizontally.

Model Architecture and Training Strategy

1. Solution Design Approach

The model architecture was designed iteratively, but held back a lot by my issues with the “plumbing” of loading images into the model while training or driving. After a rewrite to a simpler LeNet-like approach, I achieved smaller losses and better performance on the road.

The neural net had to be convolutional, due to the convolutional network’s power in detecting image features such as shapes of lanes or the curvature of the road. I used three layers of convolutions to allow the model to build a higher understanding of what it “sees”, as well as a few fully connected layers for regression learning.

The model’s image data was split into a training and validation set using Scikit Learn and run through a batch generator when actually training.

In my first few approaches, the model was training well (often reaching up to 50% accuracy) but validating badly and driving horribly. This indicated an overfit as well as issues with the pipeline.

After a rewrite, the model was simple enough again to run on my laptop and train within reasonable time. The first iteration after the rewrite got past the “curve” and fell off the road right after, indicating issues with data selection (seems like it only used the first ~1000 frames to train).

After shuffling the data properly and improving the data pipeline, the training accuracy improved but the model was still doing badly in curves and on the bridge.

I added another convolutional layer to improve with higher-level feature detection, and after that, the model’s performance improved by a lot.

The vehicle’s driving behavior with the new model was far better, going around the track after training with just 1000 frames. I increased the size of the dataset and trained again, which improved the behaviour even more.

As an experiment, I also recorded data on Track 2 and trained an individual model on those frames alone. The model’s behaviour on Track 2 was impressive and it circumnavigated the track on the first try, however did not stay in the right lane all the time (often cutting around corners in the other lane).

At the end of the process, the vehicle is able to drive autonomously around Track 1 without leaving the road. Around Track 2, performance is still somewhat spotty, but it manages to drive far better than I assumed.

Here’s some stats about the trained model.

Model loss over epochs:

Model validation loss over epochs:

Model validation accuracy over epochs:

2. Final Model Architecture

The final model architecture is described in the table above as well as in the code (model.py lines 144-163).

Here’s a TensorBoard visualization of the model. (It didn’t want to let me download a full-size PNG, so it is a screenshot)

3. Creation of the Training Set & Training Process

The training dataset I used to train Track 1 was from Udacity itself. The Track 2 data I recorded myself.

I recorded two laps of Track 2 data, trying to drive as smoothly as possible, but failing and recovering due to the complexity of the track. This might be beneficial in order to allow the network to learn recovering on its own even more.

During runtime, the dataset was normalized and preprocessed like this:

Load an image by randomly taking either the left, right or center image and offsetting the steering angle as necessary.
Crop the image by removing the top 50 and bottom 25 pixels. image = image[ 50 : image.shape[0] - 25, :]
Transform the image’s color space from BGR to RGB. (For fun, here’s a HSV image where the plotter assumes it’s RGB.)
Resize the image to 64x64x3 using inter-area interpolation.
Normalize the image data using image = image / 255.0 - 0.5
On a coin flip, decide to horizontally flip the image or pass it as-is to the model.

The source code for it can be found in model.py lines 63-104.

Short Data Analysis

The model data consists of 8036 entries and a total of 24108 images.

Here’s some base parameters of the data:

The steering angles are distributed as follows. Note how much the Udacity data biases zero steering over any other angle. Seems like it was gathered while driving with the keyboard. However, this doesn’t affect my model, so I haven’t taken action to reduce its occurrence in the training set.

The data consists of 160x320 images that look like this:

After preprocessing, the images look like this:

Steering angles over time look like this: