Neural Network from scratch: Part 4; control a drone with hand signals

Python project.

This article shows how to build a dataset for a specific task and train a neural network through the homemade Deep Learning framework implemented in the previous articles. This neural network will allow controlling a drone with hand signals.

We will create a dataset of hand signals, train several neural networks on it and convert their outputs to commands. The full pipeline will be connected to an Anafi drone to control it through an USB camera.

GitHub link: https://github.com/Apiquet/DeepLearningFrameworkFromScratch

First, I will describe the project, then, I will share the code implemented to build the database. The next sections will be dedicated to the database management and the neural network’s training through the homemade framework implemented in the previous articles: https://apiquet.com/2020/07/18/deep-learning-framework-from-scratch-part-3/. Finally, we will see how to pilot an Anafi drone with Python and send command from the sign recognition network.

Table of contents

  1. Project
  2. Database creation
  3. Training
    1. Data pre-processing
    2. Declare and train a FCN
    3. Declare and train a CNN
  4. Control an Anafi drone with Python
    1. Connect and run commands on the drone
    2. Model’s outputs to drone commands
  5. Conclusion

1) Project

As hand sign recognition can be useful in many different projects, I decided to build a neural network for this task to illustrate the use of my homemade Deep Learning Framework. The network’s output will be then used to control a drone. I use a C270 USB camera turned to the ceiling to have a background easy to remove. I then built a simple dataset with my hands crossing different sides of the image to have 6 different commands:

These signs will be used for:

  1. to move forward,
  2. to go down,
  3. to do a 10 degrees rotation on z axis,
  4. to go up,
  5. to take off,
  6. to land,
  7. The network should also be trained on images without hands to teach the Idle (do nothing) command.

We will decide which sign belongs to which command in function of the confusion matrix (for instance, if the sign 3 is sometimes recognize as sign 4, we won’t use a critical command like move forward for the sign 4). We will also need to add negative images without any hand for the command “do nothing”. The next section will show how to build this dataset.

2) Database creation

To build the database we need a script that record a camera and save pictures in an appropriate format. As the neural networks created in the previous article performed well on digit recognition of size 28x28x1, we do not need to use more complex image type. Our script will binarize images and resize them to 28×28, this will also speed up the training and inferences. The script should have the following options:

  • Number of classes: 7 for 6 signs and the negatives (-n)
  • Output path to store the images (-o)
  • Camera ID if several cameras are available (m)
  • Bunch of options to modify the images before saving:
    • Crop (-c)
    • Resize (-r)
    • Grayscale (-g)
    • Binarize (-b)
    • Erode (-e)
    • Dilate (-d)

The script should do the following actions in sequence:

  • Show the input stream
  • Show the input stream with images options applied (crop, binarize, etc.)
  • Wait for a user command to start saving the images of the first class
  • Once the user sent the command, the first class images should be saved every N milliseconds
  • The user can pause at any time
  • The user can start saving the images of the next class with a command
  • All the image should have a filename with the following pattern: img_ImageNumber_ClassNumber.png

This script has no complex parts to explain and is available here: https://github.com/Apiquet/DeepLearningFrameworkFromScratch/blob/master/DB_creation/db_creation.py. The command I used to create the database is:

python db_creation.py -n 7 -o DB_path\ -r 28,28 -b 100,255 -g -e 3,2 -d 3,2 -m 0

The arguments are described above. This script convert the previous images to the following ones:

As we got good results on the previous articles with 10 digits classes and 3000 images, I have created a first database version with 200 images for each class so 1,400 images at all.

3) Training

3-1) Data pre-processing

To train a neural network, we need data in an appropriate format. In the Deep Learning framework implemented in the previous articles, the convolution layers needs images with format (B, C, W, H) and the train function needs labels with format (B): B is the batch size, C the number of channels, W width and H height of the images. Indeed, the train function takes care of the one-hot encoding task. Thanks to that, the data pre-processing is simple:

def load_data(db_path, train_ratio=0.7, img_is_gray=True):
    """
        Method to load the database
        N: number of images for a class
        C: number of images channels
        Args:
            - (str) database path
            - (float) train ratio
            - (bool) specify if data as only 1-channel
        Return:
            - (numpy array) images (N, C, H, W)
            - (numpy array) classes (N)
    """
    imgs_path = glob(db_path + "/*")
    number_of_imgs = len(imgs_path)
    train_number = int(number_of_imgstrain_ratio)
    random_idx = np.arange(number_of_imgs)
    np.random.shuffle(random_idx)

    train_imgs, train_labels = [], []
    test_imgs, test_labels = [], []

    for i, idx in enumerate(random_idx):
        img_path = imgs_path[idx]
        image = np.asarray(Image.open(img_path))
        if img_is_gray:
            image = np.expand_dims(image, 0)
            img_basename = os.path.splitext(os.path.basename(img_path))[0]
        if i <= train_number:
            train_imgs.append(image)
            train_labels.append(int(img_basename.split('_')[2]))
        else:
            test_imgs.append(image)
            test_labels.append(int(img_basename.split('_')[2]))
    return np.array(train_imgs)/255., np.array(train_labels),\
        np.array(test_imgs)/255., np.array(test_labels)

This code also split the dataset into train-test sets. To load the data, we now need to run:

train_imgs, train_labels, test_imgs, test_labels = DBManager.load_data("DB_path/")

To verify our loaded data, we can print some images with their labels thanks to matplotlib:

Everything looks good, the training can start.

3-2) Declare and train a FCN

To declare a FCN with our Deep Learning framework we can take as examples the following notebook: https://github.com/Apiquet/DeepLearningFrameworkFromScratch/blob/master/fcn_example.ipynb

We can see there how to flatten our images to get a shape appropriate to a fully-connected layer:

train_imgs_flatten = train_imgs.reshape([train_imgs.shape[0], np.prod(train_imgs.shape[1:])])
test_imgs_flatten = test_imgs.reshape([test_imgs.shape[0], np.prod(test_imgs.shape[1:])])

Then, we can declare a FCN as follow to have 3 layers with LeakyReLU activation functions and a batch normalization:

# Build the model
fcn_model = NN.Sequential([NN.Linear(input_size, hidden_size),
                           NN.LeakyReLU(), NN.BatchNorm(),
                           NN.Linear(hidden_size, hidden_size),
                           NN.LeakyReLU(), NN.BatchNorm(),
                           NN.Linear(hidden_size, num_class),
                           NN.Softmax()], NN.LossMSE())
 # Set the learning rate
 fcn_model.set_Lr(learning_rate)

Please note that a Cross Entropy loss is also available and more appropriate for this kind of task.

We can also print the model description:

Finally, we can start the training with the train function:

We get a close train and test errors, this is good but we still have 4.5% of errors on the test set. We can plot the confusion matrix to know which kind of mistakes the FCN does:

We can see the number 6 which means that the FCN confuses label 4 with label 2.

We can also see that labels 3, 4 and 5 are always well recognized so we can put the critical commands to these labels if we take this FCN model.

3-3) Declare and train a CNN

The main difference with the FCN is the lack of the flatten step. Indeed, a CNN can take as input the image in 3D: Channel, width, height. A CNN declaration example is available here: https://github.com/Apiquet/DeepLearningFrameworkFromScratch/blob/master/cnn_example.ipynb

We can declare one CNN with 2 convolution layers, LeakyReLU activation functions, Max Pooling, Batch Normalization and Linear layer as follow:

# Build the model
cnn_model = NN.Sequential([NN.Convolution(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size),
                           NN.LeakyReLU(), NN.MaxPooling2D(2),
                           NN.Convolution(in_channels=out_channels, out_channels=out_channels, kernel_size=kernel_size),
                           NN.LeakyReLU(), NN.Flatten(), NN.BatchNorm(),
                           NN.Linear((out_first_conv*<em>2)</em>out_channels, hidden_size), NN.LeakyReLU(), NN.BatchNorm(),
                           NN.Linear(hidden_size, num_class), NN.Softmax()], NN.LossMSE())
# Set the learning rate
cnn_model.set_Lr(learning_rate)

Please note that a Cross Entropy loss is also available and more appropriate for this kind of task.

We can then train it and get the confusion matrix

With the CNN we get only 2.14% of errors on the test set. The number 3 is the maximum confusion and means that the network confuses label 0 with label 1 in rare cases. We can use the label 1, 2, 4 and 5 for critical commands.

4) Control an Anafi drone with Python

A Github repository has all the code need to send commands to the drone: amymcgovern/pyparrot. Many thanks to amymcgovern for this, it allows me to save a lot of time. The complete documentation is available here: https://pyparrot.readthedocs.io/en/latest/.

Once the pyparrot module is download and installed, we will need to record the live stream from the USB camera, infer the neural network on it, and finally send commands to the drone in function of the neural network’s outputs. As the network does not work perfectly (2.14% of test error) we should not send commands to the drone at each image processed. We will create a buffer of images of a fixed length, for instance the last 5 images, and infer the model on it. Then, if the neural network outputs the same command for the 5 images, we plot the command into the console, finally, we send the command to the drone. As a last verification process, we could add an option in arguments to ask the script to wait 2s for the user to cancel the command if wanted (with ‘q’ key pressed).

4-1) Connect and run commands on the drone

The pyparrot module allows us to connect and send commands to the drone with very few lines. For instance, here is the code to connect, take off, move to coordinates (1,0), turn around of 45° and land:

from pyparrot.Anafi import Anafi
anafi = Anafi(drone_type="Anafi", ip_address="192.168.42.1")

print("Connecting...")
success = anafi.connect(10)
print(success)
print("Sleeping for 5s...")
anafi.smart_sleep(5)

print("Take off")
anafi.safe_takeoff(5)
anafi.smart_sleep(1)

print("Move to (1, 0)")
anafi.move_relative(dx=1,dy=0,dz=0,dradians=0)

print("Move to (1, 0)")
anafi.move_relative(dx=0,dy=0,dz=0,dradians=0.785)

print("Landing...")
anafi.safe_land(5)
print("DONE - disconnecting")
anafi.disconnect()

Thanks to this code, we can now link each neural network’s output to the wanted commands.

4-2) Model’s outputs to drone commands

To convert the neural network’s outputs to commands, I choose to declare a dictionary with (key, value) = (output, command). Thanks to this variable, we can easily change the correspondences between hand signals and actions. The following code implement to function to call when the neural network found N times the same command (this check is to lower the probability of sending a wrong command):

COMMANDS = {0: "move_forward", 1: "go_down", 2: "rot_10_deg",
             3: "go_up", 4: "take_off", 5: "land", 6: "idle"}

 def send_command(anafi, command_id):
     """
     Function to send commands to an Anafi drone in function of the command id
     """
     if command_id not in COMMANDS:
         raise f"Command id not in COMMANDS choices: {command_id}"
     if COMMANDS[command_id] == "idle":
         return
     print("The following command will be sent: ", COMMANDS[command_id])

     if COMMANDS[command_id] == "move_forward":
         anafi.move_relative(dx=1,dy=0,dz=0,dradians=0)
     if COMMANDS[command_id] == "move_forward":
         anafi.move_relative(dx=0,dy=0,dz=-0.5,dradians=0)
     if COMMANDS[command_id] == "move_forward":
         anafi.move_relative(dx=0,dy=0,dz=0,dradians=0.785)
     if COMMANDS[command_id] == "move_forward":
         anafi.move_relative(dx=0,dy=0,dz=0.5,dradians=0)
     if COMMANDS[command_id] == "move_forward":
         anafi.safe_takeoff(5)
     if COMMANDS[command_id] == "move_forward":
         anafi.safe_land(5)
     return

We can then create the script that run the model onn the live stream from the camera and convert the model’s outputs to commands thanks to the above function. As this script is very simple, I won’t explain it here, it is available on my github repository under drone_control/run_model_on_cam.py. I have also commit the tensorflow model weights. Using an USB camera turn to the ceiling, you should be able to run the following demo:

cd DeepLearningFrameworkFromScratch
python control_drone/run_model_on_cam.py -n 7 -r 28,28 -b 100,255 -g -e 3,2 -d 3,2 -m 1 -p control_drone/tf_model/ -t -a path/to/pyparrot/

Running the above command should give the next behavior: we keep a buffer of 3 images, we then infer the network on them and display the results to the terminal. If the 3 results are equal, we send the command to the drone (this is to avoid sending a wrong command if the network fails to classify a signal).

We can see that the good commands are sent for each hand sign. In this scenario, the drone take off, go up for 2m (4 times 0.5), move forward 3m (3 times 1m), rotate 90° (2 times 45°), go down for 1m (2 times 0.5m), and land.

Conclusion

This article concludes the series “Neural Network From Scratch”. We learned in the previous articles:

  • gradient descent in 2D and 3D cases,
  • how to build a Fully-Connected neural network using only NumPy:
    • linear layer, activation functions, loss, sequential module and training
  • how to create a non-linear dataset to test the models
  • how to build a full Deep Learning Framework:
    • Convolution, Flatten, Max and Mean Pooling layers,
    • Useful functions: save and load a model to deploy it somewhere, get its number of parameters, draw learning curves, print model’s description and calculate confusion matrix

This article finally applies this Deep Learning Framework to build a model for a specific task. We learned how to create a dataset of hand signals, how to train a neural network on it and how to convert its outputs to commands. We finally connected the full pipeline to an Anafi drone to control it through an USB camera.


Here you can find my project:

https://github.com/Apiquet/DeepLearningFrameworkFromScratch