Keypoints detection and tracking

Python project.

The goal of this project is to implement the keypoint detector SuperFAST (Features from Accelerated Segment Test) and to track keypoints with the Kanade–Lucas–Tomasi feature tracker.

GitHub link: https://github.com/Apiquet/Keypoints_Detection_And_Tracking

Implementation based on the SuperFAST paper which uses the FAST algorithm.

Table of Contents

  1. Keypoints detection
    1. Features from Accelerated Segment Test (FAST)
    2. Improvements of the algorithm
      • Adaptive threshold
      • Grid for homogeneous distribution
      • Non-max suppression
  2. Optical flow
    1. Use reimplemented FAST instead of Good Feature to Track
    2. The Lucas-Kanade method
      1. Principle of the Lucas-Kanade method
      2. Dense optical flow
  3. Conclusion

1) Keypoints detection

The purpose of a keypoint detector is to find features in an image. These features can then be used in many computer vision tasks, such as pose estimation or optical flow. In the next section of this article, Kanade-Lucas-Tomasi feature tracker will be explained and used to track keypoints.

1-1) Features from Accelerated Segment Test (FAST)

FAST algorithm is one of the best traditional feature extractors that has been able to compute this task in real time. Its key points are pixels whose neighboring pixels are different enough that the pixel under consideration is highly distinguishable.

To calculate whether a certain pixel p is a key point, FAST examines 16 pixels (following a circle) around it and checks if at least 12 pixels have a value above or below the value of the pixel p by a threshold:

To minimize the computation, the algorithm first checks pixels 1, 5, 9 and 13 to see if at least 3 of them meet the criterion before checking the others.

The step needed to implement a V0 of this algorithm are:

  1. Loop on each pixel with intensity Pi
    1. Check pixels 1, 5, 9, 13 are above/below Pi +/- threshold
    2. if less than 3 of them meet the criterion: pass to next pixel
    3. Check the other pixels
    4. if at least 12 pixels meet the criterion, P is a keypoint

Implementation:

def detect(img, threshold=15, N=12, step=3):
    """
    Function to detect keypoints on an image

    Args:
        - (np.array) input image
        - (int) threshold to use to validate a pixel
        - (int) min number of neighbor to validate a pixel
        - (int) step to do 1/step pixels
            (if step=1: computation done on every pixel)
    Return:
        - (np.array) vector of detected keypoints
            [Number of keypoints, x, y]
    """
    final_keypoints = []

    # loop on pixels with a step
    for y in range(3, img.shape[1]-3, step):
        for x in range(3, img.shape[0]-3, step):
            neighbors_validated = 0
            pixel_position = np.array([x, y])
            
            # get pixel value
            pixel_value = get_pixel_value(img, pixel_position)
            
            # calculate bounds to validate a neighboring pixel
            lower_bound = pixel_value - threshold
            higher_bound = pixel_value + threshold
            
            for i, (key, value) in enumerate(
                    PIXELS_OF_INTEREST.items()):
                # get a neighboring pixel value
                neighbor_pixel_value = get_pixel_value(
                    img, pixel_position+value)
                # verify criterion
                if neighbor_pixel_value <= lower_bound or\
                        neighbor_pixel_value >= higher_bound:
                    neighbors_validated += 1
                # the first 4 pixels are 1, 5, 9, 13
                # if less than 3 of them are validated
                # invalidate the current pixel as keypoints
                if i == 3 and neighbors_validated < 3:
                    break
            
            if neighbors_validated >= N:
                final_keypoints.append(pixel_position)
    return np.asarray(final_keypoints)

With this first simple version, the key points are detected and only one threshold has to be defined by the user (this implies problems and will be solved in the next section).

As FAST returns points if its neighboring pixels are different, it should not return key points on smooth walls (because all points are similar). It should mainly return points on edges with strong color change (clothes-background, color changes on the dog, leaves, grass, etc.):

As expected, the key points are not on smooth walls (examples with red shapes) but mainly on places where color changes are important (examples with green circles).

1-2) Improvements of the algorithm

1-2-a) Adaptive threshold

With this first simple version, the key points are detected and only one threshold has to be defined by the user. However, this is not a good practice because this threshold will not be suitable for all types of images and will force the user to test several thresholds to get a good one for each sequence. In addition, a sequence of images may require different thresholds depending on where the camera moves. This requires developing new logic to ask the user for a number of desired key points and then dynamically adapt the threshold throughout the sequence.

Another disadvantage of having a static threshold is that the number of keypoints found can be very different along a sequence. Since many algorithms that use keypoints are directly affected by the number of input keypoints, this leads to random performances (FPS), which is certainly not desired.

  • Example with a static threshold: many more keypoints are found in the grass close to the camera
  • Example with an adaptive threshold: A similar number of keypoints is found no matter where the camera looks

Implementation:

def detect_with_adaptive_threshold(
        img, nb_keypoints, N=12, step=3, epsilon=50,
        percentage=0.1, init_threshold=None):
    """
    Function to detect keypoints with adaptive threshold

    Args:
        - (np.array) input image
        - (int) number of wanted keypoints
        - (int) min number of neighbor to validate a pixel
        - (int) step to do 1/step pixels
            (if step=1: computation done on every pixel)
        - (int) epsilon to accept a number of keypoints
        - (float) percentage to change the threshold per iteration
        - (int) value to initialize the threshold (default is 15)
    Return:
        - (np.array) vector of detected keypoints
            [Number of keypoints, x, y]
    """
    # create threshold on first call
    if not hasattr(detect_with_adaptive_threshold, "threshold") or\
            init_threshold is not None:
        detect_with_adaptive_threshold.threshold = 15

    # use detect function to get the keypoints
    keypoints = detect(img, detect_with_adaptive_threshold.threshold,
                       N=N, step=step)
    
    # adapt the threshold in function of the number of keypoints
    if keypoints.shape[0] > nb_keypoints + epsilon:
        change = detect_with_adaptive_threshold.threshold*percentage
        if change < 1:
            change = 1
        else:
            change = math.floor(change)
        detect_with_adaptive_threshold.threshold += change
    elif keypoints.shape[0] < nb_keypoints - epsilon:
        change = detect_with_adaptive_threshold.threshold*percentage
        if change < 1:
            change = 1
        else:
            change = math.floor(change)
        detect_with_adaptive_threshold.threshold -= change

    return keypoints

The threshold adaptation requires a few frames to be optimal. In the following example, the threshold was initialized to 28, resulting in 91 keypoints instead of the desired 500. The threshold was decreased until 500 (+/-50) keypoints were found:

1-2-b) Grid for homogeneous distribution

Another limitation that can be observed in the above illustration is the non-homogeneity of the distribution of key points. This is due to the fact that the threshold is not adapted to the whole image. To solve this problem, a grid can be implemented with an adaptive threshold for each cell. After one run, the algorithm could increase the threshold for cells with too many keypoints and decrease it for cells with too few keypoints.
This will produce a homogeneous distribution of keypoints.

  • Example with a grid of 4 columns and 3 rows before the adaptation of the thresholds:

The distribution of the keypoints is not optimal:

  1. the patch 4 has 28% of the total keypoints
  2. the patches 1, 3, 7, 9 and 12 have 8-10%
  3. the patches 5, 6 and 10 have 6%
  4. the patches 2 and 11 have 4%
  5. The patch 8 has 2%
  • Example of adaptation of the thresholds after few iterations:

With the adaptation of thresholds per cell, the distribution of keypoints is now homogeneous (8 to 9% of the keypoints per cell)

Implementation:

def detect_with_adaptive_threshold_and_grid(
        img, nb_keypoints, N=12, step=5, epsilon=50,
        percentage=0.1, init_thresholds=None, cols=4, rows=3):
    """
    Function to detect keypoints with adaptive threshold on multiple cells

    Args:
        - (np.array) input image
        - (int) number of wanted keypoints
        - (int) min number of neighbor to validate a pixel
        - (int) step to do 1/step pixels
            (if step=1: computation done on every pixel)
        - (int) epsilon to accept a number of keypoints
        - (float) percentage to change the threshold per iteration
        - (int) value to initialize the threshold (default is 15)
        - (int) number of columns
        - (int) number of rows
    Return:
        - (np.array) vector of detected keypoints
            [Number of keypoints, x, y]
    """
    # create threshold on first call
    nb_cells = cols*rows
    epsilon = epsilon/nb_cells
    if not hasattr(detect_with_adaptive_threshold_and_grid, "thresholds"):
        detect_with_adaptive_threshold_and_grid.thresholds = [
            15 for _ in range(nb_cells)]
    if init_thresholds is not None:
        if len(init_thresholds) != cols*rows:
            print("Init thresholds should be a list of size cols*rows")
            return
        detect_with_adaptive_threshold_and_grid.thresholds = init_thresholds

    # calculate number of rows and cols per cell
    nb_cols_per_cell = img.shape[1] // cols
    nb_rows_per_cell = img.shape[0] // rows
    # calculate number of keypoints wanted per cell
    nb_keypoints_per_cell = nb_keypoints // nb_cells
    # divide image by cols*rows cells
    detect_with_adaptive_threshold_and_grid.patches = extract_patches(
        img, nb_rows_per_cell, nb_cols_per_cell)
    
    detect_with_adaptive_threshold_and_grid.nb_keypoints_per_cell = []
    detect_with_adaptive_threshold_and_grid.keypoints_per_cell = []

    for i, patch in enumerate(detect_with_adaptive_threshold_and_grid.patches):
        # use detect function to get the keypoints
        keypoints = detect(
            patch, detect_with_adaptive_threshold_and_grid.thresholds[i],
            N=N, step=step)

        nb_keypoints_in_cell = keypoints.shape[0]
        detect_with_adaptive_threshold_and_grid.nb_keypoints_per_cell.append(
            nb_keypoints_in_cell)
        detect_with_adaptive_threshold_and_grid.keypoints_per_cell.append(
            keypoints)
        # adapt the threshold in function of the number of keypoints
        if nb_keypoints_in_cell > nb_keypoints_per_cell + epsilon:
            change = detect_with_adaptive_threshold_and_grid.thresholds[i]*percentage
            if change < 1:
                change = 1
            else:
                change = math.floor(change)
            detect_with_adaptive_threshold_and_grid.thresholds[i] += change
        elif nb_keypoints_in_cell < nb_keypoints_per_cell - epsilon:
            change = detect_with_adaptive_threshold_and_grid.thresholds[i]*percentage
            if change < 1:
                change = 1
            else:
                change = math.floor(change)
            detect_with_adaptive_threshold_and_grid.thresholds[i] -= change

        # convert patch number into position in the image
        patch_x_pos = i//cols
        patch_y_pos = i-patch_x_pos*cols
        # offset to keypoints to convert patch position to image position
        offset_pos = np.array(
            [patch_x_pos*nb_rows_per_cell, patch_y_pos*nb_cols_per_cell])
        keypoints = np.array([kp+offset_pos for kp in keypoints])
        
        if 'keypoints_per_cell' not in locals():
            keypoints_per_cell = keypoints
        else:
            keypoints_per_cell = np.concatenate([keypoints_per_cell, keypoints])

    return keypoints_per_cell

In the following example, the thresholds have been initialized to 15, then each threshold increases/decreases if its dedicated cell has too many/too few keypoints.

Comparison between: fixed global threshold / grid and adaptive thresholds (with a fixed global threshold, the number of key points and their distribution vary greatly, unlike the example on the right with a grid and adaptive thresholds).

1-2-c) Non-max suppression

Instead of applying the algorithm to all pixels (or with a fixed step), a better solution may be to not compute the algorithm on pixels very close to a key point already found. Indeed, for most computer vision applications, having several key points very close to each other is not efficient.

In some computer vision tasks, this algorithm can be applied to the same image several times with different resolutions. In this case, it is mandatory to have a NMS logic to avoid having the same key point several times.

2) Optical flow

In the following wikipedia article, optical flow history and methods are described. This section will focus on the Lucas–Kanade method.

An example of optical flow is given by OpenCV at the following page. They use Good Feature To Track algorithm for the key points detection, and Lucas-Kanade method for the tracking. In the code below (that can be found on the OpenCV page), GFTT can be replaced by FAST algorithm implemented previously (line 15).

cap = cv2.VideoCapture('slow.flv')

# params for ShiTomasi corner detection
feature_params = dict( maxCorners = 100, qualityLevel = 0.3, minDistance = 7, blockSize = 7 )

# Parameters for lucas kanade optical flow
lk_params = dict( winSize  = (15,15), maxLevel = 2, criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)

### Key points detector to replace ###
######################################
p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)

while(1):
    ret,frame = cap.read()
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    ### Optical flow calculation ###
    ###########################################
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

    # Select good points
    good_new = p1[st==1]
    good_old = p0[st==1]

    # draw the tracks
    for i,(new,old) in enumerate(zip(good_new,good_old)):
        a,b = new.ravel()
        c,d = old.ravel()
        mask = cv2.line(mask, (a,b),(c,d), color[i].tolist(), 2)
        frame = cv2.circle(frame,(a,b),5,color[i].tolist(),-1)
    img = cv2.add(frame,mask)

    cv2.imshow('frame',img)
    k = cv2.waitKey(30) & 0xff
    if k == 27:
        break

    # Now update the previous frame and previous points
    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1,1,2)

2-1) Use reimplemented FAST instead of Good Feature to Track

The first step is to replace the use of Good Feature To Track (line 15) in the OpenCV code above by the FAST algorithm implemented in the previous section. As the code is almost the same, it will not be displayed here but it can be found in my github page.

Once replaced, the same behavior is obtain with both key points detectors:

2-2) The Lucas-Kanade method

2-2-a) Principle of the Lucas-Kanade method

Complete descriptions of the method is available online, for instance on wikipedia. The main points to remember are the following:

  • the displacement between the two frames should be small
  • the displacement should be approximately constant within a neighborhood
  • the objectif is to find the matrix V which handles the velocity for each pixel. This matrix must satisfy:

where q_n are the pixels inside the window, and I_n are the partial derivatives of the image I with respect to position x, y and time, evaluated at the point q_n at current time.

These equations can be written in matrix form Av=b, where:

This system has more equations than unknowns and thus it is usually over-determined. The Lucas–Kanade method obtains a compromise solution by the least squares principle. Namely, it solves the system:

2-2-b) Dense optical flow

The velocity matrix can be calculated by opencv with the following function:

cv2.calcOpticalFlowFarneback(old_gray, frame_gray, *params)

After a few post-processing, it can be visualized on the image itself, red color represents high velocity and blue one a low velocity:

As expected, the fast motion are detected in red and the keypoints are also tracked along the sequence.

Conclusion

Key point detection and optical flow are good tools in many cases. We can cite as examples: combined with an object detection algorithm, they could help to do multi-object tracking; with a depth map and camera pose, keypoints could be converted to real-world positions to obtain object velocity.


You can find the project at the following link:

https://github.com/Apiquet/Keypoints_Detection_And_Tracking