MIT Master's Thesis

Autonomous Navigation and Tracking of Dynamic Surface Targets On-board a Computationally Impoverished Aerial Vehicle

 

Daniela Rus – Thesis Supervisor
John Leonard – Mechanical Engineering Faculty Reader
Peter Corke – Computer Vision/Control Collaborator

In this thesis, we describe the development of an independent and on-board visual servoing system which allows a computationally impoverished aerial vehicle to autonomously identify and track a moving surface target. Our image segmentation and target identification algorithms were developed with the specific task of monitoring whales at sea but can easily be adapted for other targets. Observing whales is important for many marine biology tasks and is currently done manually from the shore or on boats. We also present extensive hardware experiments which demonstrate the capabilities of our algorithms which enable a flying vehicle to track a moving target. We first demonstrate tracking capabilities utilizing a motion capture system for state feedback. We then extend this work to autonomous tracking using an on-board camera and processing. The system relies on an Extended Kalman Filter for estimated state feedback.


Observing Southern Right Whales

Peninsula Valdez, Argentina
August 20-25 2009

Observing whales is important for many marine biology tasks including taking census, determining family lineage, and general behavior observations. Currently, whales are observed manually using binoculars and cameras from the shore or from boats, and notes are made using pencil and paper. The process is error prone, non-quantitative and very labor intensive. Human-operated planes and helicopters are also used, but the data gathered this way is limited. Planes fly at high altitude, can not hover, and the data is limited in duration and precision.Helicopters can hover and fly closer to the sea surface, but they are noisy and effect the behavior of the whales.

We used a small hovering unmanned aerial vehicles known as the Ascending Technologies Falcon 8 robot to assist in the data collection of whales. The robot is silent enough to fly close above the water’s surface and not disturb the whales. We had several successful missions of approximately fifteen minutes each, during which the robot was piloted over groups of whales and video was recorded.

[embedyt] http://www.youtube.com/watch?v=C6MGDQuZWx0%5B/embedyt%5D

Motivated by this data, our goal is to create an autonomous flying robot that relies on vision-based position estimates and only on-board processing to navigate. The robot is light weight and computationally impoverished such as an Ascending Technologies Hummingbird or Pelican platform. Our goal robot system differs from existing systems that use additional sensors such as GPS or laser scanners or perform off-board processing of the video stream. We describe an autonomous aerial robot system that uses a suite of fast computer vision algorithms to identify and track targets in natural environments. The algorithms are designed for targets that moves against a fairly uniform background such as a whale moving on the surface of the sea or an animal moving on a meadow.


Target Identification and Tracking Using Hue-Saturation Histograms
Wil Selby
Peter Corke
Daniela Rus

MIT Distributed Robotics Lab
Fall 2009

Our object recognition algorithm characterizes the target in the Hue (H) and Saturation (S) plane and then identifies the target in subsequent frames based on this characterization. We require a solution that is robust but also computationally efficient enough to run on our small on-board computer at a high frequency. Techniques such as graph cuts, and MSER are too expensive for this purpose.

Object Representation
The model used to represent the target is a two-dimensional histogram. First, an initial frame containing the target is captured and converted from RGB color space to HSV color space. A two-dimensional histogram is computed which describes the probability distribution of the HS pair values within the image. Using the threshold values selected by the user, the two-dimensional histogram M is modified so that the values of bins outside the threshold values are set to zero. The resulting histogram, Mt represents the target that the user has identified as a discrete probability distribution.

Object Identification
After the user defines the threshold values for target identification, the model histogram Mt is used to find all areas in future frames which have a high probability of being the target. For each consecutive frame, the image is converted to HSV color space. The H and S values of each pixel are back-projected through the histogram to form a whale probability image. A greyscale morphological opening operation is used to remove small false positive regions in the back-projection image. Next, a greyscale morphological closing operation is used to join together positive regions which are close in proximity and to fill in small gaps.

After the back-projection image has been filtered, contiguous groups of non zero-value pixels are identified as contours. We assume the intended target is large in the image. We eliminate contours with small dimensions based on a user defined minimum perimeter value. Processing continues until there are no contours remaining or until a user defined maximum number of contours has been reached. The center of mass and an axis-aligned bounding box around the contour are plotted to identify the target to the user.

Results
The resulting whale tracking algorithm performs object recognition using a pixel-level classifier and domain knowledge. We find that image hue and saturation values are suitable invariants to segment whales from other elements in the scene and can be used to estimate the centroid of the moving target. We describe the algorithm and present extensive results over 7,300 frames representing Southern Right whales and Blue whales. The data was collected from a variety of angles under varying lighting conditions. Our results show 98.99% recall for the Southern Right whale footage we collected in Argentina.


Autonomous Dynamic Object Tracking With Motion Capture State Feedback
MIT Distributed Robotics Lab
Spring 2011

In this video we present an autonomous on-board visual navigation and tracking system for an Ascending Technologies Hummingbird quadrotor vehicle to support the whale tracking application. Due to the limited payload of the robot, we are restricted to a computationally impoverished SBC such as a Fit-PC2. The vision system was run on the vehicle using a 2.0 GHz Intel Atom processor (Fit-PC2) with a Point Grey Firefly MV USB camera. The camera had a resolution of 640×480 pixels which was down sampled to 320×240 pixels to reduce computational cost.The full system combined for a total payload of 535 g, well above the recommended maximum payload of 200 g for this platform, but our experiments show that the system remains maneuverable.

The target for the robot tracking experiments was a 0.21×0.28 m blue clipboard mounted onto an iRobot iCreate. The iCreate was programmed to follow a specific trajectory at a constant 0.025 m/s and was also tracked by the motion capture system.The quadrotor flew at a desired altitude of 1.35 m for each trial.

This experiment utilized the motion capture system to sense the quadrotor’s pose and the vision algorithm output to determine the translational error to the target.

The controller module utilized four independent PID controllers to compute the roll, pitch, yaw and thrust commands. The on-board Autopilot software computed the individual motor commands. The quadrotor control module received a global pose estimate from the motion capture system as well as the attitude compensated estimate of the target’s position from the vision system. The vision system output was used to create a desired trajectory from the latest quadrotor position to the estimated target location. Once a new vision estimate was received, the process was repeated and the desired trajectory extended.

The computer vision system output error estimates at 10 Hz, the motion capture system provided pose estimates at 110 Hz, and the controller computed commands at 40 Hz.

The data was computed over ten consecutive successful trials. The average RMSE was approximately 0.067 m in the x axis and 0.042 m in the y axis.


Autonomous Dynamic Object Tracking Without External Localization
MIT Distributed Robotics Lab
Spring 2011

In this video we present an autonomous on-board visual navigation and tracking system for an Ascending Technologies Hummingbird quadrotor vehicle to support the whale tracking application independent of external localization. Due to the limited payload of the robot, we are restricted to a computationally impoverished SBC such as a Fit-PC2. The vision system was run on the vehicle using a 2.0 GHz Intel Atom processor (Fit-PC2) with a Point Grey Firefly MV USB camera. The camera had a resolution of 640×480 pixels which was down sampled to 320×240 pixels to reduce computational cost.The full system combined for a total payload of 535 g, well above the recommended maximum payload of 200 g for this platform, but our experiments show that the system remains maneuverable.

The target for the robot tracking experiments was a 0.21×0.28 m blue clipboard mounted onto an iRobot iCreate. The iCreate was programmed to follow a specific trajectory at a constant 0.025 m/s and was also tracked by the motion capture system.The quadrotor flew at a desired altitude of 1.35 m for each trial.

This second experiment removed external localization and relied entirely on visual feedback. It utilized an Extended Kalman Filter (EKF) to estimate the pose of the quadrotor. This estimated pose was sent to the control module which computed commands to maneuver the quadrotor to the center of the target. The EKF was adapted extensively from work done by Abe Bacharach and implemented using the KFilter library. This filter combined position estimates from the vision system algorithms as well as attitude and acceleration information from the IMU. While the IMU readings were calculated at a frequency of 30 Hz, the vision system module operated at 10 Hz. The filter had to handle these asynchronous measurements and the inherent latencies in these measurements. The filter output position and attitude estimates at 110 Hz.

Wil Selby MIT Thesis Arducopter System Overview

For a sample trial, the EKF estimates had a RMSE of 0.107 m, a velocity RMSE of 0.037 m/s, and an acceleration RMSE of 0.121 m/s^2 compared to the ground truth captured by the motion capture system.

The data was computed over ten consecutive successful trials. The average RMSE was approximately 0.068 m in the x axis and 0.095 m in the y axis.

While the performance is slightly more varied and less accurate than the tracking with motion capture state feedback, the performance is still acceptable. There is also an inherent delay using the filter and for our system, this was around 0.06 seconds. Additionally, the Pelican was used to achieve tracking target speeds of up to 0.25 m/s. At this speed, the experiments resulted in an RMSE of 0.11 m in the x axis and 0.09 m in the y axis. This error was slightly more than the Hummingbird experiments, but the increased speed demonstrated the stability of the control system.


Autonomous Dynamic Object Tracking Using GPS Feedback
MIT Distributed Robotics Lab
Summer 2011

In this video we present an autonomous on-board visual navigation and tracking system for an Ascending Technologies Pelican quadrotor vehicle to support the whale tracking application utilizing the on-board GPS receiver. Due to the limited payload of the robot, we are restricted to a computationally impoverished SBC such as a Fit-PC2. The vision system was run on the vehicle using a 2.0 GHz Intel Atom processor (Fit-PC2) with a Point Grey Firefly MV USB camera. The camera had a resolution of 640×480 pixels which was down sampled to 320×240 pixels to reduce computational cost.The full system combined for a total payload of 535 g, near the recommended maximum payload of 500 g for this platform, but our experiments show that the system remains maneuverable.

These experiments took place on the MIT campus in a grassy area. The target for these experiments was a red wagon that was initially static, but was then manually maneuvered by a human. The desired height for these experiments was around 7.5 m.

These experiments integrated our on-board object identification and tracking experiments with the GPS navigation software provided on the Ascending Technologies Pelican platform. The on-board Autopilot software receives GPS position estimates and fuses these estimates with on-board IMU measurements and height estimates from a pressure sensor. While the Autopilot software allows for direct GPS coordinates as inputs, this technique was not used. Instead, the error vector output from the vision system was used to change the desired relative GPS set point for the on-board GPS navigation system.

While no data was collected for these initial experiments, the video shows the output of the vision system as the wagon is moved. From this video, one can see the quadrotor tracking the moving target. This tracking is heavily dependent on the quality of the GPS signal and it is evident in the video that the quadrotor operates within a fairly large GPS tolerance.

Collaborations

Autonomous Modular Optical Underwater Robot (AMOUR)

Marek Doniec
Wil Selby
Iuliu Vasilescu
Carrick Detweiler

MIT Distributed Robotics Lab
September 2010

This research uses a Hue and Saturation based image segmentation algorithm to identify and track the colored pallet on the pool bottom. The location of the pallet is used for feedback to position our underwater robot called the Autonomous Modular Optical Underwater Robot (AMOUR).


Cookie Baking PR2

Mario Bollini
Wil Selby

MIT Distributed Robotics Lab
May 2011

Mario and I implemented the target identification and tracking algorithm in the PR2 made by Willow Garage. This code utilized the OpenCV computer vision library and was implemented using ROS. Mario’s research focused on enabling the PR2 to bake cookies autonomously. This involved grasping bowls of ingredients, manipulating the bowls, stirring the ingredients, pouring the dough into a cooking tray, and inserting and removing the tray from the over. The computer visions algorithm was used to identify ingredients based on the color of the bowl. While the final version of this project used the on-board laser scanner to identify ingredients by bowl size, the computer vision algorithm was implemented and available for use.