Opencv Hand Tracking Github For Mac
In this post, we will learn about a Deep Learning based object tracking algorithm called GOTURN. The original implementation of GOTURN is in Caffe, but it has been ported to the OpenCV Tracking API and we will use this API to demonstrate GOTURN in C and Python.
What is Object Tracking? The goal of object tracking is to keep track of an object in a video sequence.
A tracking algorithm is initialized with a frame of a video sequence and a bounding box to indicate the location of the object we are interested in tracking. The tracking algorithm outputs a bounding box for all subsequent frames.
For more details on object tracking, check out our post on the. What is GOTURN?
GOTURN, short for Generic Object Tracking Using Regression Networks, is a Deep Learning based tracking algorithm. The video below explains GOTURN and shows a few results. Most tracking algorithms are trained in an online manner. In other words, the tracking algorithm learns the appearance of the object it is tracking at runtime. Therefore, many real-time trackers rely on online learning algorithms that are typically much faster than a Deep Learning based solution. GOTURN changed the way we apply Deep Learning to the problem of tracking by learning the motion of an object in an offline manner.
The GOTURN model is trained on thousands of video sequences and does not need to perform any learning at runtime. How does GOTURN work?
GOTURN was introduced by David Held, Sebastian Thrun, Silvio Savarese in their paper titled. Figure 1: GOTURN takes two cropped frames as input and outputs the bounding box around the object in the second frame. As shown in Figure 1, GOTURN is trained using a pair of cropped frames from thousands of videos.
In the first frame (also referred to as the previous frame), the location of the object is known, and the frame is cropped to two times the size of the bounding box around the object. The object in the first cropped frame is always centered. The location of the object in the second frame (also referred to as the current frame) needs to be predicted. The bounding box used to crop the first frame is also used to crop the second frame. Because the object might have moved, the object is not centered in the second frame. A Convolutional Neural Network (CNN) is trained to predict the location of the bounding box in the second frame.
Note for Beginners If you are an absolute beginner, think of the CNN as a black box with many knobs that can be set to different values. When the settings on the knobs are right, the CNN produces the right bounding box. Initially, the settings of the knobs are random. At the time of training, we show the neural network pairs of frames for which we known the location of the object (i.e. Bounding boxes). If the CNN makes a mistake, the knobs are changed in a principled way using an algorithm called back propagation so that it gradaully stops making as many mistakes. When changing the knob settings stops improving the results anymore, we say the model is trained.
GOTURN Architecture In the previous section, we just showed the CNN as a black box. Now, let’s see what is inside the box. Figure 2: GOTURN Architecture Figure 2 shows the architecture of GOTURN. As mentioned before, it takes two cropped frame as input. Notice, the previous frame, shown at the bottom, is centered and our goal is the find the bounding box for the currrent frame shown on the top. Both frames pass through a bank of convolutional layers. The layers are simply the first five convolutional layers of the CaffeNet architecture.
The outputs of these convolutional layers (i.e. The pool5 features) are concatenated into a single vector of length 4096. This vector is input to 3 fully connected layers. The last fully connected layer is finally connected to the output layer containing 4 nodes representing the top and bottom points of the bounding box.
Note for Beginners Whenever you see a bank of convolutional layers and are confused what it means, think of them as filters that change the original image such that important information for solving the problem at hand is retained and unimportant information in the image is thrown away. The multi-dimensional image (tensor) obtained at the end of the convolutional filters is converted to a long vector of numbers by simply unrolling the tensor. This vector serves as input to a few fully connected layers and finally the output layer. The fully connected layers can be thought as the learning algorithm that is using the useful information extracted from the images by the convolutional layer to solve the classification or regression problem at hand. How to use GOTURN in OpenCV The authors have released a. You can try it using Caffe, but in this tutorial, we will use OpenCV’s tracking API. Here are the steps you need to follow.
Download GOTURN model files: You can download the GOTURN caffemodel and prototxt files located at. The model file is split into 4 files which will need to be merged before unzipping (see step 2). Alternatively, you can use this to download the model. Please keep in mind it may take a long time to download the file because it is about 370 MB!
Opencv Hand Tracking Github For Mac Mac
If you use this method, skip step 2. Merge zip files: The GOTURN model file shared via GitHub is split into a 4 different files because the model file is large. These files need to be combined before unzipping.
Opencv Image Tracking
OSX and Linux users can do so using the following commands. # MAC and Linux Users cat goturn.caffemodel.zip.
Opencv Hand Tracking Github For Mac
goturn.caffemodel.zip unzip goturn.caffemodel.zip On Windows, the files can be merged using. Move model files to current directory: GOTURN implementation in OpenCV expects the model file to be present in the directory from where the executable is executed. So, move it to the current directory.