Getting up and running with the Kinect in Ubuntu 12.04

I’m currently experimenting a bit with the Kinect depth camera, which I intend to use for a prototype. I’ve played around with the device before on Windows using the Microsoft Kinect SDK. Setting up libfreenect on Windows is actually quite a pain, which was the reason why I started with the Microsoft SDK. As I wanted to code in Python, I’ve had a look at PyKinect and the Python Tools for Visual Studio (see also this PyCon US 2012 talk about building a Kinect game with Python). Unfortunately, the PyKinect samples seem to be outdated, and don’t work with the latest version of the Kinect SDK (version 1.0, released in February).

So I decided to see how far I would get with the Kinect and libfreenect on Ubuntu (12.04). Step 1: install the libfreenect demos to test if it works (this will also install libfreenect and its dependencies).

$ sudo apt-get install libfreenect-demos

So far so good. Unfortunately, the demo application wouldn’t start:

$ freenect-glview 
Kinect camera test
Number of devices found: 1
Could not claim interface on camera: -6
Could not open device

Strange.. I did another check to see if my user was in the plugdev group (which
was the case):

$ groups jo
jo : jo adm cdrom sudo dip plugdev lpadmin sambashare

Then I noticed that the Kinect’s LED kept blinking continously, which usually means there’s some sort of connection problem. It was correctly recognized, though:

$ lsusb
...
Bus 002 Device 007: ID 045e:02b0 Microsoft Corp. Xbox NUI Motor
Bus 002 Device 008: ID 045e:02ad Microsoft Corp. Xbox NUI Audio
Bus 002 Device 009: ID 045e:02ae Microsoft Corp. Xbox NUI Camera

After a quick web search for the specific libfreenect-glview error message, I learned that recent versions of the Linux kernel prevent libfreenect from claiming the Kinect as they now include a driver to support the device as a regular webcam (see kinect.c), which is actually quite cool. This also means there should be a specific video device for the Kinect (/dev/video1 in my case).

The kernel modules are indeed loaded:

$ lsmod | grep -i gspca
gspca_kinect           12936  0
gspca_main             28366  1 gspca_kinect
videodev               98259  2 gspca_main,uvcvideo 

Let’s try playing the camera stream using GStreamer and Video4Linux:

$ gst-launch-0.10 v4l2src device=/dev/video1 ! video/x-raw-yuv ! ffmpegcolorspace ! xvimagesink

That seems to work!

The Kinect actually has two image sensors: an RGB camera and a depth sensor, which consists of an infrared laser projector and a monochrome CMOS sensor. A depth map is created by projecting structured infrared light and capturing the resulting image from the monochrome sensor. As I wasn’t sure how to grab an image from the monochrome sensor, I contacted the author of the kernel module (Antonio Ospite). He told me it’s possible to get a monochrome image by specifying an image size of of 640×488 pixels (instead of the usual 640×480). Note that the kernel module currently only supports video streams from the monochrome sensor as unprocessed output.

If we pass that specific width and height to GStreamer, we get this:

It’s also possible to get the full-size (1280×1024) monochrome image. However, most webcam apps will just show the video stream from the RGB sensor for that resolution as you can’t specify that you want the specific Y10B format. To do that, you can use a separate program like qv4l2, which can be installed as follows:

$ sudo apt-get install qv4l2

To get the full-size monochrome image, run qv4l2, open the /dev/video1 device in raw mode (File > Open raw device), select 1280×1024 for the frame size, and “Y10B – Y10B” for the capture format, and click the red capture button:

OK, back to running the libfreenect demo. Someone over at superuser.com suggested to remove the modules (so that the user-mode libfreenect driver could
take over) and run libfreenect-glview again.

And indeed, after removing both gspca modules, the freenect-glview demo did work:

$ sudo modprobe -r gspca_kinect 
$ sudo modprobe -r gspca_main
$ freenect-glview 
Kinect camera test
Number of devices found: 1
GL thread
Write Reg 0x0006 <= 0x00
Write Reg 0x0012 <= 0x03
Write Reg 0x0013 <= 0x01
Write Reg 0x0014 <= 0x1e
Write Reg 0x0006 <= 0x02
Write Reg 0x0005 <= 0x00
Write Reg 0x000c <= 0x00
Write Reg 0x000d <= 0x01
Write Reg 0x000e <= 0x1e
Write Reg 0x0005 <= 0x01
...

The left side of the window shows a colored depth image, while the right side shows the RGB image from the camera. Of course, it would be quite cumbersome to remove the modules again every time you want to use libfreenect. One option is to blacklist the gspca module as follows:

$ echo "blacklist gspca_kinect" >> /etc/modprobe.d/blacklist.conf

Now that we've got that sorted out, I'll try out the libfreenect Python wrapper (or perhaps SimpleCV).

Update: Antonio Ospite pointed out in the comments that recent versions of libfreenect (0.1.2) can automatically detach the kernel driver. There's a PPA available by Florian Echtler with updated libfreenect packages for Ubuntu 12.04.

Python introduction movie

I just wanted to share this movie about the Python programming language. Although it’s a bit dated, I feel the movie still provides a nice overview of the benefits of Python, and explains why the language is good for educational purposes. It can be found at python.org.

You’ll see interviews with Guido van Rossum (creator of the Python language), Eric S. Raymond (author of “The Cathedral and the Bazaar” and the excellent article “Why Python?”), and Tim Peters (known a.o. from The Zen of Python).


The film was originally called “A Python Love Story”, and was was made in 2001 by a class of high school students in Yorktown High School (Arlington, Virginia) together with their Computer Science teacher Jeff Elkner. Elkner is known for translating Allen Downey‘s “How to Think Like a Computer Scientist” into Python. The original (and longer) version of the video is available from the Python Bibliotheca.

Facade update: code available and information about related student projects

My previous posts on face detection with Python and on changing the user’s presence when he/she is detected by the webcam are two of the most popular articles on my blog. There were quite a few comments that asked about the code for this project.

Although I already created a Facade project page a while ago and made the code available through Bazaar, I never announced it here. To make it easier to try the script out, I also uploaded an archive of the code. This version includes a simple Python script that works on Windows (simple_winclient.py) that performs face detection.

I noticed today that Jinwei Gu, a PhD student at the University of Columbia and teaching assistant for the course COMS W4737 (E6737) Biometrics, included my blog post in a list of links to resources for doing face detection.

Last year, two students improved the presence detection in Facade during their Bachelor’s theses.

Bram Bonné worked on the network aspect, to allow users to have their presence detected on multiple computers. When users would move from one computer to another (or to a mobile device), their messages would automatically be routed to their current machine.

Here are a few screenshots from Bram’s thesis:

Multi-device presence

An everywhere messaging service that takes into account the user&#039;s presence.

Bram also created a demo video of his system:

You need to a flashplayer enabled browser to view this YouTube video

Kristof Bamps worked on supporting a more diverse set of sensors (e.g. presence of a Bluetooth phone, keyboard and mouse input). He used a decision tree to model all these sensors, and determine the user’s correct presence based on the collected information.

Two screenshots from Kristof’s thesis:

Showing the user&#039;s presence state

A history of presence changes

Kristof’s demo video:

You need to a flashplayer enabled browser to view this YouTube video

I have unfortunately not yet found the time yet to merge Kristof’s and Bram’s code into the main Facade branch, but this should not be too big of a problem.

Small update on face detection post

Just a quick update on my previous post. I hooked my code up to a D-Bus daemon, and used it to detect if I was sitting behind my desk or not.

I had to position the camera a bit lower, because it would sometimes not recognize my face when I was looking at the bottom of my screen. Having a dual-monitor setup doesn’t help either, since you should ideally position the webcam in the middle of the two displays to have good coverage. I positioned the webcam in the middle of the two displays at the bottom and tilted it upwards, which gives good results:

I guesstimated that if the system couldn’t detect a face for a period of 10 seconds, I would be away from my desk (or at least not paying attention to the screen). I played around a bit with pygtk and notify-python to create a status icon and show notification bubbles whenever my status changed. Here’s what happens when it detects that I’m away from my desk (notice the bubble in the upper right corner):

When I return, the system will change its status icon accordingly and notify me again:

OpenCV is fun! :-)

Fun with Python, OpenCV and face detection

I had some fun with Gary Bishop’s OpenCV Python wrapper this morning. I wanted to try out OpenCV for detecting faces using a web cam. This could be used for instance to see if someone is sitting behind his desk or not. I used Gary’s Python wrapper since I didn’t want to code in C++.

I didn’t know where to start, so I searched for existing OpenCV face detection examples. I found a blog post by Nirav Patel explaining how to use OpenCV’s official Python bindings to perform face detection. Nirav will be working on a webcam module for Pygame for the Google Summer of Code.

I managed to rewrite Nirav’s example to get it working with CVtypes:

Here’s the code. Although it’s just a quick and dirty hack, it might be useful to others. It requires CVtypes and OpenCV, and was tested on Ubuntu Hardy with a Logitech QuickCam Communicate Deluxe webcam. You will need Nirav’s Haar cascade file as well.

import sys
from CVtypes import cv
 
def detect(image):
    image_size = cv.GetSize(image)
 
    # create grayscale version
    grayscale = cv.CreateImage(image_size, 8, 1)
    cv.CvtColor(image, grayscale, cv.BGR2GRAY)
 
    # create storage
    storage = cv.CreateMemStorage(0)
    cv.ClearMemStorage(storage)
 
    # equalize histogram
    cv.EqualizeHist(grayscale, grayscale)
 
    # detect objects
    cascade = cv.LoadHaarClassifierCascade('haarcascade_frontalface_alt.xml', cv.Size(1,1))
    faces = cv.HaarDetectObjects(grayscale, cascade, storage, 1.2, 2, cv.HAAR_DO_CANNY_PRUNING, cv.Size(50, 50))
 
    if faces:
        print 'face detected!'
        for i in faces:
            cv.Rectangle(image, cv.Point( int(i.x), int(i.y)),
                         cv.Point(int(i.x + i.width), int(i.y + i.height)),
                         cv.RGB(0, 255, 0), 3, 8, 0)
 
if __name__ == "__main__":
    print "OpenCV version: %s (%d, %d, %d)" % (cv.VERSION,
                                               cv.MAJOR_VERSION,
                                               cv.MINOR_VERSION,
                                               cv.SUBMINOR_VERSION)
 
    print "Press ESC to exit ..."
 
    # create windows
    cv.NamedWindow('Camera', cv.WINDOW_AUTOSIZE)
 
    # create capture device
    device = 0 # assume we want first device
    capture = cv.CreateCameraCapture(0)
    cv.SetCaptureProperty(capture, cv.CAP_PROP_FRAME_WIDTH, 640)
    cv.SetCaptureProperty(capture, cv.CAP_PROP_FRAME_HEIGHT, 480)    
 
    # check if capture device is OK
    if not capture:
        print "Error opening capture device"
        sys.exit(1)
 
    while 1:
        # do forever
 
        # capture the current frame
        frame = cv.QueryFrame(capture)
        if frame is None:
            break
 
        # mirror
        cv.Flip(frame, None, 1)
 
        # face detection
        detect(frame)
 
        # display webcam image
        cv.ShowImage('Camera', frame)
 
        # handle events
        k = cv.WaitKey(10)
 
        if k == 0x1b: # ESC
            print 'ESC pressed. Exiting ...'
            break

A known problem is that pressing the escape key doesn’t quit the program. Might be something wrong in my use of the cv.WaitKey function. Meanwhile you can just use Ctrl+C. All in all, the face detection works pretty well. It doesn’t recognize multiple faces yet, but that might be due to the training data. It would be interesting to experiment with OpenCV’s support for eye tracking in the future.

Update: the script does recognize multiple faces in a frame. Yesterday when Alex stood at my desk, it recognized his face as well. I think it didn’t work before because I used cv.Size(100, 100) for the last parameter of cv.HaarDetectObjects instead of cv.Size(50, 50). This parameter indicates the minimum face size (in pixels). When people were standing around my desk, they were usually farther away from the camera. Their face was then probably smaller than 100×100 pixels.

Just a quick note on ctypes. I remember when I created PydgetRFID that I tried to use libphidgets’ SWIG-generated Python bindings, but couldn’t get them to work properly. I had read about ctypes, and decided to use it for creating my own wrapper around libphidgets. Within a few hours I had a working prototype. When you’re struggling with SWIG-generated Python bindings, or have some C library without bindings that you would like to use, give ctypes a try. Gary Bishop wrote about a couple of interesting ctypes tricks to make the process easier.