Just a quick update on my previous post. I hooked my code up to a D-Bus daemon, and used it to detect if I was sitting behind my desk or not.
I had to position the camera a bit lower, because it would sometimes not recognize my face when I was looking at the bottom of my screen. Having a dual-monitor setup doesn’t help either, since you should ideally position the webcam in the middle of the two displays to have good coverage. I positioned the webcam in the middle of the two displays at the bottom and tilted it upwards, which gives good results:
I guesstimated that if the system couldn’t detect a face for a period of 10 seconds, I would be away from my desk (or at least not paying attention to the screen). I played around a bit with pygtk and notify-python to create a status icon and show notification bubbles whenever my status changed. Here’s what happens when it detects that I’m away from my desk (notice the bubble in the upper right corner):
When I return, the system will change its status icon accordingly and notify me again:
I had some fun with Gary Bishop’s OpenCV Python wrapper this morning. I wanted to try out OpenCV for detecting faces using a web cam. This could be used for instance to see if someone is sitting behind his desk or not. I used Gary’s Python wrapper since I didn’t want to code in C++.
I managed to rewrite Nirav’s example to get it working with CVtypes:
Here’s the code. Although it’s just a quick and dirty hack, it might be useful to others. It requires CVtypes and OpenCV, and was tested on Ubuntu Hardy with a Logitech QuickCam Communicate Deluxe webcam. You will need Nirav’s Haar cascade file as well.
A known problem is that pressing the escape key doesn’t quit the program. Might be something wrong in my use of the cv.WaitKey function. Meanwhile you can just use Ctrl+C. All in all, the face detection works pretty well. It doesn’t recognize multiple faces yet, but that might be due to the training data. It would be interesting to experiment with OpenCV’s support for eye tracking in the future.
Update: the script does recognize multiple faces in a frame. Yesterday when Alex stood at my desk, it recognized his face as well. I think it didn’t work before because I used cv.Size(100, 100) for the last parameter of cv.HaarDetectObjects instead of cv.Size(50, 50). This parameter indicates the minimum face size (in pixels). When people were standing around my desk, they were usually farther away from the camera. Their face was then probably smaller than 100×100 pixels.
Just a quick note on ctypes. I remember when I created PydgetRFID that I tried to use libphidgets’ SWIG-generated Python bindings, but couldn’t get them to work properly. I had read about ctypes, and decided to use it for creating my own wrapper around libphidgets. Within a few hours I had a working prototype. When you’re struggling with SWIG-generated Python bindings, or have some C library without bindings that you would like to use, give ctypes a try. Gary Bishop wrote about a couple of interesting ctypes tricks to make the process easier.
Just a quick update to my previous post. I can imagine that my discussion of the advantages of Smalltalk might be a bit abstract for people who never used it.
So here’s a short demo video of a Solitaire game running in a Smalltalk environment (via David Buck). It clearly illustrates features such as full introspection (e.g. by using the object browser) and live “fix-and-continue” debugging:
I found another video showing live code updates in Smalltalk while invoking native libraries in the background (more specifically, OpenGL):
I spent some time last weekend looking into Smalltalk again. The first time I did this was somewhere around 2004, when I played around with Ruby and discovered that it was strongly influenced by Smalltalk. Back then I watched an old video by Dan Ingalls on object-oriented programming which finally made me fully understand the essence of OOP: it’s all about messaging
In my personal opinion, this video (or at least the message that Dan tries to communicate) should be better integrated in OOP courses at universities. Another invaluable resource for grasping these ideas is Design Principles Behind Smalltalk, again by Dan Ingalls. Of course, it’s difficult to understand what OOP is about if you have to learn it through a weak implementation. We learned the basics of OOP in C++ for example, which would be blasphemy to Alan Kay He once saidActually I made up the term object-oriented, and I can tell you I did not have C++ in mind. Here’s his definition of OOP:
OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things. It can be done in Smalltalk and in LISP. There are possibly other systems in which this is possible, but I’m not aware of them.
Before I looked into Smalltalk, to my understanding objects just contained a bunch of methods or functions that had access to the object’s context. I did not really grasp the idea that objects just respond to messages (or method calls in my definition). The real difference about this is that in Smalltalk messages are dynamically dispatched at runtime. A method is the function or subroutine that is invoked in response to the sending of a message, which will be matched to the message name (or selector) at runtime. In contrast, method calls in C++, Java and C# are statically bound at compile-time. There is thus a distinction between the semantics (or message) and implementation strategy (or method) in Smalltalk. Decoupling these allows for more flexibility, such as objects that cache all incoming messages until their database connection is fully set up, after which they replay these messages, or objects that forward messages to other objects (which might have even been passed in at runtime). This is of one of the aspects of extreme late binding in Alan Kay’s definition of OOP.
It’s exactly this run-time lookup of methods that enables effortless polymorphism. As explained in the video, at some point the intermediate factorial result will become an instance of LargeInteger, while in previous iterations it was an instance of SmallInteger. The multiplication message (*) is sent to this object, after which the correct method in the class LargeInteger is looked up for handling the message, allowing the existing code to continue to work. Java, C# and C++ have all inherited this feature (although C++ requires explicitly declaring methods as virtual for this to work, due to efficiency reasons). Smalltalk can even realize polymorphism without inheritance (also known as duck typing), although this is not shown in this video. Smalltalk has implicit interfaces: an object’s interface is the messages it responds to. If two objects both respond to a certain message, they are interchangeable (even at runtime). Traditional languages such as Java or C++ only support inheritance-based polymorphism (although something similar to duck typing can be achieved with C++ templates). Here’s the explanation by Dan Ingalls:
Polymorphism: A program should specify only the behavior of objects, not their representation.
A conventional statement of this principle is that a program should never declare that a given object is a SmallInteger or a LargeInteger, but only that it responds to integer protocol. Such generic description is crucial to models of the real world. Consider an automobile traffic simulation. Many procedures in such a system will refer to the various vehicles involved. Suppose one wished to add, say, a street sweeper. Substantial amounts of computation (in the form of recompiling) and possible errors would be involved in making this simple extension if the code depended on the objects it manipulates. The message interface establishes an ideal framework for such an extension. Provided that street sweepers support the same protocol as all other vehicles, no changes are needed to include them in the simulation:
More details on the differences between Smalltalk and current OOP languages are explained in Smalltalk: Getting The Message. I believe that understanding the original philosophy behind OOP helps you be a better object-oriented programmer in any language. Ramon Leon discusses the common mistake of magic objects which is an interesting read.
But let’s get to the point of why I started looking into Smalltalk again. At the moment, I mostly program in C# (and sometimes in Java), but I often feel frustrated with both languages. After being exposed to Ruby and Python, I feel like static typing requires me to write too much code and helps the compiler more than it helps me. Furthermore, Java seems to be overly engineered with all the factories, manager, readers and writers, while C# is often inconsistent or lacking in its implementation (e.g. anonymous methods are not really closures). Both languages are becoming increasingly complex with the addition of more and more features. Generics for example is just not necessary in a dynamically typed language. The problem with scripting languages such as Ruby and Python however, is that they are often interpreted and slow. I experimented a bit with JRuby (a Ruby implementation in Java with full access to Java’s class library) but that didn’t satisfy my needs either. After trying to code a simple Hello World Swing application in JRuby, I was stunned that it still required me to wrap code inside an ActionListener like Java does, while I really just wanted to pass in a Ruby block.
Other people have also been struggling with languages such as Java or C# (e.g. Jamie Zawinski, Mark Miller and Steve Yegge) or are looking for alternatives (e.g. Martin Fowler and Tim Bray). I think the popularity of Ruby might motivate more people to have a look at Smalltalk. Furthermore, if you know Ruby, it’s easier to get acquainted with Smalltalk. Besides lots of similarities in the class library (the Kernel class, the times message on numbers, etc.), Ruby already introduces the notion that everything is an object, objects in Ruby communicate through messages and Ruby has blocks. However, Ruby is not really equivalent to Smalltalk yet. Ruby introduced extra syntax to be more familiar to people that were used to C-style programming languages, thereby losing part of Smalltalk’s flexibility. In fact, the beauty of Smalltalk is that its entire syntax easily fits on a postcard. If you look closely at this example, even a conditional test in Smalltalk is implemented using messaging on objects. You just send the message ifFalse to an instance of the class Boolean, and pass in a code block you want to have executed when the value is false. It’s turtles all the way down.
Another problem I came across when developing in Java or C# (or in any other OOP language I used) was the difficulty of changing class hierarchies. Very often, due to time constraints, a design is just left in its original state, and the new requirements are supported by performing a quick hack. I suspect this problem is especially prevalent in so-called “research code” It gets even worse when programming in teams. Although this problem is generally known in software engineering and several strategies have been proposed to deal with it, I wondered why the promise of OOP failed here. Wasn’t OOP supposed to improve the situation and make spaghetti code obsolete?
Jeffrey Massung asked himself a similar question: What if the philosophy (OOP) wasn’t the problem, but the implementation (language) was?, and decided to write a 2D DirectX game in Smalltalk. It seems Smalltalk did indeed allow for easier design changes. Self (a language derived from Smalltalk) tries to alleviate the aforementioned problem by specializing through cloning of existing objects instead of through class hierarchies. It’s funny to note that the problem wasn’t that bad in Smalltalk, since you could still easily change the hierarchy, unlike in languages such as Java or C++.
The real power of Smalltalk is not its syntax, but the entire environment. I believe this is also key to understanding OOP. The current languages and tools (e.g. IDEs) we use for doing object-oriented programming are just weak implementations of the original Smalltalk environment. When working in Smalltalk, you are working in a world of running objects, there are no files or applications, everything is an object. For example, version control systems in Smalltalk are actually aware of the semantics of your code, they are not just text-based. When merging code they can show you what methods have been changed, added or removed, what classes were changed, allow you to decide which changes you want to keep, etc.. Although I think Bazaar is a great, it doesn’t come close to this way of working. Smalltalk allows live debugging and code changes, which is tremendeously useful. Ever wished that you could fix a problem while you’re debugging and immediately check if your solution works without having to recompile your application and start the entire process again? In Smalltalk (and Lisp) that’s possible. If you want to find out more about why Smalltalk is way ahead of current mainstream OOP languages, have a look at Ramon Leon’s Why Smalltalk.
Update:: Scott Lewis commented that I should have emphasized that Smalltalk is mostly written in Smalltalk: So when you subclass any object, you can go back up the chain of inherited objects and see how everything works. Likewise when you hit an error/bug, the debugger lets you delve about as deeply as you could possibly want into what is going wrong, and why it is an error. This is indeed a powerful aspect of Smalltalk, and an example of how it was influenced by Lisp.
Besides reading about Smalltalk, I have also been experimenting a bit with Squeak. Squeak is an open source implementation of the Smalltalk programming language and environment, created by its original designers. Squeak runs bit-identical on many platforms (including Windows CE/PocketPC). I will leave my Squeak experiments for another blog post though
To conclude, it seems that we are very good at ignoring the past. We just take our current systems for granted, and use them as a reference frame for future innovations. Marshall McLuhan once phrased it like this: We drive into the future using only our rearview mirror. I believe this is true in HCI research as well, as people like Dan Olsen have pointed out. He argued that our existing system models are barriers to the inclusion of many of the interactive techniques that have been developed. He gave the example of the recent surge in vision-based systems and multi-touch input devices, which get forced in a standard mouse point model because that is all that our systems support:
Multiple input points and multiple users are all discarded when compressing everything into the mouse/keyboard input model. Lots of good research into input techniques will never be deployed until better systems models are created to unify these techniques for application developers.
Research on toolkits is a lot less popular these days. We try to map everything into existing models, and always feel like we have to support legacy applications, which hampers significant progress. Bill Buxton has also studied innovation in HCI, and questioned the progress we made in the last 20 years.
I think the reason why so many great work was done by the early researchers in our field (e.g. Ivan Sutherland, Douglas Engelbart and Alan Kay) is — besides that they were very creative and intelligent people — that there was not that much previous work, they just had to start from scratch. Alan Kay once asked Ivan Sutherland how it was possible that he had invented computer graphics, done the first object oriented software system and the first real time constraint solver all by himself in one year, after which Sutherland responded I didn’t know it was hard.
I spent some time the last weeks to support type decoding plugins in Uiml.net. This is mainly useful when you want to interact with applications or web services that have their own types that need to be converted to something the widget set understands. Suppose for example a web service returns a set of Persons, which need to be represented in a list view. The renderer does not know how to transform a Person into an item of a list view, so you need to define a custom component that sits between the renderer and the web service, and can provide this conversion. However, since you don’t know which widget set is used, you have to do this for every possible widget set (e.g. System.Windows.Forms, Gtk#, System.Windows.Forms on the Compact Framework, etc.). Furthermore, it would be better to let the renderer manage this code.
So I created a type decoder plugin system and while I was at it, also cleaned up the code. This resulted in only one general TypeDecoder instance being created in the renderer, while we previously had one instance per backend. Now we have a container class in each backend to host widget set-specific type decoders. This container class get registered with the TypeDecoder, and is in fact also a plugin.
Instead of going into the implementation details, let’s have a look at an excerpt from the System.Windows.Forms container class (SWFTypeDecoders.cs):
37
38
39
40
41
42
43
44
45
46
47
using Uiml.Rendering;
using Uiml.Rendering.TypeDecoding;
publicclass SWFTypeDecoders
{[TypeDecoderMethod]publicstaticSystem.Drawing.Point DecodePoint(string val){string[] coords = val.Split(newChar[]{','});
returnnewSystem.Drawing.Point(Int32.Parse(coords[0]), Int32.Parse(coords[1]));
}
The only thing we have to do to define a type decoder method is add the [TypeDecoderMethod] attribute and support as a parameter the type we want to convert from. The return type is what we will convert to. In the above listing, the DecodePoint method converts a string to a System.Drawing.Point. The [TypeDecoderMethod] attribute is used to declare that the corresponding method is a type decoder. This way other auxiliary methods will not be registered and won’t pollute the type decoder registry.
To test the implementation, I created a simple class that connects to del.icio.us and gets all my tags. I use this class to show them in a Gtk# UIML GUI. To be able to convert between the XML document that is returned by del.icio.us and the user interface, I wrote a custom type decoder, and connected it to the renderer. I also have a short screencast showing its workings.
I have extended the Uiml.net type decoder to combine existing type decoding methods if direct conversion is not supported. In this example I created a type decoder to convert from a System.Xml.XmlDocument to a Uiml.Constant. But the Gtk.TreeView widget requires a Gtk.TreeModel. The renderer will therefore look for a conversion from a Uiml.Constant to a Gtk.TreeModel and apply the type decoders in sequence. Although we could have converted directly to this data type, this is not as flexible since it is widget set-specific. Although the interface will remain the same, I will probably change the underlying implementation to a graph with the types as vertices, and type decoders as edges to better support these indirect conversions.
I am currently moving Uiml.net towards MSBuild files (a.k.a. Visual Studio project files) and experienced a few problems such as a restriction with Compact .NET’s OpenFileDialog.
After a bit of Googling, I finally know why I can’t seem to get the OpenFileDialog to look for files in \Program Files: it was just designed that way (do a quick search for OpenFileDialog on that page). Apparantly they wanted to help users organize their files, by restricting them to the \My Documents folder. Too bad they didn’t think of what developers might want to do … I want to allow people to try UIML examples when they click the “Select UIML file” button. Of course these files reside in the Uiml.net application directory itself, so I need to allow them to pick a file outside of the My Documents folder. If that wasn’t bad enough, it further restricts you to one level of subfolders within the My Documents folder!
I also noticed MSBuild has no nice way of copying a file to another path then the one it is being referenced from. I orginally wanted to copy the front-end files and vocabularies in the root directory where the Uiml.net executable is placed. Since I didn’t find a solution, I made dedicated directories for these files, and modified the code to look for them in those directories instead.
If I could tell MSBuild to copy a file to another path, I could also solve the OpenFileDialog problem just by copying the Uiml.net examples to the My Documents folder, while the rest of the application would stay in the Program Files folder.
So my (hackish) solution is now to copy all examples first into a direct subfolder of the user’s My Documents folder. Not very elegant, but it works.
I spent some time yesterday to update the Uiml.net homepage:
We set up a wiki somewhere last year to provide more information about our UIML research. However, since Uiml.net is the main focus of our research efforts, I reorganized the wiki to be more of a homepage for Uiml.net, while still listing our publications.
I am hoping to migrate all the content of the old website (still located at Kris’ homepage) to the wiki soon.
Lately I have been enjoying Launchpad for hosting my projects and Bazaar branches. It has integrated specifications (_blueprints_), bug tracking, some kind of forum (_answers_). I have moved Uiml.net over to Launchpad as well. Launchpad’s integration of Bazaar branches will be useful for managing multiple experimental contributions from myself, Kris or our Bachelor’s or Master’s students.
Finally I’m hoping to put out a more or less stable release of Uiml.net by September. More information can be found at Launchpad.
The software is now more polished and additionally provides a D-Bus service that allows other applications (written in any language with D-Bus bindings) to use the hardware. Currently this means you can connect to the hardware from Python, Ruby, .NET, C, C++, Perl and Pascal!
This DBUS service allows to start and stop reading, and emits a signal whenever a different tag (including the nil value) is read. I modified the original PyGTKGUI to use this daemon for communicating with the hardware. Furthermore, I improved the HAL support so that plugging the device in and out is detected. Unfortunately, the daemon cannot yet handle this though. That’s for a next release
Here is the GUI (which you probably remember from the last post). Nil values are now colored red:
And this a screenshot showing the communication between the daemon and the GUI logged with dbus-monitor:
I had some fun writing a Python wrapper around libphidgets for an RFID reader we had lying around here. To do so, I used ctypes (”apparantly”::http://ailab.ch/pipermail/libphidgets-discuss/2006-February/000442.html the Python bindings for libphidgets were broken). To check for a connected RFID reader, I interfaced with hal through dbus. Afterwards I created a simple GUI for the device with PyGTK.
Here is the result:
I experienced some weird permissions problems though. The device could only be opened with root privileges. Takis helped me step through libphidgets to see if there was a bug in it. In the end, we solved it by changing the /etc/udev/rules.d/permissions.rules file (in my case it was called 40-permissions.rules). I’m not sure if there are any security problems with this though.
I changed these lines:
# USB devices (usbfs replacement)
SUBSYSTEM=="usb_device", MODE="0664"
to:
# USB devices (usbfs replacement)
SUBSYSTEM=="usb_device", GROUP="plugdev", MODE="0664"
I will probably put the code online when I have some spare time (and after I cleaned it up a bit).
I managed to destroy my Windows XP partition a few weeks ago while trying to resize it. Fortunately I made a backup. I decided to try to run Windows inside a virtual machine. After removing the broken NTFS partition, I reused the free space to create a new /home partition.
I created a disk image with QEMU, and ran a Windows XP installation on it through VMware Player. And I must say, it works perfectly. I am even able to connect to USB devices such as a PocketPC.
In the picture below I am debugging a Compact .NET Hello World application in Visual Studio in Windows in VMware. The application is running on the PDA that is attached to the PC.
Here is a screenshot of Visual Studio running the PocketPC emulator:
Speed is not an issue: the image runs almost at native speed thanks to the kernel modules. Whenever I need Windows (I rarely do), I just fire up VMware Player, and I am able to dig right in because it saved the previous state. And when I’m finished I just suspend the virtual machine.
I don’t think I’ll go back to a dual-boot system soon.