Projector-Camera System for Flexible Interactive Projections by dfgh4bnmu


									     Projector-Camera System
for Flexible Interactive Projections

           Andreas Ammer

Numerisk analys och datalogi                  Department of Numerical Analysis
KTH                                                       and Computer Science
100 44 Stockholm                                   Royal Institute of Technology
                                                  SE-100 44 Stockholm, Sweden

              Projector-Camera System
         for Flexible Interactive Projections

                               Andreas Ammer


                 Master’s Thesis in Computer Science (20 credits)
   within the First Degree Programme in Mathematics and Computer Science,
                           Stockholm University 2005
                      Supervisor at Nada was Lars Bretzner
                         Examiner was Yngve Sundblad
Projector-Camera System for Flexible Interactive
This report describes a Master’s thesis project to create a projector-camera
system. A projector-camera system is a fairly new interface for human-
computer interaction where the user can interact with projections. This report
starts off by examining some previous work and then goes on to study some
different solutions and approaches to the problems you face when constructing
a projector-camera system. After that I give a more detailed description of how
the system was built and what important decisions I made. Finally the report
goes through some simple prototypes and lists experiences learned when
testing the system on ourselves as well as some outside test users.

Projektor-kamerasystem för flexibla interaktiva
Denna rapport beskriver ett examensarbete i datalogi där målet var att skapa ett
projektor-kamerasystem. Ett projektor-kamerasystem är ett rätt så nytt
gränssnitt inom människa-datorinteraktion där en användare kan interagera
med projektioner. Denna rapport börjar med att undersöka tidigare arbeten
inom området och går sedan vidare till att studera några olika lösningar och
tillvägagångssätt till problemen man möter när man skapar ett projektor-
kamerasystem. Därefter beskrivs mer detaljerat hur systemet byggdes och vilka
viktiga beslut som togs. Till sist görs en genomgång av ett par enkla prototyper
och de erfarenheter som iakttogs när systemet testades på erfarna såväl som på
utomstående användare.
1   INTRODUCTION                                                             1
    1.1   MOTIVATION AND GOALS                                               2
          1.1.1 GOALS                                                        2
2   RELATED WORK                                                             3
    2.1   PROJECTOR-CAMERA SYSTEMS                                           3
    2.2   RELATED SYSTEMS                                                    4
    2.3   FUTURE APPLICATIONS                                                7
3   OVERALL SYSTEM DESIGN                                                    8
    3.1   SYSTEM OVERVIEW                                                    8
    3.2   CAMERA AND PROJECTOR SETUP                                         9
    3.3   TASKS AND COMMUNICATION                                            9
          3.3.1  COMMUNICATION PROTOCOL                                      9
          3.3.2  CAMERA IMAGE PROCESSING                                     9
          3.3.3  PROJECTION CONTROLLER                                       10
    3.4   ACTIVE AREAS                                                       10
    3.5   FEEDBACK                                                           10
    3.6   EXAMPLE                                                            12
4   CAMERA IMAGE PROCESSING                                                  13
    4.1   FINGERTIP FINDING ALGORITHMS                                       13
          4.1.3  COLOR MATCHING                                              15
          4.1.4  TEMPLATE MATCHING                                           16
          4.1.5  CONCLUSION                                                  18
5   DETAILED IMPLEMENTATION INFORMATION                                      19
    5.1   SYSTEM PIPELINE                                                    19
    5.2   ACTIVE AREAS                                                       20
    5.3   DIFFERENCE IMAGE ALGORITHM                                         20
    5.4   TEMPLATE MATCHING ALGORITHM                                        21
6   PROJECTION CONTROLLING                                                   23
    6.1   OVERVIEW                                                           23
    6.2   INTERACTION DELAY AND VISUAL FEEDBACK                              25
    6.3   COMMUNICATION                                                      25
    6.4   SLIDES                                                             26
          6.4.1  UNIVERSALSLIDE                                              26
          6.4.2  CUSTOMSLIDE                                                 26
7   PROTOTYPE EXAMPLES AND OBSERVATIONS                                      27
    7.1   PROTOTYPES                                                         27
          7.1.1  SIMPLE SLIDESHOW                                            27
          7.1.2  EXAMPLE: FROGSLIDE                                          27
          7.1.3  EXAMPLE: TICTACTOESLIDE                                     28
    7.2   EXAMPLE SYSTEM RUN                                                 28
    7.3   OBSERVATIONS                                                       30
8   CONCLUSIONS AND FUTURE WORK                                              31
9   SUMMARY                                                                  32
REFERENCES                                                                   33
APPENDIX: SLIDESHOW.XML                                                      35
1   Introduction
    As computers are getting more advanced and the average user is getting more
    used to interacting with them, the research on creating better and more user
    friendly interfaces is growing, along with the research to make old interfaces
    more user friendly. This master thesis studies a fairly new type of interface,
    analyzes different approaches to it and implements a simple prototype. The
    interface in question is called a projector camera system. As the name implies
    the interface consists of two important hardware components, a digital video
    camera and a projector. The projector is used to display information valid to
    the user on a suitable surface. The camera then watches the user for any
    reactions or interactions (see figure 1).

                                Figure 1.
                                A projector camera system.

        Depending on what the user does the projected display changes to what the
    system interpreted the user wanted. What the system actually looks for can
    differ. One example is to watch how close the user is to the projected display
    and then update the display as the user gets closer. Another example is to
    watch for user interactions within the display as the user uses hands and fingers
    to interact with different active elements in the display. Since this is such a new
    invention there is no standard or well known way to implement it.
        The report is organized as follows. First we will make a summary of
    previous works in this area of research and also look at some similar systems
    (Chapter 2). After that we will examine the hardware and software setup for
    this project in a brief overview (Chapter 3), and then study each part in detail
    going through every implementation related decision carefully (Chapter 4–6).
    Finally we will end the report with a summary of what was done and what
    important lessons we learned.

1.1     Motivation and Goals
        Why is a Projector-Camera System needed? Other user interfaces such as touch
        screen monitors or interactive computer terminals already exist and provide a
        solid base to choose from. The advantage of a projector is that the image can be
        displayed in a number of different location and sizes. A projection can be
        several meters wide or just a few decimeters, whereas the size of monitors is
        limited and not adjustable. The use of power cables and such is simpler with a
        projector, as a projector often is placed where it can not be seen. Another big
        advantage is when the projector is turned off the projection area is empty and
        free to use for other things, where a monitor that is turned off is just a waste of
            A Projector-Camera could for instance be used on a restaurant table to
        show the menu, once the customer has decided what to eat and drink the
        projection would terminate and the table would be free to eat off of. Another
        kind of application is an information kiosk where a user gathers useful
        information. In this case the information can be displayed nearly anywhere on
        a vacant surface. Information can include directions to different parts of a large
        shopping mall or information about certain articles, such as available sizes and

1.1.1   Goals
        The goal of this project is to first of all get a good understanding of the
        problems and characteristics of a Projector-Camera System. We want to create
        a fully functional prototype with one or two example implementations, such as
        an information kiosk and a simple game. Both the projector and the camera
        will be of standard configuration, no specific hardware components will be
        used. An important goal we want to fulfill is that once the system is finished,
        the process of creating new applications such as an information kiosk should be
        simple enough for anyone to implement it. The goal is not to create one single
        prototype, but create a system that can handle lots of different prototypes
        specified by a user. We will not include a usability study in this paper but the
        possibility to do one in the future is open since the system will be flexible
        enough to create new prototypes to test.
            Many problems will come up in the progress of creating this system. Some
        of the big problems to be solved are:
        • We will need to find an appropriate application to implement. Our way to
          motivate the construction of this system.
        • Depending on the application we need an efficient way to identify user
        • We need to make sure the interface is simple enough for an inexperienced

2     Related Work
      Most of the previous work in this area is theoretical research done in
      connection with evaluations or discussions of other types of human-computer
      interfaces. In this chapter we will study some of the existing practical
      implementations of Projector-camera systems.

2.1   Projector-Camera Systems
      Scientists at Siemens such as C. Maggioni have written several articles in this
      area, the early ones being strictly theoretical since the hardware that existed
      back then did not support a real time system. One of the first articles
      [Maggioni, Cristoph. 1995] written in this area discuss the use of hands and
      gestures as a big part in human to human interaction, and how it might also be
      implemented in human-computer interaction. As computers got more power
      the theoretical work could more and more be implemented as practically
      working applications.

              Figure 2.
              Left: the Everywhere Display projector system of IBM.
              Right: the round table with three items highlighted.

          A man that has been leading in inventing and creating implementations of
      Projector-Camera systems is the Frenchman Francois Bérard. His work in this
      area of research includes “The Magic Table” [Bérard. 2003] where users
      interact with a projection using colored magnets. In another project from
      Bérard described in the article “Bare-Hand Human-Computer Interaction”
      [Bérard, Hardenberg. 2001] they successfully display several software
      applications such as an internet browser where the user could click on links and
      scroll down the page. In this article Bérard describes an advanced algorithm to
      locate the user’s fingers as well as the user’s hands and included a wide range
      of different commands. For instance scrolling was applied by spreading all
      fingers and moving the hand up and down, while clicking a link was done
      using only one stretched finger.

          Another team of researchers working for IBM, Claudio Pinhanez et al, has
      implemented a number of working prototypes. Research has been done over a
      wide range of ideas. Basically they all worked to give information to the user
      in different situations. In an article they describe four working implementations
      including the Retail Store Application (RSA) [Pinhanez, Kjeldsen. 2003a]. The
      RSA displays information on a projection area next to a stack of clothes in a
      store and changes display depending on how close the user is and which type
      of clothes the user currently is looking at. Information in this case was the
      different sizes available and how many articles were in stock. A similar idea
      was also tried on a round table where the displays changed along the side of the
      table as the customer walked around it showing detailed information about the
      item closest to the customer (see figure 2 right). The round table is described
      more closely in an article [Pinhanez, Kjeldsen. 2003b] that also discussed the
      hardware setup on a more detailed level. The most striking feature of the
      systems developed by the IBM team is that the camera and the projector are
      mounted upon a steerable motorized arm (see figure 2 left). This means they
      can control where the projection shall be displayed, and with software they are
      able to correct image distortion in real time. The system can project images on
      nearly any surface available close enough to the projector. The IBM team has
      also created a system, more like Bérard’s Bare-Hand, where the user interacts
      with a display by pointing and clicking in it [Pinhanez, Kjeldsen. 2002]. The
      system was sensitive to movement whereas Bérard had a system that was able
      to find hands and fingers that were not moving. This meant that to register a
      ‘click’ from the user some sort of movement was required, and the solution
      was a forward motion followed by a retracting motion.
          Siemens’ researchers have not only theoretical results to show. In chapter 2
      in the book [Cipolla, Pentland. 1998], Maggioni and Kämmerer discuss several
      different implementations and ideas. They created an Information Station
      where a user could gather useful information about various products by
      interacting with his hands in the projection.

2.2   Related Systems
      Many applications similar to standard Projector-Camera systems exist but with
      the demand of special hardware or with a different setup. A team of Swiss and
      American scientists has designed a Projector-Camera system [Starner, Leibe.
      2003] with one significant difference in the hardware setup. The camera looks
      at the infrared (IR) spectrum, in effect it looks after heat signature where a
      regular camera just looks for any moving object. This solution has some
      advantages in that you know almost for sure that it is a person interacting with
      the system and not a pen or other finger shaped object. During their tests, items
      like coffee cups and heated food appeared but were easily filtered away since
      these objects look nothing like a hand or finger. The big disadvantage is that
      the hardware is very specific whereas the regular projector camera system in
      theory could use any camera and projector available.
          The PDS (Portable Display System) created by Stanislav Borkowski et al.
      [Borkowski, Riff, Crowley. 2003] is a Projector-Camera system where the
      projector and camera are mounted on a motorized rack and the user can direct
      the projection using a small rectangular piece of cardboard (see figure 3). The
      camera finds the cardboard and follows it, telling the projector to do the same.
      A similar system to the one IBM used, but where IBM used theirs to give

information, this system is only for changing projection area without shutting
down and restarting/recalibrating the system. This system also lacks any
interactive projections which is what we will implement.

        Figure 3.
        The PDS system. To the left is the hand held cardboard and to the right a
        projection has successfully been projected upon it.

    A project that is very relevant to this master thesis even though it is not a
Projector-Camera system is a work by Zhengyou Zhang at Microsoft Research
[Zhang. 2003]. In the paper Zhang goes through a number of applications that
is controlled with finger and hand gestures captured by a camera (see figure 4).
This project does not however use a projector.

    Figure 4.
    One of the interactive implementations of Z. Zhengyou. The user can interact with
    the displayed image by using her hands.

    In the area of recognizing fingers and hands Alexandra L.N. Wong et al has
constructed a hand scanner that can identify a unique user by looking at the
hand [Wong. 2002]. They have managed to create a system that very accurately
can identify all parts of a users hand. They do this however by using a specially
designed hand scanner with high resolution. Although the technique is
interesting, the quality level of the input image extremely high in resolution
and nothing like what a standard digital video camera of today can produce.
And since it is live video we are working with this technique is too slow on
computers today.
    Mark Ashdown and Peter Robinson are two researchers at the University of
Cambridge. They have created a personal workspace they call Personal

Projected Display [Ashdown, Robinson. 2003]. A regular desk is used to
project a large desktop workspace where the user can interact with the display
using an electronic pen (see figure 5). Two projectors are used, one to display
the major part of the workspace and another with higher resolution to view
documents and other text in more detail. The electronic pen was a tool that
proved to be an intuitive and easy to learn interface that even the most
inexperienced user could learn to master reasonably fast.

    Figure 5.
    The Personal Projected Display. In practical use to the left and the hardware
    setup sketch to the right.

    For more accurate location of the fingertips two scientists at the Brigham
Young University created a system with 2 cameras [Fails, Olsen. 2002], giving
the system to locate the users hand in a 3D environment. This project did not
include a projector but the use of 2 cameras for better abilities to locate the
hand is certainly an interesting implementation feature.

2.3   Future Applications
      The research in this area is ongoing and develops more as new hardware and
      faster and smaller computers can be used by the scientists. Cameras are now
      small enough to be mounted into personal cell phones for a small amount of
      money. Scientists predict that soon projectors too will be small and cheap
      enough that installing them into cell phones becomes possible. This means
      every cell phone could be a portable Projector-Camera system and the
      applications they could support are limited only by the imagination of the
      scientists. One research team is working on a virtual keyboard [Virtual Devices
      Inc. 2004], where the phone projects a keyboard on any available surface and
      the camera looks for fingertips in the area (see figure 6).

            Figure 6.
            A virtual keyboard built into a PDA makes typing easier as long as there is
            a large enough planar area around.

3     Overall System Design
      We will in this chapter do a brief overview of how the first decisions regarding
      software development.

3.1   System Overview
      The Projector-Camera system consists of two parts; one part controls the
      projector and what images to show. We call this the Projection Controller or
      PC. The other part controls the camera taking care of the images the camera
      captures. This part is called Camera Image Processing or CIP (see figure 7). In
      theory the system could be constructed as one big application that handled
      everything, but as the two parts are significantly different in functionality and
      implementation splitting them up will make each implementation easier. This
      makes it easier to increase the efficiency of the system as each component does
      only what it is good at doing.
          All user interaction will be based on fingers; algorithms for finding hands,
      faces or any other parts can be implemented. We chose to use fingers, more
      specifically fingertips because the fingertip is the most intuitive pointing tool
      humans have. What kind of fingertip finding algorithm to use will be explained
      later on in chapter 4.

                      Figure 7.
                      Simple layout sketch of the system. PC stands for Projection
                      Controller and CIP for Camera Image Processing

3.2     Camera and Projector Setup
        When positioning the camera and projector there are several things to take into
        consideration. In this master thesis we will use one regular camera and one
        regular projector. Previous works have used many different setups of hardware,
        for instance using two cameras for better depth perception or an infra red
        camera, or multiple projectors for better projections. The goal of this project is
        to use as generic components as possible and by that not limiting the usability
        to specific hardware components. Our setup requires that the camera and
        projector are aligned, so that the camera observes the exact image the projector
        is showing. Software to correct the camera view or the projection exists and
        has been used in several similar projects, for instance the PDS mentioned
        earlier [Borkowski, Riff, Crowley. 2003]. We decided not to use any such
        software, partly because research in that area is already well covered, partly
        because we want to limit the work to be done in this project.

3.3     Tasks and Communication
        The communication between the Projection Controller and Camera Image
        Processing will take place in sockets. We will use a simple client-server

3.3.1   Communication Protocol
        We decided to make the CIP the server side. This has advantages such as
        supporting several clients to connect from different places for access to
        information about the state and current settings of the camera. The projection
        controller side only contains information about the different images that can be
        projected and is more suitable to act as a client in the socket.
            Most of the data that will be sent between the two parts will be numbers
        which makes it simple to implement a protocol. We want the system to be able
        to handle more advanced commands for possible future modifications. We
        decided a text based communication protocol would be more generic and open
        for future expansions.
            Data sent from the camera to the projection controller will only consist of
        coordinates where a finger was found while data sent from the projection
        controller to the camera will contain a number of commands. The projection
        controller will also send coordinates whenever the active areas in the projection
        have been updated. The projection controller will act as a controller to the
        camera application using commands in form of strings.

3.3.2   Camera Image Processing
        The CIP takes every image captured by the camera and searches it for
        fingertips. Searching for patterns, such as fingertips, in images is very
        computationally demanding. To reduce the work, the areas in which the
        algorithm will search are limited to the specified active areas, this means we
        can disregard part of the image as not important and need not to waste
        computing power on it. The result of the algorithm is a position, representing
        the best match for a fingertip found in the image. The coordinates of this
        position is sent to the projection controller where they are examined and acted

3.3.3   Projection Controller
        The task of the projection controller is to store all information about the images
        shown by the projector. This include data such as where the active areas are
        within all images, what happens when a specific active area is activated and if
        an active area is linked to another image, that must be stored as well. All this
        information can be stored on file for easy access and ability to save a specific
        system setup. Coordinates will continuously be received from the camera
        control where fingertips have been identified. The projection controller will
        examine whether these coordinates are within an active area and if they lead to
        activation of an area the appropriate action will be taken. Each time a new
        image is projected, the coordinates of the active areas need to be sent to the
        camera control for more efficient image processing.

3.4     Active Areas
        The image processing will go through each image read from the camera feed
        looking for user inputs, in our case fingertips. To simplify detection and reduce
        computational load we decided to define special active areas in the projection
        where the user interaction takes place. This means we can eliminate parts of the
        image when looking for fingertips which will increase the speed of the system.
        It also means we can have moving elements displayed, such as movies or
        animations without it interfering with the fingertip finding algorithm by placing
        the moving elements outside the active areas. Where these active areas will be
        stored is another decision we need to make. Initially all information about
        where the active areas are will be on the projection controller side of the
        application since that is where the projection is defined. Either the active areas
        can be sent to the camera control during runtime as they change or they can all
        be sent in the initial phase after they have been read and stored. A problem
        with sending all of them initially is if the projection consists of dynamic
        elements, images that are generated during runtime with new active areas. To
        have a system that can handle both static and dynamic images we will
        implement continuous report of active areas.

3.5     Feedback
        One of the big problems with some existing Projector-Camera systems is the
        lack of appropriate feedback to the user. The researchers at IBM could
        conclude this after testing their Projector-Camera systems on inexperienced
        users [Kjeldsen, Pinhanez. 2003c]. The only thing a user can see and get
        information from in a Projector-Camera system is the projection, and a
        projection has no feeling or physical feedback. Lacking a physical form we still
        have 4 senses to work with, seeing, hearing, smelling and tasting. We quickly
        rule out smell and taste, since it is practically impossible to implement with
        today’s hardware, even though implementing feedback for these two would be
        both interesting and a challenge. This leaves us with hearing and seeing. The
        way we wanted the active areas to work was with a certain time delay, if a user
        holds his/her fingers over an active area for a number of seconds that area will
        become active. To add audio feedback to such a function would mean playing
        sounds for several seconds at a time. Since every button has a time delay, we
        could use a sound that intensifies until it finally registers the “click”. The sound
        could just be played when the area is activated but then there will be no

feedback during the actual “click”, which means the feedback would still be
bad. Playing sounds several seconds every time the user is interacting with the
system would probably be more annoying than helping in the long run,
depending on the scenario and the application. Another problem with audio
feedback is where to place the speakers; if the sound comes from behind the
user for instance it would probably be more confusing than helping. The best
way to implement audio feedback would be in combination with visual
feedback. With only visual feedback left we need to find some intuitive way of
implementing it. Visual indication of user interaction can be done in a number
of ways, all we need to do is find a good way. Some kind of indicator that
would fill up when-ever an active area gets activated and that correspondingly
would empty if the activation stops. The design and the position of this
indicator is another issue. If there are lots of active areas there could either be
one indicator used for all of them or every active area has its own. The
important thing is to make sure the indicator does not interfere with any active
area since changes in the display will show as potential fingertips in the
fingertip finding algorithm. Consistency is important when creating a good
user interface.

        Figure 8.
        Left, the problem with IBM’s system was the lack of feedback.
        Right, one of our prototypes. The circle filling up with red indicates the user is
        successful in his actions.

    If we create a prototype using just one indicator in an image no matter how
many active areas the image has it would be best that all images in the system
were built the same way. However if we created another prototype we could
use another indicator placed somewhere else. Or if we created a Projector-
Camera system for only experienced users we could mix different indicators.
This shows that depending on what kind of prototype we create and which
users it aims for the design of the images and indicators may differ a lot. We
decided to implement some different indicators and different positioning of
them just to show how it can be done rather than how it should be done. We
want a system that is useable even for the most inexperienced user and adjust
the design after that.

3.6   Example
      The example we will go through (see figures 9–11) is named “The Frog-slide”,
      it consists of a single frame with two buttons and one small image of a frog, all
      in all three active areas. The two buttons are fixed and does not move when
      activated, one is named “Reset” the other “Exit”. Activating “Reset” will make
      the frog go to the center of the display, while activating “Exit” will make the
      system shut down. Pointing at the frog will make it jump to a random position
      somewhere in the display.

        Figure 9.
        Left. The frog slide with the frog in the initial position.
        Right. The user points at the frog.

        Figure 10.
        Left. The frog has now changed position.
        Right. The user is activating the Reset button.

        Figure 11.
        Left. The frog has been returned to its initial position.
        Right. The user is activating the Exit button.

4       Camera Image Processing
        We have chosen to use fingertips as the users’ way to interact with the system
        and will concentrate on those algorithms that suite this kind of problem. To get
        a good update on some of the existing algorithms this chapter will go through
        relevant methods, ending with a conclusion and analysis.

4.1     Fingertip Finding Algorithms
        An intuitive way for a human to interact with any computerized user interface
        is with his/her hands. The function of a fingertip finding algorithm is to locate
        and identify any interaction from the user. The most significant features of our
        hands are the fingers, which is what the algorithms listed below try to locate.
        Most such algorithms consists of four basic steps;
        1. Fetch the last image read by the camera.
        2. Motion detection, calculate a difference-image to locate interference or
        3. Shape and/or color matching to see if the interference in step 2 actually was
            a finger.
        4. Store the coordinates of eventual fingertips.
            We will now look at three different algorithms for motion detection and
        then discuss some different approaches to template matching.

4.1.1   Motion Detection with a Static Reference Image
        The first image difference algorithm we will look at uses a static reference
        image (see figure 12–14). When the system starts up the reference image
        algorithm, it captures and stores the first image – the reference image –
        displayed by the projector. If the number of different projections possible is
        known from start, every one of these can be captured and stored in the
        initializing part. When the system is up and running the latest image captured
        by the camera is compared with the reference image. This is done pixel by
        pixel in all three different base channels (red, green and blue) of the images as
        described in equation (1). The resulting difference image is a binary (black and
        white color only) image where any interference strong enough to pass the given
        threshold is marked black while white indicates no significant difference was
        found. For a more detailed description of the algorithm the reader is referred to
        the chapter “Detailed Implementation Information”.

          I diff = (I cur( R) − I ref ( R) ) + (I cur(G) − I ref (G) ) + (I cur( B) − I ref ( B) )   (1)

            Static Reference image is an algorithm best suited for static projections
        such as menu systems or information displays where the background does not
        change while the system is running. A big plus with static reference image is
        that it can register fingers that are still, which makes it easier to recognize user
        inputs. On the other hand, a big drawback with static reference image is that it
        can not handle big changes in environment and lighting, or if the images
        displayed by the projector are created while the system is running and therefore
        impossible to store the reference image beforehand. For example in a tic-tac-
        toe game where the next possible image differs upon where the user chooses to

set his/her mark, it is impossible to store every possible reference image. It is
important to be able to store the images before the system is up and running
because if reference images were to be taken later on, the possibility exists that
a user has her hands within the projection area, and the reference image will
then contain a hand or a finger. If this is the case the difference calculating will
always result in finding a fingertip where the accidental imprint occurred.

 Figure 12.                                        Figure 13.
 Live feed from the camera.                        Reference image taken in the initial

Figure 14.
Difference image resulting from the difference algorithm.

4.1.2   Motion Detection with Frequently Updated Reference Images
        Where the static reference image algorithm was more suited for static
        projections in stable environments, movement detection works almost under
        any conditions. Instead of taking the reference image in the initial state of the
        system this algorithm considers every image a reference image to the following
        image. That way slow change in the display area will be disregarded.
            Difference is calculated just as in the static reference image algorithm, pixel
        by pixel in all three channels (RGB) of the image; with a binary (black and
        white) image as a result, see equation (2). But where the static reference
        algorithm uses a fixed image as reference this algorithm uses one that is n
        frames old I(t-n). Black indicates a strong enough difference to pass the
        threshold and white indicates that no significant difference was found.
            The big advantage with this algorithm is that because it always updates the
        reference image, the background and environment as well as the projection can
        change, but the algorithm will still be able to find the difference in two
        consecutive images.

         I diff = (I t0 ( R) − I (t −n)(R) ) + ( I t0 (G ) − I (t −n)(G) ) + ( I t0 ( B) − I (t −n)(B) )   (2)

             Where this method solves the problem of changes in the display it also
        creates a problem in finding a good way of identifying user interaction. In the
        case with static reference image the user can just hold the hand or finger over a
        specific area and after a certain time some action can be taken. This is not the
        case with this movement detection because as soon as the user holds the hand
        still it becomes invisible for the algorithm; in effect the calculated difference
        image of a user holding the hand still will be completely white. Any action
        from the user must now instead be based upon movement, for instance waving
        or “rubbing” a specific spot with a finger. The researchers at IBM [Pinhanez,
        Kjeldsen. 2003c] used such a movement-based algorithm and registered a click
        as a movement forward followed by a movement backwards. This is not the
        most intuitive way for a user to apply commands in since the forward motion
        followed by a backward could mean the user realized the action she was going
        for was the wrong one.

4.1.3   Color Matching
        Color matching is a method to locate objects of a specific color in an image. It
        can be the color of the skin of the hand, which only works if the projecting area
        is separated from the area for user interaction, as in the previous master thesis
        [Bodda. 2003] by Gabriele Bodda, or the color of some specific object like the
        magnets used by Bérard in The Magic Table [Bérard. 2003]. The resulting
        difference image however is still most often represented as a binary image. The
        fact that the projector’s display distorts the colors of all objects in the
        projection area makes it hard to implement this algorithm in a regular
        Projector-Camera system, but it is possible under some circumstances.
        Restricting the colors being projected or using very clear colors in the projected
        images could be a solution.

4.1.4   Template Matching
        In most fingertip finding algorithms, shape matching of some sort is a must.
        Template matching is a matching algorithm that uses a specific pattern, a
        template, which it searches for in the difference image (see figure 16). After
        the difference between the reference image and the live feed image has been
        calculated a binary black and white image has been generated and in that image
        all potential fingers and hands need to be located. Depending on what kind of
        algorithm was used in creating the difference image and what kind of precision
        is needed; the choice of template can vary greatly. If a static reference image
        algorithm was used prior to the template pixel match algorithm a bigger and
        more fingerlike template is appropriate. Whereas if a movement detection
        algorithm was used, a smaller and thinner template works better as the
        difference between two consecutive images has different characteristics. All
        pixel matching algorithms basically work in the same way. The smaller
        template image is compared to all sub-regions of the larger difference image.
        The number of sub-regions of the same size as the template is calculated
        according to equation (3). For instance a 10*10 pixel difference image with a
        5*5 template image will require 25 matchings.

        # regions = (Widthdiff − Widthtemplate ) * ( Heightdiff − Heighttemplate )            (3)

                 Figure 15.
                 The fingertip template of Bérard.
                 Where d1 is the diameter of the little finger and d2 the diameter of the thumb.
                 [Hardenberg, Bérard. 2001]

            The easiest way to do a template match is to just go through both pictures
        pixel by pixel and award one point for each matching pixel (see equation 4).
        For every position [row,col] the algorithm will produce an error describing the
        difference between that part of the image and the template. To make the
        algorithm normalized we divide by the number of pixels in the template, that
        way the overall scale of the template and image part has no effect on the error.

                            ( ABS (Im age[row − u, col − u ] − Template[u , v]))
   error[row, col] = u ,v                                                          (4)

    One type of template is described by F. Bérard in one of his articles
[Hardenberg, Bérard. 2001] (see figure 15). Another type of template is the
one used by Gabriele Bodda in the previous master thesis from this institution
[Bodda. 2003]. He uses a template that looks more like a finger than a
fingertip, as is the case with most templates. This makes the algorithm better at
finding actual fingertips but with the cost of rotating the template at each sub-
region. To combine both the rotation and the joined pixel test into one
algorithm could mean problems with the speed of which the system can handle

            Figure 16.
            Three different templates.
            (a) A round template for static reference image algorithms.
            (b) Another template for static reference image algorithm, this one
                needs to be rotated.
            (c) A template for movement based algorithms.

    When determining which type of template and pixel matching algorithm to
use it is important to take into consideration what kind of hardware the
application will run on. To use a simple round template and to that use a simple
pixel match algorithm means the system does not have to be state of the art
since the computational load on the system will be lighter. But this also means
that the images can be misclassified when differences not looking like
fingertips at all can be rewarded high by the pixel match algorithm.
    Finally one last important factor to take into consideration when choosing a
template is the size of the template. For instance if the distance between the
camera and the area where interaction takes place is great the large fingers will
seem smaller and the size of the template needs to be scaled down, similarly it
needs to be scaled up if the distance is very short. Other factors of size to take
into consideration are if the system should work well for children as well as for
grownups and if the system can change projection surface while running, like
the Everywhere Display of the IBM researchers [Pinhanez, Kjeldsen. 2003]. In

        these cases it can be a good idea to be able to change template as the system is
        running or to have a template of regular size and a pixel matching algorithm
        that is not too precise.

4.1.5   Conclusion
        When choosing a difference image algorithm for this master thesis we were
        weighing the positive and negative sides of static reference image and motion
        detection against each other and decided for the static reference image. We
        want to make a system where the user interaction is both intuitive and with
        good possibilities to give useful feedback and we felt that those two items were
        the hardest to implement in a good working manner using motion detection. A
        static reference image is also suitable when the content of the images being
        projected and the projection area are known from start, which is what we had
        in mind in our implementation.
            We will implement two prototypes of the Projector-Camera system, one
        with static menu elements which a user can go through for information or
        control over some appliance and one with a more dynamic behavior where the
        slides are altered during runtime. Here a motion detection algorithm might be
        preferred but we felt that consistency was important and if a new algorithm
        was to be implemented the way for a user to interact with the system must
        change too. Instead we will make the dynamic imagery work with the static
        reference image algorithm by disregarding certain areas when running the
        finger finding algorithm.
            Choosing a template is very specific to how the system should behave and
        what kind of hardware is available. Since the hardware in this case is sufficient
        we instead took into consideration how the system should act under certain
        circumstances. The system should be easy to use for everyone including
        children and adults; we therefore need to find a round template that is of
        regular size. To that we implemented a simpler pixel-matching algorithm that
        awards points for each matching pixel, and added a low threshold value so that
        a finger from small to large size will get the system to respond. This might get
        a lot of misclassified images but hopefully not too many to interfere with the
        stability and functionality of the system, something that will show in a later

5     Detailed Implementation Information
      In this chapter we will describe on a more detailed level how we implemented
      the different parts of the Camera Image Processing. We have chosen to write
      the image processing in the C++ programming language because it is
      traditionally considered to be more efficient when designing computationally
      expensive algorithms. The image processing library we chose to use – Halcon
      – also has full support for C++.

5.1   System Pipeline
      The system works in a 6-step pipeline (see figure 17).
      1. Initializing phase that is only done once.
      2. If needed, new and updated active areas will be sent.
      3. The system reads a new image from the camera.
      4. A difference image is calculated using a difference image algorithm.
      5. Template matching generates coordinates.
      6. Information is sent through the socket. Back to step 2.
          Step 3–5 will be executed on the Camera Image Processing side of the
      application. What we need to establish is what data to send in step 6. If the
      template matching was unable to find any coordinates no data needs to be sent
      at all, this is only necessary when user input has been registered. Most of the
      time the system will be idle and no information will be sent, as no fingertips
      will be found.

                             Figure 17.
                             System pipeline overview.

5.2   Active Areas
      The active areas are designed to make the fingertip-finding algorithm go faster
      and reduce the possible mismatches. By eliminating parts of the projection and
      by that eliminating parts of the image read by the camera as parts that need to
      be searched for fingertips two major advantages arise. Firstly the fingertip-
      finding algorithm is the most cpu demanding process in the application, and by
      greatly reducing the amount of data to search through the program will run a
      lot faster. Second, within those areas that has been eliminated, moving objects
      and animations can now be shown without interfering with the reference image
      based difference algorithm.
          Active areas are defined as Rectangles; a coordinate, the upper left corner,
      and a width and a height (see figure 18). These Rectangles are stored in the
      projection controller for each image in the system. Every time an active area
      gets activated, from user interaction, the projection controller will send new
      active areas, and while it sends them it will also change the projected image.

                        Figure 18.
                        Active areas marked in the image.

5.3   Difference Image Algorithm
      We chose to use a difference image algorithm based upon a static reference
      image algorithm. With the addition of the ability to handle multiple reference
      images, taken in the initial state of running the application. For each new image
      that is captured the difference is calculated between it and the corresponding
      reference image. This is done by first splitting the images into the three color-
      sub-channels, red green and blue, and then in each of the three channels doing
      a pixel-by-pixel subtraction. Using a specified threshold, three binary
      difference images are generated where a black pixel marks a difference strong
      enough to pass the threshold value. The three images are merged into one
      picture, again using a pixel-by-pixel match where only matching black pixels
      are saved.

                   For each image
                     For all color channels
                       For each pixel
                         If pixel_img – pixel_template > T
                           Pixel_diff = black
                           Pixel_diff = white
                     Diff = diff_r + diff_g + diff_b
                     For each pixel in Diff
                       Remove single and small clusters of
                   Return Diff
                    Pseudo code.
                    Simple code for the difference image algorithm.

          To eliminate insignificant differences we use connected-components, a
      method that only saves big collections of black pixels, while smaller
      collections consisting of one to a few pixels are discarded. This elimination is
      done using 8-neighbourhood, which is an algorithm that for each pixel in the
      image determines the number of neighboring pixels of the same color.
          8-neighbourhood means the algorithm looks at all the surrounding pixels
      and if none of these are a match the current pixel is removed. After eliminating
      single pixels another algorithm removes larger groups of pixels that still are too
      small to be matches. This algorithm counts the number of pixels that are
      connected and if they are less than a specified amount, all the pixels will be
      eliminated. The number of pixels required to not eliminate the group can differ
      as the size of the template image differs. The parameters of this algorithm must
      be adapted to the size of the template image currently in use. The resulting
      picture is then passed on to the fingertip finding algorithm.

5.4   Template Matching Algorithm

                  Figure 19.
                  The best match marked with a grey rectangle.

    To get an overall efficient fingertip-finding algorithm we chose the round
template image. This will mean some misclassified images where the algorithm
will find fingertips that are not really there. In every difference image the
algorithm will go through the active areas doing a template match and
reporting the highest score with corresponding coordinate. The best match
function generates an error for each pixel within an active area, an error
describing how good the template match from this pixel was (see equation 4).
The collected errors will after the algorithm is run describe the obtained error
for each coordinate within the active area, the coordinate with the lowest error
will be the one returned by the algorithm (see figure 19). If the value of the
error is within some pre-defined limits of the error a fingertip would give the
coordinate is sent to the Projection Controller.
    When running the fingertip-finding algorithm the distance between the
camera and the actual projected image is very important. The closer to the
projection the camera is the larger the fingers will appear to be in the captured
image. If the camera and projector should be able to be moved between
different distances while the system is up and running a number of different
template images will be needed, alternatively one original template image can
be scaled to fit the current distance or the system could generate the templates
by itself during runtime. To find a size that fits a certain distant is not very hard
once one template has been identified for one specific distance (see figure 20).
Mapping the templates to specific distance can be done in advance the system
will be used. The user then only needs to specify the distances and the system
will know what template to use from earlier testing from a simple formula of a
constant k and the distance (see equation 5). The constant is something that
need to be identified by testing since it can differ on different cameras.

                                       1        (5)
              I templ = f ( d ) = k

       Figure 20.
       Three different template images for different distances between camera and
       projection surface. Which template to use is calculated from the distance
       between camera and projection area.

6     Projection Controlling
      Controlling what the projector should show and when to show it is an
      important part of the system functionality. In this chapter we will look over the
      structure of the projection controller and explain some of the implementation
          When creating a graphical interface there are lots of programming
      languages and libraries to choose from, we decided to work with Java since it
      is a well known language with two good graphical libraries in AWT and Swing
      where SWING was what we chose to work with. We wanted to make the
      program generic and modularized to make future expansions or modifications
      as easy as possible. We also wanted it to be possible to easily define images
      and menu systems without needing to have programming skills, to create a new
      slide show the user only needs to specify certain parameters in a XML file.
      Both of these requirements are fairly easy to fulfill using Java.

6.1   Overview
      For a graphical overview of the classes see diagram on next page. The main
      class that controls the dispatching of data to the correct slide and making sure
      the correct slide is being shown is SlideShowViewer. A SlideShowViewer has
      several components, an interface for communicating with the camera
      <CamProj>, a collection of slides <Slide> and an interface for reading XML
      files <XMLInput>. It also implements an interface, CameraListener which is
      the class with which the camera communicates with the SlideShowViewer.
      SlideShowViewer should inherit the JFrame class since we decided to use
      Swing. To add the functionality of a window controller we added the class
      SlideWindow and let SlideShowViewer inherit it instead of extending JFrame
          The Slide is a JPanel following the Swing pattern. Slide is also abstract, the
      purpose of it is to have a layout of how a standard slide works. When a user
      creates slides, she can use the built in script for creating slide shows using
      XML, or if a more complex slide is to be created with dynamic components the
      user can implement the <CustomSlide>. The CustomSlide is an empty shell for
      creating slides with dynamic behavior, with built in support in XMLInput,
      which means you can link the self made slide right into the slide show with all
      the static and other dynamic slides.
          To keep track of all the text, images and active areas we created a collection
      of small classes to handle this. TextItem, ImageItem and ActiveItem all inherit
      the same properties from the super class Item. An item holds its own position
      and properties and has the ability to draw itself in the JPanel that is the current

Class diagram for an overview over class dependencies.

6.2   Interaction Delay and Visual Feedback
      Active areas are parts of the projection that can be activated and hold a
      function; they are the buttons of the system. To activate an area the user holds a
      finger over the selected area until the system has registered enough successful
      coordinates for the area to be active. This delay is important to avoid incorrect
      activation of areas as the user is moving the hand over several different areas.
      A successful activation requires user interaction over a certain amount of time.
      It is important to make this time delay just long enough to make the system
      easy to use. Too long delays are annoying for the user. Too short delays are not
      acceptable either as this will lead to misclassified activations.
           It is important to give the user feedback when interacting with the system.
      From earlier discussions we reached the conclusion to use visual feedback, as
      this is mainly a visual interface. To visualize the feedback in the projection we
      use figures such as circles and rectangles that gradually get filled up with some
      color as the user successfully makes an active an area active. Every active area
      has a time limit for how long it has to be active before it gets activated; this
      time is equal to the system delay time. The placement of the feedback figure
      can vary; either every active area has its own figure or one figure can be used
      for all areas in the image. The important thing when placing the feedback
      figure is to not place them to close to or in an active area, since this could lead
      to misclassified fingertips as the gradually filled up figures will show as
      differences in the fingertip finding algorithm. The idea of bars and circles
      gradually filling up was something we worked out during this project. We
      knew the feedback was extremely important as it is the basic feature this kind
      of system normally lacks. We could have used symbols like an hourglass or
      clock but felt that gradually filling figures was a more intuitive way of
      reporting progress to the user, as the user sees the exact progress at each
      moment. An additional aspect of gradually filling figures is that when the
      interaction stops, the figure will start to empty out. Thus alerting the user that
      the interaction did stop which is important. With good and intuitive feedback
      the user will feel in control of the system not that the system is controlling

6.3   Communication
      We established that we want a text based protocol for communicating with the
      Camera Image Processor (CIP). To handle this communication we have on the
      Java side implemented a package. The package consists of two classes;
      CameraListener and CamProj. CamProj is a public class that handles all
      outgoing information while CameraListener is an interface that reports a
      CameraEvent every time data comes from CIP.
           In the initialization of the system it is important to synchronize the
      communication between the Projection Controller (PC) and CIP, this is when
      all the reference images are taken. The PC has to make sure that the correct
      image is displayed by the projector and then tell CIP to capture and store the
      image, then if there is more than one reference image to be taken, display next
      image and tell CIP to capture the next one and store that too. After all reference
      images are taken and reported as stored by CIP the system can start up. It does
      this by sending the active areas of the first image and then displays it on the

        projector, then goes into idle mode just waiting for CIP to start reporting hits
        from user input.
            When the system is in running mode the only information being sent are
        hits reported from CIP. If an active area gets activated PC will update the
        display and then send the new array of active area coordinates. If the system is
        created to be able to change projection area during run-time we built in support
        to change template image along with some other commands that could be
        useful when creating a new system, like being able to take new reference
        images and resetting the system.

6.4     Slides
        The images and all their attributes are stored as Slides. Slide is an abstract class
        containing all data belonging to a slide, such as its active areas and images.
        Since we chose SWING for this project, Slide extends JFrame making it easy
        to paint where and when we want to. Defining a slide show in our application
        is done in XML. The XML parser understands two types of slides,
        UniversalSlide and CustomSlide, where the first is a standard static slide used
        to create menu systems where slides refer to each other, CustomSlide is a way
        for users to implement slides with different behavior.
            A slide has two attributes that can be specified. Name and background
        color. The name is the way to refer to a slide in a slide show when linking it
        with other slides.

6.4.1   UniversalSlide
        The UniversalSlide is a static slide that has the same active areas all through
        the runtime of the system. A UniversalSlide consists of three different
        elements, images, text and active areas. The active areas are the clickable areas
        with a reference to another slide while text and image elements only consists of
        just that, raw text and an address to an image. Universal slides are linked
        together by using active areas. Every active area has a specified slide that the
        slide show will change to once that area has been activated.

6.4.2   CustomSlide
        A CustomSlide is a slide that the user constructs by herself, when some
        specific behavior is needed that can not be solved by using UniversalSlide. For
        instance when creating a simple tic-tac-toe game, it could either be done by
        linking several static UniversalSlides together or a way better solution would
        be to create a slide that changed appearance and by that its active areas during
        runtime. For this project we created two simple custom slides to illustrate what
        a dynamic layout can do. A custom slide has no way of specifying active areas
        unless they are manually coded into the system. Therefore a custom slide has
        an extra attribute – the name of the slide to go to when the exit function of the
        custom slide is activated.

7       Prototype Examples and Observations
        In this chapter we will describe a few example implementations we did to
        illustrate how to use our system.

7.1     Prototypes
        To test the functionality of the system we implement a number of simple
        prototypes. We will look at two simple dynamic slides and one very simple
        static example.

7.1.1   Simple SlideShow
        A slide show is a number of slides specified in an XML-file. A slide show
        consists of UniversalSlides and CustomSlides. To define a CustomSlide a
        name is required to keep track of it, after that it is just to add images, text and
        active areas wherever the user wants. A CustomSlide has three fields and they
        are all mandatory, the class name is needed to create an instance of the class,
        the name and the name of a slide it refers to. An example of an XML file
        specifying a slide show can be found in Appendix 1.

7.1.2   Example: FrogSlide
        When creating the first simple prototypes of the system we wanted a simple
        dynamic slide that can change appearance during runtime. The FrogSlide (see
        figure 21) is a simple game where a picture of a small frog is displayed and
        when the user points at the frog it will jump in a random direction over the
        image. When the frog jumps to the new position the active areas of the image
        needs to be updated and then sent to the CIP so that the fingertip finding
        algorithm does not search in the wrong areas.

                          Figure 21.
                          A simple game where the user makes a small frog jump
                          across the projection.

7.1.3   Example: TicTacToeSlide
        TicTacSlide is a simple tic-tac-toe game that shows the strength of the system
        when it comes to defining the active areas (see figure 22). Once a marker has
        been set that field in the slide will no longer be an active area and the marker
        that might easily be mistaken by the system as a fingertip will no longer be
        examined by the fingertip-finding algorithm. Activating the reset button will
        make all active areas active again and remove all markers. Once a marker is
        placed it is stuck there as we do not have a system of being able to move
        markers around. Another possible implementation could be a 4-in-a-row game
        where markers by default can not move once placed.

                          Figure 22.
                          A game of tic-tac-toe, the four fields marked with
                          circles are no longer active to prevent mistaken
                          fingertip matches.

7.2     Example System Run
        For a graphical overview see flowchart on next page. We will go through the
        implementation when running the example in chapter 3 to more clearly see
        how the system will work practically. The example we will go through is
        named “The Frog-slide” and consists of a single frame with two clickable
        buttons and one small image of a frog, all in all three active areas. The two
        buttons are fixed and does not move when activated, one is named “Reset” the
        other “Exit”. Activating “Reset” will make the frog go to the center of the
        display, while activating “Exit” will make the system shut down. However
        clicking the frog will make it jump to a random position somewhere in the
        display, although it can not jump to where the two buttons are. This example
        would run as follows.

7.3   Observations
      As we tested these prototypes we could make some observations on the
      performance of the system. We did not have time to do an extensive usability
      evaluation but we had a few inexperienced testers that got to try the system
      without any knowledge beforehand.
          A problem that showed up during these tests was that user interaction
      within certain colors and dark areas would not show up in the difference image
      algorithm. This need to be considered when designing a slide. If a part of the
      slide is very dark, no light will fall on the finger of the user, making it invisible
      for the system. One way to avoid this problem is to make sure other light
      sources than the projector exists to make the user more visible to the camera.
      Another is to make sure all active areas are within light areas in the slide.
          A good thing was that the number of misclassified images, images where a
      finger was found even though there was no actual user interaction, was very
      low. If we would have got lots of misclassified images the feedback system
      would not work as planned. This way the feedback system worked just as we
      wanted it to and with a time delay of about 2 seconds everybody that tested the
      system found it easy to understand.
          The frame rate we managed to get is around 10–15 frames per second. This
      means the projection controller gets 10–15 reported coordinates every second.
      This is also the count that controls the feedback delay. An active area requires
      a number of correct coordinates rather than time based interaction. No
      optimizing has been done at this point. This means that if we manage to get a
      the algorithm to work faster and get a frame rate of 20–25 the number of
      coordinates required to activate an active area needs to be increased for the
      time delay to be the same. This has been taken into consideration as every
      active area has a time factor parameter when specifying it in the XML file. The
      most important parameter for changing the frame rate was the resolution of the
      image from the camera, the larger image the slower the algorithm would work.
      Other parameters that affect the frame rate is the size of the template image, the
      size and number of active areas and the precision of the template match.

8   Conclusions and Future Work
    While working on this project we came across problems both expected and
    unexpected. In this chapter we will discuss what could have been done to avoid
    them or why they can not be avoided.
        When starting this project we tried to identify some of the problems we
    could run in to. The problems we identified then were all related to what kind
    of application we would create and how users would interact with it. As it
    turned out, we created a system with static reference images; making the user
    interaction a little easier for the user than if we had gone with a dynamic
    system with motion-detection. The activation of functions could then be
    implemented by setting a time limit on an area and let an indicator show how
    long the user had to hold the finger in the position for the function to become
        The problem of finding an appropriate application to implement has been
    placed on the user. We have created a system where a user easily can create
    slide shows and with some programming knowledge even create their own
    unique slides.
        Another factor to take into consideration when it comes to lighting is not to
    have the projection in an environment with changing lights. Once the reference
    image is taken the system will work best with the same lighting as when that
    image was taken. Once the change in light is too big the difference algorithm
    will interpret the whole image as a difference and the system will not be able to
    function. One way to solve this problem as well as problems of dynamic slides
    is to implement a safe way to take new reference images. The problem with
    taking new reference images during runtime is that we have no control whether
    or not there are objects such as hands and fingers interfering.
        Something we have not taken into consideration during this project is what
    happens if the projection and the view of the camera are not perfectly aligned.
    This is partially because software to solve similar problems already exists and
    we felt that it was not an important feature to add. For future implementations
    this is something that could be added to make the system more flexible, even
    more projection surfaces can be made accessible with this technique.

9   Summary
    When creating our projector-camera system we made several choices and
    observations. We will now briefly go through them to sum things up.
        Our goal when starting this project was that the hardware should not be too
    specific, any projector and camera with a video input respectively output
    should work. After testing the system on different hardware and also in
    different environments – different colored background and lighting – we can
    conclude that this has been achieved.
        We decided to divide the application into two parts, one part controlling the
    projections and one part handles the image processing. We called these two
    parts Projection Controller (PC) and Camera Image Processing (CIP). The CIP
    was written in C++ and acted as a server in the socket communication protocol
    we set up. The PC, written in Java would connect to CIP and initiate contact.
        The big question when implementing a way of identifying user interaction
    was what method to use. Before we could make that decision we needed to
    decide what kind of interaction to use, and for that we chose simple pointing
    gestures. We therefore needed an algorithm that easily could find fingertips.
    We chose a method with a static reference image, the advantages of this
    method is that it is always easy to spot user interaction even if there is little
    movement. The downside is that there is a problem if the projected image
    changes during runtime. This problem was solved by introducing active areas.
    The system only looks for user interaction within certain areas in the image, as
    long as the projection remains the same in this area the interaction detection
    will work. Fitting a proper template image to a specific distance is something
    we had to test our way to accomplish. Now when the system starts all it needs
    to know is how far the camera is from the projection and the correct template
    will load automatically.
        An important aspect when creating a projection-camera system is the lack
    of physical feedback. Since the only part of the system that was visible to the
    user was the projection we needed a visual feedback system. Making every
    clickable area sensitive to interaction and setting a time limit made it possible
    to use figures that would gradually fill up indicating a progress of performing a
    click. These figures would also empty if the interaction stopped or failed. The
    timing of this feedback was the most important aspect of the feedback system
    and we found after some testing that a delay of about 2 seconds was the
    optimal solution.
        Finally we wanted the system to be easy to use by people that want to
    create their own slideshows. To achieve this we implemented an XML parser
    that could create slideshows from simple tags that the user could specify. Users
    with some experience with XML should have no problem creating slideshows
    of their own. We created several slideshows using XML during the testing of
    the system and a simple example prototype was also included in this Master’s
    thesis (see Appendix).
        After trying out the system on different users and in different conditions we
    can conclude that it works well as long as the lighting does not change too
    much during runtime.

Ashdown, Robinson. Experiences Implementing and Using Personal Projected
Displays. 2003.
Procams workshop at ICCV 2003, Nice, France, October 2003.

Bérard, Hardenberg. Bare-Hand Human-Computer Interaction. 2001.
ACM International Conference Proceeding Series.
Proceedings of the 2001 workshop on Perceptive user interfaces
Orlando, Florida.

Bérard, The Magic Table. 2003.
In IEEE workshop on Projector-Camera Systems IEEE/PROCAM' Nice,

Bodda. A computer vision based prototype for human-machine interaction via
pointing gestures. 2003.

Borkowski, Riff, Crowley. Projecting Rectified Images in an Augmented
Environment. 2003.
International Workshop on Projector-Camera Systems, ICCV 2003.

Cipolla, Pentland. Chapter 2 of: Computer Vision for Human-Machine
Published July 1998.

Fails, Olsen, Light Widgets in Every-day Spaces. 2002.
Proceedings of the 7th international conference on Intelligent user interfaces,
January 13–16, 2002, San Francisco, California, USA.

Maggioni, Cristoph. GestureComputer – New Ways of Operating a Computer.
In Proc. International Conference on Automatic Face and Gesture Recognition,
pages 166–171. June 1995.

Pinhanez, Kjeldsen, Levas et al. Interacting with Steerable Projected Displays.
In Proc. of the 5th International Conference on Automatic Face and Gesture
Recognition (FG' 2002. Washington, DC.

Pinhanez, Kjeldsen, Levas et al. Applications of Steerable Projector-Camera
Systems. 2003a
In Proceedings of the IEEE International Workshop on Projector-Camera
Systems at ICCV 2003, Nice Acropolis, Nice, France, October 12 2003. IEEE
Computer Society Press.

Pinhanez, Kjeldsen, Levas et al. Steerable Interfaces for Pervasive Computing
Spaces. 2003b.

First IEEE International Conference on Pervasive Computing and
Communications (PerCom' March 23–26, 2003 Fort Worth, Texas.

Pinhanez. Kjeldsen, et al. Embedding Interactions in a Retail Store
Environment: The Design and Lessons Learned. 2003c.
In Proc. of the Ninth IFIP International Conference on Human-Computer
Interaction (INTERACT' Zurich, Switzerland. September 2003.

Starner, Leibe et al. The perceptive workbench: Computer-vision-based
gesture tracking, object tracking, and 3D reconstruction for augmented desks.
in Machine Vision and Applications. 2003.
Vol. 14, No. 1, pp. 59-71, April 2003.

Virtual Devices Inc. 2004.
Last visited 26 January 2005.

Wong et al. Peg-Free Hand Geometry Recognition Using Hierarchical
Geometry and Shape Matching. 2002.
IAPR Workshop on Machine Vision Applications, Nara, Japan, December,
2002. p. 281–284.

Zhang. Vision-based Interaction with Fingers and Papers. 2003.
In Proc. International Symposium on the CREST Digital Archiving Project, pp.
83–106, May 23–24, 2003, Tokyo, Japan.

Appendix SlideShow.xml
<!--Fields for SlideShow:
  <Width>: required - specifies the width of the projection in pixels.
  <Height>: required - specifies the height of the projection in pixels.
  <StartinSlide>: required - specifies what slide to start the slidewhow with.
  <Template>: required - 1-4. 1 is for high distances, 4 is for when the
camera is closer.

<!--Fields for Slide
  <Background>: optional - background color for the slide, default is white.
  <Name>: required - the name of the slide.
  <TextItem>: optional - adds a textitem to the slide.
  <ImageItem>: optional - adds an imageitem to the slide
  <ActiveItem>: optional - adds an activeitem to the slide.

<!-- TextItem has three fields:
    <X>: required - the x-coordinate of the text.
    <Y>: required - the y-coordinate of the text.
    <Text>: required - the actual text to be printed.
      <Text>Hello Welcome, this is the first slide.</Text>

<!--ImageItem has three fields:
    <X>: required - the x-coordinate of the image.
    <Y>: required - the y-coordinate of the image.
    <Url>: required - the adress to the image.

<!--ActiveItem has nine fields:
    <X>: required - the x-coordinate of the active area.
    <Y>: required - the y-coordinate of the active area.
    <Width>: required - the width of the active area.
    <Height>: required - the height of the active area.
    <IndicatorType>: optional - 1-5 the type of indicator used. Default 1.
    <IndicatorColor>: optional - the color of the indicator. Default blue.
    <Visual>: optional - 1 or 0. the active item is visual or not. default 1.
    <Next>: required - specifies what slide to go to when this area is activated.
    <TimeFactor>: optional - 1-100. the time to activate the item. Default 20.

    <Text>You made it to the second slide.</Text>



<!--Fields for CustomSlide
…<ClassName>: required – the name of the class to instantiate.
  <Name>: required - the name of the slide.
  <Next>: required - specifies what slide to go to when this area is activated.



To top