README txt by Flavio58


									* * This is a C++ implementation of weakly-supervised learning and * matching for visual object recognition, as described in: * * [1] D. Crandall, D. Huttenlocher. "Weakly-supervised Learning of * Part-based Spatial Models for Visual Object Recognition" in * Proc. European Conference on Computer Vision (ECCV), 2006. * * [2] D. Crandall, P. Felzenszwalb, D. Huttenlocher. "Spatial Priors for * Part-based Recognition using Statistical Models" in Proc. Conference * on Computer Vision and Pattern Recognition (CVPR), 2005. * * * Disclaimer: this is research code with probably many bugs. * * Please feel free to contact me if you have any questions or problems: * * David Crandall * * * 1. OVERVIEW ----------The code in this archive implements weakly-supervised learning and object detection/localization using $k$-fan spatial models. By "weakly supervised" we mean that the only information required by the learning algorithm is a set of positive (or foreground) images that contain the object of interest, and a set of negative (or background) images that do not contain the object. Please see the papers listed above for details about $k$-fan models, the $k$-fan matching algorithm, and how our weakly-supervised algorithm works. The learning algorithm is relatively compute intensive, depending on the size and number of training images and other parameters. While it is possible to run the learning procedure on a single workstation, the code is designed to be run on a cluster of nodes communicating via MPI. For the experiments in our paper [1], we used a small cluster of about 15 Pentium III nodes, and training times were about 24-48 hours.

2. COMPILING THE CODE --------------------- This code requires MPICH, a multi-platform implementation of the MPI message-passing standard for parallel applications. Before proceeding, install MPICH1, available at: MPICH only needs to be installed on the machines that you compile and begin jobs on (it does *not* need to be installed on the other nodes of the cluster). - The code also requires the GNU Scientific Library ( ), the corona image I/O library ( ), and the DLib image processing library. For convenience, these are included in the archive. To compile the libraries, do: cd gsl-1.0.5; ./configure; make; cd .. cd corona; ./configure; make; cd .. cd DLib; make; cd .. - Next, edit Makefile, and change the setting for the _TMP_DIRECTORY_ variable to the path of a shared temporary directory that is visible under the same path on *all* compute nodes you plan on using. The temporary directory is used by the parallel processed to exchange data during learning. The directory should have a few gigabytes of free space. You may also need to customize the Makefile for your compute environment. The Makefile assumes Pentium III architecture; modify the compiler flags if you are using another architecture (e.g. AMD). - Compile the code: make - Setup your environment to find the dynamic libraries. An easy way of doing this is: mkdir ~/kfan-libs cp gsl-1.5/.libs/*.so* ~/kfan-libs cp gsl-1.5/cblas/.libs/*.so* ~/kfan-libs cp corona-1.0.2/src/.libs/*.so* ~/kfan-libs setenv LD_LIBRARY_PATH ~/kfan-libs

3. SETUP MPICH -------------If you already have MPICH installed and set up, skip this section. Once MPICH is installed, create a text file called 'machines' with the host names of the compute nodes, one per line. If you have not done so already, set up the compute nodes so that you can ssh to them without

supplying a password (using the RSA key protocol instead). Otherwise you will have to type your login password multiple times every time you run an MPICH job. To do this on linux: (from 'man ssh') The user creates his/her RSA key pair by running ssh-keygen(1). This stores the private key in $HOME/.ssh/identity and the public key in $HOME/.ssh/ in the user's home directory. The user should then copy the to $HOME/.ssh/authorized_keys in his/her home directory on the remote machine (the authorized_keys file corresponds to the conventional $HOME/.rhosts file, and has one key per line, though the lines can be very long). After this, the user can log in without giving the password. RSA authentication is much more secure than rhosts authentication.

4. WEAKLY-SUPERVISED LEARNING ----------------------------Note: This code only supports 0- and 1-fan learning at this time. 4.1 Assemble training datasets -----------------------------The first step is to assemble a set of training images. You need a set of positive images (images that contain the object of interest) and a separate set of negative images (images that do not contain the object). It is generally best if the positive and negative images are otherwise similar; e.g. if many of the positive images are indoor photos, there should be indoor photos in the negative set as well. (Otherwise the algorithm may learn biases in the data that are not related to the object of interest.) It is also best to have as large and diverse a training set as possible. We used 400-800 training images in the results reported in [1], although it is possible to get good results with many fewer (it depends on the amount of variability in the object). Put the training images in a directory, according to the following guidelines: - Images may be in any file format that Corona supports, including PNG, JPEG, GIF, BMP, etc. - Images must be named as 6-digit integers (e.g. 150000.jpg, 330632.png, etc.). - Negative (background) images must begin with a 7 (e.g. 700303.jpg), while positive images may begin with any number. [Please don't ask why. It's an embarrassing hack. :) ]

4.2 Running weakly-supervised learning (the easy version) --------------------------------------------------------The easiest way to run the learning procedure is with the script. This script hides the details of learning and the many user-customizable learning parameters. If you are interested in the details, see section 4.3 instead. To use the script: - review the environment variables at the top of You can customize the training process by editing the values of the variables. In particular, make sure PROC_COUNT (the number of parallel processes started by MPI) is set to something reasonable for your environment. Generally one plus the number of processors in the cluster is a good choice. If running on a single processor, set it to 2. - Then run, ./ image_directory temporary_directory where image_directory is the path to where you put the training images, and temporary_directory is a path that will receive temporary and output files. There must be a *lot* free space in temporary_directory; 20GB should be enough in most cases. When complete, temporary_directory/em_iterations contains models after each iteration of E-M. Type 'cat temporary_directory/em_iterations/*.analysis' to see equal ROC performance on the training image with each iteration of the model. You should choose an iteration with a high equal ROC performance, but before the performance asymptotes in order to prevent choosing an overtrained model. The result of training is a .appear file (containing appearance model parameters) and a .?fan file (containing the spatial model parameters). If temporary_directory is not empty, attempts to restart at the point that the last run exited. Delete the contents of the directory to re-run the entire process. 4.2 Running weakly-supervised learning (the details) ---------------------------------------------------The unsupervised learning process is divided into six phases, each of which takes some input and produces some output. The idea here is that since unsupervised learning is compute intensive, it's good to save the intermediate files so that you can restart learning midway through the process instead of always starting from the beginning. Learning is carried out by the ./learn_p3 program. The basic syntax for this program is './learn_p3 -n', where n is the phase number,

and then parameters and options based on the phase number. You can type './learn_p3' without parameters for fairly detailed usage information. Here's a quick high-level overview of what the phases are: - Phase 1, Patch sampling: randomly sample the training data to generate lots of candidate patches. Input: training image files Output: a "template_spec_file" that contains info about all of the sampled patches. Approx time: < 5 minutes - Phase 2, template correlation: correlate all of the patches from phase 1 to the training data Input: template_spec_file, training image files Output: a directory that contains correlation results for each of the training images. Approx time depends immensely on # of patches, # of training images, patch sizes, load on the cluster, etc. A very rough estimate is about 300 patches per node per hour. - Phase 3, process single patches: analyzes the results of phase 2, considering individual patches Input: files produced by phase 2; Output: a patch_maxlike_coords file that contains the MAP estimate of the location of each patch in each training image, among other things Approx time: 1-2 minutes - Phase 4, process patch pairs: builds little spatial models between all pairs of patches. Input: patch_maxlike_coords file from phase 3 Output: patch_pairs_file Approx time: depends on # of patches and minimum likelihood ratio parameter (-r on the command line). Typically 1-8 hours. - Phase 5, build an initial k-fan model. Input: patch_maxlike_coords file from phase 3 and patch_pairs_file from phase 4. Output: a trained k-fan model, suitable for use with ./match_p3. Approx time: depends on # of patches, typically 5-10 minutes

- Phase 6, EM iteration. Input: initial model from phase 4 Output: new model produced by specified # of EM iterations Approx time: depends on # of iterations, etc. Typically 10-30 minutes.

The script described in section 4.2 calls learn_p3 to do these phases in sequence; you can modify the script to customize parameters. Type './learn_p3' to see the many command line options. The most interesting parameters are probably the number of patches and the number of scales to sample during phase 1. The more scales and more patches you use, the better the learned models, but the higher the computation requirements during training. You can get reasonable results with as few as 1,000 patches for some categories, and typically 10,000-30,000 is high enough to produce quite good models.

5. K-FAN MATCHING (detection and localization) ----------------Note: This code only supports 0- and 1-fan matching at this time. The matching program, match_p3, is designed as a parallel MPI program in order to run large matching experiments efficiently. However it is possible (and quite feasible) to run the program on a single node. To run matching, do: mpirun -machinefile machines -np proc_count ./match_p3 -a appear_model_file -s spatial_model_file -o output_file test_image_directory/* where: - proc_count is the number of parallel processes to use (a good setting is the number of nodes in your cluster, or 2 if running on a single node) - appear_model_file and spatial_model_file are the model files produced by ./learn_p3 from supervised or unsupervised learning. - output_file will receive the results of matching (maximum likelihood object configuration) for each image.

- test_image_directory is the location of the test imagery. The specifications on the image file names are the same as with learning (see section 4.1). The program defaults to fixed-scale matching. For scale-invariant matching, include the -S option to match_p3. There are other command line parameters for customizing the matching experiments; type ./match_p3 without arguments to see them all.

To top