Saturday 26 January 2013

Automatic Clustering of Geo-tagged Images. Part 2: Other Feature Metrics

In the last post I took a look at a simple method for trying to automatically cluster a set of images (of Horseshoe Bend, Glen Canyon, Arizona). Some of those picture were of the river (from different perspectives) and some of other things nearby.

With a goal of trying to separate the two classes, the results were reasonably satisfactory with image histograms as a feature metric, and a little user input. However, it'd be nice to get a more automated/objective way to separate the images into two discrete clusters. Perhaps the feature detection method requires a little more scrutiny?

In this post I look at 5 methods using the functionality of the excellent Mahotas toolbox for python downloaded here:

1) Image moments (I compute the first 5)
2) Haralick texture descriptors which are based on the co-occurrence matrix of the image
3) Zernike Moments
4) Threshold adjacency statistics (TAS)
5) Parameter-free threshold adjacency statistics (PFTAS)

The code is identical to the last post except for the feature extraction loop. What I'm looking for is as many images of the river at either low distances (extreme left in the dendrogram) or very high distances (extreme right in the dendrograms). These clusters have been highlighted below. There are 18 images of the river bend.

In order of success (low to high):

1) Zernike (using a radius of 50 pixels and 8 degrees, but other combinations tried):




 This does a poor job, with no clusters at either end of the dendrogram.


2) Haralick:



There is 1 cluster of 4 images at the start

3) Moments:




There is 1 cluster of 6 images at the start, and 1 cluster of 2 at the end

4) TAS:



There is 1 cluster of 7 images at the start, and 1 cluster of 5 near the end, and both clusters are on the same stem


5) PFTAS:




The cluster at the end contains 16 out of 18 images of the river bend at the end. Plus there are no user-defined parameters. My kind of algorithm. The winner by far!

So there you have it, progress so far! No methods tested so far is perfect. There may or may not be a 'part 3' depending on my progress, or lack of!


Tuesday 22 January 2013

Automatic Clustering of Geo-tagged Images. Part 1: using multi-dimensional histograms

There's a lot of geo-tagged images on the web. Sometimes the image coordinate is slightly wrong, or the scene isn't quite what you expect. It's therefore useful to have a way to automatically download images from a given place (specified by a coordinate) from the web, and automatically classify them according to content/scene.

In this post I describe a pythonic way to:
1) automatically download images based on input coordinate (lat, long)
2) extract a set features from each image
3) classify each image into groups
4) display the results as a dendrogram

This first example, 2) is achieved using a very simple means, namely the image histogram of image values. This doesn't take into account any texture similarities or connected components, etc. Nonetheless, it does a reasonably good job at classifying the images into a number of connected groups, as we shall see.

In subsequent posts I'm going to play with different ways to cluster images based on similarity, so watch this space!

User inputs: give a lat and long of a location, a path where you want the geo-tagged photos, and the number of images to download




First, import the libraries you'll need. Then interrogates the website and downloads the images.



As you can see the resulting images are a mixed bag. There's images of the river bend, the road, the desert, Lake Powell and other random stuff. My task is to automatically classify these images so it's easier to pull out just the images of the river bend




The following compiles a list of these images. The last part is to sort which is not necessary but doing so converts the list into a numpy array which is.  Then clustering of the images is achieved using Jan Erik Solem's rather wonderful book 'Programming Computer Vision with Python'. The examples from which can be downloaded here. Download this one, then this bit of code does the clustering:


The approach taken is to use hierarchical clustering using a simple euclidean distance function. This bit of code does the dendogram plot of the images:



which in my case looks like this (click on the image to see the full extent):
It's a little small, but you can just about see (if you scroll to the right) that it does a good job at clustering images of the river bend which look similar. The single-clustered, or really un-clustered, images to the left are those of the rim, road, side walls, etc which don't look anything like the river bend. 

Next, reorder the image list in terms of similarity distance (which increases in both the x and y directions of the dendrogram above)


Which gives me:
As you can see they are all images of the river bend, except the 2nd from the left on the top row, which is a picture of a shrub. Interestingly, the pattern of a circular shrub surrounded by a strip of sand is visually similar to the horse shoe bend!!

However, we don't want to include it with images of the river, which is why a more sophisticated method that image histogram is required to classify and group similar image ... the subject of a later post.




Sunday 20 January 2013

Alpha Shapes in Python

Alpha shapes include convex and concave hulls. Convex hull algorithms are ten a penny, so what we're really interested in here in the concave hull of an irregularly or otherwise non-convex shaped 2d point cloud, which by all accounts is more difficult.

The function is here:



The above is essentially the same wrapper as posted here, except a different way of reading in the data, and providing the option to specify a probe radius. It uses Ken Clarkson's C-code, instructions on how to compile are here

An example implementation:



The above data generation is translated from the example in a matlab alpha shape function . Then:



Which produces (points are blue dots, alpha shape is red line):
and on a larger data set (with radius=1, this is essentially the convex hull):
and a true concave hull:





Patchwork quilt in Python

Here's a fun python function to make a plot of [x,y] coordinate data as a patchwork quilt with user-defined 'complexity'. It initially segments the data into 'numsegs' number of clusters by a k-means algorithm. It then takes each segment and creates 'numclass' sub-clusters based on the euclidean distance from the centroid of the cluster. Finally, it plots each subcluster a different colour and prints the result as a png file.

inputs:
'data' is a Nx2 numpy array of [x,y] points
'numsegs' and 'numclass' are integer scalars. The greater these numbers the greater the 'complexity' of the output and the longer the processing time



Some example outputs in increasing complexity:

numsegs=10, numclass=5

numsegs=15, numclass=15

numsegs=20, numclass=50



Wednesday 16 January 2013

Compiling Clarkson's Hull on Fedora


A note on compiling Ken Clarkson's hull program for efficiently computing convex and concave hulls (alpha shapes) of point clouds, on newer Fedora

First, follow the fix described here which fixes the pasting errors given if using gcc compiler.

Then, go into hullmain.c and replace the line which says:



at the end of the function declarations (before the first while loop)

Finally, rewrite the makefile so it reads as below It differs significantly from the one provided.



Notice how the CFLAGS have been removed. Compile as root (sudo make)