COMPSCI 180 Project 4: Stitching Photo Mosaics

Kaitlyn Chen

Introduction:

In this project, we will stitch images together, forming mosaics through warping. Warping also allows us to rectify images to view them in different perspectives. The second half of the project aims to create a way to automatically stitch the images into a mosaic by autodetecting corners, following “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al

The coolest thing I learned is how corners can we automatically detected using a simple ANMS technique, and how feature points can be matched “automatically”

Shooting the Pictures:

I shot these two images of my apartment on my iPhone. I tried overlapping 40% to 70% of the images to make registration easier. I also tried to fix the center of projection and only rotated the camera between the two images.

Recovering Homographies:

We first need to recover the parameters of the homography transformation between each image. H is a 3x3 matrix with 8 degrees of freedom, and one scaling factor. The linear algebra shown is the basis to the computeH function that takes in image 1’s points and image 2’s points as parameters. Since we want an overdetermined system, avoiding noise and instability, we can use least squares to solve for the system of equations. This returns a vector (h) of the 8 parameters that we reshape into a 3x3 matrix H.

\mathbf{p}' = H \mathbf{p}

\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix}

ax + by + c = wx'\\ dx + ey + f = wy'\\ gx + hy + 1 = w

=>

ax + by + c = (gx + hy + 1)x' \\ dx + ey + f = (gx + hy + 1)y'

=>

ax + by + c - gxx' - hyx' = x' \\ dx + ey + f - gxy' - hyy' = y'

Ah = b

\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -x x' & -y x' \\ 0 & 0 & 0 & x & y & 1 & -x y' & -y y' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f\\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix}

Warp the Images:

Now that we have the homography matrix H, we can warp an image towards a target image. In this section we write the warpImage function that takes in the input image and homography matrix H. I chose to use the technique of inverse warping instead of forward warping. I first created a bounding box for the output image by warping the four corners by H, normalizing them, then using the minimum and maximum x and y values as the bounding box. Next I can map the pixels in the output back to the pixels in the input image by applying the inverse homography matrix to these output pixels. Lastly for each color channel, values of the input image are interpolated using the map_coordinates function. Within this function I also chose to output an alpha mask to indicate where valid points were.

Rectifying Images:

As a sanity check to warping , we rectify images of known rectangular objects and make them rectangular. I first took images of a book and pizza box from an angle. I chose correspondence points on the corners of the objects (red points) and the target/rectangular points of the objects (blue points). I computed a homography matrix H, to translate the red points to the blue, then called my warpImage function on the image. The image of the pizza shrunk quite small to the bottom left corner, I believe due to the more significant translation over a large area of the overall image.

Blending Images into a Mosaic:

For the mosaics I created homography matrices that map an image2 to an image1, and I warped image2 such that it aligns with image1’s “perspective”. In attempts to create smooth transitions between the overlapping portions of the image, I created alpha masks that manually sets the transparency for those pixels. For a smoother blending I could have applied Gaussian smoothing or Laplacian Stacks. To blend the images I compute a weighted average of the images based on their alpha masks.

Detecting Corner Features with Harris Interest Point Detector:

I used a single scale Harris detector to find the corner features of the images. The Harris detector was implemented and provided by CS180 staff. To the right are the harris corners overlaid on the images.

Adaptive Non-Maximal Suppression (ANMS):

As we can see from above, the Harris Interest Point Detector over produces the images’ corners. This creates unnecessary redundancy and is computationally expensive to process. The ANMS algorithm helps resolve this, allowing us to choose the number of interest points wanted (maximum of 200 for my images) and select interest points that are spatially well-distributed.
For each the interest point, it finds the set of distances to the nearest other interest points within a certain radius, as calculated with the equation below. The points within this radius with weaker corner strengths are suppressed, otherwise retained. I used value c_robust = 0.9

r_i = \min_{j} \left| x_i - x_j \right| \quad \forall x_j : h(x_i) < c_{\text{robust}} \cdot h(x_j)

Feature Descriptor Extraction:

Now that we have our reduced corner points of interest, we need to extract features descriptors of each point. To do so we extract axis-aligned 8x8 patches, sampled from 40x40 patches. A low pass filter (Gaussian filter) is first applied to the 40x40 patch, to avoid aliasing, before resizing to 8x8 patch. After sampling the features are also normalized.

Feature Matching:

We need to now connect feature points between the two images. For each point feature descriptor of the first image we calculate the euclidean distance to all feature descriptors of image 2. The two nearest neighbors are found, and if the ratio between them is lower than some threshold (Lowe’s trick), then the match is confirmed. Lowe’s trick leverages the idea that the closest nearest neighbor should be a better match than the second. I used a threshold of 0.8 for my images.

Using 4-point RANSAC to Compute a Robust Homography Estimate:

Lowe’s trick helps eliminate outliers, but not all. To increase the robustness of the homography, which uses least squares, we use RANSAC algorithm. The algorithm iterates a specified number of times (2000 for my images). At each iteration it randomly selects 4 matched points and these two points are used to computed a homography. The points in the first image are projected onto the second using this homography. If the euclidean distance between the projected and actual point in image 2 is below some threshold than the matched points is considered valid/inlier. The homography that produces the most inliers is used and recalculated using the chosen set of inliers.

Warp and Blend:

Warped and blended images as before, but using homography computed with RANSAC algorithm.

Kitchen Blended Mosaic - Manually Stitched

Kitchen Blended Mosaic - Automatically Stitched

More Examples:

Example 1:

Example 2:

Note the original image 1 has an object in the way on the bottom right, that is not captured from the same angle in image 2, so the oject was retained in the blending