Image Blending

Introduction

In this project, I first implemented a simple unsharp mask filter to create a sharpening effect on an input image. I then manipulated low and high frequency detail in conjunction with Gaussian blurring to create hybrid images where one image is visible when viewed from up close and another image is visible when viewed from afar.

Exploring the frequency space further, I implemented Gaussian and Laplacian stacks which allows the viewing of images with certain frequency bands filtered out. And finally, using my newfound knowledge of Gaussian and Laplacian stacks, I created an image blender which, given two images and a mask, is able to blend the images together. An example of image blending is below:

Part 1: Unsharp Mask Filtering

In this part, I used the unsharp mask filter that we learned about in lecture to sharpen an image. The unsharp mask filter itself is created via the following steps:

Images from left to right:
[Original Image, Blurred Image, High Frequency Image, Sharpened Image (alpha = 1)]

Apply a Gaussian filter to the image to blur it, exposing the low frequencies of the image.
Subtract the blurred image from the original image to remove the low frequencies of the image, leaving only the high frequencies.
Scale the resulting image by a value alpha that determines how "sharp" the image will be.
Add the resulting image to the original image to accentuate the high frequencies of the original image.

Note: I clipped the images so that I didn't have out of bounds values.

Below is a set of images of my dog, Pebble. I used a 30 x 30 window for my Gaussian filter with a sigma value of 10, following the general rule of thumb of making your window width and height 3 * sigma.

original

alpha = 0.5

alpha = 1

alpha = 5

alpha = 10

alpha = 50

alpha = 100

I ran the algorithm a few times with different alpha values and you can see that after a certain point, alpha becomes too large and the images look unrealistic rather than sharp. The difference at 0.5 is barely noticeable while the difference at 1 seems sharpen the image well. However an alpha value of 5 or higher results in Pebble's fur being too white and contrasted resulting in a more "chiseled" effect as alpha increases.

Interestingly enough, the high frequency image seems to have picked up on the minute shadow on the door behind Pebble as that texture also becomes accentuated as alpha increases.

Part 2: Hybrid Images

Next, I was tasked to create hybrid images, that is, given 2 input images, create an image that seemingly changes appearance as a function of viewing distance. We accomplish this by high pass filtering (HPF) one image, low pass filtering (LPF) the other image, and then adding the images together. I accomplished the HPF in the same manner as part 0, taking the difference between the original image and a Gaussian blurred version of the image, and the LPF is accomplished just by applying a Gaussian to the original image.

Additionally, the images were always aligned, then gray scaled, then filtered, then combined. The alignment code (which was given to us) uses 2 user-defined points on the images to align them together. Naturally, I chose images with forward facing portraits and subjects with two visible eyes. I used the eyes in every pair of photos as the alignment points.

I found that a sigma value of 7 on the LPF and 15 on the HPF worked the best. I followed the 3 * sigma rule of thumb and used a 21 x 21 window on the LPF and a 45 x 45 window on the HPF. Below is my favorite result:

Try walking away from your computer and viewing from a distance!
(Ed Sheeran should disappear and the husky should become more visible.)

Hybrid photo of Ed Sheeran and a husky

Here are some pictures detailing the process of creating the hybrid image:

Original Images and their Fourier Transforms (FT)

Original Ed

Ed's FT

Original Husky

Husky's FT

Filtered Images and their FT's

HPF'd Ed

Filtered Ed's FT
(As expected, has become brighter due to the HPF.)

LPF'd Husky

Filtered Husky's FT
(As expected, has become darker as the high frequencies are filtered out.)

Result

Resulting Hybrid Image

Hybrid FT

Here are few more examples that I ran:

"The professor of my favorite 61-series class and the 16th President of the United States."

High: CS61C Professor Dan Garcia

Low: President Abraham Lincoln

Result: Dabe Garlincoln

"My favorite actor and my favorite friend."

High: Actor Jeff Goldblum

Low: My Dog, Pebble

Result: Jebble Goldblubble

"My boss' boss' boss' boss' boss over the summer and his former boss."

High: Apple Founder Steve Jobs

Low: Current Apple CEO Tim Cook

Result: Stim Jooks

Here's one that didn't turn out very well. It's hard to discern Ghandi's features at a distance and I think it is primarily due to Obama's face being larger than Ghandi's. Additionally, Ghandi doesn't have pointy ears, or a thick beard, or anything that stands out when his image is blurred so it's difficult to discern his figure.

High: Current President Barack Obama

Low: Civil Rights Leader Mahatma Gandhi

Result: Ghandbama

Part 2: Bells and Whistles

I attempted to see if color could enhance the effect of the hybrid images, that is, if color in one or both of the images would lead to the figures being more discernable in the resulting image. Here are some examples:

Uncolored Result

HPF Colored (Ed)

LPF Colored (Dog)

Both Colored

Uncolored Result

HPF Colored (Jeff)

LPF Colored (Dog)

Both Colored

It seems as though coloring the LPF image or coloring both the LPF and HPF images has the best results. This can be explained in that the HPF image is a difference between the original image and a Gaussian blurred image. Because the HPF essentially detects edges in the photo, adding color to this image simply colors the edges of the image as any continuous patches of color will be removed by the subtraction.

Making the HPF image's edges more discernable may help when viewing the image close up, but it actually obstructs the view of the LPF image when moving away from the image. On the other hand, coloring the LPF makes the image more discernable at long viewing distances without compromising the viewability of the HPF image at close viewing distances. Coloring both images seems to have similar results because coloring the blurred image has much larger effect on the eye than coloring the HPF image which only colors the edges of the HPF image.

This led me to try color on my failed example. I found that coloring the HPF image (Obama) had very little effect on discerning Ghandi at a distance (compared to not coloring either image). However, coloring both images and coloring only Ghandi did seem to help show his figure better at a distance:

Uncolored Result

HPF Colored (Obama)

LPF Colored (Ghandi)

Both Colored

Part 3: Gaussian and Laplacian Stacks

The next objective of the project was to use Gaussian and Laplacian stacks in order to show images with various frequencies filtered out. The Gaussian stack involved repeatedly applying a Gaussian filter to an input image with increasing sigma value. I used a stack of depth 5 and an initial sigma value of 1. I then doubled the sigma at each level of the stack and adjusted my window accordingly following the 3 * sigma rule.

For the Laplacian stack, I also used a depth of 5 and an initial sigma value of 1. It was built by taking the difference between the Gaussian blurred image of the current level and the Gaussian blurred image of the previous level. Of course I needed to reach depth 2 (assuming index starting at 1) of the Gaussian stack to get the first level of my Laplacian stack. So for this part of the project, I added an extra level of Gaussian blurring so I would have equal depth Gaussian and Laplacian stacks.

I additionally grayscaled the input image before building the stacks and for the Laplacian stack, I ran matplotlib.pyplot's imshow function with a clim argument of (0, 0.1). This is because the Laplacian stack images could contain negative or very small values so I adjusted them so the values would be more visible. Below are my results:

Note: Click on the photos to enlarge them!

Original "Gala Contemplating the Mediterranean Sea at Twenty Meters which at Twenty Meters Becomes the Portrait of Abraham Lincoln"

Result of Gaussian and Laplacian stacks at sigma = [1, 2, 4, 8, 16]
(Last entry of Laplacian stack is Gaussian at level 6, sigma = 32)

Original Mona Lisa

Result of Gaussian and Laplacian stacks at sigma = [1, 2, 4, 8, 16]
(Last entry of Laplacian stack is Gaussian at level 6, sigma = 32)

Original Ed Sheeran with Husky Hybrid Image

Result of Gaussian and Laplacian stacks at sigma = [1, 2, 4, 8, 16]
(Last entry of Laplacian stack is Gaussian at level 6, sigma = 32)

The Gaussian stack shows a progression of removing high frequencies which is why Abraham Lincoln and the Husky become more visible as sigma increases. For the Mona Lisa, Mona's smile seems to be less prevalent when sigma = 1, 2 but more prevalent when sigma = 4. The Laplacian stack shows the image at frequency slices because each level is a difference between two Gaussian stack levels. Thus the behavior is a little more unpredictable.

For the Mona, the third Laplacian stack level has the most prevalent smile showing that Mona's smile becomes the most obvious in that frequency range. In the first and last cases, however, the HPF images (Gala and Ed Sheeran) seem to begin as barely visible, become more visible in the middle of stack, and then become indiscernible at the end of the stack.

This seems to imply that the optimal viewing distance for the HPF images is actually not as close as possible, but a more moderate distance from the image.

Part 4: Image Blending

The final objective for this project was image blending. Essentially, we apply a mask to a pair of photos and blend one photo into the other using the mask. We do this via the following steps:

Using the code from part 2, we create gaussian and Laplacian stacks for our two images as well as our mask (though we do not use the generated Laplacian for the mask).
We then weight each entry of the Laplacian stacks by the corresponding entry in the mask's gaussian stack and sum them, effectively blending the two images together:

L_r(i) = G_m(i) * L_a(i) + (1 - G_m(i)) * L_b(i)

L_r(i) is the resulting Laplacian stack
G_m(i) is the Gaussian of the mask
L_a(i) is the Laplacian of the first image
L_b(i) is the Laplacian of the second image
i is the entry number of the stacks
This altogether creates a Laplacian stack for the output image.
We then calculate the Gaussian of the (n + 1)th level of the output image's Gaussian stack:

G_r(n + 1) = G_m(n + 1) * G_a(n + 1) + (1 - G_m(n + 1)) * G_b(n + 1)

G_r(n + 1) is the output image's Gaussian
G_m(n + 1) is the mask's Gaussian
G_a(n + 1) is the first image's Gaussian
G_b(n + 1) is the second image's Gaussian
n is the depth of our Laplacian stacks
Finally, we sum the result of the previous step with all the entries in the output image's Laplacian stack which we calculated in step 2. This recreates the image using the Laplacians and final Gaussian.

The sum at the end of the procedure makes sense because the Laplacian stack levels are differences between Gaussian levels. We need the final Gaussian because we need a base Gaussian which covers the rest of the infinite Laplacian stack levels.

Like the previous part, I used a stack depth of 5 and a starting sigma of 1 with a doubling of sigma at every level of the stack. I did run into a small issue where my blending wasn't performing very well, which was more visible on certain images than others. I took the advice of our GSI, Tinghui, to start the sigma value of my mask's Gaussian stack higher. I chose to use 3 to enable blurrier masks and, subsequently, more blending in the resulting image. Below is my favorite result (not so much because it blended the best but moreso because I found it the funniest out of the images I blended):

The Original Images

My good friends: Tanay and Obama
(Photos were aligned in GIMP)

The Avengers Poster

Mask
(Also created in GIMP)

Gaussian stack of the images and mask:

Note: the input images have 6 images in each stack to support Laplacian stacks of depth 5 as well as to determine the output image's Gaussian at depth 6

Tanay and Obama

Avengers

Mask

Laplacian stacks of the images:

Note: the images were adjusted using the clim argument (set to (0, 0.1)) in matplotlib.pyplot's imshow function

Tanay and Obama

Avengers

Laplacian stacks of images weighted by the Gaussian of the mask:

Note: the last image in each row is the level-6 Gaussian of the image weighted by the level-6 Gaussian of the mask

Tanay and Obama

Avengers

Sum of the weighted Laplacian stacks level-wise:

Note: the last image is the sum of the weighted depth-6 Gaussians

And finally, after summing the entries in the previous photo, the result!

The Avengers with Nobama Fury and Captain Tamerica

Below are some additional photos that came out a lot better than my favorite (except the last one). I created all the masks in GIMP and aligned the images so that they would be in the proper positions prior to blending.

Bird Looking Up

Bird Looking Down

Mask

Output
(Due to the similarities in the backgrounds of the images, the birds blended quite well together.)

Flowers

Space

Mask

Output
(This one did well because the blend happened along the seam of the horizon making imperfections of the blend harder to distinguish)

Statue of Liberty

Fireworks

Mask

Output
(Nice blend around the statue but lesser performace on the horizon primarily because the horizon is already not very well contrasted against the night sky so introducing and blending the bright fireworks looks awkward)

Frog

Flower

Mask

Output
(Decent blend around the edges of the flower but the stem area sticks out due to the color differences between the leaf and flower stem)

I learned very quickly that your choices of input images matter a lot with this form of blending. Laplacian stack blending blindly blends images without considering differences within the image as well as the differences between the images at the mask border. This is a problem that is addressed in gradient blending but for this part of the project, it was best to choose images with matching colored and similar backgrounds and or where the mask followed a seam inherent in the images. Images with very strong lighting and reflections from light sources seem to also cause some confusion in our brains if the lighting between the two images is not consistent.

This is why the birds worked so beautifully, having been shot with a large aperture, as the backgrounds were greatly blurred and the colors matched up relatively well. The night sky of the statue photo as well as the black background of the fireworks similarly worked well to hide differences between the photos when blending.

A Rock

Mt. Rushmore

Mask

Output
(Not very good. This is primarily due to the lighting differences between the rock's photo and Mt. Rushmore as well as the sharp edges in the Rushmore rock formations looking awkward when blended into the rock. This might be blended better using gradient blending in the next project!)

Part 4: Bells and Whistles

I added color to all my output pictures which effectively tripled the computation time as I had to perform 3 convolve2d() calls when applying the Gaussian filter (1 for each of the R, G, and B channels) as opposed to 1 call when working in grayscale. For certain output images, the blending looked worse. The variable success was expected because adding complexity to the image means your blend has more chance to be imperfect and any imperfections in grayscale could be more apparent in RGB. Here are my favorite colored images:

My favorite birds in color
(The photos were stunning to begin with so I take no credit for the color enhancing the overall effect)

Just a view of my backyard
(Again the photos had some amazing colors to begin with which really contrast nicely when combined)

These weren't supposed to all go off at the same time
(The "Everyone has a photo of the Statue of Liberty, let's add some pizzazz~" photo)

Life's better when you have a friend
(Frog looked lonely in the rain so I added a flower friend to keep it company)

Reflection

The projects for this class keep getting cooler and cooler in my opinion. While working on this project, I learned a lot about how the frequency domain works in images. Prior to this, I didn't really view photos in terms of frequencies but now I understand the importance of the frequency domain in manipulating photos.