Digital Matting for Image Processing and Composition

MASARYK UNIVERSITY FACULTY OF INFORMATICS Digital Matting for Image Processing and Composition BACHELOR THESIS Martin Dámek Brno, 2011 2 Stat...

Author: Debra Summers

0 downloads 4 Views 8MB Size

Report

Download PDF

Recommend Documents

Digital image processing

Digital Image Processing

Digital Image Processing for Image Enhancement and Information Extraction

Digital Image Processing

7650. Digital Image Processing

Digital Image Processing

Digital Image Processing: Pendahuluan

Digital Image Processing (DIP)

Digital Image Processing

Digital Image Processing for Pavement Distress Analyses

Digital Image Processing Laboratory: Image Halftoning

Digital Image Processing Chapter 8: Image Compression

Digital Image Processing Laboratory: Image Filtering

Digital Image Analysis. Theory of Image Processing

A Survey on Pre-Processing in Image Matting

Woods, Digital Image Processing, 2ed

Digital Image Processing Chapter 6: Color Image Processing

Digital Image Processing Using LabView

ELEC 5638 Digital Image Processing

Principles of Digital Image Processing

MASARYK UNIVERSITY FACULTY OF INFORMATICS

Digital Matting for Image Processing and Composition BACHELOR THESIS

Martin Dámek

Brno, 2011

2

Statement I declare that this thesis is my original authorial work that I elaborated by myself. All resources, sources and literature, that I used in the preparation or drew from, I quote in the thesis properly, with stating the full reference to the source.

_______________________

3

Acknowledgements I would like to express my thanks to all who helped me and supported me during the writing of this thesis. Namely my supervisor doc. Ing. Jiří Sochor, Csc. for scholarly advice and kind approach. To Mr. Sergey Bochkanov for developing an open source library of matrix operations algorithms ALGLIB that I used. To my friends and my brother for reviewing and testing.

4

Abstract There are several methods for extraction of objects from images. The process is called matting and involves creating a matte (an opacity mask) and separating the image into two layers – foreground and background. The main problem is determining their color tones at the object's boundaries, where they are mixed together in various ratios. A successfully separated foreground layer can then be combined with another background using the matte. This reverse process is called image compositing. One of the matting methods is the Bayesian approach. This method is based on probability distribution computations and approximating the most likely color and opacity values. The algorithm achieves good results even for objects with complex boundaries, such as fur or translucent materials.

5

Keywords matting, digital image, foreground detection, color approximation, probability, likelihood, alpha channel, opacity, transparency, color quantization

6

Table of Contents 1 Introduction........................................................................................................................................8 1.1 What is matting?......................................................................................................................................8 1.2 Digital reprezentation of images..............................................................................................................9 1.3 Problem definition.................................................................................................................................10

2 The Bayesian Approach...................................................................................................................11 2.1 History...................................................................................................................................................11 2.2 Theory of likelihood...............................................................................................................................12 2.3 Problem formulation I............................................................................................................................14 2.3.1 Pixel and its neighborhood.............................................................................................................15 2.3.2 Color quantization and clustering...................................................................................................17 2.3 Problem formulation II...........................................................................................................................18 2.4 Alternating iterative approximation........................................................................................................20 2.5 Algorithmization....................................................................................................................................20

3 Implementation................................................................................................................................22 3.1 Data retrieval and reprezentation...........................................................................................................22 3.1.1 Bitmap...........................................................................................................................................22 3.1.2 Trimap and alpha matte..................................................................................................................22 3.1.3 Cluster............................................................................................................................................22 3.2 Selected functions..................................................................................................................................24

4 Application.......................................................................................................................................25 4.1 Platform.................................................................................................................................................25 4.2 User interface.........................................................................................................................................25 4.3 Testing and results..................................................................................................................................25

5 Conclusion........................................................................................................................................26 References............................................................................................................................................27

7

1 Introduction In the recent decades we are witness to astonishing advancements in the field of information technology. Modern computers combined with complex software can often perform what a laic would easily call magic. I am a fan of magic and I am fond of pictures. What matting methods can do with a picture may indeed seem like magic. In fact it is a mixture of math, namely statistics, and human genius. As my supervisor, doc. Sochor, said: „Do not trust a digital image.“ This work shows why.

1.1

What is matting?

Matting is derived from „matte“, which is a word used in photography and film-making. It used to be a special film frame, parts of which were black and the rest was transparent. When put over a picture frame, this matte would filter out the covered parts (background) and leaving only the foreground. Matte can also mean something glare-less or impermeable. In that sense, image matting can also be understood as covering certain parts of it, so that they are no longer visible. Objects simply cut out of an image in such a way would look unnatural, though. Compositions made after using this technique would be imperfect and anyone would know at the first glance, that the image has been tampered with. Obviously, the boundaries of the object are the problem. If only we could make those objects fit the new background seamlessly. And that is exactly what matting methods are trying to achieve. The key to unlocking the secret of smooth edges lies in opacity. The monochromatic mattes do not recognize opacity. Each point of them is either fully opaque or fully transparent. In nature, many objects have very complex or blurred edges. Ergo, the more degrees of opacity can be utilized, the more natural impression the image makes. Also, when a camera takes a picture, contours of an object become melded into the background a little and its color tones are added to the background colors. To what extent this happens, depends on the light conditions, the depth of field of the objective, focus, and resolution of the camera. What matters is that the colors are still there, only they are overlapping, occupying In spite of the fact that this lost information can never be fully recovered, there are methods for estimating it with a certain probability. Most of these methods still require guidance from the user to outline the foreground and background areas for the algorithm to collect the initial data. Generally it can be said that the more information an algorithm gets from a user, the

8

more precise the result is. With enough care and a reasonably diverse picture, the foreground object can be extracted so well, that the following photo-montage seems very real.

1.2

Digital representation of images

Computers do not comprehend what yellow or blue means, they only „think“ in binary. After all, colours are merely a human perception of various wavelengths of light. We have to define a way of mapping the set of colors on a set of numbers to be able to represent colors digitally. The most common representation of colors in numbers, for the purpose of digital imaging, is the RGB model. The abbreviation RGB stands for Red – Green – Blue. By combining these three colors it is possible to produce a wide range of color tones. For the purposes of computation with colours we define a color as a vector in three-dimensional RGB space. The axes of the RGB space are the intensities of red, green and blue color tones. For example, black color has each of the R, G, B components zero, thus is coded as [0, 0, 0] in the RGB model. Analogically, white color is [1, 1, 1] and the shades of grey are [x, x, x] where 0 < x < 1. Blue color is coded [0, 0, 1] and yellow color is coded [1, 1, 0]. This way each color is represented by three numbers, which can now be processed by a computer. Depending on the precision of the way these three numbers are stored in the computer, various bit-depth of the color space is reached. Commonly used 24-bit RGB coding allows for 8-bits per color channel, each of which can hold 2 8 = 256 integer values in the range . The maximum number of different colors stored is then 2563 = 16 777 216. In order to store the additional information of opacity, we simply add another dimension to the RGB space. Usually this additional channel is called the alpha channel. For α = 0 the color is fully transparent (invisible) and for α = 1 it is fully opaque (visible). A 75% opaque magenta color is then coded [0.75, 1, 0, 1] in the ARGB model.

1.3

Problem definition

The extraction of an object from the background is a reverse process to composition. In composition, we additively mix the colors in order to get the resulting color shade. Composition is simple enough. Provided we have the two separate images including

9

the alpha channel, the colors of the composite image can be computed according to this equation:

C = α F + (1 – α) B

(1)

where C is the resulting color of the composite, F is the foreground color, B is the background color and α is the opacity. The reverse process, extraction, is not that easy, as some information has been lost. At a first glance the only information we have available is the color C. The remaining variables F, B, α are unknown. Such equation alone would be impossible to solve. Luckily enough, when dealing with real world data, in this case photographs, we find that there are certain rules constraining the values of F, B and α and binding them together. For example the spacial conformity – the colors of nearby areas are often the same or a very similar color. These properties of the real world data can be used in our advantage. However, nothing can be taken for granted and as most rules it has exceptions, thus the need for the use of statistics arises. The goal of this thesis is to explore this probabilistic approach to digital matting, implement an algorithm to solve the matting problem and develop an application that allows to test it, as well as use it casually as a graphics software tool.

10

2 The Bayesian Approach As stated before, the Bayesian method works with probabilities and estimations. The problem of solving an equation of three variables, which is impossible, is replaced by estimating the most likely values of said variables.

2.1

History

The method is named after an English mathematician Rev. Thomas Bayes (1702 – 1761) who contributed greatly to the area of statistics and probability.

Figure 1 – Portrait of Thomas Bayes [Bayes1] Thomas Bayes (See Figure 1) was born in London to the family of a Nonconformist minister, where he has also received a private education. Later he was ordinated a minister like his father. Mathematics caught his interests, namely the field of probability and statistics. He devoted his work to it and wrote many papers on joint probability distribution and inference. The most famous paper of his, Essay Towards Solving a Problem in the Doctrine of Chances, was published three years after his death. [Bayes2]

11

What is relevant to this thesis is that he established a mathematical basis for probability inference – the way of calculating the probability of an event from observations of the event's occurrences.

2.2

Theory of likelihood

A natural grasp of probability goes the opposite way. We know the probability of an event, then we are able to calculate the frequency with which the event will occur on average and other secondary characteristics. Bayes formulated a relation between a conditional probability and its inverse which is called the Bayes' theorem.

Bayes' theorem: P(A|B) =

P(B|A) P(A) ──────── P(B)

(2)

Where P(A|B) is the conditional probability of event A occurring given the occurrence of event B and where P(B|A) is its inverse, the conditional probability of B given A. The inverse probability is also called the likelihood. The meaning of the likelihood quantity can be understood as the chance of an event having a certain probability given observed occurrences of it. Formally, a likelihood L is a function of the second argument of the original conditional probability P with its first argument held fixed. [Wiki1]

If

b → P(A|B = b),

then

L(b|A) = P(A|B = b).

Likelihood is basically still a probability, its values can only be in the range . Despite that, they are not the same mathematical quantity. To be able to better imagine the difference between the two, I present an example. Let's say we have an ordinary two-sided coin that can be tossed and with a probability P(H) lands on its „heads“ side or with the probability P(T) = 1 – P(H) lands on its „tails“ side. The probability is in ideal case P(H) = P(T) = 0.5. Let's assume for the example's sake that we do not know this.

12

Now, if we toss the coin once and the observed outcome is tails, we know for certain only that P(T) > 0. You see, we assume that the coin could be crooked. The only assumption we can make is that there is a 25% chance of that coin having P(T) = 0.5. It is not the same though, as saying that event 'TT' occurs with a 25% probability. There is a likelihood function that describes this case:

Figure 2 – Likelihood function plot A [Wiki1] As you can see, after observing 'TT' there is still a chance that P(H) > 0. Only it is not very likely. The most likely value of P(T) is now 1 and the likelihood of P(T) being 1 is L = 1. The least likely value of P(T) is 0, because after observing 'TT' it is likely that the coin favours its „tails“. Now let's say we add one toss of the coin and come up with a „heads“. The altogether observance is now 'TTH' and the likelihood function of P(T) changes as following:

Figure 2 – Likelihood function plot B [Wiki1]

13

The most likely probability of the event „tails“ is now P(T) = 2/3 and its likelihood is L = 0.15. The least likely are the possibilities of P(T) being 0 or 1, and the likelihood of those possibilities is L = 0, because after observing both „heads“ and „tails“, it is not very likely (in fact it is impossible) that either P(H) or P(T) is either 1 or 0. The relation between likelihood and conditional probability proves useful when we need to solve a complex problem such as the estimation of foreground, background and opacity values of the merged areas in the image as described in chapter 1.

2.3

Problem formulation I

The aim is to find for each pixel of the image such values of foreground color F, background color B and opacity α, that the probability of them merging into one fully opaque pixel of the given color, is the highest possible. I follow the technique described in the paper A Bayesian Approach to Digital Matting [Matt]. To formally express the problem at hand, we search for a maximum probability for F, B and α given C

max P(F, B, α | C) F, B, α If we use the Bayes' theorem to split this conditional probability, we get

max P(F, B, α | C) = max P(C | F, B, α) P(F) P(B) P(α) / P(C) F, B, α

F, B, α

We can omit the term P(C) because it is a constant and then transform the equation logarithmically Expressing the result as sum of log-likelihoods we get

max P(F, B, α | C) = max L(C | F, B, α) + L(F) + L(B) + L(α) F, B, α

(3)

F, B, α

where L is the natural logarithm of the likelihood function. It has the same characteristics, but it is easier to work with. In this instance, L(α) is assumed constant as well and is excluded from the maximization. The expression of a complex conditional probability is now simplified and we only need to determine the partial log likelihoods L(C | F, B, α), L(F) and L(B).

14

The term L(C | F, B, α) is the likelihood of C being what is observed, given the values F, B and α. It can be expressed as the difference between the original color C and αF+(1-α)B, which is the color predicted by the estimated values of F, B and α.

L(C | F, B, α) = – ||C – αF – (1 – α)B||2 / σC 2 where σC is the standard deviation of a Gaussian probability distribution with its center at the mean value of C's neighborhood. The terms L(F) and L(B) are addressed in the second part of this chapter.

2.3.1

Pixel and its neighborhood

Before we go any further, we need to define a few notions.It has already been explained how colors are stored in a computer as numbers. In order to store an image that consists of many different colors fixed at certain positions, we need to organize those colors. In computer, an image is represented by points of color, called pixels. These pixels can be understood as little squares filled with color. When organized into a grid, they form together a bitmap, which is the digital form of an image. This is the way most digital devices display pictures. (See Figure 3)

Figure 3 – Pixel, grid of pixels and a bitmap

Each pixel in the image is defined by its coordinates and color (including α). In order to exploit the statistical properties of real world images for the purposes of foreground extraction, a pixel's neighborhood is defined. It is the set of all pixels that are situated in the bitmap equally or less than a certain distance far from the original pixel. Simply put, inside a circlet, the center of which is the original pixel. (See Figure 4)

15

Figure 4 – Foreground and background

Figure 5 – Clusters, their means and

neighborhoods of a pixel [Matt]

eigenvectors in RGB space [Matt]

Furthermore, the neighborhoods contain varieties of pixels, some of them are of more or less similar shades of colours, depending on the complexity of the actual image. In order to estimate the most likely values of F, B and α efficiently, it is convenient to divide the neighborhoods into parts called clusters, that would each contain the pixels of similar color. Each cluster has its mean value, which is the average color value of all its pixels and an eigenvector, which is a unit vector in the direction of the cluster's dominant deviation. (See Figure 5)

2.3.2

Color quantization and clustering

Quantization is the process of reducing a palette of colors so that there are fewer different shades of colors, but which are as close as possible to as many as possible of the original colors. It has been used to create color palettes for display devices that had a limited number of colors they could display simultaneously. It can also be used to cluster the pixels' neighborhoods and thus reduce the number of combinations for solving the likelihood equations. There are several methods of color quantization, one of them uses the binary tree data structure which I am not going to explain here. This and a few other methods are described in detail in [Quant]. In my thesis I use a modified version of the binary tree clustering. The basic principle of quantization is following:

16

1. Let the colors form a set 2. Calculate its statistical properties 3. Split the set in a way that improves those properties 4. Repeat 2-3 for the new sets until the desired statistical properties are reached After the final splitting, these sets constitute the clusters. The pixel's neighborhood is divided into groups of pixels of colors similar to the degree we can choose by setting the target maximum deviation from the mean or a desired maximum number of clusters. By taking the mean color value from each cluster as a representant, we can reduce the number of combinations of the background and foreground colors without loosing much accuracy. Let's take a closer look at the very process of clustering and what are the statistical properties that express the degree of variation in the set of colours. For the purposes of quantization, colours can be understood as vectors in the RGB space. For example black would be a zero vector [0, 0, 0], white would be [1, 1, 1] and a dark blue-green colour would look something like [0, 0.25, 0.25]. When we consider the set of colours as a set of vectors in RGB space, we can calculate the arithmetic mean of the n-th cluster.

qn = 1/N ∑ xi i ∈N

where N is the number of colours (vectors) in the cluster and where x n are the individual colours. Using this value we can calculate the error of each vector (difference from the mean in absolute value) and also the average error vector of the cluster, using the same arithmetic rule as for the mean.

en = 1/N ∑ | xi – qn | i ∈N

This vector points in the direction of the greatest variation of the cluster in RGB space. When normalized (its length set to 1) it serves as the criteria for the splitting step of the clustering algorithm. It is called the eigenvector of the cluster, denoted e n. The splitting itself is performed in the following manner. Each colour vector x i of the i vectors of the n-th cluster is sorted into one of two groups (new clusters) according to which one of these two inequations it satisfied.

en . xi ≤ en . qn

(4)

en . xi > en . qn 17

For the criteria of when to stop the splitting serves the eigenvalue of a cluster:

λn = ∑ [(xi – qn) . en]2 i ∈N

When the eigenvalue of each cluster is equal or less than a desired target eigenvalue, the clustering is complete. Result is a set of clusters that each contain similar colours (similar in the sense that the end points of their colour vectors are closer to each other in the RGB space than to the colours from the other clusters). Eigenvectors and eigenvalues are more commonly known in the context of matrices, here they are used as the statistical properties of clusters, showing us the way how to do the splitting. My modification to the clustering algorithm lies in two details. One is the way of calculating the eigenvectors and eigenvalues and the other is the way of how the actual splitting is done. In [Quant] once a cluster is formed, its covariance matrix is computed and then an eigendecomposition is performed on it. That way the three eigenvectors and three eigenvalues are determined, as it is a 3x3 matrix. Then the largest of eigenvalues and the corresponding eigenvector are assigned to the cluster. In rare cases the decomposition may fail. My way of evaluating the eigenvector is fast, always works and is reliable to determine the direction of cluster's greatest variance as long as there are at least two different pixels in it. In the rare case that all pixels in a cluster share the exact same color, the method returns a zero vector, but that also means it is no longer necessary to split such a cluster no matter how the eigenvalue criteria is set. Let me point out that the formula for calculating the eigenvector results in it being „mirrored“ into the first quadrant of the RGB space due to the absolute value. That is remedied by counting the number of times the error vectors' coordinates are negative, separately for each axis, and then switching the summed vector accordingly. The algorithm is still linear. The eigenvalue is evaluated the same as in [Quant], but experimentally I have found out, that better results are achieved if the volume of a cluster is taken into consideration. That can be achieved by either dividing the eigenvalue by N or simply by not normalizing the eigenvector. Since the latter means less computing (especially dismissing a square root operation) that is what I opted for. The second difference is the order of splitting the clusters. Instead of creating an ever more disjoint binary tree, I am reducing the first cluster by repeated splitting until the desired eigenvalue is reached, while putting all the other „halves“ into the second cluster. Then the process is repeated for all the next clusters. To visualize the difference, it is basically like instead of cutting a pie in a checkered pattern, cutting off the edges around the center down to a certain size. That center piece represents the cluster currently being reduced, incidentally it often converges right at the densest „swarm“.

18

By repeating the process until the last cluster meets the eigenvalue condition I can cluster the neighborhood more effectively for the purposes of the matting algorithm. The distribution of color tones in the clusters is smoother and on average there are less clusters altogether (which is good) for a given eigenvalue ceiling. Another thing worth mentioning is that since I don't need the covariance matrices for clustering, but only later, for the actual pixel approximation, the number of covariance matrix computations is reduced significantly. I believe these subtle differences exist and the reason my adjustments produced better results is that [Quant] presents a universal algorithm that was originally intended to quantize color palettes for display devices with a limited number of simultaneously displayable colors. Also, in the matting instance, it is possible to utilize spacial weighting of the pixels, thus providing extra information that helps to identify the optimal „splitting planes“. (Technically, when split, the clusters are divided by a plane in the RGB space that is perpendicular to the eigenvector and going through the cluster mean.)

2.3

Problem formulation II

Now that we have defined the statistical properties of neighbourhoods (resp. clusters) such as mean, eigenvalue and eigenvector, we can express the remaining terms in equation (3). In order to determine more accurate mean colour, pixels are weighted by their opacity and their distance from the currently estimated pixel. In the weighting function, the distance d figures as a variable of a gaussian probability distribution centered at the cluster mean. The gaussian component of the weighting function therefore is this: ___

– d2/2σ2

gi = (1/σ√2π ) . e

where σ = 8 (the width parameter of the gaussian) The complete weighting function then is:

w(xi) = αi2 gi

for the foreground and

w(xi) = (1– αi)2 gi

for the background.

19

With this weighting function we define the total weight of a cluster:

W = ∑ wi i ∈N

and the cluster weighted mean:

Next we define the cluster covariance matrix:

Finally, the last two terms of the equation (3) are expressed as following:

(Analogically for the background likelihood.) We have defined all terms needed for calculating the overall probability P(F, B, α | C).

20

2.4

Alternating iterative approximation

The next step is to actually calculate the likelihoods of each pair of colors being the foreground and background and choosing the most likely one. In [Matt] the values of F, B and α are defined as the solutions of the set of equations:

(5) (6)

To solve for the optimal solution, we commit these two equations to an iterative approximation method. It is a numerical optimization method that is going to be performed by the computer. We first estimate α as the mean α value of the neighborhood and compute the first approximation of F and B from it using equation (5). With these values we then solve equation (6), compute the second approximation of α and so on. The values of F, B and α should converge to the optimal solution. Like with other numerical methods, when the desired precision is reached, the alternating iteration stops and the current values F, B, α are considered the results.

21

2.5

Algorithmization

For a practical use, all of the above needs to be translated into a language computers understand, so that it can do the tremendous amount of actual computation for us. Here follows the algorithm for Bayesian matting in a pseudo code. [Rubin] 

input: image C and trimap M



initialize alpha matte α, foreground F and background B according to M



for all unknown pixels do ➢ cluster the F and B neighbourhoods ➢ for all pairs of F and B clusters do –

solve equations for F, B, α using alternating iteration

–

calculate likelihood L(C) + L(F) + L(B)

➢ assign the most likely values to F, B, α 

output: F, B, α

The clustering algorithm in detail: 

input: neighborhood with one cluster



calculate the eigenvalue of the first cluster



for the list of all do ➢ while the cluster's eigenvalue > max egienvalue



–

add a new cluster at the end of the list

–

split the current cluster according to the set of inequations (4)

–

move the „outer half“ of pixels into the last cluster

–

calculate the eigenvalue of the current cluster

output: neighborhood with n clusters, less than max eigenvalue each As the algorithm progresses, the processed pixels are added to the initial data, so that the remaining pixels can be approximated more precisely. That means the result depends on the order in which the pixels are processed. Here follows the pathing algorithm that goes around the unknown area:

22



input: trimap



while there are any unknown pixels in the trimap ➢ process all foreground-adjacent unknown pixels ➢ process all background-adjacent unknown pixels ➢ mark all the processed pixels as known in trimap The algorithm also dynamically adjusts the neighborhood radius as it goes, according to the minimum required number of pixels in a neighborhood and size of the nearby unknown area, in order to achieve better precision. In rare cases, some pixels produce a non-invertible covariance matrix, or for some other reason the optimization algorithm does not converge. When that occurs, the pixel is left unknown, skipped in this loop and hopefully gets successfully processed in the next loop. More frequently this situation arises whenever a neighborhood contains less non-zero weighted pixels than is a required minimum. This condition ensures the safe amount of data for color estimation.

23

3 Application 3.1

Platform and portability

I have decided to implement the algorithm in the C# language for its advantages and previous experience. The application was developed using the Microsoft Visual Studio 2010 development interface. The program runs on any machine with .NET framework version 3.5 or later. An Open Source library ALGLIB in the project which is redistributable under a GPL compatible license.

3.2

Data retrieval and representation

The source image data can be loaded from a file. Supported file formats are BMP, GIF, JPEG and PNG. Regardless of the source files, the image data is extracted and stored in bitmap objects. The information about foreground and background areas is provided by the user in the form of a trimap. Internally, the program converts the bitmaps into 2D arrays of vectors for faster processing. The trimap is a two-dimensional array of bytes. Values for foreground, background, unknown are defined as constants. It is a data structure according to which the algorithm is guided, in the sense of deciding which pixels are to be estimated and in what order. The alpha matte is also represented by a 2D array of floating point values. All alpha, red, green and blue values are in range internally, and are transformed from and to integral values in range in image files. Neighbourhoods are implemented as lists of clusters. Each cluster then is implemented as a list of pixels. This hierarchical structure allows to call functions and access the data in a simple manner.

24

3.3

User interface

The application was designed to be minimalistic and easy to use. The purpose is to test the Bayesian matting algorithm on real data. It can of course be used for recreational purposes as well. Here is a quick overview of the user interface:

The common use case scenario is following: 1. Open an image file. 2. Load a prepared trimap form a file. Alternatively you can create one*, but it is recommended to use a better graphic editor to do it more comfortably. 3. Generate the alpha matte. Please be advised that the process can take several minutes, depending on the image size and contour complexity. 4. Select a new background image or a solid color to make a composite. * To create a trimap, first make sure that the „Trimap → Draw outline“ option is selected. Outline the contours of the object you wish to extract using mouse directly into the picture box. What you draw is directly displayed as a transparent overlay. The contour is the unknown area to be estimated. It is drawn in gray color. When you're done, select „Trimap → Mark foreground“ option and click on the inside of the object. As long as the contour is properly closed up, the foreground and background areas should fill out with white and black colors after a while, and your custom trimap is ready to use.

25

The layers are displayed upon clicking on the corresponding button. The icon beside the drop-down menu buttons indicates which layer is currently displayed to help you navigate. If you wish to select another option without switching the displayed layer, use the right side of the buttons marked by a drop-down triangle. You can save any layer in a file whenever you wish to by selecting the Save option from the corresponding drop-down menu. The trimap and the background must be the same size as the image and the application does not support zooming or positioning of the layers (yet). I apologize for the inconvenience.

3.4

Testing and results

Please see the appendix for the results on the testing samples from [Matt] and [Rubin]. Theirs are available to compare on the web sites cited in the references. You can find comparisons with other matting algorithms there, too.

26

5 Conclusion The algorithm performed well in all but the most complicated cases. Even non-corporeal objects like smoke is extracted with satisfactory precision. (See examples) Although the color approximation itself is clearly defined by (5) and (6), the final shape of the matte depends on several parameters, such as the size and shape of neighborhoods, precision threshold for clustering and likelihood convergence, the order of processing pixels and handling of the occasionally occurring degenerate cases. In the demonstration application I have tuned the parameters into a compromise between speed and precision, based on numerous live testing. It is possible to change any of them in the source code, should you want to tweak the performance, but some unwisely chosen values may cause instability, freezing or extremely slow processing. As one of the possible future improvements, I would like to add a user interface for these settings, safety checks and more ways to dynamically adjust the parameters „on the run“ based on the data.

27

References [Bayes1]

The Institute of Mathematical Statistics. The Reverend Thomas Bayes, F.R.S. 1701?-1761. The IMS Bulletin, Vol. 17 (1988); January/February 1988.

[Bayes2]

University of Minnesota Morris. Thomas Bayes [Internet]. The UMN Web page; 1998 [cited 2010 May 22]. Available from: http://www.morris.umn.edu/~sungurea/introstat/history/w98/Bayes.html.

[Kub]

Ondřej Kubíček. Metody pro detekci obrazového pozadí [Bachelor Thesis]. Faculty of Informatics, Masaryk University, Botanická 68a, Brno. Spring 2007.

[Matt]

Yung-Yu Chuang, Brian Curless, David H. Salesin, and Richard Szeliski. A Bayesian Approach to Digital Matting. In Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR 2001), Vol. II, 264-271, December 2001.

[Quant]

Michael T. Orchard and Charles A. Bouman. Color Quantization of Images. In IEEE Transactions on Signal Processing, 39(12):2677–2690, December 1991.

[Rubin]

Michael Rubinstein. Bayesian Matting. Computational Photography '09, The Interdisciplinary Center Hertzelia, 2009.

[Wiki1]

Wikipedia contributors. Likelihood function [Internet]. Wikipedia, The Free Encyclopedia; 2010 May 16, 14:51 UTC [cited 2010 May 22]. Available from: http://en.wikipedia.org/w/index.php?title=Likelihood_function&oldid=362437286.

[Wiki2]

Wikipedia contributors. Gaussian function [Internet]. Wikipedia, The Free Encyclopedia; 2010 Apr 27, 07:04 UTC [cited 2010 May 22]. Available from: http://en.wikipedia.org/w/index.php?title=Gaussian_function&oldid=358596702.

28

Appendix Original

Alpha matte

29

Composite

Original

Alpha matte

30

Composite

Original

Composite and alpha detail

31