Efficient similar images research

Efficient similar images research TER 2012 Anthony Biga, Iliasse Hassala, Amine Oueslati and Paraita Wohler UNSA UFR Sciences - Master IFI/MBDS 2012...
5 downloads 1 Views 3MB Size
Efficient similar images research TER 2012

Anthony Biga, Iliasse Hassala, Amine Oueslati and Paraita Wohler UNSA UFR Sciences - Master IFI/MBDS

2012 June 04th

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

1 / 35

How to efficiently find similar images ?

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

2 / 35

Presentation

Used in the industry : I I

Picasa (Google) iPhoto/Aperture (Apple)

Problems : I I I

CPU intensive memory consuming doesn’t seem to scale well

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

3 / 35

Presentation

Finding similar images can be separated in 2 stages :

Extraction of images features For each image we extract interest points, which represents caracteristic parts of the image. For every interest point, we get their features which are vectors.(SURF algorithm).

Similarity search We compare every image’s features with each others, using the K-Nearest Neighbors algorithm to determine similarities.

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

4 / 35

Workflow

features extraction

every similarity pair

images set

AB, IH, AO and PW (Master IFI/MBDS)

similarity search

Efficient similar images research

2012 June 04th

5 / 35

1

SURF and K-NN SURF K-NN

2

Our workflow SURF C++/OpenCL Java

K-NN Java threads pool Hadoop map-reduce

Hadoop Map-Reduce 3

Benchmarks and results

4

Conclusion

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

6 / 35

Speeded Up Robust Features (SURF)

Used for : Camera calibration 3D reconstruction object recognition discrete images correspondences Basically 6 sequential steps : 1 2 3 4 5 6

compute integral image calculate hessian determinant apply gaussian filters select the best interest points compute orientation normalize vectors

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

7 / 35

Example

False positive :

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

8 / 35

Example

Good match :

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

9 / 35

K-Nearest Neighbor The Knn search is a problem found in many domains such as Data compression, DNA sequencing, image retrieval etc. The problem : I I I I

Set of n elements in a d-dimensional space E q ∈ E. Similarity function ∆ k smaller than n.

knn algorithm Apply ∆ to the n elements and return the k most similar elements to q.

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

10 / 35

K-Nearest Neighbor Custom version of the k-nn for comparing 2 images img1 and img2 : I

I

I

We choose a similarity function which return true if the euclidean distance between two points is ≤ ε. Then the two points are similar. For each point of img1, we compare it with each point of img2 till the similarity function return true. Finally, img1 is similar to img2 if we find k or more similar points.

In this case k doesn’t represent the k most similar point but just k pairs of similar points. Brute-force algorithm : compute the distance from every img1 descriptors to every img2 descriptors.

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

11 / 35

K-Nearest Neighbor problem Find the optimal value for k . The number of descriptors depends on the image. In some images we have a small number of descriptors.

=⇒ If k is greater than the number of descriptors, these images will never be chosen.

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

12 / 35

clsurf An OpenCL implementation of the SURF algorithm.

http://code.google.com/p/clsurf/ clsurf has been developed by the Northeastern University Computer Architecture Research Group. The application should run correctly on NVIDIA and ATI GPUs without any changes. Extract as much parallelism as possible from the SURF algorithm : I I I I

compute integral image calculate hessian determinant select the best interest points normalize vectors

Has a large number of tunable parameters to change the precision and the number of descriptors=⇒ impacts the performance

restriction Compatibility : the device must support the version 1.2 of CUDA or latest. AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

13 / 35

JOpenSURF A Java implementation of SURF http ://code.google.com/p/jopensurf/

What does JOpenSURF do ? SURF Matching points finding Graphic representation

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

14 / 35

Matching points

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

15 / 35

K-nn : Java thread pool Problem K-nn needs a lot of memory

Solution Do not load the entire file in memory

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

15 / 35

K-nn optimisation Descriptors file is accessed n2 How to minimize K-nn computing time ?

Solution Do not compute two times distances for the same couple of images Use a thread pool

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

16 / 35

Hadoop Map/Reduce Overview Programming model for processing large data sets Typically used to do distributed computing on clusters Written in many programming languages. A popular free implementation is Apache Hadoop. The model is inspired by the Map and Reduce functions

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

17 / 35

"Map" step Master node takes the input. Divides it into smaller sub-problems Distributes them to worker nodes Worker node processes the smaller problem, and passes the answer back to its master node.

"Reduce" step The master node collects the answers Combines them to form the output

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

18 / 35

Hadoop Block Nested Loop Join (Pairwise) Used to join two sets R and S Partition R and S, each into n equal-sized disjoint blocks Perform (BNLJ) for each possible Ri ,Sj pairs of blocks Get k-nn results from n local k-nn results for every record in R

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

19 / 35

1st Job Performs the K-nn search The Mapper takes < Imgi ; DescriptorsList > Produces all the possible pairs of img’s and descriptors. The reducer computes the local K-nn The 1st Job produces < Img1i ; Img2j > which are the most similar images

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

20 / 35

2nd Job Performs a filtering process Eliminates duplicated entries and produces the most similar images

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

21 / 35

Problem of Scaling HBNLJ algorithm doesn’t scale well with multidimensional data Lack of space disk (eg for 1600 pictures, 1.4 GB input file size, it needs more than 7 TB of intermediate space)

Solution ! Modify the output of the Map phase of the first round instead of < img1i ; img2j ; descriptorsList > we output < img1i ; img2i ; offsetbegin ; offsetend > The reducer will compute the K-nn using the initial input file We replace the Disk space glutonny with CPU + I/O time

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

22 / 35

Benchmarks

What we tested : integrity check Surf implementations different combinations of integrity check + Surf KNN implementations Data set : heterogenous set of images consist of 5 differents directories based on size (100,200,400,800,1600)

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

23 / 35

Benchmarks

Platform for SHA-1+SURF benchmarking : CPU RAM GPU OS

Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 18480828 kB NVidia Corporation GF108 (Quadro 600) Fedora 16 x86_64

Benchmark protocol : 3 iterations at night, to reduce side effects

ant clean compile after every iteration

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

24 / 35

Integrity and SURF results

SHA-1 computation time

110

OpenCL Java

1200

90 80

1000

time (seconds)

Time (seconds)

SURF computation time

1400

C++ Java

100

70 60 50 40

800 600 400

30 20

200

10 0

0

200

400

600

800

1000

1200

1400

1600

0

0

200

Number of images

AB, IH, AO and PW (Master IFI/MBDS)

400

600

800

1000

1200

1400

1600

Number of images

Efficient similar images research

2012 June 04th

25 / 35

KNN Java implementation results

Comportement du KNN 250

temps en secondes

200

150

100

50

0 0

AB, IH, AO and PW (Master IFI/MBDS)

200

400

600

800 1000 Nombre d’images

Efficient similar images research

1200

1400

1600

2012 June 04th

26 / 35

Conclusion Conclusion and thanks We implemented a complete workflow based on multiple technologies Huge speed gain expected with GPGPU. Speedup of 3 mesured (in comparison with java) We learned a lot from PhD’s and researchers (EPW,CafeIn,RoundTable...)

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

27 / 35

AB, IH, AO and PW (Master IFI/MBDS)

Efficient similar images research

2012 June 04th

28 / 35