3D Modeling and Visualization of Archaeological Data

Technische Universität München Fakultät für Bauingenieur- und Vermessungswesen Lehrstuhl für Kartographie Prof. Dr.-Ing. Liqiu Meng 3D Modeling and V...
Author: Elmer Gilbert
0 downloads 2 Views 7MB Size
Technische Universität München Fakultät für Bauingenieur- und Vermessungswesen Lehrstuhl für Kartographie Prof. Dr.-Ing. Liqiu Meng

3D Modeling and Visualization of Archaeological Data

Pei Zhang Master Thesis

Editing time:

1. 06. 2013 – 20. 11. 2013

Course of study: Cartography Supervisor:

Dipl.-Ing. Stefan Peters

2013 I

Declaration of Authorship

Last name:

First name:

I declare that the work presented here is, to the best of my knowledge and belief, original and the result of my own investigations, except as acknowledged, and has not been submitted, either in part or whole, for a degree at this or any other University. Formulations and ideas taken from other sources are cited as such. This work has not been published.

Location, Date

Signature

II

Abstract In this work, Structure from Motion (SfM) tools were used for reconstructing archaeological site which is located in Nemi, Italy. Two different SfM systems were implemented on the study area: (1) by using a commercial program “Agisoft PhotoScan” (2) by using open source tools (including VisualSFM and CMPMVS). Optimal strategies of photos acquisition, data processing, and model accuracy were discussed: Optimal strategies and conclusions can be used for improving the efficiency of data processing in SfM systems. Methods of model repairing, model georeferencing, and model modification were introduced. Two datasets from the study area were tested. One dataset includes ground-based images taken by Nikon D3 camera and the other dataset include aerial-based photos taken by a UAV (Unmanned Aerial Vehicle). Point clouds and models derived from these two datasets were compared. In surface reconstruction process, ground photos generate models with more distortions than aerial photos. This problem can be solved by using two methods: one is to use a dataset with a combination of ground photos and aerial photos; the other method is to reconstruct small sub-areas separately, the distortions can be controlled through georeferencing process while using local control points. In addition large scale models always have less detailed information than small scale models. Therefore for large study areas, high detailed mesh models can only be built by using sub-datasets which are sub-sampled from the main dataset. Accurate georeferenced mesh models have only displacement of several centimeters. These models have completed coordinate system for archaeological studies such as measuring distances, overview of excavating sites, excavations monitoring and so on. A 3D GIS model with archaeological attributes was built. SfM systems can be used for improving details in GIS model reconstruction. Archaeological reconstruction model can be built by using mesh model and orthophotos as referencing data. The work has confirmed that SfM systems and related tools can be used for archaeological studies as well as GIS studies.

III

TABLE OF CONTENTS ABBREVIATIONS .................................................................................................................VI LIST OF TABLES ................................................................................................................ VII LIST OF FIGURE ................................................................................................................ VIII 1. INTRODUCTION ............................................................................................................ 1 1.1 Objective of the Thesis ...................................................................................... 1 1.2 Background ....................................................................................................... 1 1.3 Thesis overview ................................................................................................ 2 2. STATE OF THE ART ........................................................................................................ 3 3. METHODOLOGY............................................................................................................ 4 2.1 Overview of Methodology ................................................................................ 4 2.2 Structure from Motion (SfM) System ............................................................... 4 2.2.1 Description ................................................................................................ 4 2.2.2 Scale Invariant Feature Transform (SIFT) ................................................ 5 2.1.2.1 Keypoint detection ..................................................................... 5 2.1.2.2 Keypoint descriptor .................................................................... 8 2.1.2.3 Summary .................................................................................. 10 2.2.3 Basic Theory of SfM ............................................................................... 11 2.2.4 Implementation of SfM ........................................................................... 12 2.2.4.1 Bundler&CMVS/PMVS2......................................................... 12 2.2.4.2 VisualSFM.......................... Fehler! Textmarke nicht definiert. 2.2.4.3 Photosynth ................................................................................ 13 2.2.4.4 Agisoft PhotoScan.................................................................... 13 2.2.4.5 123D Catch............................................................................... 13 2.2.4.6 ARC 3D.................................................................................... 13 2.2.4.7 Summary .................................................................................. 14 2.3 Optimal Strategies for SfM ............................................................................. 14 2.3.1 Description .............................................................................................. 14 2.3.2 Types of Cameras .................................................................................... 15 2.3.3 Strategies for Photos Acquisition ............................................................ 16 2.3.3.1 Shadow Effect .................................................................................. 16 2.3.3.2 Positions of Cameras ........................................................................ 16 2.3.4 Strategies for Data Processing................................................................. 22 2.3.4.1 Input Dataset .................................................................................... 23 2.3.4.2 Photo Matching ................................................................................ 23 2.3.4.3 Dense Reconstruction....................................................................... 26 2.3.4.4 Multi-view Stereo (MVS) Reconstruction ....................................... 32 2.3.5 Summary ................................................................................................. 33 2.4 Mesh Models Generation ................................................................................ 34 2.4.1 Integrated Software ................................................................................. 34 2.4.2 VisualSFM - MeshLab Workflow ........................................................... 35 2.4.3 VisualSFM - CMPMVS Workflow ......................................................... 37 2.5 Usages of Mesh Models .................................................................................. 38 IV

2.5.1 Post Processing of Mesh Models............................................................. 39 2.5.1.1 Models Repairing ............................................................................. 39 2.5.1.2 Models Separation............................................................................ 42 2.5.1.3 Models Combination ........................................................................ 44 2.5.2 GIS and Archaeological Usages of Mesh Models ................................... 45 2.5.2.1 Archaeological Usages ..................................................................... 46 2.5.2.2 GIS Usages....................................................................................... 46 3 CASE STUDIES: NEMI, ITALY.................................................................................... 49 3.1 Location and Background ............................................................................... 49 3.2 Structure of the Temple ................................................................................... 50 3.3 Data Acquisition .............................................................................................. 52 4 RESULT AND DISCUSSION ........................................................................................ 54 4.1 Overview of Implementations ......................................................................... 54 4.2 Results from PhotoScan .................................................................................. 55 4.3 Results from VisualSFM Workflow ................................................................ 61 4.3.1 Dense Reconstruction.............................................................................. 61 4.3.2 CMVS Reconstruction ............................................................................ 64 4.3.3 CMPMVS Surface Reconstruction ......................................................... 69 Georeferenced Models and Accuracy...................................................... 72 4.3.4 4.3.5 Discussion ............................................................................................... 90 4.4 GIS Models Reconstruction ............................................................................ 92 4.5 Archaeological Reconstruction ....................................................................... 93 5 CONCLUSION AND OUTLOOK ................................................................................. 95 REFERENCE .......................................................................................................................... 97 APPENDICES ...................................................................................................................... 100

V

ABBREVIATIONS BSP: Binary Spatial Partition CMVS: Clustering Views for Multi-view Stereo CPU: Central Processing Unit CUDA: Compute Unified Device Architecture DEM: Digital Elevation Model DoG: Difference of Gaussian DSLR: Digital Single-Lens Reflex cameras EXIF: Exchangeable Image File Format FME: Feature Manipulation Engine GIS: Geographic Information System GLSL: OpenGL Shading Language FME: Feature Manipulation Engine LoG: Laplacian of Gaussian GML: Geography Markup Language GUI: Graphical User Interface GPU: Graphics Processing Unit PMVS2: Patch-based Multi-view Stereo Software version 2 SfM: Structure from Motion SIFT: Scale-Invariant Feature Transform UAV: Unmanned Aerial Vehicle

VI

LIST OF TABLES Table 1: A Comparison of SIFT, PCA-SIFT and SURF (Juan, 2009) ............................................ 11 Table 2: Advantages of SfM web service and SfM desktop ........................................................... 14 Table 3: SfM applictions and their characteristics (Some results are concluded from Kresten et al. 2012a) ............................................................................................................................................. 14 Table 4: Numbers of matched photos (Different types of cameras) ............................................... 15 Table 5: Numbers of matched photos (Shadow’s effect) ................................................................ 16 Table 6: Tested datasets information .............................................................................................. 24 Table 7: Information about the data ............................................................................................... 53 Table 8: Displacements between random identical points on orthophoto and survey points (Point1, 2, 3, 4 are points from the middle of the temple; point 5,4,7,8 are points from the north site; points 9, 10, 11, 12 are points from south site) .......................................................................................... 58 Table 9: Reconstruction time and generated vertices from different datasets ................................ 61 Table 10: Processing time for CMVS and generated vertices ........................................................ 64 Table 11: Processing time and vertices generated by using Nikon D3 sub-datasets ...................... 68 Table 12: Displacements between random identical points on orthophoto and survey points (meter) ........................................................................................................................................................ 79 Table 13: Statistics in this section .................................................................................................. 91

VII

LIST OF FIGURE Figure 1: Structure of the thesis ....................................................................................................... 2 Figure 2: Overview of the Methodology.......................................................................................... 4 Figure 3: Workflow to implement SIFT ........................................................................................... 5 Figure 4: 2D Gaussian filter and its implementation ....................................................................... 6 Figure 5: Gaussian image pyramid ................................................................................................... 6 Figure 6: Difference of Gaussian ..................................................................................................... 7 Figure 7: How SIFT find local maxima and minima pixels (from Lowe, 2004) ............................. 8 Figure 8: Original histogram ............................................................................................................ 9 Figure 9: Demonstrations of feature point descriptor and the photo matching by using SIFT (generated in Matlab by using SIFT demo program) ...................................................................... 10 Figure 10: SIFT family: Principal Component Analysis (PCA)–SIFT, CSIFT (Colored scale invariant feature transform), Speeded Up Robust Features (SURF), ASIF (Affine-SIFT) ............. 11 Figure 11: Computing camera’s positions by using keypoints and camera’s focus length ............ 12 Figure 12: Sample photos by using different cameras. (a) A sample photo from the Canon 5D (b) A sample photo from the Canon Powershot .................................................................................... 15 Figure 13: Sample photos taken in different time. (a) A sample photo from noon (b) A sample photo from morning ........................................................................................................................ 16 Figure 14: Two plans describing positions of the cameras. (a) Plan1: the focus points were set on some special features (b) Plan 2: the focus points were always on the center of the object ........... 17 Figure 15: (a) Results of the dense reconstruction by using plan 1 (b) Results of the dense reconstruction by using plan 2 ........................................................................................................ 18 Figure 16: Positions of photographer and camera.......................................................................... 19 Figure 17: Taking photos for large study area. Cameras with red dots are the positions which can see more than two objects while cameras with blue dots are the positions which only one object can be seen ...................................................................................................................................... 20 Figure 18: An object which is difficult for identifying depths in SfM ........................................... 21 Figure 19: Camera's positions in a narrow indoor area .................................................................. 22 Figure 20: Establishment of reference coordinate system in a narrow space................................. 22 Figure 21 Four main processes in VisualSFM ............................................................................... 23 Figure 22: Time comparison between data with EXIF or not ........................................................ 24 Figure 23: Time costing for matching photos in different resolutions ........................................... 24 Figure 24: CUDA versus GLSL in the photo matching process .................................................... 25 Figure 25: Processing time for different quantities of images ....................................................... 26 Figure 26: Vertices generated from different resolutions ............................................................... 26 Figure 27: Relation between generated vertices and processing time in photo matching process . 27 Figure 28: Processing time comparison between data with EXIF tags or not................................ 27 Figure 29: Three cases in reconstruction failure (a) Reconstruction failure: Point cloud split (b) Reconstruction failure: Single object split (c) Reconstruction failure: Noise object ...................... 29 Figure 30: Reconstruction stability test.......................................................................................... 30 Figure 31: CUDA versus GLSL in dense reconstruction process .................................................. 31 Figure 32: Processing time for different quantities of photos in dense reconstruction process ..... 31 Figure 33: Generated vertices from different quantities of photos in dense the dense VIII

reconstruction process ..................................................................................................................... 31 Figure 34: CMVS time costing for images with different resolutions ........................................... 32 Figure 35: Processing time for different quantities of photos in CMVS ........................................ 33 Figure 36: Generated vertices from different quantities of photos in CMVS ................................ 33 Figure 37: Factors which have influences on different processes in VisualSFM ........................... 34 Figure 38: The VisualSFM-MeshLab Workflow............................................................................ 35 Figure 39: Surface reconstruction in MeshLab. (a) CMVS Point cloud (b) Normals of the points (c) Recomputed normals (d) Ball Pivoting Reconstruction (e) VCG Reconstruction (f) Poisson Reconstruction ................................................................................................................................ 36 Figure 40: The VisualSFM-CMPMVS Workflow.......................................................................... 37 Figure 41: Demonstration models generated from CMPMVS ....................................................... 38 Figure 42 Mesh repairing by using MeshFix ................................................................................. 40 Figure 43: Mesh repairing by using Graphite ................................................................................ 41 Figure 44: Steps for repairing big holes in MeshLab. (a) Original model (b) Poisson mesh model derived from the original model(c) Poisson mesh model with texture (d) Repaired model ........... 42 Figure 45: Separating the mesh model into pieces......................................................................... 43 Figure 46 The process of separating a model into two pieces. (a) The original mesh model generated from CMPMVS (b) Clean the meshes and delete irrelevant parts (c) Fill all the holes and close the model (d) Create a cube, implement Boolean operations on the cube and the mesh model (e) The model has been separated into two pieces ............................................................... 43 Figure 47: A demonstration of using Boolean Operations to close a cube which lack of three faces. (a)A block with three faces (b) Boolean calculation “Difference” implementation (c) A derived cube ................................................................................................................................................. 44 Figure 48: Merge a platform with a sculpture in Blender. (a) A repaired mesh model (b) Create a platform for the sculpture (c) Use Boolean Operation “Union” to merge two models ................... 45 Figure 49: Points based merging process. (a) Matching points are selected on each object (b) Two objects are merged into one............................................................................................................. 45 Figure 50: Measurement of mesh models. (a) Measuring the length of a rock in a sample data (b) Measuring the length of the same rock by using 2D GIS data in ArcScene ................................... 46 Figure 51: From feature points to GIS model: (a) picked points from the point cloud generated from SfM (b) imported the points into 3D modeling software, SketchUp as an example (c) gave texture to the simple model ............................................................................................................. 47 Figure 52: Workflow to inherit attributes from 2D data................................................................. 48 Figure 53: Extruded model is replaced by a simple model which is built by using SfM point cloud ........................................................................................................................................................ 48 Figure 54: Overview of Nemi, Genzano and Nemi Lake (Peters et al. 2012)................................ 49 Figure 55 Overview of the excavations at Nemi Lake (Ghini. 2011) ............................................ 50 Figure 56: The temple was built in three different periods (Peters et al. 2012) ............................. 51 Figure 57: A 2D map describing different components of the temple (Peters et al. 2012) ............ 52 Figure 58: Equipments for data acquisition. (a) The Canon Powershot SX220 (b) The Nikon D3 (c) The AscTec Falcon 8 equipped with Sony NEX-7....................................... 53 Figure 59: Ground control points for georeferencing. (a) Blue-red-yellow marker (b) Black and white marker on a rock (c) Measuring coordinate of a control point .............................................. 54 Figure 60: Overview of the implementations ................................................................................. 54 IX

Figure 61: Exported georeferenced mesh models which demonstrates warped effect. (a) Generated model with fine geometry when viewing in PhotoScan, blue flags are control points (b) Exported model viewing in MeshLab ............................................................................................. 56 Figure 62: Cloud-mesh distances computed by using survey points and mesh model from PhotoScan ....................................................................................................................................... 57 Figure 63: Orthophoto generated from the UAV dataset by using PhotoScan overlapping with 2D survey points ................................................................................................................................... 59 Figure 64: Derived models from the Nikon dataset by using PhotoScan (Blue flags are referencing points). (a) Derived point cloud (b) Derived mesh models .......................................... 60 Figure 65: Positions of the cameras. (a) Cameras’ positions from the Nikon D3 dataset (b) Cameras’ positions from the UAV dataset ...................................................................................... 62 Figure 66: Derived point clouds by using different datasets. (a) Dense reconstruction by using Canon Powershot dataset with 819 photos (b) Dense reconstruction by using Nikon D3 dataset with 494 photos (c) Dense reconstruction by using AscTec Falcon 8 dataset with 119 photos ...... 63 Figure 67: Results from Multi-view reconstruction. (a) CMVS reconstruction from Canon Powershot dataset (b) CMVS reconstruction from Nikon D3 dataset (c) CMVS reconstruction from AscTec Falcon 8 dataset ......................................................................................................... 65 Figure 68: Detail comparison between models generated from UAV photos and ground photos. (a) Model generated from Nikon D3 dataset (b) Model generated from UAV photos ......................... 66 Figure 69: Dataset of the temple was separated into ten parts ....................................................... 67 Figure 70: Processing time and generated vertices comparison by using sub-datasets.................. 69 Figure 71: Results from CMPMVS reconstruction. (a) By using The AscTec Falcon 8 dataset (b) By using the Nikon D3 dataset ....................................................................................................... 72 Figure 72: A georeferenced point cloud showing warped effect .................................................... 73 Figure 73: Distance map showing displacements between UAV point cloud and mesh model generated from ground photos (Nikon D3) ..................................................................................... 74 Figure 74: Distance map showing detailed areas. (a): South-west corner of the temple (b): North-east of the temple .................................................................................................................. 75 Figure 75: Cloud-mesh distances computed from the UAV model ................................................ 77 Figure 76: Cloud-mesh distance computed from the Nikon model ............................................... 78 Figure 77: Orthophoto generated from the UAV dataset overlapping with survey points (north-east corner of the temple) ....................................................................................................................... 80 Figure 78: Orthophoto generated from the Nikon dataset overlapping with survey points (north-east corner of the temple) ..................................................................................................... 81 Figure 79 : Orthophoto generated from the UAV dataset overlapping with survey points (south-west corner of the temple) ................................................................................................... 82 Figure 80 : Orthophoto generated from the Nikon dataset overlapping with survey points (south-west corner of the temple) ................................................................................................... 83 Figure 81: A comparison between a model derived from ground photos and a model derived from a dataset which contains ground photos and aerial photos. (a) A generated model by using dataset which only has ground photos from “Region A” (b) A generated model by using a mixed dataset which has photos from “Region A” and UAV photos ..................................................................... 84 Figure 82: A model generated from a dataset which combines ground photos and aerial photos. (a) Target area has textures from ground photos (b) Level of detail in target area was dragged down to X

the same level with the other areas ................................................................................................. 85 Figure 83: Cloud-mesh distance map showing accuracy of the model generated from mixed dataset ............................................................................................................................................. 86 Figure 84: Cloud-mesh distance map showing accuracy of the model generated from “Region A” dataset ............................................................................................................................................. 88 Figure 85: Difference between PhotoScan and CMPMVS workflow............................................ 89 Figure 86: Cloud-mesh distance map showing displacements between the point clouds generated in PhotoScan and mesh model generated in CMPMVS .................................................................. 90 Figure 87: Details comparison between four models in north-east part of the temple. (a) A model derived from the UAV dataset showing north-east site of the temple (b) A model derived from the Nikon dataset (c) A model derived from the UAV and "Region A" dataset (d) A model derived from the "Region A" dataset .................................................................................................................... 91 Figure 88: UML model for archeological excavations (Peters, et al. 2012) .................................. 92 Figure 89: GIS model in ArcScene 10 ........................................................................................... 93 Figure 90: Demonstration model of the temple (PhaseⅢ) ............................................................ 94 Figure 91: Viewing the CityGML model in LandXplorer (Appendix) .......................................... 94

XI

1. INTRODUCTION Structure from Motion (SfM) technique has been developed for several years. It is based on feature point detection algorithms which come from computer vision science. Many applications have come into the market in the past few years. Some are open source and some are already mature commercial applications not matter for desktops or for smart phones. Data derived from SfM systems can be used for 3D model reconstruction, but this is not the only possibility. How to use SfM systems In GIS (Geographic Information System) and its related field is the starting question.

1.1 Objective of the Thesis The goal of the work is to connect SfM with GIS applications for the purposes of archaeological studies. In order to obtain photos correctly, optimal strategies of cameras, shadows, angles and positions will be discussed. In order to improve the efficiency of data processing, relations between processing time, generated vertices, GPU (Graphics Processing Unit) drivers and EXIF (Exchangeable Image File Format) tags will be illustrated quantitatively. Methods of mesh model reconstruction and model repairing will be demonstrated. The most important questions are: how accurate the mesh models are and how they can be used in GIS studies. Therefore accuracy of the data will be proved and further GIS applications will be implemented. For large study areas, ground photos may generate models which are not as accurate as the models generated from the UAV (unmanned aerial vehicle) photos. UAV is not available in all faculties and universities. If only ground photos are available for reconstructing large study areas, how accurate the derived models are. If these models are not accurate enough for GIS studies, what kinds of methods can be utilized in improving the accuracies?

1.2 Background In the year 2012, a survey group from Technical University of Munich visited Nemi, Italy for an archaeological project. The project was mainly about collecting survey data and using them for the purpose of creating a 2D GIS data in archaeological studies. In 2013 the archaeological excavation work continues and the survey group visited Nemi again, on one hand this 2D GIS data need to be updated and improved, on the other hand new techniques can be implemented on the study area, such as ground penetrating radar, infrared camera and UAV. Therefore the idea about using photos from UAV as well as ground photos in SfM systems was carried out. The survey data can be used for validating the accuracy of the results which are derived from SfM systems. 1

1.3 Thesis overview The thesis is divided into two parts: methodology and the case studies, shown in Figure 1. Methods and theories which are introduced in the methodology part will be implemented in case studies. In methodology, basic theory of SfM and its applications will be introduced. All the other sections in methodology part are based on experiments. In section “Optimal strategies of SfM” optimal strategies of data acquisition and data processing will be discussed. How to create mesh models by using programs or workflows will be illustrated. After model post processing process, mesh models can be used for archaeological studies as well as GIS studies. In case studies, the data from Nemi, Italy were used. Accuracies of the models and accuracy improvements are the main tasks. With the help of data derived from SfM systems, a GIS 3D model as well as an archaeological reconstruction model will be created as demonstrations.

Figure 1: Structure of the thesis 2

2. STATE OF THE ART With the development of computer science, SfM applications are becoming more and more user friendly. Number of available SfM applications is increasing. The differences between them were proved (Remondino et al. 2012) (Kersten et al. 2012a) (Neitzel & Klonowski. 2011). Usages of SfM systems are extended as well. Using SfM systems in archaeological case studies is becoming popular. Reconstruction of excavation sites is the most common use of SfM systems. Some commercial SfM programs are easy to use even for low-trained archaeologists. In addition the derived models are accurate enough for archaeological studies as well as excavation recording (Doneus et al. 2011). By using a specific workflow including open source toolkits or programs, mesh model can also be reconstructed (Ducke et al. 2011). Lengths of the archaeological objects can be measured if the derived point clouds or mesh models have absolute coordinate system (Wulff et al. 2013) (Kersten et al. 2012b). With the help of UAV or related equipments, aerial photos are available for large study areas. Therefore extended outputs such as orthophotos which are derived from SfM programs can be used in archaeological studies. (Green. 2012) (Neitzel & Klonowski. 2011) (Verhoeven et al. 2012).

3

3. METHODOLOGY 2.1

Overview of Methodology

This chapter is divided into three parts: (1) Basic theory (2) Optimal strategies (3) Model generation and usages, shown in Figure 2. First, basic theories of SIFT (Scale-invariant feature transform) and Bundler will be illustrated as examples; popular SfM applications and their advantages will be discussed. The second part is based on experiments and statistics. The aim is to find out critical factors in data acquisition as well as data processing, including the effects from types of cameras, shadows and angles; relations between processing time, generated vertices, GPU driver and EXIF tags. The third part is a workflow describing mesh model generation and usages of mesh models. Multiple methods are available in mesh model reconstruction: integrated software and open source workflows. Methods of mesh model repairing, model separation and model combination will be demonstrated. In the end, usages of mesh models in GIS as well as in archaeology will be discussed.

Figure 2: Overview of the Methodology

2.2

Structure from Motion (SfM) System

2.2.1 Description In this section basic theory of SfM systems will be introduced. How 3D point clouds can be reconstructed by using only 2D photographs. Many scientists have their modified formulas for feature detection as well as for dense reconstruction. Basic theory of SIFT (Scale-invariant feature transform) and Bundler (Snavely et al. 2008) will be described as examples. Other SfM programs have similar theoretical structures. Main SfM applications will be introduced. Their advantages and disadvantages will 4

be discussed.

2.2.2 Scale Invariant Feature Transform (SIFT) Image matching is the fundamental aspect of SfM. In order to match images, distinct features on the image should be found. Scale-invariant Feature Transform (SIFT) developed by David Lowe (Lowe, 2004) is an algorithm which can detect stable local feature points. Traditional feature detecting methods extract edges and corners which lie on high contrast regions of images. These features are not stable for matching images because objects on training images may be affected by rotation, scale, shift, illumination, viewpoint distortion, occlusion, noise and so on. Lowe’s method perfectly solves this problem: extracted features are invariant to scale and rotation, partially invariant to affine distortion, noise and illumination (Lowe, 2004 p.91).

Figure 3: Workflow to implement SIFT SIFT has two core functions (Figure 3): Keypoint detection and keypoint descriptor. SIFT uses a special method to find keypoints which are invariant to scale and rotation. A kind of unconventional descriptor is used for describing these keypoints. The extracted keypoints can be found on different images and matching processes can be implemented. 2.1.2.1

Keypoint detection

Objects far away from us are vaguer and smaller. In image processing, this effect can be simulated by continuously changing the scale of an image; the derived images call scale space (Witkin, 1983). In addition Gaussian kernel is the only possible scale space kernel (Lindberg, 1994).  is a volume describing the scale. Therefore an 5

input image I (x, y) can generate a scale space which defined as L(x, y,  ):

L  x, y ,    G  x, y ,   * I  x, y  Where Gaussian function G  x, y,   

 ( x2  y2 )  exp   2 2  2 2   1

Scale space exists in real world. The Gaussian function is a way to express scale space. Gaussian blur (Figure 4) is the result of blurring images by using Gaussian function.

Figure 4: 2D Gaussian filter and its implementation Reducing image size by combining smoothing and subsampling together, an image pyramid describing scale space can be generated (Lindberg, 1994). As shown in Figure 5: one image can generate several octaves, one octave has several intervals.

Figure 5: Gaussian image pyramid Laplace operator can be implemented on each interval in order to detect maxima/minima values on the image. But this process requires huge amount of computations. SIFT provides a close approximation to LoG (Laplacian of Gaussian): 6

Laplace operator is isotropic therefore LoG Operator LoG( x, y,  )   2G  Therefore

G   2G 

G G( x, y, k )  G( x, y,  )    2 (k  1)

G  x, y, k   G  x, y,     k  1  22G

Difference between Gaussian kernel function has an approximation with LOG operator. Therefore a new operator DoG (Difference of Gaussian) was introduced: D  x, y,    G  x, y, k   G  x, y,   * I  x, y   L  x, y, k   L  x, y,  

Compare to LoG, DoG is more efficient. As illustrated in Figure 6: In each octave, computing the difference of two nearby images in Gaussian space will derive DoG image which is approximately equal to LoG.

Figure 6: Difference of Gaussian So far, the target image space has been generated. Local maxima and minima pixels can be extracted from DoG images. Each pixel will be compared with its eight neighbor pixels and its nine neighbor pixels which are lying on the scale where above and below the current scale, shown in Figure 7 (Lowe, 2004).

7

Figure 7: How SIFT find local maxima and minima pixels (from Lowe, 2004) Keypoints come from local maxima and minima pixels. Local maxima and minima pixels need to be filtered. Unstable points will be deleted. The purpose is to reinforce the stabilities of feature points. The method of filtering pixels will not be introduced in this thesis. 2.1.2.2

Keypoint descriptor

The extracted keypoints should have special descriptors hence points on different images can be matched as one point. SIFT uses a special method to describe keypoints. A keypoint is picked on an image. Use this point as the center point and define a neighbor region. Then calculate gradients within this region and use these gradients to build an orientation histogram. Every 10 degree is defined as one column therefore the orientation histogram has 36 columns, shown in Figure 8. The X axis represents degrees and the Y axis represents the sum values of the gradients’ numbers. The direction with the biggest value will be regarded as the direction of the keypoints (Lowe, 2004). Therefore keypoints on the images are marked with arrows. A demonstration is shown in Figure 9. In conclusion SIFT keypoint descriptor describe the relations between a key pixel and its neighbor pixels. When the lighting condition has changed, every pixel in the image will become darker therefore the keypoints’ direction will remain the same. Similarly, rotations and scale will not change keypoints’ direction as well. This is why SIFT keypoints are table keypoints. Detailed descriptions about keypoint descriptor can be found on prof. Lowe’s papers.

8

Figure 8: Original histogram

9

Figure 9: Demonstrations of feature point descriptor and the photo matching by using SIFT (generated in Matlab by using SIFT demo program) 2.1.2.3

Summary

SIFT method is widely used in different research fields such as object identification, 10

image mosaicking, robot orientation, 3D modeling and so on. SIFT has its own advantages in feature detection. It also has drawbacks such as time-consuming, not sensitive to images with blurry edges etc. Consequently scientists never stop on its optimizations and improvements. Through many years of developing, SIFT family is getting bigger and bigger, as shown in Figure 10. These modified or improved versions have their individual characteristics, shown in Table 1.

Figure 10: SIFT family: Principal Component Analysis (PCA)–SIFT, CSIFT (Colored scale invariant feature transform), Speeded Up Robust Features (SURF), ASIF (Affine-SIFT) Table 1: A Comparison of SIFT, PCA-SIFT and SURF (Juan, 2009) Method Time Scale Rotation Blur Illumination SIFT common best best best common PCA-SIFT good common good common good SURF best good common good best

Affine good good good

2.2.3 Basic Theory of SfM The aim of Structure from Motion is to estimate three-dimensional structures from two-dimensional images. As mentioned before, image matching is the fundamental aspect for SfM. Matching process needs stable keypoints which are invariance to image transformations. SIFT keypoint detector is one solution. Different SfM systems may use different keypoint detectors but the basic theory is similar. Once the keypoints are known, combining with the EXIF (Exchangeable Image File Format) tags which contain focus length information, cameras’ positions can be computed. As shown in Figure 11: keypoints have different distributions on each image. A geometry relation exists between these points and camera’s position. Once the positions are known, more matched points on image pair can be found. In the end a point cloud which represents the shape of the object can be derived.

11

Figure 11: Computing camera’s positions by using keypoints and camera’s focus length

2.2.4 Implementation of SfM 2.2.4.1

Bundler&CMVS/PMVS2

Bundler (Snavely et al. 2008) is a structure-from-motion system for unstructured image collections, for example, images on internet. The earlier version of Bundler was developed for the project “Photo Tourism” which belongs to Microsoft (Snavely et. al. 2006). The software used SIFT keypoint detector for finding keypoints, and uses a modified version of the Sparse Bundle Adjustment package (M.I.A. Lourakis & A. A. Argyros) for reconstructing the scene. The only input data for Bundler is photos. The software generates 3D point clouds which represent the shape of the scenes automatically. Results generated from Bundler can be used as input data in CMVS/PMVS2 (Furukawa & Ponce 2010) which generates denser point clouds. Bundler was primarily used and tested in Linux OS. For Windows user Cygwin (Appendix) and some programming packages are necessary. In general Bundler is not easy to use for the people who have no experience with Linux. The latest version of Bundler is 0.4 which was released on April 10, 2010.

2.2.4.2

VisualSFM

VisualSFM is a GUI (Graphical User Interface) application developed by Changchang Wu (Wu, 2011). It is free for personal and non-profit use. VisualSFM uses SiftGPU for detecting DoG keypoints and uses a CPU/GUP mixed method for building the keypoint list. SiftGUP uses GUP for the computation of SIFT algorithm, which brings huge efficiency to VisualSFM (Wu, 2007). The GUI only contains SiftGUP which supports keypoints detection and dense reconstruction. Users who want to use CMVS/PMVS reconstruction have to download extra algorithms and put them in the 12

GUI folder. In this thesis VisualSFM is one of the main tools for 3D reconstruction. 2.2.4.3

Photosynth

Photosynth (Appendix) is developed by Microsoft Live Lab and University of Washington. It is based on the Photo Tourism Project (Snavely et al. 2006) and has been successfully transformed into a web application. Users can upload their photos in Microsoft’s server and the server will return point clouds. Photosynth also supports other functions for example geo-tagging the models, creating panorama views and so on. The whole process is a black box and users have no access to the processing procedure. By using some toolkits the results can be exported: SynthExport (Appendix) is a free tool which support exporting the point cloud in various format (PLY, OBJ, VRML, X3D); PhotoSynthToolkit (Appendix) contains some scripts based tools for processing the generated point cloud from Photosynth: point clouds downloading and CMVS/PMVS2 implementation.

2.2.4.4

Agisoft PhotoScan

PhotoScan (Appendix) is a commercial SfM program. It has two versions for selling: $179 for a standard version and $3499 for a professional version. The standard version contains most of the functions from point cloud generation to mesh model generation. Professional version has some specific functions for Photogrammetry and GIS, including DEM generation, orthophoto generation, georeferencing and so on. In general, PhotoScan is user friendly and powerful. It combines SfM, georeferenced tools, 3D filtering and editing tools together. 2.2.4.5

123D Catch

Autodesk 123D (Appendix) contains a series of software for 3D model generation and modification, including 123D Catch, 123D Design, 123D Creature and so on. 123D Catch is a free web service for generating 3D point clouds and mesh models from photos. Unlike Photosynth, 123D Catch is able to generate dense point clouds and 3D mesh model directly. A desktop version containing modification tools can be used for further post processing, such as model repairing, 3D filtering, tools for cutting and extruding etc. Comparing to PhotoScan, 123D Catch is less accurate and non-professional. 2.2.4.6

ARC 3D

ARC 3D (Appendix) is a free SfM web services as well. Users have to use a desktop interface for uploading images to the server. The server will send an email to remind the users when the process has finished. Three links are attached with the email: a MeshLab depth maps, full resolution model in OBJ format and a low solution model for online viewing. Generated full resolution models have high quality. Users can use it for further processing, for example georeferencing and model editing. 13

2.2.4.7

Summary

The described SfM applications can be divided into two main categories: SfM web services and SfM desktop applications. Each has its own advantages, as shown in Table 2. Table 2: Advantages of SfM web services and SfM desktop applications Advantages SfM web services 1. No special requirement for computer hardware 2. No usage for local memories and spaces 3. No need to install libraries for different programming languages 4. Much faster than desktop SfM desktop 1. Output quality are controllable (algorithm/software) 2. Time spending are controllable 3. Processes are optional The described applications are the SfM programs which commonly being used nowadays. The detailed characteristics are shown in Table 3: For desktop applications: Bundler/PMVS2 is an earlier SfM application, which is very slow; VisualSFM is similar to Bundler/PMVS2, but much faster; PhotoScan shows unique advantages in every aspect, but as commercial software the price is very high. Web services are not suitable for scientific researches because the computing processes are not controllable. However, nearly all the web services are very fast and easy to use. Therefore VisualSFM was chosen as the main reconstruction tool in the thesis. In practices, VisualSFM shows its stabilities and efficiencies. Table 3: SfM applictions and Kresten et al. 2012a) Speed Bundler/CMVS Slow VisualSFM Very fast PhotoScan Fast 123D Ctach Very fast ARC 3D Slow Photosynth Very fast

2.3

their characteristics (Some results are concluded from Quality Good Good Excellent Satisfactory Good Good

Georeference No Yes Yes No No No

Mesh model No No Yes Yes Yes No

Filter No No Yes Yes No No

Optimal Strategies for SfM

2.3.1 Description This section is divided into two principal parts: strategies for data acquisition and 14

strategies for data processing. VisualSFM was used as a testing tool in experiments. Types of cameras, the position of cameras, lighting conditions and their effects will be discussed. The influences from resolution of photos, the quantity of photos, EXIF tags and GPU drivers will be found out.

2.3.2 Types of Cameras Is camera making differences in VisualSFM? Two types of cameras were used in data acquisition: the Canon 5D and the Canon Powershot SX220 HS. The Canon 5D is a DSLR (Digital Single-Lens Reflex cameras) camera, equipped with a 50mm fixed lens. The Canon Powershot is a normal digital camera. These datasets were computed in VisualSFM in order to find out how many photos were correctly matched. Datasets “The Statue” (Figure 12) were used. Selected photos were taken from the same angles at around 3pm in the afternoon. Two cameras were equipped with different lenses; therefore the distances from the camera to the object were adjusted so that the images have the same appearances. The result is shown in Table 8: When only 4pics and 8pics were used, both datasets fail to match any photo. When 12pics were computed, the Powershot dataset has derived a better result. However, two datasets have similar performances in the rest of comparisons. The result shows that DSLRs have no specific advantage in data acquisition. But DSLR can equip with different lenses. Wide angle lenses will bring more field of view to the cameras. This is especially useful when dealing with large study area.

Figure 12: Sample photos by using different cameras. (a) A sample photo from the Canon 5D (b) A sample photo from the Canon Powershot

Table 4: Numbers of matched photos (Different types of cameras) 4pics 8pics 12pics 18pics Canon 5D Fail Fail 3 6 Canon Powershot Fail Fail 9 5 15

24pics 10 11

2.3.3 Strategies for Photos Acquisition Without appropriate photos as input dataset, all SfM systems are not able to accomplish dense reconstruction process. Therefore experiments have been implemented in order to find out optimal strategies in photos acquisition.

2.3.3.1

Shadow Effect

In theory, SIFT keypoints are invariant to the lighting conditions. When all pixels on an image have turned darker or brighter together, keypoints still have the same descriptors. In reality, when lighting conditions have changed, some regions in images may show stronger shadows than the other regions, which may change keypoints’ descriptors. These shadows will affect the results in dense reconstruction process. The Canon Powershot camera was used for the data collection. One dataset was taken in the morning and the other was taken in the afternoon. All photos were collected from the same angles and locations. The result (Table 5) obviously shows that: photos which have avoided strong shadows have better performance in the photo matching process. The dataset which was collected in the morning always match more photos than the dataset which was collected at noon.

Figure 13: Sample photos taken in different time. (a) A sample photo from noon (b) A sample photo from morning Table 5: Numbers of matched photos (Shadow’s effect) 4pics 8pics 12pics Noon Fail Fail 3 Morning Fail 4 12

2.3.3.2

18pics 6 17

24pics 10 24

Positions of Cameras

Positions of a camera play an important role in data acquisition. Consequently two types of data acquisition plans (Figure 14) were tested. In Plan 1(a) the focus points 16

were set on special features such as edges and corners. In Plan 2(b) the focus points were always set on the center of the object. Both plans use 48 images for the reconstruction.

Figure 14: Two plans describing positions of the cameras. (a) Plan1: the focus points were set on some special features (b) Plan 2: the focus points were always on the center of the object When using plan 1, the derived result (Figure 15) shows four point clouds which represent different faces of the object, but they can’t be matched into one object. In comparison, when using plan 2, the results has generated only two point clouds and the shape of the model is very clear. This shows that VisualSFM needs a large percentage of images’ overlap in photo matching process. The exact number is not clear, but in most cases better to have an overlap more than 90%. The focus point should always focus on the center part of the object, which will bring more overlaps to each photo pairings.

17

Figure 15: (a) Results of the dense reconstruction by using plan 1 (b) Results of the dense reconstruction by using plan 2 The demonstration (Figure 16) shows the positions of the photographer and the camera (blue points represent photographers). Plan 1 shows a photographer standing on one point and taking photos of different aspects from the object. In Plan 2, the photographer has movement while shotting photos and the camera always focus on the centre of the object. Plan 1 is not recommended for data collection because normally these images have insufficient overlaps between image pairings. Datasets with inadequate overlaps commonly cause reconstruction failure which generates multiple point clouds represent different faces of an object.

18

Figure 16: Positions of photographer and camera When the target is one single object with normal size, this method is easy to handle. But for a large study area covering multiple objects, taking photos on the ground for SfM system is not easy. “Central objects” and “connection” are key words for data acquisition in a large study area. Identical objects can be selected as central objects. When cameras obtain photos around central objects, central objects as well as their backgrounds can be reconstructed. “Connection” means extra photos are needed for connecting these central objects. As shown in Figure 17, the camera’s positions with red dots are the positions which can see more than two objects. However, sometimes this is not enough for the connection. Therefore more photos from further positions are necessary.

19

Figure 17: Taking photos for large study area. Cameras with red dots are the positions which can see more than two objects while cameras with blue dots are the positions which only one object can be seen The central object should be easily identified from every aspect of view. No special rules exist in selecting central objects. But in general round objects should be avoided because they have similar figures from different aspects. Objects with dark colors especially black objects are not recommended as well because these objects are difficult in identifying depths of the objects. A sample object is shown in Figure 18, which is a statue locating in TU Munich. 48 images were used for the reconstruction but still fail because of its unique shape and black color.

20

Figure 18: An object which is difficult for identifying depths in SfM Two strategies can be used for large area data acquisition: (1) Taking sufficient photos for the reconstruction, and take extra photos as backup photos. If the reconstruction fails, check which parts have multi-model effect and use the backup photos from that part to fix the problem. (2) Separate the area into small parts, and reconstruct them separately. Later these separated components can be combined after a georeferencing process. The first method only needs one georeference. As long as the user knows how to collect photos correctly, this method is recommended. For the users who are not familiar with SfM systems the second method is recommended for the following reasons: (1) Less time-consuming, especially when using multiple computers running the process at the same time. (2) Options of giving more details to specific parts. (3) Every part has its own local control points, the errors in georeference are controllable for each part. (4) When the performance of the computer is not good enough to deal with large mesh data, using separate models can accelerate the display speed. For a large indoor area, the strategy is similar. Identifiable objects locating inside indoor environments can be selected as central objects. And the background behind these central objects can be reconstructed. For a small and narrow space, this strategy can be changed a bit. When taking photos inside a narrow space, the moving range is limited. In addition, the camera may not catch all objects locating inside because the lens of the camera is not wide enough. In order to solve this problem the space can be divided into at least two parts. As shown in Figure 19: Objects and their background can be reconstructed. Derived two models 21

are in their local coordinate system. In order to connect these two models, a simple reference coordinate system can be established (Figure 20): The length and height of the room and objects can be measured. An edge can be defined as the coordinate (0, 0, 0), therefore all the other coordinates can be calculated. Consequently these two models can be georeferenced and connected.

Figure 19: Camera's positions in a narrow indoor area

Figure 20: Establishment of reference coordinate system in a narrow space

2.3.4 Strategies for Data Processing The aim of this section is to find out optimal strategies in data processing by using SfM applications. It is impossible to test all SfM programs quantitatively. Therefore VisualSFM was used as the testing program because of its unique stability and efficiency. For most of the SfM programs, running processes behind are similar. 22

Therefore these results can be considered as referenced optimal strategies for other SfM desktop applications as well. Performances of different GPUs and CPUs may have influences on the results in the tests. Therefore all tests are based on one computer hardware environment: OS System: Windows 7 Professional 64 bit Processor: Intel(R) Xeom(R) CPU E31220 @ 3.10GHz RAM (Random-access memory): 8.00 Gigabyte GPU: NVIDIA Quadro 2000 VisualSFM has four main processes (Figure 7). The first process reads input dataset which are photos; then these photos will be computed and matched. The third and the forth processes generate point clouds. Connections exist between the following parameters: the processing time, generated vertices, EXIF tags in the photos, GUP driver type, resolution of the photos and quantity of photos.

Figure 21 Four main processes in VisualSFM

2.3.4.1

Input Dataset

The first process “Input Dataset” is for the purpose of loading photos. Obviously its processing time depends on the quantity of the photos as well as the size of the photos.

2.3.4.2

Photo Matching

This is the main processes in SfM systems. VisualSFM will automatically compare all the photos and find the connections between them. The matching process is time-consuming. EXIF tags contain all camera information. In theory, Photos contain EXIF tags or not have no influence on processing time. The following test was implemented for proving this. Three datasets with different number of photos (Table 6) was used: “Statue”, “Lions” and “Rocks”. They are divided into 10 different sub-datasets, depending on the resolutions. As shown in the chart (Figure 22), only a slight difference exists between them. EXIF tags cannot accelerate photo matching process. This result also shows another conclusion: When the resolution decreases, the processing time remains stable until the resolution is less than 1200*900 (also in Figure 23). This means in VisualSFM, reducing photos’ resolution will not help in accelerating photo matching process.

23

Table 6: Tested datasets information Statue Rocks Number of images 71 58

Lion 61

Figure 22: Time comparison between data with EXIF or not

Figure 23: Time costing for matching photos in different resolutions As mentioned before, the matching process relies heavily on GPU performance. In 24

VisualSFM two modes can be used for matching photos: the GLSL (OpenGL Shading Language) mode and the CUDA (Compute Unified Device Architecture) mode. CUDA mode is only usable for the computers which equipped with NVIDIA graphics cards. Four extra datasets with EXIF tags were computed in the test (Figure 24). CUDA mode needs less time in matching process. When dealing with large amount of photos, the difference is extremely huge. In most situations, CUDA mode has successfully reduces the time by around 50%.

Figure 24: CUDA versus GLSL in the photo matching process In order to find out the relation between processing time and quantity of photos, datasets with different quantities of images was used in the photo matching process. The result (Figure 25) clearly demonstrates that processing time has an exponent relation to the number of photos. A dataset with 40 images only need 5 minutes in the matching process, but when the quantity has increased to 80 which is two times than 40. The processing time has increased to 20 minutes, which is 4 times larger than 5 minutes. In this case the line match perfectly with the quadratic equation y=0.0031x20.0256x+1.0075, R2= 0.9961.

25

Figure 25: Processing time for different quantities of images

2.3.4.3

Dense Reconstruction

Once the cameras’ positions are known, points from the object can be matched and a loose point cloud which represents the shape of the object can be generated. This is so called dense reconstruction process. Normally this process is very fast, from several second to several minutes, depending on quantity of photos. Dense reconstruction process generates point clouds. In VisualSFM point clouds can be visualized directly. A denser point cloud means a better result. Datasets “Statue”, “Rocks”, “Lion” with EXIF tags were used in order to find out the relations between photos’ resolution and derived vertices. As demonstrated in Figure 26: When the resolutions are bigger than 1200*900, the number of derived vertices remains stable. This means photos with high resolution will not bring more vertices. When using vertices divided by costing time as the Y axis (Figure 27), another conclusion can be concluded: the more time consuming it is in photo matching process, the more vertices can be generated.

Figure 26: Vertices generated from different resolutions 26

Figure 27: Relation between generated vertices and processing time in photo matching process What about EXIF tags in the reconstruction process? VisualSFM extracts focus length information from EXIF tags automatically. Will this accelerate the reconstruction process? Dataset “Rocks” was used in this test by 20 times (Figure 28). When using EXIF dataset. The processing time is stable, in this case around 5 seconds. When using the data without EXIF tags, it’s more time consuming and unstable. This proves the assumption that EXIF tags help VisualSFM to locate cameras’ positions, which will bring stability and efficiency to the reconstruction process.

Figure 28: Processing time comparison between data with EXIF tags or not Not all datasets are able to finish the reconstruction process. When just a few photos match with each other or when the resolutions are too low. The reconstruction 27

procedure may fail. Three typical reconstruction fail are shown below: Point cloud split (Figure 29 (a)): This is the commonest reconstruction failure. It happens especially when the datasets have insufficient overlaps, especially at the corners of an object. It also caused by using the wrong methods in photos acquisition. For example, standing in a set point and taking photos without any movement. Single object split and Noise object (Figure 29 (b) & (c)): These commonly happen when the resolutions of datasets are not high enough. The shape of the point cloud is very clear, which means all the photos have no problem in the matching process. But in some detail parts object collapse because of the low pixel quality. Sometimes it represents as gaps in a single object and sometimes represents as noise objects. In general, these two situations only happen when the resolution of the data has decreased to an unstable level.

28

Figure 29: Three cases in reconstruction failure (a) Reconstruction failure: Point cloud split (b) Reconstruction failure: Single object split (c) Reconstruction failure: Noise object When the resolutions have dropped down to an unstable level, these kinds of reconstruction failures happen all the time. For the dataset “Rocks” with 58 photos, the resolution 800*600 is not stable. Sometimes it can generate point clouds correctly but sometimes not. Therefore a test (Figure 30) about stability of reconstruction was implemented: An unstable dataset was used in reconstruction process for twenty times. The result shows that dataset with EXIF tags have a higher chance of completing the reconstruction. Twenty tests were implemented and the success rate is 100% for the datasets with EXIF tags, but only 50% for the EXIF-lost dataset. In addition, the 29

derived vertices from EXIF-lost data are less than the derived vertives from EXIF dataset because fewer photos were correctly matched with each other.

Figure 30: Reconstruction stability test. As with photo matching process, the GUP drivers CUDA mode and GLSL mode cause differences in VisualSFM as well. The result is shown in Figure 31. Small datasets with fewer photos have various possibilities in the reconstruction process. VisualSFM needs more time to analyze these possibilities. Therefore a faster GPU driver shows great advantages: CUDA mode is much faster than GLSL mode. But when dealing with large amount of photos, the structure of the object is correspondently clear and the time it needs for GUP driver is comparatively stable. Therefore the gap between CUDA driver and GLSL driver is small. In general, to use CUDA GUP driver will heavily increase the speed in dense reconstruction process.

30

Figure 31: CUDA versus GLSL in dense reconstruction process Unlike photo matching process, costing time for the reconstruction process has a linear relation with the quantities of photos (Figure 32). In this case the line is close to linear equation y=0.007x+0.0068, R2=0.8876. In addition, quantities of photos have a linear relation with the derived vertices (Figure 33).

Figure 32: Processing time for different quantities of photos in dense reconstruction process

Figure 33: Generated vertices from different quantities of photos in dense the dense reconstruction process 31

2.3.4.4

Multi-view Stereo (MVS) Reconstruction

Clustering Views for Multi-view Stereo (CMVS) is developed by Dr. Yasutaka Furukawa. This program uses the results from Structure from Motion (SfM) system and output denser point clouds. Users can integrate this software with VisualSFM. CMVS is very time consuming, especially for large datasets. Since the positions of cameras are known in the previous process, EXIF tags have no influence on CMVS reconstruction. In addition, GUP driver is not involving in this process therefore CUDA and GLSL have no difference here. CMVS needs to analyze photos one by one and match more points on the existing point cloud. Therefore the resolutions of photos play an important role in its efficiency. Dataset “Rocks” with 58 images was used for testing the relation between time and the resolution (Figure 34). And the result clearly proves the assumption: an exponential relation exists between time and the resolution of the photos. When using extremely high resolution images in CMVS reconstruction, the time may increase dramatically to several days.

Figure 34: CMVS time costing for images with different resolutions In theory, quantities of photos have linear relations with the processing time as well as vertices it generated. In the test (Figure 35&Figure 36) the result is not stable sometimes. It also depends on what the images contain. For example, the ideal dataset only contains the target objects. All the other objects including the background are white. But in reality, large and complicated backgrounds exist in the dataset. This images need more time for the implementation.

32

Figure 35: Processing time for different quantities of photos in CMVS

Figure 36: Generated vertices from different quantities of photos in CMVS

2.3.5 Summary Normal cameras have no specific disadvantages compare to DSLR. But a DSLR equipped with a wide-angle lens is especially suitable for large area or indoor environment reconstruction. In photo acquisition, optimal strategies are: taking photos around the target object; avoid the weather condition which will cause the target object too dark or too bright, a cloudy weather is recommended; for multiple objects, “Central Objects” and “Connection” are the key points; use reference coordinate system when have troubles in “Connection”; keep all the EXIF information in the photos. In VisualSFM, CUDA mode will heavily increase the efficiency in photo matching process. But has no influence on all the other processes. Combining the results from the matching process and the dense reconstruction process, conclusions can be made (Figure 37): resolution of the photos has little influence on these two processes. Photo matching process is much more time consuming than the reconstruction process. As shown in Figure 32, even 100 images cost less than one minute in reconstruction process. Therefore the time it cost can be ignored. Processing time of the photo matching has an exponent relation to the 33

numbers of photos. But generated vertices have a linear relation to the numbers of photos. Reducing the numbers of photos in a dataset can heavily increase the efficiency. In these two processes, quantity is the key point. When dealing with large dataset with huge amount of photos, deleting unnecessary photos from the dataset may save lots of time. When the target area is wide, separating the whole area into small parts which contain fewer photos can improve the efficiency. This separated point clouds also have advantages in data post processing. In CMVS reconstruction, resolution is the key point. When the processing time is extremely long, reducing 50% of the resolution may reduce 75% of processing time but only reduce 50% of the generated vertices. More importantly, CMVS will generate a huge amount of points which are sometimes difficult to display. Using high resolution images can generate a denser point cloud which is not able to be processed, which also makes no sense. Therefore in CMVS reconstruction, images with a resolution around 2000*1500 are recommended. Image size more than 4000*3000 will sometime take an extremely long time for the implementation.

Figure 37: Factors which have influences on different processes in VisualSFM

2.4

Mesh Models Generation

2.4.1 Integrated Software As mentioned before. Some programs are able to generate mesh models directly. For example, PhotoScan, 123D Catch and Arc3D are commonly used nowadays. 123D Catch is comparatively not suitable for scientific researches. For building very accurate GIS models this program is not recommended. As a web service, Arc3D derives mesh models with superior qualities. But when dealing with large datasets, users cannot estimate how much time the server needs. PhotoScan is one of the best SfM programs in the market. The software is simple and easy to use even for 34

non-professional users and the data quality is also one of the best. The integrated functions for georeferencing, DEM image generation and orthophoto generation are the highlights. Georeferencing tools in PhotoScan use images for setting GCPs, which is very fast and accurate.

2.4.2 VisualSFM - MeshLab Workflow

Figure 38: The VisualSFM-MeshLab Workflow Users can also use a combination of multiple programs or tools for mesh model generation. VisualSFM and MeshLab workflow (Figure 38) is a typical workflow for surface reconstruction nowadays. It contains alternatives choices in some main procedures. For instance, in dense reconstruction process, other SfM programs are also usable. In addition, many tools are available for georeferencing the models, such as GRASS GIS (Appendix), PC-AffineTrans (Appendix) and Java Graticule 3D (Appendix). A new version of VisualSFM also embeds a function for georeferencing point clouds. Georeferencing point clouds will cause warp effect which will be 35

discussed in chapter 4. An alternative method is: first generate mesh models from point clouds and then use CloudCompare/MeshLab (Appendix) to georeference mesh models directly. This kind of georeferensing method uses four/three point based glued function in the programs. VisualSFM-MeshLab workflow needs a lot manual works, including point cloud filtering and mesh surface filtering. Poisson Surface Reconstruction (Kazhdan et al. 2006) is a method for reconstruction 3D surfaces from point clouds. MeshLab (Version 1.3.2) contains three surface reconstruction methods: Poisson Surface Reconstruction, Ball Pivoting Reconstruction and the VCG Reconstruction (from VCG library).

Figure 39: Surface reconstruction in MeshLab. (a) CMVS Point cloud (b) Normals of the points (c) Recomputed normals (d) Ball Pivoting Reconstruction (e) VCG Reconstruction (f) Poisson Reconstruction Surface reconstruction algorithms need normals of the points in the computation. And the points generated from VisualSFM always have wrong normals (Figure 39: (b)). With these points, the algorithm will generate a model with a wrong geometry. Therefore re-computing the normals are necessary. Poisson Surface Reconstruction method is widely used because it generates surfaces without holes. This workflow relies on Poisson Reconstruction algorithm for constructing the surfaces. Therefore the method is not suitable for reconstructing multiple and complicated targets. Through experiments, this workflow generates better results for a single object such as a sculpture or a façade.

36

2.4.3 VisualSFM - CMPMVS Workflow

Figure 40: The VisualSFM-CMPMVS Workflow CMPMVS (M. Jancosek et. al. 2011) is a Multi-view Reconstruction program. It uses the output files from VisualSFM. This command-line based program runs automatically and generates multiple results: dense point clouds, textured models in different formats, videos (optional), simplified models in different formats and so on. The latest version of CMPMVS has some optional functions for users. For example: generate an orthophoto, generate videos, etc. CUDA GPU driver is compulsory for CMPMVS. Unlike the previous workflow, this workflow has no alternative choice for the main process because CMPMVS has to use the output file from VisualSFM. Most of the processes in this workflow are automatic. CMPMVS is very time consuming especially for large dataset. And the full output from CMPMVS will cost enormous hard disk space. For instance, a tested dataset with only 6 images needed 63 minutes for the implementation and it has consumed 6 gigabytes hard disk space. A dataset with 105 images used 649 minutes and it even cost 51.1 gigabytes. 37

Dense point cloud

Surface model

Textured surface model

Figure 41: Demonstration models generated from CMPMVS CMPMVS derives mesh models with superior quality (Figure 41). However, sometimes the full size models are too big to modify. That’s why the software provides an option which will generate simplified models. In addition, one model file in .qs format will be generated. QSplat (Appendix) is a program for displaying large size models. Users who have difficulties in displaying models can use this program as an alternative tool. In addition, the orthophoto generated from CMPMVS can be georeferenced and used for GIS purposes. Accuracies of the results will be discussed in Chapter 4.

2.5

Usages of Mesh Models

Mesh models can be generated from the described programs or workflows. These models can be used for various purposes, for example, 3D printing, animations, product demonstration, etc., but nearly all models derived from SfM systems have 38

some defects: irrelevant faces, noise faces, missing faces and missing texture and so on. Therefore model post processing are needed for solving these problems.

2.5.1 Post Processing of Mesh Models 2.5.1.1

Models Repairing

In most cases generated mesh models have holes. Models with holes cannot be used for some purposes like 3D printing or animations. Various programs are available for mesh repairing, for example: MeshLab, MeshFix, Graphite etc (Appendix). MeshFix is a command line based program which can automatically close the model and delete irrelevant mesh components (Figure 42). It provides only a few functions. The repaired models generated from MeshFix will lose their textures. However, the coordinates will not change. Therefore the textures can be re-sampled from the original models.

Original mesh model

Repaired mesh model

39

Original mesh model

Repaired mesh model

Original mesh model

Repaired mesh model

Figure 42 Mesh repairing by using MeshFix Graphite is a free program for computer graphic researches. Comparing to MeshFix, Graphite provides more user-defined functions for model repairing and modifying. But the output models from Graphite loses their textures as well. The textures can be re-sampled from the original models. Figure 43 shows an example.

40

Original mesh model

Repaired mesh model

Original mesh model

Repaired mesh model

Figure 43: Mesh repairing by using Graphite MeshLab is an open source program for mesh model editing and processing. It has powerful functions for editing point clouds and mesh faces. MeshLab provides some core functions for model repairing such as close holes; remove isolated pieces; compact vertices; remove duplicated faces and fill holes. For big holes, a more accurate method can be used in Meshlab: first create a Poisson mesh model as material for filling the holes; then resample the texture from the original model and delete the duplicated components; only the components which will be used for filling the holes will be kept; in the end glue this components with the original model (Figure 44). More process can be implemented on the Poisson model for example 41

smoothing and deformation.

Figure 44: Steps for repairing big holes in MeshLab. (a) Original model (b) Poisson mesh model derived from the original model(c) Poisson mesh model with texture (d) Repaired model

2.5.1.2

Models Separation

In reality, objects are composed of many small components. But the models generated from SfM all components are stuck together as one model. Therefore in some cases only repairing the general model is not enough. For example, as shown in Figure 45: the rock contains small parts which should be separated. MeshLab can be used for separating the models: by using modification tools users can delete the parts which are not needed, this will cause holes at the joint; repairing methods which mentioned above can be used for closing these holes. As an alternative method a 3D modeling software can be used as well. Most of the modeling programs in the market can be used for separating models. Blender (Appendix) will be introduced as an example.

42

Figure 45: Separating the mesh model into pieces Blender supports .ply and .wrl formats, which will keep the texture of the models. Other formats such as .obj and .dae formats will lose the texture. The general steps are shown in Figure 46: first the model has to be cleaned and repaired; the repaired models can be imported into Blender for Boolean operation. Boolean Operations are available in most of the 3D modeling software like SketchUp (Appendix) and 3dsMax. Boolean Operations “Intersect” and “Difference” can be used for separating models. This method will not generate extra holes because the Boolean Operations will automatically fill the joint area with a plane surface. Separated models can be exported for further processing.

Figure 46 The process of separating a model into two pieces. (a) The original mesh model generated from CMPMVS (b) Clean the meshes and delete irrelevant parts (c) Fill all the holes and close the model (d) Create a cube, implement Boolean operations on the cube and the mesh model (e) The model has been separated into two pieces In addition, Boolean Operations can be used for filling huge holes or closing meshes 43

as well. Figure 47 shows a demonstration. When a fragmentary object has Boolean Operation “Diffeence” with an object which can cover all faces on fragmentary object, the fragmentary object will be closed. The reason behind it is not clear, but the assumption is most of the modeling software use BSP (Binary Spatial Partition) structure in 3D modeling, even these models are not closed, in Boolean Operation they are regarded as entities.

Figure 47: A demonstration of using Boolean Operations to close a cube which lack of three faces. (a)A block with three faces (b) Boolean calculation “Difference” implementation (c) A derived cube

2.5.1.3

Models Combination

In some cases generated models are incomplete. For instance some parts are missing. Sometimes the original dataset are too huge to process, therefore users has separated the original point clouds into small components for surface reconstruction. Now these components have to be merged. For solving these problems, MeshLab/CloudCompare provides “Point Based Glueing” function for merging multiple mesh models. First all model components should have a uniform scale. Matching pair of points can be selected between two models. Later these two models will be merged into one and further merging can be processed between the merged model and other components. 44

An alternative method is using 3D modeling software. Boolean Operation “Union” will merge all the selected models together. This method is especially suitable for merging existing mesh models with new models which created in modeling software, for example creating a platform for a model (Figure 48). But the Boolean Operation is not stable, for some datasets the merging process will cause error textures. For solving this problem, users can export the created model to MeshLab and implement “Point Based Glueing” for the merging process (Figure 49).

Figure 48: Merge a platform with a sculpture in Blender. (a) A repaired mesh model (b) Create a platform for the sculpture (c) Use Boolean Operation “Union” to merge two models

Figure 49: Points based merging process. (a) Matching points are selected on each object (b) Two objects are merged into one

2.5.2 GIS and Archaeological Usages of Mesh Models In general, mesh models are not suitable for storing GIS information due to the massive size of the models. Mesh models have good performances for the purpose of visualization, but these models contain massive number of faces which will cost huge 45

amount of memories in GIS system. Therefore for the purpose of GIS 3D model reconstruction, SfM data has to be treated differently: in GIS model reconstruction, a different workflow will be introduced; the constructed models have no GIS attributes inside, a method to inherit attributes from the 2D GIS data will be introduced.

2.5.2.1

Archaeological Usages

Accurate mesh models can be used as referencing data in archaeological studies and reconstruction. Derived mesh models contain superior detailed information. This level of detail is extremely useful for archaeologists. Archaeologists are using a traditional way of data collection: rocks and details were drawn through field work. Georeferenced mesh models have completed coordinate system for archaeological studies. New applications for archaeologists can be carried out by using these kinds of data: measuring distances between objects on personal computer, overview of a large excavating site, excavation monitoring and updating, archaeological reconstruction analysis and so on. Figure 50 shows an example. A mesh model was georeferenced and measured. Compare the result to survey data. The displacement is only 0.001 meters.

Figure 50: Measurement of mesh models. (a) Measuring the length of a rock in a sample data (b) Measuring the length of the same rock by using 2D GIS data in ArcScene

2.5.2.2

GIS Usages

On one hand archaeologists are seeking for details as much as possible but on the other hand most GIS system cannot handle such massive data with millions of points. SfM systems will generate huge amount of vertices, especially after CMVS/PMVS2 reconstruction. Most of these points are not necessary. In GIS system, simple and structured models are preferred. For the purposes of reconstructing these simple models, identical feature points are needed. These points can be extracted from dense point clouds. 46

Open source program CloudCompare can be used for extracting feature points and their coordinates. CloudCompare support a function for picking points out of point clouds and export the coordinates in a text file (.csv, .xyz, .pts, etc.). Exported points can be imported into Google SketchUp by using a free plug-in called “Cloud v8 (Appendix)”. Once the points are imported, a textured model can be constructed (Figure 51). LOD (Level of Detail) should be concerned when the points was picked. Constructed simple textured models can be widely used in GIS.

Figure 51: From feature points to GIS model: (a) picked points from the point cloud generated from SfM (b) imported the points into 3D modeling software, SketchUp as an example (c) gave texture to the simple model In some cases 2D GIS data can be extruded into 3D models, but these models are not accurate. Mesh models or simple models can be used to replace extruded models. By using this method, data attributes and structures in 2D shape files are kept. As shown in Figure 52, the workflow works manually by using ArcScene and 3D modeling software. The connection between them is COLLADA format. Nowadays most 3D modeling programs in the market support this format. In most cases repaired mesh models from SfM systems are too huge to handle in ArcScene because even a small block will have hundreds of faces. Therefore the simple GIS models from the previous section are preferred. Figure 53 shows an example.

47

Figure 52: Workflow to inherit attributes from 2D data

Figure 53: Extruded model is replaced by a simple model which is built by using SfM point cloud

48

3 CASE STUDIES: NEMI, ITALY 3.1

Location and Background

The study area is located in the municipality of Nemiand it is about 30km southeast of Rome. The archaeological sites are located on the northern shore of Lake Nemi and they are the ruins of an ancient temple dedicated to Diana (Figure 54). Among these ruins, an ancient temple called “Tempio di Diana a Nemi” is the main target object. The excavation of this area is not finished yet. The overview of the excavation is shown in Figure 55. Diana temple is almost located in the middle. The temple is only a small part of the whole archaeological site. Many unknown historical remains are still underground.

Figure 54: Overview of Nemi, Genzano and Nemi Lake (Peters et al. 2012)

49

Figure 55 Overview of the excavations at Nemi Lake (Ghini. 2011)

3.2

Structure of the Temple

The structure of the temple is complicated. It is combined by three different temples (Figure 56). In different period of time, local people rebuilt or reinforced the temple. Therefore distinct rocks were used as material in the wall. The temple was getting bigger and bigger through rebuilt and reinforcement. Figure 57 illustrates a 2D floor plan of the temple.

50

Figure 56: The temple was built in three different periods (Peters et al. 2012)

51

Figure 57: A 2D map describing different components of the temple (Peters et al. 2012)

3.3

Data Acquisition

Three different types of datasets were collected, depending on the equipments (Figure 58): the Nikon D3, the Canon powershot SH220 and the UAV. The Canon Powershot SX220 HS is a low cost solution for data collection. It costs around 200 Euros (2013) and the performance is satisfactory. The Nikon D3 is equipped with a 28mm fix lens, which brings it a significant advantage. With this wide angle lens, the camera can shot more objects in the photos, which is easier for SfM systems to connect objects. A UAV system called “AscTec Falcon 8” was used. It equipped with a Sony NEX-7 which is a mirror less interchangeable lens camera with an APS-C (Advanced Photo System type-C) sensor.

52

Figure 58: Equipments for data acquisition. (a) The Canon Powershot SX220 (b) The Nikon D3 (c) The AscTec Falcon 8 equipped with Sony NEX-7 Detailed information is shown in Table 7. The UAV obtains the data from the sky, which bring it the widest angle. 119 images are sufficient for coving the whole area. The original dataset from UAV also contains interior and exterior orientation information which can be used for photogrammetric rectification. The other two equipments were used on the ground. Nikon D3 with a 28mm fix lens has a wider angle than the Canon Powershot, which reduces the quantitative requirement in photos for the Nikon D3. Table 7: Information about the data Canon Powershot SX220 Number of photos 819 Original resolutions 4000*3000

Nikon D3 494 4256*2832

AscTec Falcon 8 119 6000*4000

In order to georeference the 3D models or point clouds. Identical control points on the ground are also needed (Figure 59). Total stations are used to collect the coordinates of these control points. Three types of control points were used: (1) Clear edges or corners of a rock (2) Designed blue-red-yellow marker (3) Black and white marker for the UAV. When dealing with a specific area for detail reconstruction, sometimes no 53

sufficient control points can be found around. In such condition, clear edges or corners can be used as control points as well.

Figure 59: Ground control points for georeferencing. (a) Blue-red-yellow marker (b) Black and white marker on a rock (c) Measuring coordinate of a control point

4 RESULT AND DISCUSSION 4.1

Overview of Implementations

The datasets which were collected in Nemi, Italy can be divided into two categories: aerial photos and ground photos. In this Chapter, the datasets were implemented by using two principle methods: commercial software PhotoScan and open source workflow (including VisualSFM and CMPMVS). Structure of the implementation is shown in Figure 60. The results are evaluated on two factors: appearance and accuracy. Two methods were used in order to improve the accuracies of the derived models. In the end a GIS model and an archaeological model were built as demonstrations.

Figure 60: Overview of the implementations 54

4.2

Results from PhotoScan

As commercial software, PhotoScan has its own visualization windows. The user interface integrated all functions from photos reading to models georeferencing. In PhotoScan, results can be visualized and edited without any problems. But once the models are exported by using a defined coordinate system, the exported models will have warped effect no matter they are mesh models or point clouds. This phenomenon also happens when using other tools for georeferencing process. The UAV data was used in PhotoScan and the result is shown in Figure 61: image (b) demonstrates the warped exported model. This phenomenon will not occur when the models are exported by using local coordinates systems. To avoid this effect, models can be exported by using its local coordinate system first, then to use four/three points based align function in CloudCompare so that the model can be relocated and rescaled correctly. 3D Survey points were used for checking the accuracy of georeferenced model. In CloudCompare, Cloud-mesh distances function computes the distances between points and their nearest triangles (CloudCampare manual). For smooth models, if wrong control points were used in georeferencing processes, the georeferenced models will have huge geometry displacements. If the whole models have small displacements in the X, Y, Z axis, average Cloud-mesh distances will increase heavily as well. Therefore cloud-mesh distances can be used for evaluating the accuracy of a mesh model. The distances between survey points and the georeferenced model were computed, shown in Figure 62. 56.22% of the values are between +0.0069 and -0.185 meters. Mean distance is 0.061 meters. In general models generated from PhotoScan have smooth surfaces and well structures, but detailed information is missing on some area. A comparison between this model and the model derived from the open source workflow will be introduced in section 4.2.4.

55

Figure 61: Exported georeferenced mesh models which demonstrates warped effect. (a) Generated model with fine geometry when viewing in PhotoScan, blue flags are control points (b) Exported model viewing in MeshLab

56

Figure 62: Cloud-mesh distances computed by using survey points and mesh model from PhotoScan PhotoScan also provides convenient tools for exporting DEM and orthophotos which are based on the model and cameras’ positions. Therefore the orthophotos can be used 57

for evaluating the accuracy of the model as well. The generated orthophoto from the UAV dataset was overlapped with survey points in Arcgis for an overview, shown in Figure 63. In general, the data is very precise. Random identical feature points were picked and the 2D displacements were measured, shown in Table 8. Some points are difficult to identify on the orthophoto, therefore only points lying on clear edges are picked. The results match with the previous results. The survey points have around 0.05 meters of displacement in average. Table 8: Displacements between random identical points on orthophoto and survey points (Point1, 2, 3, 4 are points from the middle of the temple; point 5,4,7,8 are points from the north site; points 9, 10, 11, 12 are points from south site) Displacement(m) Displacement(m) Displacement(m) Point1 0.041 Point5 0.049 Point9 0.081 Point2 0.056 Point6 0.047 Point10 0.038 Point3 0.048 Point7 0.041 Point11 0.049 Point4 0.036 Point8 0.042 Point12 0.046

58

Figure 63: Orthophoto generated from the UAV dataset by using PhotoScan overlapping with 2D survey points The Nikon D3 dataset was used in the PhotoScan for the purposes of checking if ground photos are able to generate geometry structures which are as accurate as aerial photos. In PhotoScan the target quality was selected as “high” and the processing time 59

was about 60 hours. The derived model is not satisfactory even it has 4699750 faces. As shown in Figure 64: Detailed information is lost when only using ground photos for reconstructing large study area in PhotoScan. Most of the regions are blurred and distorted therefore the model cannot be used for further post processing. This is because in surface reconstruction process, the program needs a local coordinate system as a platform, and all the meshes will be built on this basic platform. For large study area, even detailed photos are provided for reconstructing small regions. The whole model will still be built on a large scale. This is an effect which will drag down detail information when photos describing large area are used together. A similar result will be shown in section 4.2.4.

Figure 64: Derived models from the Nikon dataset by using PhotoScan (Blue flags are referencing points). (a) Derived point cloud (b) Derived mesh models 60

4.3

Results from VisualSFM Workflow

Results from the VisualSFM workflow have three categories: (1) Point clouds from dense reconstruction (2) Point clouds from CMVS/PMVS2 (3) Mesh models from CMPMVS. Three datasets were used and the comparisons between them will be discussed.

4.3.1 Dense Reconstruction Dense reconstruction process has derived point clouds. Details are shown in Table 9. Dataset from Canon Powershot has 819 photos, while dataset from AscTec Falcon 8 has only 119 images. But the matching time for Powershot dataset is about forty times larger. The theory was proved in Chapter 2 (Figure 25: Processing time for different quantities of images). Table 9: Reconstruction time and generated vertices from different datasets Canon Powershot Nikon D3 AscTec Falcon 8 Number of photos 819 494 119 Time for matching photos 1019.783 min 407.25 min 24.7 min Time for dense reconstruction 22.967 min 14.55 min 2.7 min Vertices generated 429653 245961 62861 Three point clouds were generated (Figure 66). UAV has the widest angle. The point cloud only has 62861 points, but all the points are more equally separated on the surface of the temple. By contract, the other two point clouds have more details focus on the edges of the walls. On the surface of the temple, some areas are lack of points. As shown in Figure 65: When taking images on the ground for a large target area, camera was focus on identical components of the target. For feature point detection, points on identical components such as edges and corners can be easily detected as feature points. Therefore different components can be connected. By using this method, fewer photos are needed for connecting components, but also cause a problem: not enough images for identifying feature points on the flat surface. But this doesn’t mean that the flat surface areas are not covered in the images, only because the points there are comparatively loose. After the CMVS reconstruction, the situation will change.

61

Figure 65: Positions of the cameras. (a) Cameras’ positions from the Nikon D3 dataset (b) Cameras’ positions from the UAV dataset

62

Figure 66: Derived point clouds by using different datasets. (a) Dense reconstruction by using Canon Powershot dataset with 819 photos (b) Dense reconstruction by using Nikon D3 dataset with 494 photos (c) Dense reconstruction by using AscTec Falcon 8 dataset with 119 photos 63

4.3.2 CMVS Reconstruction Quantities of photos have nearly linear relations with the processing time in CMVS reconstruction (proved in Figure 35: Processing time for different quantities of photos in CMVS). The Canon Powershot dataset didn’t work in this process. Due to the large number of photos, it took more than three days for the implementation and the process stopped automatically in the end. VisualSFM generated models separately in the process, although the process stopped, all finished point clouds can be found in the computation folder. The AscTec Falcon 8 collects photos with a resolution of 6000*4000. In CMVS reconstruction, an exponential relation exists between time and the resolution of the photos (proved in Figure 34: CMVS time costing for images with different resolutions). Therefore when the dataset was computed with its original resolution, the processing time is extremely long. After 5404.083 minutes, VisualSFM had stopped. When the resolution of the photos was reduced to 3000*2000, the processing time had dropped down to 24.7 minutes. After the CMVS reconstruction, number of points increases dramatically. For the Canon Powershot dataset: more than 62 times; for the Nikon D3 dataset: about 72 times; for the UAV dataset: around 36 times. As mentioned before, point clouds generated from the Canon Powershot and the Nikon D3 have huge sizes, such massive point clouds may causes problems in data post processing because of limitations of computer hardware. In order to solve this problem, users can resample the derived point clouds by using MeshLab; the other method is to separate the whole area into small regions and reconstruct them separately. Table 10: Processing time for CMVS and generated vertices Canon Powershot SX220 Nikon D3 Number of photos 819 494 Processing time approx. 4556 min 2273.733 min Vertices generated 26845726 17718341

AscTec Falcon 8 119 24.7 min 2270769

Figure 67 shows the results from the CMVS reconstruction. The UAV dataset had derived a point cloud with clear structures of the whole temple. Point clouds derived from the ground photos have similar results: some regions are lacking of details; background areas are distributed in an irregular form. In order to cover the whole temple, Nikon D3 camera with a wide angle lens only needs half number of photos than the Canon powershot, which shorten the processing time dramatically. Generated vertices from both datasets have massive sizes: point cloud derived from the Canon dataset is two times larger than the point cloud which derived from the Nikon dataset. The extra vertices generated from the Canon dataset has already exceeded the needs of creating detailed models. However, these two point clouds have more detailed information than the UAV point cloud. As shown in Figure 68, the model from Nikon D3 dataset contains more detailed structures of the blocks. 64

Figure 67: Results from Multi-view reconstruction. (a) CMVS reconstruction from Canon Powershot dataset (b) CMVS reconstruction from Nikon D3 dataset (c) CMVS reconstruction from AscTec Falcon 8 dataset 65

Figure 68: Detail comparison between models generated from UAV photos and ground photos. (a) Model generated from Nikon D3 dataset (b) Model generated from UAV photos

66

Generated point clouds by using the Nikon and the Canon datasets have huge sizes. Nowadays a PC with a decent graphics card is able to display such huge point clouds, but sometimes still has difficulties in interactive activities such as selections and modifications. In addition, processing time of the CMVS reconstruction is very long. The whole process needs approximately 4556 minutes which is more than three days for 819 images. And the derived point clouds are too huge to process. For the purposes of detailed models reconstruction and simple GIS models reconstruction, to process all the photos at the same time is not necessary. Most of the modeling processes are done by manual works. Dealing with huge amount of photos at the same time increases processing time heavily in dense reconstruction process (proved in Figure 25: Processing time for different quantities of images). In addition the size of the photos has a few influences on photo matching process, but has huge influence on CMVS reconstruction (proved in Figure 23: Time costing for matching photos in different resolutions and Figure 34: CMVS time costing for images with different resolutions). Using photos with original size will generate huge amount of vertices which are not necessary. Therefore the sizes of the photos can be reduced and the whole temple can be separated into small sub-regions.

Figure 69: Dataset of the temple was separated into ten parts Data from the Nikon D3 was divided into ten regions (Figure 69). Each nearby region has some overlap areas. For example region B has overlapped area with region C and region A. These overlapped areas need some photos in common. Therefore the sum number of all photos from each part is bigger than the number of photos in the 67

original dataset. In this case from A to J the total number of photos is 522, the number in original dataset is 494. The resolution of the photos was reduced 30% to 2979*1982. Each sub-dataset was computed separately. The results are shown in Table 11. Although the resolution has decreased by 30%, generated vertices from the the dense reconstruction process remain stable, generated vertices from the CMVS reconstruction have reduced by around 50%. The total processing time has reduced dramatically, as shown in Figure 70. By using sub-dataset, the processing time has decreased by 87.57% in the dense reconstruction and by 90.2% in the CMVS reconstruction. Table 11: Processing time and vertices generated by using Nikon D3 sub-datasets Number Time for the Vertices from Vertices from of matching Time for the CMVS the dense the CMVS photos photos (min) reconstruction(min) reconstruction reconstruction A 64 7 19.6 27756 692649 B 53 4.55 22.45 20858 874590 C 26 1.22 6.95 10209 393871 D 77 9.63 35.9 42734 1363796 E 52 4.68 23.533 32912 917968 F 65 7.07 24.117 45322 736230 G 58 5.65 26.883 40528 1160060 H 50 4.15 24.317 30613 1010685 I 50 4.35 22.717 33695 999335 J 27 2.32 14.5 19937 649505 Total 522 50.62 220.967 304564 8798689 Original 494 407.25 2273.733 245961 17718341 Data

68

Figure 70: Processing time and generated vertices comparison by using sub-datasets

4.3.3 CMPMVS Surface Reconstruction When the Canon Powershot dataset was used for CMPMVS surface reconstruction, after several days of processing the program always stopped automatically because the huge amount of photos. When used the Nikon D3 dataset, CMPMVS software successfully generated a mesh model in the end but unable to generate a video. The AscTec Falcon 8 dataset worked very well and costs about 820 minutes. Figure 71 shows the generated mesh models and comparisons. The model from UAV data has 2693004 faces while the Nikon data has 4615256 faces.

69

70

71

Figure 71: Results from CMPMVS reconstruction. (a) By using The AscTec Falcon 8 dataset (b) By using the Nikon D3 dataset The model derived from the UAV dataset shows outstanding structures of the study area. The model is close to a DEM model, which contains a few holes. But in general lack of detailed information. In contrast, the Nikon D3 dataset has generated a very detailed model. Some surfaces areas are filled but still have a huge amount of small holes which need to be repaired.

4.3.4 Georeferenced Models and Accuracy Many tools can be used for georeferencing point clouds: GRASS GIS, PC-AffineTrans, Java Graticule 3D, VisualSFM and so on. In practices, GRASS GIS 72

cannot handle large datasets; PC-AffineTrans only support a few data formats. More importantly, the georeferenced point clouds are warped (Figure 72). And all these programs aren’t able to georeference mesh models. CloudCompare has point-based aligned function which can be used for georefereneing mesh models directly.

Figure 72: A georeferenced point cloud showing warped effect Two mesh models generated from CMPMVS were used for comparisons. Mesh models from the UAV dataset and from the Nikon dataset were georeferenced in CloudCompare. The distances between point clouds and mesh models can be computed. As shown in Figure 73: displacements between UAV point cloud and mesh model generated from the Nikon dataset were computed. In average, the distance is around 0.01m. The northern part shows more displacement than the southern part. Two detailed distance maps are shown in Figure 74: North-east corner of the temple has an average displacement of around 0.06 meters while the south-west part of the temple has only about ± 0.02 meters. Pavements in north-east corner have even more than ± 0.3 meters of displacements. The results show that after georeferencing, the models generated from ground photos has similar geometry structures to the models generated from aerial photos. Some areas have larger differences than the others.

73

Figure 73: Distance map showing displacements between UAV point cloud and mesh model generated from ground photos (Nikon D3)

74

Figure 74: Distance map showing detailed areas. (a): South-west corner of the temple (b): North-east of the temple 75

The foregoing results only show that models from the UAV dataset have displacements with the models from the ground photo dataset (the Nikon), which means one model is less accurate than the other. In order to find out which model is less accurate, survey points were used for comparisons. North-east corner of the temple has a larger displacement between two models. Therefore the north-east site was selected as the target area. Figure 75 shows the cloud-mesh distances computed from the UAV model. The survey points match very well with the mesh model in general. Most of the values are between +0.02 and -0.06, This result is qualified for accurate GIS model reconstruction. Figure 76 shows the Cloud-mesh distance computed from the Nikon model. The model has a slight shift with the survey points. Values in the diagram are mainly concentrated in the middle; most of the values are concentrated from -0.09 to -0.13.

76

Figure 75: Cloud-mesh distances computed from the UAV model

77

Figure 76: Cloud-mesh distance computed from the Nikon model So far the conclusion still can’t prove that ground photos will generate models with more distortions because these kinds of geometry distortions may caused by 78

georeferencing process. CMPMVS also generated orthophotos from the two datasets (the UAV and the Nikon). They were georeferenced by using 16 control points and imported in ArcGIS. 2D survey points were overlapped with the orthophotos. Figure 77 shows the UAV orthophoto overlapping with survey points on north-east corner of the temple. The photo matches well with the points. Through measurements, the displacement is less than 0.05 meters. But the orthophoto derived from the Nikon dataset has larger displacements, as shown in Figure 78: all points have shifts about 0.2 meters. The distortion is not caused by manual mistakes in georeferencing process because enough GCP were used for georeferencing. Sample area in south-west of the temple was shown in Figure 79 and Figure 80. In south-west corner of the temple, the distortion from the Nikon dataset is smaller. Random identical feature points were picked from north-east site and south-west site. Their 2D displacements were measured, as shown in Table 12. The result clearly shows that the model derived from the Nikon dataset has more displacements. North-east site has more displacements than south-west site. These results match with conclusions from the previous section. Models generated from ground photos are less accurate than models generated from aerial photos. By using aerial photographs, generated models have better geometry structures and fewer distortions. Ground photos will generate models with more details, but huge errors exist on some areas. Table 12: Displacements between random identical points on orthophoto and survey points (meter) Displacement Displacement (North-east UAV dataset) (North-east Nikon dataset) Point1 0.0194 0.173 Point2 0.0314 0.2058 Point3 0.0555 0.0513 Point4 0.0276 0.1367 Point5 0.0305 0.1432 Point6 0.0416 0.1778 Point7 0.0174 0.080 Displacement Displacement (South-west UAV dataset) (South-west Nikon dataset) Point1 0.0567 0.0597 Point2 0.024 0.0462 Point3 0.0174 0.0834 Point4 0.0345 0.1048 Point5 0.0509 0.086 Point6 0.0338 0.0689 Point7 0.0451 0.0933

79

Figure 77: Orthophoto generated from the UAV dataset overlapping with survey points (north-east corner of the temple)

80

Figure 78: Orthophoto generated from the Nikon dataset overlapping with survey points (north-east corner of the temple)

81

Figure 79 : Orthophoto generated from the UAV dataset overlapping with survey points (south-west corner of the temple)

82

Figure 80 : Orthophoto generated from the Nikon dataset overlapping with survey points (south-west corner of the temple) For the purposes of large archaeological site reconstruction, models generated from ground photos and aerial photos were compared. Aerial photos which are taken by the UAV have an overview of the entire target, therefore less geometry distortions exist in 83

the models. However, these models are lacking of details for detailed GIS model reconstruction. When only using photos which are taken from the ground, SfM systems has a bigger chance of generating models with larger geometry distortions, because ground photos are lacking of view of the whole target areas. For fixing these distortions, two methods can be used. One way is to combine aerial photos with ground photos. Photos from “Region A” (shown in Figure 69) were combined with 80 aerial photos from the UAV dataset. CMPMVS was used for the implementation. The result shows that combining ground photos with aerial photos will not increase detailed information dramatically. Figure 81 gives a comparison between a model from ground photos and a model which is generated from a combination of ground photos and aerial photos. The result is very clear: in CMPMVS workflow, by using a combination of ground photos and aerial photos will not increase the level of detail on specific regions. The reason was described in section 4.1. CMPMVS surface reconstruction process is similar to the process in PhotoScan, the program needs a local coordinate system as a platform, and on this platform meshes will be created. For large study area, even detailed photos are provided for reconstructing small regions. The whole model will still be built on a large scale. Surface reconstruction process will not distinguish sub regions. When two datasets were combined as one, the generated models have a level of detail from the photos which have larger scale. As shown in Figure 82: some textures in the model are from ground photos, but the Target area “Region A” was smoothed to the same level of detail with the other areas. Therefore for detailed area reconstruction, the only way is to use photos from small regions, and georeferencing then separately. Finally models describing small regions can be glued together.

Figure 81: A comparison between a model derived from ground photos and a model derived from a dataset which contains ground photos and aerial photos. (a) A 84

generated model by using dataset which only has ground photos from “Region A” (b) A generated model by using a mixed dataset which has photos from “Region A” and UAV photos

Figure 82: A model generated from a dataset which combines ground photos and aerial photos. (a) Target area has textures from ground photos (b) Level of detail in target area was dragged down to the same level with the other areas Regardless of the fact that this method cannot provide more details on specific regions, it increases the accuracy of the models. The average displacement in this region is 0.093 meters when only ground photos are used. Now the average number decrease to 0.0492. Details are shown in Figure 83.

85

Figure 83: Cloud-mesh distance map showing accuracy of the model generated from mixed dataset 86

The other method is to reconstruct small sub-areas separately. The distortions can be controlled through georeferencing process. Photos from “Region A” were selected as sample data. CMPMVS was implemented and the Blue-red-yellow markers (as shown in Figure 59) were used for georeferencing the model. The result is shown in Figure 84. The accuracy of the model is improved dramatically. When using all the photos together, this area has an average displacement of 0.093 m, which is not quality for scientific studies. Now the number has decreased to around 0.056 m. Most of the points are located at the right positions.

87

Figure 84: Cloud-mesh distance map showing accuracy of the model generated from “Region A” dataset PhotoScan and CMPMVS workflow both generated models successfully by using the UAV dataset. The differences between them will be discussed. Both original models have around 2700000 faces. As shown in Figure 85: The model derived from PhotoScan has less details than the model derived from CMPMVS; the depths of the 88

models have huge difference. Therefore Cloud-mesh distances were computed, as shown in Figure 86. Mean displacement between them is 0.091. Most of the displacements occur in the regions which have altitude difference. Obviously CMPMVS has generated a model with higher accuracy in this case.

Figure 85: Difference between PhotoScan and CMPMVS workflow

89

Figure 86: Cloud-mesh distance map showing displacements between the point clouds generated in PhotoScan and mesh model generated in CMPMVS

4.3.5 Discussion Experiments for different purposes have been implemented in this section. For a large archaeological site, the CMPMVS workflow is able to handle huge amount of ground photos for the purpose of mesh model reconstruction. The derived models from the ground photos have lower accuracy than the model derived from the UAV photos. A table describing statistical results is shown in Figure 18. Average displacement from the UAV model is 0.053 while from the Nikon model the value is 0.067. In specific regions which are located in the border, the distortions are larger. For example in north-east site of the temple, the Nikon model has an average displacement of 0.093 meters. Two methods were used for improving the accuracy. Both methods successfully reduce the displacements. 90

For detailed mesh models reconstruction, the only way is to separate large target area into small regions and reconstruction them separately. Large scale models always have less detailed information than small scale models. “Region A” dataset with 64 photos can generate a model with 1441254 faces in the study area. When combining photos from “Region A” and aerial photos, this number drop down to 67478 faces. Figure 87 illustrates four models according to Table 13: Model (a) is derived from the UAV dataset. This model is accurate but lack of details. Model (b) is derived from the Nikon dataset, this model is more detailed but the mean displacement is about 0.093 meters. Model (c) is generated from a combined dataset which contains the UAV photos and the "Region A" photos, this model hasn’t improve a lot details but has improved the accuracy to 0.0492. The last model (d) is generated from "Region A" dataset and was georeferenced by using local control points. Its accuracy has been improved to 0.053 and the number of generated vertices is the highest. Table 13: Statistics in this section Mean displacement (m) Number of faces UAV model (119 photos) 0.053 984899 Nikon model (494 photos) 0.067 3855529 0.027 46204 UAV north site (Figure 87 (a)) 0.093 245813 Nikon north site (Figure 87 (b)) 0.0492 67478 UAV + "Region A"(144 photos Figure 87 (c)) 0.053 1441254 "Region A" (64 photos Figure 87 (d))

Figure 87: Details comparison between four models in north-east part of the temple. (a) A model derived from the UAV dataset showing north-east site of the temple (b) A model derived from the Nikon dataset (c) A model derived from the UAV and "Region A" dataset (d) A model derived from the "Region A" dataset

91

4.4

GIS Models Reconstruction

2D Polygons are created from the survey points. These polygons have an UML (Unified Modeling Language) structure, shown in Figure 88. By using the workflow in section 2.5.2, the UML structure can be kept. The 2D polygons can be simply extruded into 3D but this model is not accurate in height as well as in detail. Points from SfM systems are mainly used for adjusting 3D blocks. The demonstration model is shown in Figure 89. Each rock has its own archaeological information stored inside. SfM point clouds improve the accuracy in 3D GIS model reconstruction. These GIS models can be converted into other formats such as KML (Keyhole Markup Language) and CityGML by using FME (Feature Manipulation Engine) or other tools.

Figure 88: UML model for archeological excavations (Peters, et al. 2012)

92

Figure 89: GIS model in ArcScene 10

4.5

Archaeological Reconstruction

Previous workflows have generated mesh models as well as orthophotos, which can be used as referencing data for archaeological reconstruction. In this case, only the foundations of the temples are left. In addition, the structure of the temple is very complicated. Therefore archaeological reconstruction is far too early to be implemented. For some other ruins and archaeological sites, mesh models can help a lot in archaeological reconstruction. For example, derived mesh models have an overview of the whole excavation site as well as detailed structures and textures. These materials can be considered in analyzing archaeological reconstruction. Still, regarding existing mesh models and orthophotos as referencing data, combining with reading materials, a simple archaeological reconstruction model was built as a demonstration (Figure 90). Derived models can be imported into SketchUp, with the help of a CityGML plug-in (Appendix) the model can be transformed into CityGML model (Figure 91).

93

Figure 90: Demonstration model of the temple (PhaseⅢ)

Figure 91: Viewing the CityGML model in LandXplorer (Appendix)

94

5 CONCLUSION AND OUTLOOK The thesis describes specific workflows which are based on SfM systems and its related tools: tools for dense reconstruction, tools for mesh model reconstruction, tools for mesh model editing, tools for GIS model and simple model reconstruction. SfM systems use photos as the only input datasets, therefore optimal strategies for data acquisitions is very important. Types of cameras make no special differences. But DSLR with wide angle lens have extreme advantages in dealing large study areas. In data acquisition, the best strategies are: (1) Taking photos around the target (2) Avoiding weather conditions which will cause the target object too dark or too bright (3) For multiple objects, “Central Objects” and “Connection” are the key points (4) Always keep EXIF tags. Processing time and derived vertices are two main factors for evaluating efficiency of data processing processes in SfM: In photo matching and dense reconstruction processes, quantity of photos dominates the processes; but in CMVS reconstruction process, resolution of the photos dominates the process. SfM systems and its related tools were used for reconstructing archaeological site which located in Nemi, Italy. Two main SfM systems were implemented on study area: a commercial program “Agisoft PhotoScan” and an open source workflow mainly including programs VisualSFM and CMPMVS. Through experiments, although PhotoScan has multi-functional user interfaces and user friendly tools, the open source workflow VisualSFM-CMPMVS has better performances no matter in model accuracy or in model restoration. An accurate georeferenced mesh models have only displacement of centimeters. Two datasets from study areas were tested, one is ground based images which are taken by the Nikon D3 camera and the other is aerial based photos taken by a UAV. Models from these two datasets were processed by using different workflows and the results were compared. Ground photos generate models with more distortions than aerial photos and the problem can be solved by using two main methods: one method is to use dataset with a combination of ground photos and aerial photos; the other method is to reconstruct small sub-areas separately, and the distortions can be controlled through georeferencing process by using local control points. Both methods have successfully improved the accuracy of the models. But the first method hasn’t improved the level of detail even ground photos were used. The second method has improved the level of detail dramatically. Large scale models always have less detailed information than small scale models because in surface reconstruction process. The program needs a local coordinate system as a platform and all the meshes will be built on this basic platform. For large study areas, even detailed photos are provided for reconstructing small regions. The whole model will still be built on a large scale. Therefore high detailed mesh models can only be built through using sub-dataset which are cut from the main dataset. A 3D GIS model was built with archaeological attributes. SfM systems can be used for improving details and accuracy in GIS model reconstruction as well as archaeological reconstruction. 95

The work confirmed the fact that SfM systems and related tools can be used for archaeological studies as well as GIS studies. On one hand, archaeologists are seeking for details as much as possible but on the other hand most GIS system cannot handle such massive data with millions of meshes. Therefore the usages of mesh models should be divided: one is “visualization” and the other is “information”. When talking about visualization, details should be kept as much as possible. But in GIS, simple models with object attributes are more important. Therefore a connection between simple models and mesh models can be built. For example in GIS system users have clicked on a simple model and a mesh model will be shown on special visualization tools. Further studies can be web applications or UI-based applications which can connect simple GIS models and complicated detailed models. Model simplification and different LOD can be considered in the GIS model reconstruction process. The GIS system or programs may have difficulties in visualizing or storing mesh models today. However, with the development of computer hardware and new algorithms, these data can be used directly in the future.

96

REFERENCE Bernardini, F., Mittleman, J., Rushmeier, H., Silva, C., & Taubin, G. (1999). The ball-pivoting algorithm for surface reconstruction. Visualization and Computer Graphics, IEEE Transactions on, 5(4), 349-359. Ducke, B., Score, D., & Reeves, J. (2011). Multiview 3D reconstruction of the archaeological site at Weymouth from image series. Computers & Graphics, 35(2), 375-382. Doneus, M., Verhoeven, G., Fera, M., Briese, C., Kucera, M., & Neubauer, W. (2011). From deposit to point cloud: a study of low-cost computer vision approaches for the straightforward documentation of archaeological excavations. In XXIIIrd International CIPA Symposium (Vol. 6, pp. 81-88). Furukawa, Y., Curless, B., Seitz, S. M., & Szeliski, R. (2010, June). Towards internet-scale multi-view stereo. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 1434-1441). IEEE. Furukawa, Y., Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(8), 1362-1376. Green, S. (2012). Structure from Motion as a Tool for Archaeology. Ghini, G., Diosono, F. (2011). Tempio di Diana a Nemi: una rilettura alla luce dei recenti scavi. Jancosek, M., & Pajdla, T. (2011, June). Multi-view reconstruction preserving weakly-supported surfaces. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 3121-3128). IEEE. Juan, L., & Gwun, O. (2009). A comparison of sift, pca-sift and surf. International Journal of Image Processing (IJIP), 3(4), 143-152. Kersten, T. P., Lindstaedt, M., für Photogrammetrie, L., & Hebebrandstrasse, L. (2012a). Generierung von 3D-Punktwolken durch kamera-basierte low-cost Systeme – Workflow und praktische Beispiele. Kersten, T., Lindstaedt, M. A. R. E. N., Mechelke, K. L. A. U. S., & Zobel, K. (2012b). Automatische 3D-Objektrekonstruktion aus unstrukturierten digitalen Bilddaten für Anwendungen in Architektur, Denkmalpflege und Archäologie. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation eV, 21, 137-148. 97

Kazhdan, M., Bolitho, M., & Hoppe, H. (2006, June). Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110. Lourakis, M., & Argyros, A. (2004). The design and implementation of a generic sparse bundle adjustment software package based on the levenberg-marquardt algorithm (Vol. 1, No. 2, p. 5). Technical Report 340, Institute of Computer Science-FORTH, Heraklion, Crete, Greece. Lindeberg, T. (1994). Scale-space theory: A basic tool for analyzing structures at different scales. Journal of applied statistics, 21(1-2), 225-270. Neitzel, F. Klonowski, J. (2011): Mobile 3D mapping with a low cost UAV system. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXVIII-1/C22 Peters, S. et al, 2012, Archeological Cartographic InformationSystem – Demonstrated for the excavation of the Diana sanctuary in Nemi. Remondino, F., Del Pizzo, S., Kersten, T. P., & Troisi, S. (2012). Low-Cost and Open-Source Solutions for Automated Image Orientation–A Critical Overview. In Progress in Cultural Heritage Preservation (pp. 40-54). Springer Berlin Heidelberg. Snavely, N., Seitz, S. M., & Szeliski, R. (2006, July). Photo tourism: exploring photo collections in 3D. In ACM transactions on graphics (TOG) (Vol. 25, No. 3, pp. 835-846). ACM. Snavely, N., Seitz, S. M., & Szeliski, R. (2008). Modeling the world from internet photo collections. International Journal of Computer Vision, 80(2), 189-210. Verhoeven, G., Taelman, D., & Vermeulen, F. (2012). COMPUTER VISION‐BASED ORTHOPHOTO MAPPING OF COMPLEX ARCHAEOLOGICAL SITES: THE ANCIENT QUARRY OF PITARANHA (PORTUGAL–SPAIN). Archaeometry, 54(6), 1114-1129. Witkin, A.P. (1983). Scale-space filtering. In International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, pp. 1019-1022. Wu, C. (2007): SiftGPU: A GPU Implementation of Scale Invariant Feature Transform (SIFT). http://www.cs.unc.edu/~ccwu/siftgpu/

98

Wu, C. (2011): VisualSFM: http://ccwu.me/vSfM/

A Visual

Structure

from

Motion

System.

Wulf, R., Sedlazeck, A., & Koch, R. (2013). 3D Reconstruction of Archaeological Trenches from Photographs. In Scientific Computing and Cultural Heritage (pp. 273-281). Springer Berlin Heidelberg.

99

APPENDICES Programs and the sources: SfM related programs: SIFT

http://www.cs.ubc.ca/~lowe/keypoints/

Bundler

http://www.cs.cornell.edu/~snavely/bundler/

CMVS

http://www.di.ens.fr/cmvs/

PMVS2 ViualSfM

http://www.di.ens.fr/pmvs/ http://ccwu.me/vSfM/

CMPMVS

http://ptak.felk.cvut.cz/SfMservice/?menu=cmpmvs

123D catch

http://www.123dapp.com/catch

Photosynth

http://photosynth.net/

PhotoScan

http://www.agisoft.ru/products/PhotoScan

ARC3D

http://www.arc3d.be/

Georeference programs: GRASS GIS http://grass.osgeo.org/ PC-AffineTrans

http://uavmapping.com/index.php?p=1_6_PCAffineTrans

Java Graticule 3D

http://javagraticule3d.sourceforge.net/

Toolkits: SynthExport

http://synthexport.codeplex.com/

PhotoSynthToolkit

http://www.visual-experiments.com/demos/photosynthtoolkit/

Cloud v8 Cygwin

http://rhin.crai.archi.fr/rld/plugin_details.php?id=777 http://www.cygwin.com/

FME

http://www.safe.com/fme/fme-technology/

SU-CityGML

http://www.citygml.de/index.php/sketchup-citygml-plugin.html

Model visualization: CloudCompare

http://www.danielgm.net/cc/

MeshLab

http://meshlab.sourceforge.net/

QSplat LandXplorer

http://graphics.stanford.edu/software/qsplat/ http://usa.autodesk.com/adsk/servlet

Model repairing: Graphite

http://graphite.wikidot.com/

MeshFix

http://sourceforge.net/projects/meshfix/ 100

3D modeling: Blender

http://www.blender.org/

Sketchup

http://www.sketchup.com/

101

Suggest Documents