Sensemaking in Multimedia Analytics

UNIVERSITY OF AMSTERDAM Sensemaking in Multimedia Analytics A case study on multimedia pivot tables analyzing a real image dataset of #iamsterdam by...
1 downloads 2 Views 8MB Size
UNIVERSITY OF AMSTERDAM

Sensemaking in Multimedia Analytics A case study on multimedia pivot tables analyzing a real image dataset of #iamsterdam

by Martin Altmann UvAnetID 10850260

A thesis submitted in fulfillment for the degree of Master of Science in Business Information Systems

September 2015

Declaration of Authorship I, Martin Altmann, declare that this thesis titled, ‘Gaining insight of an image dataset by utilizing multimedia analytics techniques’ and the work presented in it are my own. I confirm that:



This work was done wholly or mainly while in candidature for a research degree at this University.



Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.



Where I have consulted the published work of others, this is always clearly attributed.



Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.



I have acknowledged all main sources of help.



Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

Signed:

Date:

i

“...how hard it would be to get people to work together to save the species instead of themselves. Or their children. You never would have come here unless you believed you were going to save them. Evolution has yet to transcend that simple barrier. We can care deeply - selflessly - about those we know, but that empathy rarely extends beyond our line of sight.” [Nolan, 2014]

Dr. Mann played by Matt Damon in Interstellar 2014

UNIVERSITY OF AMSTERDAM

Abstract Master of Science in Business Information Systems by Martin Altmann

In the era of big data, the importance of Sensemaking is steadily growing. The question is not if big data can be beneficial but rather how these benefits are generated. In order to solve this question Multimedia Analytics aims at generating insights by applying the Sensemaking process of Pirolli and Claud. Nevertheless this is non-trivial and challenges that reflect gaps between human and computer capabilities regarding semantics and pragmatics need to be overcome. A pioneer system namely Multimedia Pivot Tables is utilized in a case study in order to demonstrate what Sensemaking in Multimedia Analytics looks like. The case study analyzing the iamsterdam marketing concept revealed that on the one hand insights can be generated which lead to the assumption that the iamsterdam marketing concept is successful. On the other hand Multimedia Pivot Tables do not fully cover the Sensemaking process yet resulting in manual effort especially in the initial stages. Simplifications like the proposed prototype and enhancements minimizing the gaps of human and computer capabilities need to be implemented in the future.

Acknowledgements I would like to say thank you to my supervisor Marcel Worring. Thank you for providing the initial idea for this thesis and all necessary information to dig deeper into Multimedia Analytics. I really appreciated the great independence offered to work on this thesis. Nevertheless I never ended up in a hopeless situation because asking for advice was always possible to resolve issues quite often also in face to face meetings. Thanks also go to Dennis C. Koelma who supported Marcel in ensuring a flawless use of the Multimedia Pivot Table tool which was quite challenging. Last but not least I want to thank my family and my girlfriend. They motivated me when needed and never doubted that I will accomplish to successfully finish this thesis. They always had my back and proof-read everything.

iv

Contents Declaration of Authorship

i

Abstract

iii

Acknowledgements

iv

List of Figures

vi

List of Tables

viii

1 Introduction 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Sensemaking in Multimedia 2.1 Process . . . . . . . . . . 2.2 Challenges . . . . . . . . . 2.3 Advances . . . . . . . . .

1 2 4

Analytics 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Case Study 14 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Initiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Results

22

5 Discussion 29 5.1 Case Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Multimedia Pivot Table Evaluation . . . . . . . . . . . . . . . . . . . . . . 31 6 Conclusion 36 6.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Appendix

39

Bibliography

46 v

List of Figures 2.1 2.2 2.3 2.4 2.5

Sensemaking Process of Pirolli and Claud . . . . . . . . . . . . . . . . . Multimedia Analytics process by Zahalka based on Visual Analytics process by Keim et al . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exploration-search axis by Zahalka with example tasks . . . . . . . . . . Challenges in Multimedia Analytics - Example images . . . . . . . . . . Evaluation of pioneer multimedia analytic tools by Worring and Zahalka

3.1 3.2 3.3

Representative images for the research subject concept of iamsterdam . . 17 Illustration of generic and specific concepts within the ImageNet hierarchy 18 Multimedia Pivot Table filtering and role assigning interface . . . . . . . . 19

4.1 4.2

4.8

iamsterdam histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . PivotTable with five buckets as row variable (P1); geolocation, year and images as value variable (P2) . . . . . . . . . . . . . . . . . . . . . . . . relevance bucket distribution for tags #sign, #letters, #red and #selfie PivotTable with tag buckets as row variable (P1) and images and geolocations as value variables (P2); geolocations weighted with iamsterdam (P7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PivotTable with geolocation buckets as row variable (P1) and geolocations and images as value variables (P2) . . . . . . . . . . . . . . . . . . . . . geolocation bucket distribution for tag pairs . . . . . . . . . . . . . . . . PivotTable with season buckets as row variable (P1) and images as value variable (P2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . season bucket distribution of tags for event detection . . . . . . . . . . .

5.1

Overview of the Multimedia Pivot Table evaluation . . . . . . . . . . . . . 31

6.1

Illustration how semantic extraction needs to operate with non-exclusive classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8

Visual Analytics process by Keim et al . . . . . . . . . . . . . . . Sample Response of Instagram API . . . . . . . . . . . . . . . . . Sample Response of Flickr API Part 1 . . . . . . . . . . . . . . . Sample Response of Flickr API Part 2 . . . . . . . . . . . . . . . Visualization of the used API Endpoints . . . . . . . . . . . . . . Extract of CSV Dataset . . . . . . . . . . . . . . . . . . . . . . . Rough overview of collected images . . . . . . . . . . . . . . . . . Example UI for the process of selecting relevant concepts for the media Pivot Table tool . . . . . . . . . . . . . . . . . . . . . . . .

4.3 4.4

4.5 4.6 4.7

vi

. . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi. . . .

.

6

. 6 . 9 . 10 13

. 22 . 23 . 24

. 25 . 25 . 26 . 27 . 27

. . . . . . .

39 40 41 42 43 43 44

. 44

List of Figures

vii

A.9 MediaTable view with denoted semantic interactions from chapter 3.3 . . 45 A.10 PivotTable view with denoted semantic interactions from chapter 3.3 . . . 45

List of Tables 2.1 2.2

bottom-up process and top-down process of the Sensemaking Process . . . 8 System capabilites of handling the semantic gap and the pragmatic gap . 11

3.1 3.2 3.3 3.4

Overview of iamsterdam letters . . . . . . . . Roles of variables in multimedia pivot tables . Semantic Interactions of the MediaTable view Semantic Interactions of the PivotTable view

4.1

iamsterdam relevance buckets . . . . . . . . . . . . . . . . . . . . . . . . . 23

viii

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

15 20 21 21

Chapter 1

Introduction Big data is one of the most famous terms of the current decade, although it has never been nominated as a word of the year. Big data is unimaginable huge and the same applies for its growth. Where in 2000 only 25% of the world’s data was digital this changed tremendously already in 2007 where already 94% were digital and reached a volume of more than 300 exabyte. This number expressed in known quantities is equivalent to 300.000.000 terabyte or 300.000.000.000.000 megabyte. [Mayer-Schoenberger, 2013, p. 8]. According to [Jacobs, 2009] the truth about big data is on the one hand the easy collection of the data but on the other hand the nontrivial retrieval of it. In addition only the retrieval does not automatically create insights or reveals patterns within the data. The question that needs to be answered is therefore not if big data is beneficial but rather how big data can be beneficial for our society and how do we accomplish these benefits.

The field of Multimedia Analytics aims at tackling these questions with a focus on multimedia data such as images, videos, audio, animations and related text or captions. This field that recently emerged out of two separate fields. The first field is Multimedia Analysis and focuses on the analysis of images, video and also audio. Research and progress in this field is most of the time limited to an individual media type. The field of Visual Analytics is the second component and aims to provide technology for human-centric analysis. Scale-independent analytics for any data type shall be enabled by dynamic visual interfaces [Chinchor et al. , 2010a].

1

Introduction

2

Combining these two fields into Multimedia Analytics means, that different types of information such as images and corresponding meta data, gathered from different sources, are used to conduct large scale analysis. [Chinchor et al. , 2010a] This introduction will provide a high level overview of the field of Multimedia Analytics. Investigations will take place in order to identify a gap that is worth to be closed by conducting research in a specific niche. After the motivation of this thesis is clear the Literature Review chapter will provide the underlying knowledge necessary to understand the research subject. It will illustrate a process that explains how analytics is able to get from data to insight. This is called the Sensemaking process. The compliance to this process for multimedia data is unfortunately nontrivial. Therefore the major challenges are explained followed by the advances that have been made to overcome these. A case study that executes the mentioned process will be conducted where multimedia analytics tool will be utilized in order to find out how well the process is supported. The actual results of the case study subject are then presented but only play a secondary role. More important is the performance of the utilized tool according to the process and the succeeding evaluation.

1.1

Background

As already mentioned in the Introduction Multimedia Analytics emerged recently out of the fields Multimedia Analysis and Visual Analytics [Chinchor et al. , 2010a]. It is therefor not surprising that the main research fields within Multimedia Analytics can roughly be summarized with Audio Analysis, Image Analysis and Video Analysis combined with Visual Analytics. The following summaries of these fields are based on both publications of Chinchor et al. in the IEEE Computer Graphics and Applications journal [Chinchor et al. , 2010a] [Chinchor et al. , 2010b].

Audio Analysis The field of Audio Analysis aims at extracting meaningful information from the signals of an piece of audio. Therefor the signals and frequencies of an audio file are analyzed.

Introduction

3

Image Analysis Images are generally difficult types of data which are unstructured to analyze. Image Analysis is the field that is dealing with that. It aims at extracting meaningful information from images and can also be described as digital image processing. An image contains all its information in the picture itself and in the corresponding metadata. When analyzing an image it is needed to extract the metadata and to utilize computer vision in order to anaylze the picture itself [Martin & Tosunoglu, 2000].

Video Analysis Video Analysis refers to the Video Content Analysis (VCA). This term is used for describing the processing and analyzing of video steams through computer. VCA is a capability that analyzes video automatically for detecting and determining temporal and spatial events for example suspicious behaviour in surveillance streams. Algorithms of VCA aim at revealing the content video data conveys and making it available to the analyst [Hanjalic, 2004].

Visual Analytics Visual analytics is the second component of one of the three above analysis in Multimedia Analytics. It aims at an effective understanding of large and complex data sets. In combination with analysis from above this means huge datasets of audio, images or video. Gaining insights as the goal of Visual Analytics should be enabled by tools facilitating automated analysis techniques combined with interactive visualizations [Keim et al. , 2008].

The authors of the used papers highlight that multimedia fusion (use more than one type of multimedia data) or the combination of several multimedia concepts almost never takes place. They further mention that Modern multimedia information retrieval (MIR) do not classify text or document information as multimedia data. [Chinchor et al. , 2010a, p. 54]. This thesis nevertheless addresses the gap of multimedia fusion and tries to elaborate on the feasibility of closing this gap.

Introduction

1.2

4

Motivation

Above a gap was identified where research until now has only made a few advances. Instead of analyzing different parts of the data separately this thesis aims at combining different data sources, as well as different data types. The analytic process also has to be interactive in order to include the user/researcher. Having said this it sounds like a repetition of the definition of the multimedia analytics field. Nevertheless the reality in this field reveals that the full potential of multimedia analytics is rarely applied all at once in a single research. This thesis tries to do so by utilizing a multimedia analytics tool in a case study that conducts image analysis combined with analytics of the corresponding image meta data. This is also reflected by the following research questions:

Main Research Question How can tools of multimedia analytics support the process of gaining insight from an image dataset? Sub Research Questions

1. What is the process of gaining insight? 2. What are the challenges applying it in multimedia analytics? 3. What advances improve the overcoming of these challenges? 4. How does a execution of the process look like? 5. How can multimedia analytic tools support the process of gaining insights? 6. Is multimedia analytics capable of making beneficial use of big data?

The first three questions will be answered within the Literature Review section. This serves as a preparation for the reader to become acquainted with the relevant knowledge that is needed for the case study. The case study will demonstrate how the execution of the process looks like in real research and serve as answer for question four. The case study results and an evaluation of the utilized multimedia analytics tool then provide the answers to question five. The sixth and last question will be answered in the conclusion and provides an outlook towards big data.

Chapter 2

Sensemaking in Multimedia Analytics In this chapter the subject of Sensemaking in Multimedia Analytics is introduced. First of all a process is identified that will serve as a basis for the later case study. After it has been described, an explanation of what insights are and how they can be characterized follows. The application of the Sensemaking process is nontrivial in reality. The existing challenges are explained and are followed by a section that mentions advances within Multimedia Analytics in order to overcome these which includes applications of that field.

2.1

Process

The term Sensemaking as a human-computer interaction was introduced in 1993. It is defined as a process that aims at gaining knowledge out of data [Russell et al. , 1993]. Pirolli and Card are authors from the paper in 1993 and further developed the idea of Sensemaking. The result was published in 2005 and introduces a model of the Sensemaking process [Pirolli & Card, 2005]. It is shown in Figure 2.1 below.

5

Sensemaking in Multimedia Analytics

6

Figure 2.1: Sensemaking Process of Pirolli and Claud

An alternative process originates from the research field Visual Analytics which is one of two components of Multimedia Analytics. The Visual Analytics process was introduced by [Keim et al. , 2008] and also describes a process that aims at gaining knowledge from data. It is shown in Appendix Figure A.1 on page 39. The process was further developed and adapted by [Zahalka & Worring, 2014] to better fit the research field of multimedia analytics. The resulting Multimedia Analytics process is shown in Figure 2.2 below.

Figure 2.2: Multimedia Analytics process by Zahalka based on Visual Analytics process by Keim et al

Sensemaking in Multimedia Analytics

7

The Sensemaking process of Pirolli and Card will form the basis for the later case study. Its structure and the higher level of detail in artifacts and tasks have driven this decision as it is beneficial for evaluating the performance of a multimedia analytics tool. It can also be argued that the Visual Analytics process by Keim respectively the Multimedia Analytics process by Zahalka is located at every artifact within the Sensemaking process of Pirolli and Claud. This elucidates that on the one hand Keims and Zahlakas process models have their advantage in compactness whereas the Sensemaking process of Pirolli and Claud has its advantage in a more detailed visualization of basically the same processes aiming at gaining insights from data. Therefor follows in the next paragraph a detailed description of that process.

At first sight the figure of the Sensemaking Process seems to overwhelm the observer. It simplifies with the following decomposition: The overall process can be split up in three loops and two processes. They are combined out of the 10 tasks and 6 artifacts available in the process. Besides the x-dimension of effort and the y-dimension of structure are added. The two major loops are the foraging loop and the sense making loop. The first one aims at creating a schema out of collected and analyzed data. This schema should represent an idea of sense that is assumed out of a dataset. The sense making loop then tries to conceptualize the schema that fits the evidences best. It results in a verifiable hypothesis. The third and last loop is the sense making loop for the analyst. It simply represents the combination of both processes within the process and suggests its mixed iterative execution. The two processes are the bottom-up process (from data to theory) and the top-down process (from theory to data). They are described in the table below.

Sensemaking in Multimedia Analytics bottom-up process

8 top-down process

Search and filter is the use of an external

Re-evaluate is the task of updating a hy-

data source functioning as a repository

pothesis towards new discoveries or in-

which is searched by the analyst. Rele-

quiries originating from the presentation or

vant data is stored in a shoebox.

publication of it .

Read and extract is the extraction of

Search for support becomes necessary if a

nuggets of evidence from the shoebox

hypothesis is altered. This may require re-

which can be used to draw inference

examination of the schemes.

which may trigger hypotheses Schematize is the task of representing the

Search for evidence becomes necessary if a

collected evidences in schematic way.

schema is altered. This may require reexamination of shoeboxes and evidence files.

Build case is a task of the combination

Search for relations aims at discovering new

of several evidences in order to create a

patterns and relations in evidence files that

theory or hypothesis that is verifiable.

generate hypotheses.

Tell story eventually presents the theory

Search for information becomes necessary

or hypothesis to an audience either via a

if new hypotheses are generated that require

presentation or a publication.

more data to achieve its verification.

Table 2.1: bottom-up process and top-down process of the Sensemaking Process

Both processes can also be compared to the components of the exploration-search axis introduced by [Zahalka & Worring, 2014] and visible in Figure 2.3 below. It visualizes the approaches Exploration and Search suggested by [Marchionini, 2006] in order to gain knowledge. Using the Exploration approach, the analyst explores and discovers a dataset without any intention to find a specific insight. Relevant insights are not defined and build-up during the analysis. This approach is similar to the bottom-up process of the Sensemaking process where one wants to get from data to theory. An analyst investigating a dataset in order to find a specific insight that was defined beforehand uses the Search approach which can be compared with the top-down process of the Sensemaking process where one wants to get from theory to data. Imagine the case where an analyst starts to explore a dataset without any intentions. The analyst uses the bottom-up process respectively the Exploration approach. Once he identified something interesting the top-down process respectively the approach of Search is initiated and driven

Sensemaking in Multimedia Analytics

9

by curiosity. Sensemaking in Multimedia Analytics uses both approaches interactively which also is reflected by the term exploratory search [Marchionini, 2006] that depicts the interchangeably usage.

Figure 2.3: Exploration-search axis by Zahalka with example tasks

The ultimate goal of the Sensemaking process is gaining insights and therefore to generate knowledge. This also applies to the exploration-search axis but especially aims at generating knowledge out of image collections. Worring and Zahalka explain that categorization plays an important role in the process of gaining insights [Worring, 2015] [Zahalka & Worring, 2014]. The analyst is labeling fragments of the dataset while undergoing the Sensemaking process. These labels represent intermediate results upon which the analyst continues his analysis until the combination of several labels or respectively categories generates insight. Insights therefore are defined by several categories. Next to the label of the insight Worring also includes the members belonging to it and the connotations of the analyst supporting the underlying sense of the insight. Furthermore insights have some typical characteristics [North, 2006]. Insights are complex as they are generated from an remarkable amount of the provided data that is used in a synergic way. They can also depend on other insights which can make the construct of insights deep. In addition these constructs can reveal unexpected insights that have not been predicted. This may vary between researchers as insights are qualitative and can therefor be uncertain and subjective. This is also reflected by the attribute that an insight can have multiple levels of resolution that vary in certainty. All in one can say that insights are deeply embedded in the dataset. The data therefor has to be connected and labeled with relevant insights. This may go beyond dry data analysis for an impact on the domain analyzed.

Applying the Sensemaking process in reality and generating insights is as already mentioned non-trivial. In the next section the challenges of Sensemaking in Multimedia Analytics are described and advances that try to overcome these.

Sensemaking in Multimedia Analytics

2.2

10

Challenges

The challenge in Multimedia Analytics is to equip the computer with human-like capabilities when it comes to the perception of an image. Humans have a great capability of analyzing multimedia information which computers still struggle at. Apart from that humans do not have a great cognitive capacity whereas computers have large memories and greater processing power. In the following paragraph the explicit challenges are simplified explained on the basis of the two single example images visible in Figure 2.4 below.

Figure 2.4: Challenges in Multimedia Analytics - Example images

On the left one can see an image of a mushroom in nature. Humans implicitly categorize or label this image as a mushroom picture without even thinking about it. On the right one can see an image of a mushroom-shaped cloud that results from a large explosion. Even if humans have never seen such a mushroom-shaped cloud before they implicitly do not assign the same label or category to this image whereas computers may assume similarity based on shapes. Due to the different visual content humans do not choose the same words to describe it (except mushroom in this case maybe). This phenomenon is studied in the field of pragmatics [Mey, 2001]. Computers or respectively applications struggle at supporting the human capability of pragmatically assigning labels or categories when conducting large-scale analysis. This gap is defined as the pragmatic gap and can be identified as the first challenge being valid for images, fragments of collections or whole image collections. [Zahalka & Worring, 2014] identify three key aspects that help closing this gap. Categories should be non-exclusive, dynamic and able to be created at any time. This

Sensemaking in Multimedia Analytics

11

means that also more than one category can be assigned or that categories can change their meaning during the analysis. The second challenge concerns the perception of the visual content by computers. Humans clearly perceive the mushroom in the left image above and identify it as the main object. In the background one can identify natural soil. The right images shows a mushroom-shaped cloud of dust as the main object. The background appears to be sky. Humans do all this implicitly only by looking at the picture whereas computers have a complete different way of perceiving images. Computers translate images in mathematical representations that are called features and are numeric values derived from every pixel in an image. This approach aims at closing the so called semantic gap defined by [Smeulders et al. , 2000] reflecting the gap between human and computer capability of semantic extraction. The motivation of multimedia analytics is clearly the minimization of both gaps. In the next section advances within multimedia analytics are described that aim at closing or minimizing the pragmatic gap and the semantic gap.

2.3

Advances

Advances in Multimedia Analytics aim at closing the pragmatic gap and the semantic gap. Developed systems for conducting large-scale multimedia analysis can be assessed for their capability of handling both gaps. Worring and Zahalka defined limited, intermediate and advanced capability of handling each gap. This is explained in the table below based on their publication [Zahalka & Worring, 2014]. The advanced capabilities do not exist yet and also illustrate the goal of multimedia analytics systems. Capabilities Semantic Gap

Pragmatic Gap

Limited No semantics but metadata and basic visual characteristics like color non-adaptive classic classification

Intermediate semantics via objective semantic concepts like person or car adaptive classification via interactions

Advanced high-level complex semantics like patient with cancer classification fully adapts dynamically according to analysts interactions

Table 2.2: System capabilites of handling the semantic gap and the pragmatic gap

Sensemaking in Multimedia Analytics

12

Equipping systems with similar semantic extraction capabilities as humans has two major approaches. The older one is explicit feature extraction followed by classification. The approach uses several techniques for example scale-invariant feature transform (SIFT) by [Lowe, 1999] or speeded-up robust features (SURF) by [Bay et al. , 2008] as local feature extraction technique or GIST by [Oliva & Torralba, 2001] as global technique to extract features. Local techniques aim at extracting features about objects located in the image. The single features are combined into feature vectors and then grouped into feature representations by techniques such as the histogram of oriented gradients (HOG) by [Dalal & Triggs, 2005], bag of visual words (BoW) by [Sivic & Zisserman, 2003] or Fisher vectors by[Perronnin et al. , 2010]. In contrast global techniques try to extract semantics of the whole image as GIST claims to extract scene characteristics. Once the explicit feature extraction is done the classification follows. The most common technique used is support vector machine (SVM) by [Cortes & Vapnik, 1995]. It is designed to solve two-group classification problems with supervised learning. In this application it is used to classify whether a feature is present in an image (class 1) or not (class 2) after a model has been trained with images containing the feature (class 1) and with images that do not contain the feature (class 2). The more state-of-the-art approach for extracting semantics is deep learning. It uses convolutional neural networks (CNN) by [Bengio et al. , 1995]. These networks are statistical learning models that aim at estimations that depend on a large number of input variables and are generally unknown. Krizhevsky et al used this technique to train a large, deep convolutional neural network that classified ImageNet images [Krizhevsky et al. , 2012]. ImageNet is a hierarchy of semantic concepts grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. It is based on the hierarchy of WordNet which is a lexical database of English nouns, verbs, adjectives and adverbs. In 2009 where the authors published the ImageNet paper roughly 6.000 categories have been available [Deng et al. , 2009]. The ImageNet 2011 Fall Release already contains more than 32.000 categories which also reflects the usefulness of this database. The more general categories at the top of the hierarchy contain on average around 1.000 pictures. This amount decrease if the category is located in a lower hierarchy level as the concept becomes more detailed and specialized. Krizhevsky et al trained a model that classified the ImageNet images into 1.000 categories and achieved top-scoring error rates that ruled out every previous achievements that was submitted in the so called ImageNet Large Scale Visual Recognition Challenge. This annual event

Sensemaking in Multimedia Analytics

13

motivates and directs research towards the diminution of the semantic gap. The pragmatic gap concerns the classification and is mainly influenced by the tool an analyst is using. It depends on the features of the tool and the possible interactions of the analyst to adapt and modify the underlying classification model. Worring and Zahalka evaluated pioneer multimedia analytic tools and assessed their capability of handling both gaps. The result of this evaluation is visible in the figure below and adopted from their publication [Zahalka & Worring, 2014].

Figure 2.5: Evaluation of pioneer multimedia analytic tools by Worring and Zahalka

The aspiration is to utilize a tool in the case study that handles both gaps with at least intermediate capabilities. This shrinks down the possible pioneer systems to be utilized within the case study to two candidates which are the INA Browser and the MediaTable. The INA Browser is designed to interactively explore the French TV archives. It utilizes visual search by similarity which makes it not applicable for an image dataset concerning a single specific topic like in the case study of this thesis. MediaTable as only remaining candidate also handles the pragmatic gap in a better way according to Worring and Zahalka. It is therefor utilized in the following case study.

Chapter 3

Case Study In this chapter the case study that serves for conducting this research is presented. It wants to be able to evaluate how multimedia analytics undergo the Sensemaking process. Therefore an image dataset from the real world will be analyzed. This chapter is divided into three sections. First of all the case study is described and introduced. The Sensemaking process will be undergone by the case study in order to identify where the executing analyst is supported by the utilized tool. It includes a description of how the data has been collected and how it was handled to be suitable for this research. Besides the utilization of the multimedia analytics Multimedia Pivot Tables tool, that reflects the analytic part of the Sensemaking process, will be explained. It provides a description of how different functions of the tool work and what they reflect within the Sensemaking process.

3.1

Introduction

The real image dataset that has been determined for the purpose of this case study is based on the iamsterdam letters that are located in several locations within Amsterdam. The letter are part of the iamsterdam marketing concept that aims at promoting the city Amsterdam. The letters aim at representing this concept and at being or becoming a tourist attraction. There are version of these letters that have a static location. They can be found at the back of the Rijksmuseum on Museumplein and at Amsterdam Airport Schiphol. In addition one version of the letters is travelling through the city 14

Case Study

15

[iamsterdam.com, 2015]. The locations of these letters including an example image are shown in Table 3.1 below.

Location

Rijksmuseum

Example image

on

Museumplein

Amsterdam Airport Schiphol

Travelling

letters

currently in Westerpark

Table 3.1: Overview of iamsterdam letters

Requirements towards the image data set are not defined by the literature. The suitability of the image data set can therefore not be proven upfront. The goal of the case study is to be able to make a statement about the iamsterdam marketing concept and if the iamsterdam letters do contribute to this marketing concept. If no statement can be

Case Study

16

made after having conducted this case study an evaluation if the dataset was not suitable or Multimedia Analytics was not capable of answering it needs to be undergone. In the next section the Sensemaking process is initiated.

3.2

Initiation

The subject for the case study is now defined and the Sensemaking process can be initiated. The process starts with 1. External Data Sources where data is 2. Searched & Filtered from. The Multimedia Pivot Table tool is not supporting a connection to any data sources where it can search and filter data from. The sources and the task of extracting the data therefore has to be fulfilled without the support of the tool. The requirements towards the sources of the images are simplified by the easy accessibility and the richness of relevant iamsterdam images. The first requirement can be met by all the major social media networks as they all offer an Advanced Programming Interface (API). With the help of these APIs huge amounts of publicly available data is accessible and collectable. Based on suitable parameters for the purpose of this case study the data can be collected. The second requirement of richness of relevant images is individually set for this case study. The target amount of to be analyzed images is set to 50.000 images. The social media network Instagram and the foto community Flickr have been identified as suitable to meet this target amount. The determined parameters in order to 2. Search & Filter relevant images on these platforms are simple. The collection on both platforms is based on the hashtag #iamsterdam. For the collection on the Instagram platform the tags/tag-name/media/recent Instagram API Endpoint was used. The tag-name parameter was set to iamsterdam. The necessary data is available in the result that is returned by the Instagram API. An example of this result is shown in Appendix Figure A.2 on page 40. The collection on the Flickr platform is slightly different and requires the usage of multiple API endpoints in order to make the necessary data available. First of all the flickr.photos.search API Endpoint was used. The text parameter was set to iamsterdam. The result of this API call only returns an internal Flickr ID of relevant images. Once these internal Flickr IDs have been collected they have to be enriched with the remaining necessary data. For this enrichment the flickr.photos.getInfo, flickr.photos.getSizes,

Case Study

17

flickr.photos.getExif and flickr.photos.geo.getLocation Endpoints have been used. A collection of example results of the mentioned endpoints is shown in Appendix Figure A.3 on page 41 and Figure A.4 on page 42. The usage of the APIs of both platforms in order to collect the relevant data is also visualized in Appendix Figure A.5 on page 43. A rough overview of the collected images is shown in Appendix Figure A.7 on page 44. This overview reveals that the hashtag #iamsterdam is not only used for images that are taken at the locations of the iamsterdam letters. This fact makes it necessary to create a research subject concept detector. It should be able to distinguish between relevant images that are actually taken at the locations of the iamsterdam sign and images that are not taken there. A semantic concept for this distinction will be trained and detect if the iamsterdam letters are part of the image or not. This step seems useful to have an indicator of relevance for every image. If social media functions as a source it is advisable to include this step, as tags are used subjectively by the users which an lead to a lof of noise within the metadata. In order to create this concept two sets of representative images that include the letters or respectively do not include them need to be identified manually. An extract of these representative images are shown in Figure 3.1 below.

Figure 3.1: Representative images for the research subject concept of iamsterdam

Once the images and the corresponding metadata have been saved and the iamsterdam concept has been trained the semantic concept detection is executed on every image. These concepts coming from ImageNet cover almost everything that is imaginable from for example generic concepts like dog to more specific concepts like hunting dog. This

Case Study

18

continues by going deeper within the ImageNet hierarchy to even more specialized concepts like terrier being a hunting dog and with yorkshire terrier being a terrier and a hunting dog. This is also illustrated in Figure 3.2 below.

Figure 3.2: Illustration of generic and specific concepts within the ImageNet hierarchy

This rich pool of concepts is not applicable which makes it necessary to filter them. No tools are currently supporting this selection process and the usage of Python seems like the natural candidate to conduct this processing task. It is advisable to support this selection process with an User Interface (UI) that reduces the chance of errors while working with plain data files. For this case study a prototype has been developed by the executive researcher in order to simplify this selection process. It is shown in Appendix Figure A.8 on page 44. It enables filtering based on the ImageNet concept hierarchy and visualizes the filtered concepts with an image and a checkbox to choose it. It was used to undergo the selection process of ImageNet concepts resulting in a set of roughly 275 concepts that are expected to be applicable for the iamsterdam case study. Now that the images and the corresponding metadata is available and the concept selection has been done a CSV file can be created that holds both data aspects. An extract of the final CSV file is shown in the Appendix Figure A.6 on page 43. The case study is now ready to utilize the Multimedia Pivot Table tool and therefor reaches the process artifact 4. Shoebox. The Shoebox is represented by the MediaTable holding all the relevant data where one can 5. Read & Extract data from.

Case Study

3.3

19

Methods

The dataset has been prepared for the utilization of the Multimedia Pivot Tables tool [Worring & Koelma, 2013]. It can now be displayed as the MediaTable where all the images are displayed as a row. The columns are the metadata, and all probabilistic values of the semantic concept detection. The MediaTable serves as the basis for filtering and categorizing the dataset. The user can semi-interactively reduce the original dataset to a smaller active dataset or categorize several selections into buckets. Buckets serve as a categorization feature in order to group similarities and to structure the dataset. The great flexibility of these buckets allow the representation of the analysts dynamic refinements towards the understanding of the dataset. This is the key to achieve insight generation based on categorization as bucket members, bucket notes and bucket context are not fixed and can be refined at any time utilizing the Multimedia Pivot Table tool. In Figure 3.3 below the filtering interface is shown on the left side where filters based on variables of the dataset, buckets or semantic concepts can be added. Filters based on tags and nominal variables use regular expressions and numeric variables are filtered via range selection. If semantic concepts are added as a filter the filtering is based on a range of the probabilistic concept score. The Histogram Widget in the middle of the figure below aids the analyst for semi-interactively assigning the range. Once the filtering and categorizing has been finalized the analyst can move on to the PivotTable feature. The PivotTable can interactively be generated by the analyst by assigning the roles of a pivot table to variables from the dataset. The interface for assigning these roles is shown in Figure 3.3 below on the right side. The user can assign the roles Row, Value or Sort/Weight. The result of assigning these roles and what it means for the PivotTable is depicted in Table 3.2 below adopted from [Worring, 2015].

Figure 3.3: Multimedia Pivot Table filtering and role assigning interface

Case Study

20

Variable Type

Row role

Value role

Sort/Weight role

Image

Individual images

Sorted list of images

n/a

Concept

7-point summary

Distribution

Concept as weight

Tags

Individual tags

Tag cloud

BM25 / Frequency weights

Nominal

Individual labels

Label cloud

n/a

Geolocation

n/a

Map

n/a

Numeric

7-point-summary

Sum, Max, Average,

Value as weight

Distribution Bucket

Individual buckets

Weighted histogram

n/a

Table 3.2: Roles of variables in multimedia pivot tables

The features can be defined as semantic interaction [Endert et al. , 2012] and associated with an analytical reasoning of the analyst and at the same time they are linked to the tasks present in the Sensemaking process. First the link to the process is described and afterwards the semantic interactions of the MediaTable and the PivotTable are described in detail. Using the filtering feature of the MediaTable and the generating of PivotTables in an iterative manner covers several tasks and artifacts in the Sensemaking process. Reducing the initial dataset displayed in the MediaTable reflects the 5. Read Extract task. The temporarily reduced dataset and the categorization feature of buckets reflect the artifact Evidence File. Generating PivotTables executes the task 8. Schematize and results in a pivot table reflecting the artifact 10. Schema. Especially iteratively using the PivotTable feature aims at fulfilling the 11. Build Case task and is the essence for reflecting the 17. Foraging Loop. This also includes the top-down tasks such as 12. Search for Support, 9. Search for Evidence and 6. Search for Relations. Semantic interactions can be defined for the MediaTable view that is shown in Appendix Figure A.9 on page 45. They are described in Table 3.3 below. Semantic interactions for the PivotTable view are seperatly described in Table 3.4 below. The PivotTable view is shown in Appendix Figure A.10 on page 45. Both tables consist of three columns: the semantic interaction, the analytical reasoning of the analyst by using it and the impact that this usage has within the Multimedia Pivot Table tool.

Case Study

21

Semantic Interaction M1-Filter

Analytic Reasoning Increase relevance

M2-Sort M3-Bucket-Meaning

Sort for relevance Add/Edit meaning of category (evidence file) Skimming visually through the dataset Dataset categorized in Buckets

M4-Inspect-Bucket-Grid M5-Bucket-Adoption

Impact Changes size of dataset used in PivotTable MediaTable sorted Name and Description assigned to Bucket

Images placed in Bucket or removed from it

Table 3.3: Semantic Interactions of the MediaTable view

Semantic Interaction P1-Bucket-as-Row P2-Data-as-Value P3-Bucket-as-Value P4-Data-as-Row P5-Concept-as-Row P6-Concept-as-Value

Analytic Reasoning Compare data of Buckets

Impact Generation of PivotTable

Analyze data distribution among Buckets

Generation of PivotTable

Analyze concept score distribution Analyze concept scores among data/buckets

Generation of 7-point summary Maximum,Median and Minimum concept scores are displayed Scatterplot/Lineplot as value variable Minimum entry requirements for row variables; ConceptRowAggregationMode; Tag Weight Mode

P7-Concept-asSort/Weight P8-Adjust-PivotTableParameters

Discover correlation

P9-Inspect-PivotRows

Skimming through/Analyze visualizations of PivotRows

Adjust analysis parameters

Table 3.4: Semantic Interactions of the PivotTable view

Chapter 4

Results The Result chapter is presenting the results of the iamsterdam case study. Besides the actual insight that has been extracted from the image dataset this section also emphasis on how the tool has been used respectively which semantic interactions from Table 3.3 and Table 3.4 have been used in order to generate these specific insights.

The first evaluation worth considering is to check the relevance of the image dataset according to the iamsterdam case study subject. A first impression is provided by the histogram that shows the distribution of the iamsterdam concept scores visible in Figure 4.1 below.

Figure 4.1: iamsterdam histogram

In order to gain more details about the relevance, buckets have been set up by using semantic interaction M3-Bucket-Meaning and M5-Bucket-Adoption. The characteristics of the buckets is shown in Table 4.1 below. The lower and upper border represent the threshold values for the iamsterdam concept scores. The sum of images in all buckets

22

Results

23

does not reflect the total amount of images as images with iamsterdam concept score values equal to a lower or upper border are contained in both bordering buckets. Bucket Name

upper border

lower border

amount images

very high relevance

1.00

0.95

7.628

high relevance

0.95

0.90

2.584

medium relevance

0.90

0.75

3.721

low relevance

0.75

0.50

4.371

very low relevance

0.50

0.00

36.119

Table 4.1: iamsterdam relevance buckets

The borders of the buckets were determined manually with the help of creating a PivotTable with the iamsterdam concept as a row variable. This reflects semantic interaction P5-Concept-as-Row and results in a 7-point-summary and provides a rough overview of the distribution of the concept scores. The buckets are used for generating a PivotTable with them as a row variable (P1-Bucket-as-Row) and in addition several data variables get the Value role assigned (P2-Data-as-Value). The result is displayed in Figure 4.2 below.

Figure 4.2: PivotTable with five buckets as row variable (P1); geolocation, year and images as value variable (P2)

The next PivotTable created has tags as row variable (P4-Data-as-Row) and the previous defined buckets as column variable (P3-Bucket-as-Value). The minimum amount of

Results

24

images that the tag has to be used in is set to 10 (P8-Adjust-PivotTable-Parameters). This PivotTable aids the process of finding tags that are noticeable distributed among the buckets. Interesting distributed tags identified by using semantic interaction P9Inspect-PivotRows are #sign, #letters, #red and #selfie. Their distribution among the buckets is displayed in Figure 4.3 below.

Figure 4.3: relevance bucket distribution for tags #sign, #letters, #red and #selfie

The previous defined relevance buckets are replaced by four buckets each containing the corresponding images labelled with the tags identified above (M3-Bucket-Meaning M5-Bucket-Adoption). A PivotTable is created with buckets as row variable (P1-Bucketas-Row) and and images and geolocations as value variable (P2-Data-as-Value) . The result is visible in Figure 4.4 below.

Results

25

Figure 4.4: PivotTable with tag buckets as row variable (P1) and images and geolocations as value variables (P2); geolocations weighted with iamsterdam (P7)

In Figure 4.1 and in Figure 4.4 geolocations far from Amsterdam can be identified. Therefore three new buckets are created (M5-Bucket-Adoption) separating the image dataset into North and South America with a longitude below -35. Longitudes between -35 and 35 are summarized as EMEA (Europe, Middle East and Africa) and longitudes bigger than 35 are summarized as Asia (M3-Bucket-Meaning). A PivotTable with these buckets as row variable (P1-Bucket-as-Row) and the geolocations and the images as value variables (P2-Data-as-Value) is created. The result is shown in Figure 4.5 below.

Figure 4.5: PivotTable with geolocation buckets as row variable (P1) and geolocations and images as value variables (P2)

Results

26

The same buckets are used for another analysis of tags in order to identify noticeable distributions among the geolocation buckets (P3-Bucket-as-Value P4-Data-as-Row). Interesting distributed tag pairs could be identified (P9-Inspect-PivotRows) and are displayed in Figure 4.6 below.

Figure 4.6: geolocation bucket distribution for tag pairs

A last set of buckets that reflect the seasons of a year is set up (M3-Bucket-Meaning M5-Bucket-Adoption). A PivotTable with these buckets as row variable (P1-Bucketas-Row) and the images as value variable (P2-Data-as-Value) is created and shown in Figure 4.7 below.

Results

27

Figure 4.7: PivotTable with season buckets as row variable (P1) and images as value variable (P2)

Analyzing the distribution among the buckets (P3-Bucket-as-Value) in a PivotTable with tags as row variable (P4-Data-as-Row) enables the detection of events and their belonging to a season of the year. Detectable events (P9-Inspect-PivotRows) are shown in Figure 4.8 below.

Figure 4.8: season bucket distribution of tags for event detection

Results

28

Until now buckets have not been created based on semantic concepts (except the individual iamsterdam concept) nor have they been used as row or column variables in generated PivotTables. Their efficient usage is harmed by overall low scores for the manually selected concepts in chapter 3.2. This impacts the semantic extraction in general and furthermore limits the usage of semantic concepts as filter (M1-Filter) and sort (M2-Sort) variable within the MediaTable and as ranking/sort variable (P7-Concept-asSort/Weight) within PivotTables. This is evaluated in the Discussion chapter.

Chapter 5

Discussion In this chapter the results of the iamsterdam case study are scrutinized at first. Afterwards an evaluation of the Multimedia Pivot Table tool on how well it supports the Sensemaking process is decribed and discussed.

5.1

Case Study Results

Considering the iamsterdam histogram in Figure 4.1 and the amount of images in each of the relevance buckets defined in Table 4.1 one can make a statement about the tag #iamsterdam. This tag has been used to gather the images from the flickr and instagram platform. Nevertheless does the #iamsterdam tag not directly correlate with high iamsterdam concept scores. It can therefor inferred that this tag is not specifically used to label ones images if they have been taken near the iamsterdam letters. This is reflected by having the majority of the images in the very low relevance bucket. A PivotTable has been created in order to identify tags that occur more often in the very high relevance bucket. As visible in Figure 4.3 the tags #letters, #sign and #red seem to occur more often in the high relevance bucket whereas the #selfie tag is not noticeably shifted towards high relevance buckets. Analyzing these tags by using buckets for them and creating a PivotTable as visible in Figure 4.4 reveals that the tags #letters and #sign have a high median value for the iamsterdam concept whereas the tags #red and #selfie have a significantly lower median value. The majority of images for all tags except #red contain the letters. An assumption is that the #red tag is more 29

Discussion

30

likely related to the red light district of Amsterdam and the quality of concept scores for images containing a lot of red color (like the letters) decreases. Explicitly adding these images as negatives for the iamsterdam concept detection in a second training run could vanish this fuzziness. The geolocations in Figure 4.2 and Figure 4.4 are surprisingly scattered over Europe, North and South America, Asia and Australia and are not restricted to the area of Amsterdam or even the Netherlands. Using the geolocations to create buckets of North and South America, EMEA (Europe, Middle East and Africa) and Asia and generating a PivotTable as visible in Figure 4.5 reveals that these images are still taken at locations within Amsterdam. On the one hand the quality and accuracy of the geolocations can be questioned but on the other hand this also reveals that social media users do not always upload the images to social media platforms at the locations where they have taken them. Sometimes this even seems to happen after the vacation when these users are home again which makes it possible to make inferences about the origin of these users. It can also be used to determine commonness of the usage of several tags how it is visible in Figure 4.6. The Figure reveals that tourists from North and South America more likely use the term eurotrip when visiting Europe whereas Asian tourist use the more general term travel. Also the usage of a tag that reflects that a user is posting an image from the past differs depending on the origin of the user. Users from North and South America more likely use the tag #TBT (Throwback Thursday) whereas Asian users do not use the abbreviated tag but use #throwback instead. Settings up season buckets as visible in Figure 4.7 allows the detection of events in case a tag can be associated with an event. Besides the obvious events like Easter and Autumn this lead to the detection of the Gay Pride event that apparently takes place in the Summer in Amsterdam. Investigations reveal that this annual event always takes place at the first weekend in August which complies with the findings visible in Figure 4.8 where the tags #Gay and #Pride are located in the Summer bucket. At last the semantic concept scores need to be evaluated. The overall low scores lead to the fact that an efficient semantic extraction was not possible in this case study. In chapter 3.2 a subset of the total roughly 15.000 concepts was selected. The scores for all concepts of each images is always equal to 1 because the learning method used assumes mutual exclusiveness of all classes. This does not comply with the requirements defined by [Zahalka & Worring, 2014] in order to close the pragmatic gap. They state that closing the pragmatic gap needs non-exclusive categories. In this case study it was

Discussion

31

well perceptible that this statement is also valid for closing the semantic gap. Utilizing semantic extraction also needs to use non-exclusive class detections. Although the semantic extraction was not performing as expected the case study in general illustrated the role of iamsterdam in social media. It was revealed that tourists from all over the world can identify with the slogan iamsterdam and that the letters play an important role in this process.

5.2

Multimedia Pivot Table Evaluation

The evaluation is split into two parts. First the components of the Sensemaking process are discussed. This includes tasks, the artifacts, the bottom-up and the top-down process as well the foraging loop and the sensemaking loop. Afterwards the handling of the pragmatic gap and the semantic gap is discussed. Having conducted the iamsterdam case study revealed which parts of the Sensemaking process are rather well supported and which parts are rather less supported by the Multimedia Pivot Table tool. A rough overview is shown in Figure 5.1 below indicating with colors how well several parts are supported. It splits the Sensemaking process in a preparatory part (red area - not supported), an analytical part (green area - well supported) and a result part (yellow area - partly supported).

Figure 5.1: Overview of the Multimedia Pivot Table evaluation

Discussion

32

As described in Chapter 3 all preparatory tasks have been accomplished without support of the Multimedia Pivot Table tool. The preparatory part is therefore not evaluated in detail. The evaluation starts with the shoebox artifact of the Sensemaking process.

1. Shoebox The shoebox contains relevant data. The initial dataset respectively the MediaTable of the Multimedia Pivot Table tool can therefor be seen as the shoebox artifact. 2. Read & Extract The Read& Extract task aims at extracting nuggets of evidences. The filtering feature of the MediaTable aims at supporting this task. 3. Search for Relations The PivotTable supports correlations between variables that help to reveal patterns and support the Search for Relations task. 4. Evidence File Filtering the MediaTable and using the categorization feature of buckets reflects the creation of Evidence Files. 5. Schematize As a PivotTable is always a structured way of displaying data one can argue that the Schematize task is supported by the Multimedia Pivot Table tool. 6. Search for Evidence The Search for Evidence task aims at altering an Evidence File. The PivotTable can also be altered by the user by assigning or removing roles to or from the variables of the dataset. Also the filters in the MediaTable can be altered at any time. 7. Schema As a PivotTable is always a structured way of displaying data one can argue that the Schema artifact is reflected by the PivotTable. 8. Build Case The Build Case tasks aims at combining Evidence Files to create hypothesis. This task is not supported by the Multimedia Pivot Table tool with a feature. Nevertheless the iterative use of generating PivotTables aims at fulfilling this task.

Discussion

33

9. Search for Support Search for Support becomes necessary if a hypothesis is altered and a Schema has to be re-examined. As the PivotTable reflects the Schema and can be altered by the user at anytime this task can be seen as supported. 10. Hypotheses The Hypotheses artifact is clearly not supported by the Multimedia Pivot Table tool. It nevertheless provides the underlying and essential analysis results to prove them. 11. Tell Story The Tell Story task is not supported by the Multimedia Pivot Table tool as no presentable result is produced. 12. Reevaluate The Reevaluate task is not supported by the Multimedia Pivot Table tool as it aims at updating hypothesis. Hypotheses are not supported therefor updating them also not. 13. Presentation The Presentation artifact is clearly not supported by the Multimedia Pivot Table tool as the Tell Story task is not supported. 14. Foraging Loop The Foraging loop aims at creating a schema out of collected and analyzed data. The PivotTable reflects these schemes and is a powerful tool to analyze the initial dataset. The Foraging loop is supported. 15. Sense-Making Loop The Sense-Making Loop aims at creating a verifiable hypothesis. The Multimedia Pivot Table tool only provides analysis from which hypotheses can be drawn. The Sense-Making Loop is therefor not supported or at least not fully covered.

The Sensemaking process contains six artifacts. The External Data Sources are not supported. The MediaTable reflects the Shoebox and the PivotTable reflects the Evidence File and the Schema. Hypotheses and Presentation are also not supported. This results in a support of 50% of the articacts only. Nevertheless the three artifacts that reflect the analysis part of the Sensemaking process are covered by the Multimedia Pivot Table tool. It is true that the External Data Sources cannot be connected to the tool but it

Discussion

34

can also be argued that this highly individual depending on the research topic feature would be too complex to cover it properly. Same thought can be applied to the artifacts Hypotheses and Presentation. It is therefor fair to state that:

The Multimedia Pivot Table tool aims at analyzing an image dataset interactively but to get the tool run properly and draw hypotheses out of PivotTables and present them to an audience remains the responsibility of the analyst and is not directly supported by the tool as it only provides the underlying analysis results.

The Sensemaking process also contains two approaches of research that are reflected by the bottom-up process where one wants to get from data to theory and the top-down process where one wants to get from theory to data. For the bottom-up process the analytic tasks such as Search & Filter, Read &Extract and Schematize are supported with the limitation that the Search & Filter task can only be executed on the MediaTable but not on External Data Sources. Same applies to the top-down process where Search for support, Search for evidence and Search for relations are supported that reflect the analytic tasks. The case study used to bottom-up process in order to explore the iamsterdam image dataset. The exploration of an image dataset is interactively possible with the Multimedia Pivot Table tool. An more investigative approach like the top-down process was not applied. Nevertheless the tool offers the possibilities to interactively do that. The summarizing statement is therefor:

The Multimedia Pivot Table tool supports the analytic parts of the exploratory bottom-up process to get from an image dataset to insights. The analytic parts of the investigative top-down process to find evidence for a theory are also supported.

The Sensemaking process can be seen as a big loop that consists of the Foraging Loop and the Sensemaking Loop. The Foraging Loop combines the analytic tasks of the bottomup process and the top-down process to create a schema out of data. The PivotTable is one way to schematize and structure data and can therefor be seen as the result of the Foraging Loop. The interactiveness of re-generating PivotTables and altering them also supports the idea of a loop. The Sensemaking Loop aims at creating verifiable hypotheses. The tool does clearly not support that and the responsibility remains at the

Discussion

35

analyst. Nevertheless the PivotTables that result out of the Foraging Loop are triggering the Sensemaking Loop although the tool does not support the further tasks. This can be concluded by the statement:

The Multimedia Pivot Table tool and its MediaTable and PivotTable features support the Foraging Loop in order to schematize and structure image datasets. Although it does not support the Sensemaking Loop the resulting PivotTables trigger the generation of hypotheses and therefor the Sensemaking Loop itself.

In order to effectively undergo the Sensemaking process the Multimedia Pivot Table tool needs to handle the pragmatic gap as well the semantic gap with at least intermediate capabilities as defined by [Zahalka & Worring, 2014]. Concerning the pragmatic gap this includes categories to be non-exclusive, dynamic and able to be created at any time. The bucket feature reflecting these categorizations fulfills all three requirements. Images can be contained in more than one bucket. The tool also supports a dynamic way of adapting the buckets by adding and removing images interactively. The PivotTable feature allows the selection of fragments of the dataset on the MediaTable. Selected rows in the MediaTable can be assigned to a bucket at anytime and therefor buckets are also able to change their meaning which is also reflected by being able to change bucket descriptions at any time. Furthermore the semantic interactions defined in chapter 3.3 support the mentioned requirements. The intermediate capability of handling the pragmatic gap is therefor given. Intermediate capabilities of handling the semantic gap are defined as being able to extract semantics via objective semantic concepts. The roughly 15.000 semantic concepts that have been process on the image dataset aim to reflect this intermediate capability. Nevertheless the potential of this semantic extraction could not be unlocked within the iamsterdam case study due to the exclusiveness of these concepts. The intermediate capability of handling the semantic gap was therefor not given. Summarized that means:

The Multimedia Pivot Table tool has at least intermediate capabilities of handling the pragmatic gap. The intermediate capabilities of handling the semantic gap are reflected by the roughly 15.000 semantic concepts but their potential could not be unlocked in the case study conducted.

Chapter 6

Conclusion The importance of Multimedia Analytics will increase in the future same as big data will become bigger and bigger. The tools have to keep up with the amount of digital data in the world. As demonstrated with the iamsterdam case study tools of Multimedia Analytics like multimedia pivot tables can be utilized in order to analyze digital datasets in a much larger scale than humans alone would be able to. It is therefor a great opportunity to make beneficial use of big data. Undergoing the Sensemaking process in Multimedia Analytics is nevertheless non-trivial and the challenges of closing the pragmatic and semantic gap remain and can still not be handled by any tool with advanced capabilities. A statement about the performance of multimedia pivot tables undergoing the Sensemaking process taking the artifacts, processes, loops and the handling of the pragmatic and semantic gap into account:

The Multimedia Pivot Table tool struggles to support the initiation phase (red area in Figure 5.1) of the Sensemaking process. In contrast the analytical part (green area in Figure 5.1) of the process is greatly supported with at least intermediate capabilities of handling the pragmatic gap. The semantic gap is also aimed to be handled with intermediate capabilities via objective semantic extration although this potential could not be unlocked in the case study conducted. The subsequent part (yellow area in Figure 5.1) of generating hypothesis and presenting these is not directly supported but the tool aids the analyst in fulfilling these task with the gathered analysis results.

36

Conclusion

6.1

37

Limitations

The semantic extraction in the iamsterdam case study was not performing as expected and limited the analysis. Due to the usage of exclusive classes as described in the Discussion chapter the semantic concepts could not effectively be used for the generation of PivotTables. This lead to a poor capability of handling the semantic gap by the multimedia pivot tables. How non-exlusive classes are imagineable is shown in Figure 6.1 below. Instead of assessing a ranking of concepts determined within an image the semantic extraction needs to focus on objects and assess scores for their presence independently.

Figure 6.1: Illustration how semantic extraction needs to operate with non-exclusive classes

Furthermore the powerful feature of training semantic concepts with manually selected positive images and manually selected or randomly selected negatives was not available in the case study conducted due to hardware limitations. This feature may would have been able to compensate the low semantic concept scores.

6.2

Future Work

Future researches that aim at utilizing multimedia pivot tables should be aware that none of the preparatory steps is supported by the tool itself nor by any other application. A way how the concept selection process could be simplified and supported with an user interface via a web application has been shown in chapter 3.2. This could also be integrated in the Multimedia Pivot Table tool via a wizard that is guiding the user through the process of selecting relevant concepts. Next to a manual selection of relevant concept also an automated process of selecting top scored concepts with a few input parameters such as number of concept and a threshold probabilistic score is imaginable.

Conclusion

38

Simplifications like that need to be realized in order to also allow analysts without advanced technical capabilities to use the tool flawlessly. An implementation of allowing data sources to be connected to the Multimedia Pivot Table tool is also worth aiming for. This does not seem technically impossible but the requirements for such a generic feature are hard to determine. An approach where the Multimedia Pivot Table is providing the environment for individual created data connections to be implemented as modules is imaginable. A set of default modules could be created to motivate researchers to use the tool. This only bridges the complex requirements as the analyst is still responsible for the data collection. Nevertheless would this lead to a more integrated usage of the Multimedia Pivot Table for the initial tasks of the Sensemaking process. It also contributes to the ease of use of the tool and may increases the usage of the Multimedia Pivot Table tool in researches.

Appendix

Figure A.1: Visual Analytics process by Keim et al

39

Appendix

40

Figure A.2: Sample Response of Instagram API

Appendix

41

Figure A.3: Sample Response of Flickr API Part 1

Appendix

42

Figure A.4: Sample Response of Flickr API Part 2

Appendix

43

Figure A.5: Visualization of the used API Endpoints

Figure A.6: Extract of CSV Dataset

Appendix

44

Figure A.7: Rough overview of collected images

Figure A.8: Example UI for the process of selecting relevant concepts for the Multimedia Pivot Table tool

Appendix

Figure A.9: MediaTable view with denoted semantic interactions from chapter 3.3

Figure A.10: PivotTable view with denoted semantic interactions from chapter 3.3

45

Bibliography [Bay et al. , 2008] Bay, Herbert, Ess, Andreas, Tuytelaars, Tinne, & Gool, Luc Van. 2008. Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding, 110(3), 346–359. [Bengio et al. , 1995] Bengio, Yoshua, Lecun, Yann, & Lecun, Yann. 1995. Convolutional Networks for Images, Speech, and Time-Series. [Chinchor et al. , 2010a] Chinchor, N A, Thomas, J J, Wong, P C, Christel, M G, & Ribarsky, W. 2010a. Multimedia Analysis Visual Analytics = Multimedia Analytics. IEEE Computer Graphics and Applications, 30(5), 52–60. [Chinchor et al. , 2010b] Chinchor, Nancy A., Christel, Michael G., & Ribarsky, William. 2010b. Guest Editors Introduction: Multimedia Analytics. IEEE Computer Graphics and Applications, 30(5), 18–19. [Cortes & Vapnik, 1995] Cortes, Corinna, & Vapnik, Vladimir. 1995. Machine Learning, 20(3), 273–297. [Dalal & Triggs, 2005] Dalal, N., & Triggs, B. 2005. Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05). Institute of Electrical & Electronics Engineers (IEEE). [Deng et al. , 2009] Deng, Jia, Dong, Wei, Socher, R., Li, Li-Jia, Li, Kai, & Fei-Fei, Li. 2009. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. Institute of Electrical & Electronics Engineers (IEEE).

46

Bibliography

47

[Endert et al. , 2012] Endert, A., Fiaux, P., & North, C. 2012. Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering. IEEE Trans. Visual. Comput. Graphics, 18(12), 2879–2888. [Hanjalic, 2004] Hanjalic, Alan. 2004.

Content-Based Analysis of Digital Video.

Springer. [iamsterdam.com, 2015] iamsterdam.com. 2015.

i amsterdam letters.

http://www.

iamsterdam.com/en/visiting/about-amsterdam/i-amsterdam-letters. [Online; accessed 27-April-2015]. [Jacobs, 2009] Jacobs, Adam. 2009. The pathologies of big data. Commun. ACM, 52(8), 36. [Keim et al. , 2008] Keim, Daniel, Andrienko, Gennady, Fekete, Jean-Daniel, G¨org, Carsten, Kohlhammer, J¨ orn, & Melan¸con, Guy. 2008. Visual Analytics: Definition, Process, and Challenges. Pages 154–175 of: Lecture Notes in Computer Science. Springer Science Business Media. [Krizhevsky et al. , 2012] Krizhevsky, Alex, Sutskever, Ilya, & Hinton, Geoffrey E. 2012. ImageNet Classification with Deep Convolutional Neural Networks. Pages 1106–1114 of: Bartlett, P., Pereira, F.c.n., Burges, C.j.c., Bottou, L., & Weinberger, K.q. (eds), Advances in Neural Information Processing Systems 25. [Lowe, 1999] Lowe, D.G. 1999. Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision. Institute of Electrical & Electronics Engineers (IEEE). [Marchionini, 2006] Marchionini, Gary. 2006. Exploratory search. Communications of the ACM, 49(4), 41. [Martin & Tosunoglu, 2000] Martin, Alberto, & Tosunoglu, Sabri. 2000. IMAGE PROCESSING TECHNIQUES FOR MACHINE VISION. 9. [Mayer-Schoenberger, 2013] Mayer-Schoenberger, Viktor. 2013. Big Data: A Revolution That Will Transform How We Live, Work and Think. Viktor Mayer-Schoenberger and Kenneth Cukier. John Murray Publishers. [Mey, 2001] Mey, Jacob L. 2001. Pragmatics: An Introduction. Wiley-Blackwell.

Bibliography

48

[Nolan, 2014] Nolan, Christopher. 2014. Interstellar. Paramount Pictures and Warner Bros. Pictures. [North, 2006] North, C. 2006. Toward measuring visualization insight. IEEE Comput. Grap. Appl., 26(3), 6–9. [Oliva & Torralba, 2001] Oliva, Aude, & Torralba, Antonio. 2001. International Journal of Computer Vision, 42(3), 145–175. [Perronnin et al. , 2010] Perronnin, Florent, Liu, Yan, Sanchez, Jorge, & Poirier, Herve. 2010. Large-scale image retrieval with compressed Fisher vectors. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Institute of Electrical & Electronics Engineers (IEEE). [Pirolli & Card, 2005] Pirolli, Peter, & Card, Stuart. 2005. The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. 2–4. [Russell et al. , 1993] Russell, Daniel M., Stefik, Mark J., Pirolli, Peter, & Card, Stuart K. 1993. The cost structure of sensemaking. In: Proceedings of the SIGCHI conference on Human factors in computing systems - CHI 93. Association for Computing Machinery (ACM). [Sivic & Zisserman, 2003] Sivic, & Zisserman. 2003. Video Google: a text retrieval approach to object matching in videos. In: Proceedings Ninth IEEE International Conference on Computer Vision. Institute of Electrical & Electronics Engineers (IEEE). [Smeulders et al. , 2000] Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., & Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380. [Worring, 2015] Worring, Marcel. 2015. Insight in Image Collections by Multimedia Pivot Tables. International Conference on Multimedia Retrieval. [Worring & Koelma, 2013] Worring, Marcel, & Koelma, Dennis C. 2013. timedia Pivot Tables.

Mul-

”https://ivi.fnwi.uva.nl/isis/publications/2013/

WorringVAST2013/WorringVAST2013.pdf”. [Online; accessed 05-June-2015].

Bibliography

49

[Zahalka & Worring, 2014] Zahalka, Jan, & Worring, Marcel. 2014. Towards interactive, intelligent, and integrated multimedia analytics. In: 2014 IEEE Conference on Visual Analytics Science and Technology (VAST). Institute of Electrical & Electronics Engineers (IEEE).