IR Evaluation in Context Jaap Kamps University of Amsterdam http://staff.science.uva.nl/∼kamps/

Forum for Information Retrieval Evaluation (FIRE 2011) IIT Bombay, Mumbai, December 4, 2011

IR Evaluation in Context

FIRE 2011, December 4, 2011

Motivation: IR Evaluation in Context • The Cranfield/TREC paradigm is great but also limited ? Study general ranking problem/hypothesis ? Abstracts away from specific document genre, use-case and searcher stereotype • First, what about a specific search engine/application/use-case? ? How does this content, tasks and searchers differ? ? Can we make the evaluation tailored to this? • Second, what about the individuals involved in test collections? ? Can we capture the context of persons involved? ? What is the consequence of moving toward ”anonymous” and uncontrolled judges when we resort to crowdsourcing?

1

IR Evaluation in Context

FIRE 2011, December 4, 2011

Outline • Motivation • Modern IR Evaluation ? Pushing the Boundaries of Cranfield ? INEX: Focused Retrieval • Evaluation in Context ? Contextual Evaluation ? Crowdsourcing • Conclusions

2

IR Evaluation in Context

FIRE 2011, December 4, 2011

Pushing the Boundaries of Cranfield • The Cranfield/TREC paradigm is a powerful abstraction ? Focuses on generic ranking problem ? Abstracts away from specific document features ? Abstracts away from user/topic variation • Great for the scientific study of one sort of hypothesis ? which system is best (in general) • Less good for addressing specific use context or interaction ? specific use cases, collections (genre or structure), engines ? interaction is searcher dependent and hence not reusable

3

IR Evaluation in Context

FIRE 2011, December 4, 2011

Default Ad hoc Retrieval Task • Standard retrieval task in test collections based on library search: ? ? ? ? ? ?

Large, practically static collection of documents Queries based on unpredictable information needs Focus on 1-step (non-interactive) search Focus on ranking / retrieval effectiveness Focus on topical relevance Focus triage search: locate documents, not relevant information

• Real world searching is far more complex

4

IR Evaluation in Context

FIRE 2011, December 4, 2011

Some Efforts to Push Further • Just a selection of workshops: ? Focused Retrieval (SIGIR’07/’08) – Find information directly! ? Future of IR Evaluation (SIGIR’09) – Measures, crowdsourcing, interaction ? Simulation of Interaction (SIGIR’10, IIiX’10) – Cranfield style evaluation of interaction ? Exploiting Semantic Annotations (CIKM’10/’11) – Modern annotations/structure ? Complex needs with simple queries (SIGIR’11) – Contextual Suggestions is new TREC’12 track!

5

IR Evaluation in Context

FIRE 2011, December 4, 2011

Outline • Motivation • Modern IR Evaluation ? Pushing the Boundaries of Cranfield ? INEX: Focused Retrieval • Evaluation in Context ? Contextual Evaluation ? Crowdsourcing • Conclusions

6

IR Evaluation in Context

FIRE 2011, December 4, 2011

wikipedia amsterdam - Google Search

26/05/2009 15:33

INEX: Focused Retrieval

Web Images Video Maps News Groups Gmail more ! Google wikipedia amsterdam

Web Show options...

Search

Sign in

Advanced Search Preferences

Results 1 - 10 of about 2,710,000 for wikipedia amsterdam. (0.12 seconds)

Amsterdam - Wikipedia, the free encyclopedia Settled as a small fishing village in the late 12th century, Amsterdam became one of the most important ports in the world during the Dutch Golden Age, ... en.wikipedia.org/wiki/Amsterdam - 366k - Cached - Similar pages

Amsterdam Treaty - Wikipedia, the free encyclopedia The Treaty of Amsterdam amending the Treaty of the European Union, the Treaties establishing the European Communities and certain related acts, ... en.wikipedia.org/wiki/Treaty_of_Amsterdam - 71k - Cached - Similar pages More results from en.wikipedia.org »

• A common experience on the Web:

- Wikipedia - [ Translate this page ] ? Amsterdam Say you to enfind out when Amsterdam Amsterdam is dewant grootste stad (titulaire) hoofdstad van Nederland sinds 20 aprilwas 1808, founded doch grondwettelijk pas hoofdstad sinds 1983 bij de ... ? nl.wikipedia.org/wiki/Amsterdam You type your short thepages search box - 174kquery - Cached -in Similar - Wikipedia ? Amsterdam, The firstNederlandia result has the answer in the snippet! Manipud iti Wikipedia. Mapan iti: pagdaliasatan, agbirok. Para iti dadduma pay a wenno artikulo maipapan itiand Amsterdam, kitaen ti Amsterdam ... ? kaipapanan You click the result ... ilo.wikipedia.org/wiki/Amsterdam,_Nederlandia - 37k - Cached - Similar pages Amsterdam - Wikipedia Amsterdam eo kêr-benn an Izelvroioù (daoust m'emañ sez ar gouarnamant e Den Haag ). Emañ e kornôg ar vro, war ribl ar stêr Amstel, war-hed 20 km diouzh Mor ... br.wikipedia.org/wiki/Amsterdam - 67k - Cached - Similar pages

Amsterdam - Wikipedia

7

IR Evaluation in Context

FIRE 2011, December 4, 2011

Amsterdam - Wikipedia, the free encyclopedia

Amsterdam From Wikipedia, the free encyclopedia

Coordinates: 52.371°N 4.897°E

Amsterdam

Amsterdam (pronounced /!æmst!rdæm/; Dutch ["mst!r!d"m] ) is the capital and largest city of the Netherlands, located in the province of North Holland in the west of the country. The city, which had a population of around 1 million (with suburbs) on 1 January 2008, comprises the northern part of the Randstad, the 6th-largest metropolitan area in Europe, with a population of around 6.7 million. Its name is derived from Amstel dam,[7] indicative of the city's origin: a dam in the river Amstel where the Dam Square is today. Settled as a small fishing village in the late 12th century, Amsterdam became one of the most important ports in the world during the Dutch Golden Age, a result of its innovative developments in trade. During that time, the city was the leading center for finance and diamonds.[8] In the 19th and 20th centuries, the city expanded and many new neighbourhoods and suburbs were formed. The city is the financial and cultural[9] capital of the Netherlands. Many large Dutch institutions have their headquarters there, and 7 of the world's top 500 companies, including Philips and ING, are based in the city [10] . The Amsterdam Stock Exchange, part of Euronext, is located in the city centre. Amsterdam's main attractions, including its historic canals, the Rijksmuseum, the Van Gogh Museum, Anne Frank House, its red-light district and its many cannabis coffee shops, draw 4.2 million tourists annually. [11]

From left to right and top to bottom: Dam Square, Amsterdam School architecture (Het Schip), canal (Keizersgracht), De Wallen, Vondelpark, modern architecture (ING House)

Contents 1 History 2 Geography and climate 3 Cityscape and architecture 3.1 Canals 3.2 Expansion of Amsterdam 3.3 Architecture

Where is my Answer?

Coat of arms

Flag

4 Government 4.1 Definitions 4.2 City government 4.3 National government 4.4 Symbols

Nickname(s): Mokum, Venice of The North Motto: Heldhaftig, Vastberaden, Barmhartig (Heroic, Determined, Merciful)

5 Economy 5.1 Tourism 5.2 Retail 5.3 Fashion Location of Amsterdam

6 Demography 7 Transportation 8 Education 9 Housing 10 Culture and entertainment 10.1 Museums 10.2 Performing arts 10.2.1 Pop Music 10.2.2 Classical music 10.2.3 Theater

Coordinates: 52°22!23"N 4°53!32"E

Country Province

The Netherlands Noord-Holland

Boroughs

Government of Amsterdam#Boroughs

Government - Mayor - Aldermen

10.3 Comedy 10.4 Nightlife 10.5 Festivals 10.6 Sports 11 Miscellaneous 11.1 Red light district 12 References 13 External links

History The earliest recorded use of the name "Amsterdam" is from a certificate dated 27 October 1275, when the inhabitants, who had built a bridge with a dam across the Amstel, were exempted from paying a bridge toll by Count Floris V.[12] The certificate describes the inhabitants as homines manentes apud Amestelledamme (people living near Amestelledamme). By 1327, the name had developed into Aemsterdam.[12] A local romance account has the city being founded by two fishermen, who landed on the shores of the Amstel in a small boat with their dog. Amsterdam's founding is relatively recent compared with much older Dutch cities such as Nijmegen, Rotterdam, and Utrecht. In October 2008, historical geographer Chris de Bont suggested that the land around Amsterdam was being reclaimed as early as the late 10th century. This does not necessarily mean that there was already a settlement then. The reclamation of land may not have been for farming—it may have been for peat, used as fuel.[13]

- Secretary

Job Cohen [1] (PvdA) Lodewijk Asscher Carolien Gehrels Tjeerd Herrema Maarten van Poelgeest Marijke Vos Erik Gerritsen

Area [2][3] - City - Land - Water - Urban - Metro

219 km2 (84.6 sq mi) 166 km2 (64.1 sq mi) 53 km2 (20.5 sq mi) 1,003 km2 (387.3 sq mi) 1,815 km2 (700.8 sq mi)

Elevation [4]

2 m (7 ft)

Population (1 January 2009)[5][6] - City 758,198 - Density 4,459/km 2 (11,548.8/sq mi) - Urban 1,364,422 - Metro 2,158,372 - Demonym Amsterdammer Time zone - Summer (DST)

CET (UTC+1) CEST (UTC+2)

Postcodes Area code(s)

1011 – 1109 020

Website

www.amsterdam.nl

Amsterdam was granted city rights in either 1300 or 1306.[14] From (http://www.amsterdam.nl) the 14th century on, Amsterdam flourished, largely because of trade with the Hanseatic League. In 1345, an alleged Eucharistic miracle in the Kalverstraat rendered the city an important place of pilgrimage until the alteration to the Protestant faith. The Stille Omgang—a silent procession in civil attire—is today a remnant of the rich pilgrimage history. [15]

A painting (1544) depicting Amsterdam in the year 1538. The famous Grachtengordel has not been built yet.

In the 16th century, the Dutch rebelled against Philip II of Spain and his successors. The main reasons for the uprising were the imposition of new taxes, the tenth penny, and the religious persecution of Protestantism by the Spanish Inquisition. The revolt escalated into the Eighty Years' War, which ultimately led to Dutch independence.[16] Strongly pushed by Dutch Revolt leader William the Silent, the Dutch Republic became known for its relative religious tolerance. Jews from the Iberian Peninsula, Huguenots from France, prosperous merchants and printers from Flanders, and economic and religious refugees from the Spanish-controlled parts of the Low Countries found safety in Amsterdam. The influx of Flemish printers and the city's intellectual tolerance made Amsterdam a centre for the European free press.[17]

The 17th century is considered Amsterdam's Golden Age, during which it became the wealthiest city in the world. [18] Ships sailed from Amsterdam to the Baltic Sea, North America, and Africa, as well as present-day Indonesia, India, Sri Lanka, and Brazil, forming the basis of a worldwide trading network. Amsterdam's merchants had the largest share in both the VOC (Dutch East India Company) and the WIC (Dutch West India Company). These companies acquired overseas possessions that later became Dutch colonies. Amsterdam was Europe's most important point for the shipment of goods and was the leading financial centre of the world. [19] In 1602, the Amsterdam office of the VOC became the world's first stock exchange by trading in its own shares.[20] Dam Square in the late-17th century: painting by Gerrit Adriaenszoon Berckheyde

Amsterdam lost over 10% of its population to plague in 1623–5, and again in 1635–6, and once more in 1655, and one more time in 1664. Nevertheless, the population of Amsterdam rose in the 17th century (largely through immigration) from 50,000 to 200,000.[21]

Amsterdam's prosperity declined during the 18th and early-19th centuries. The wars of the Dutch Republic with England and France took their toll on Amsterdam. During the Napoleonic Wars, Amsterdam's significance reached its lowest point, with Holland being absorbed into the French Empire. However, the later establishment of the United Kingdom of the Netherlands in 1815 marked a turning point. New developments, by people such as city planner Samuel Sarphati, drew their inspiration from Paris.

The Singel with the Munttoren in the background, ca. 1900.

The end of the 19th century is sometimes called Amsterdam's second Golden Age.[22] New museums, a train station, and the Concertgebouw were built, while during this time, the Industrial Revolution reached the city. The Amsterdam-Rhine Canal was dug to give Amsterdam a direct connection to the Rhine, and the North Sea Canal was dug to give the port a shorter connection to the North Sea. Both projects dramatically improved commerce with the rest of Europe and the world. In 1906, Joseph Conrad gave a brief description of Amsterdam as seen from the seaside, in The Mirror of the Sea. Shortly before World War I, the city began expanding, and new suburbs were built. Even though the Netherlands remained neutral in this war, Amsterdam suffered a food shortage, and heating fuel became scarce. The shortages sparked riots in which several people were killed. These riots are known as the Aardappeloproer (Potato rebellion). People started looting stores and warehouses in order to get supplies, mainly food. [23]

Germany invaded the Netherlands on 10 May 1940 and took control of the country. The Germans installed a Nazi civilian government in Amsterdam that cooperated with the persecution of Jews. Some Amsterdam citizens sheltered Jews, thereby exposing themselves and their families to the high risk of being imprisoned or sent to concentration camps. More than 100,000 Dutch Jews were deported to concentration camps. Perhaps the most-famous deportee was the young Jewish girl Anne Frank, who died in the Bergen-Belsen concentration camp.[24] Only 5,000 Dutch Jews survived the war. At the end of World War II, communication with the rest of the country broke down, and food and fuel became scarce. Many citizens traveled to the countryside to forage. Dogs, cats, raw sugar beets, and Tulip bulbs—cooked to a pulp—were consumed to stay alive. [25] Most of the trees in Amsterdam were cut down for fuel, and all the wood was taken from the apartments of deported Jews. After the war, approximately 120,000 Dutch were prosecuted for their collaboration with the Nazis. Many new suburbs, such as Osdorp, Slotervaart, Slotermeer, and Geuzenveld, were built in the years after World War II.[26] These suburbs contained many public parks and wide, open spaces, and the new buildings provided improved housing conditions with larger and brighter rooms, gardens, and balconies. Because of the war and other incidents of the 20th century, almost the entire city centre had fallen into disrepair. As society was changing, politicians and other influential figures made plans to redesign large parts of it. There was an increasing demand for office buildings and new roads as the automobile became available to most common people.[27] A metro started operating in 1977 between the new suburb of Bijlmer and the centre of Amsterdam. Further plans were to build a new highway above the metro to connect the central station and city centre with other parts of the city.

Subway station Nieuwmarkt with historic images of the

Nieuwmarktrellen The incorporated large-scale demolitions began in Amsterdam's formerly Jewish neighbourhood. Smaller streets, such as the Jodenbreestraat, were widened and saw almost all of their houses demolished. During the destruction's peak, the Nieuwmarktrellen (Nieuwmarkt riots) broke out, [28] where people expressed their fury about the demolition caused by the restructuring of the city. As a result, the demolition was stopped, and the highway was never built, with only the metro being finished. Only a few streets remained widened. The destroyed buildings were replaced by new ones corresponding to the historical street plan of the neighbourhood. The new city hall was built on the almost completely demolished Waterlooplein. Meanwhile, large private organisations, such as Stadsherstel Amsterdam, were founded with the aim of restoring the entire city centre. Although the success of this struggle is visible today, efforts for further restoration are still ongoing. [27] The entire city centre has reattained its former splendor and, as a whole, is now a protected area. Many of its buildings have become monuments, and plans exist to make the Grachtengordel (Herengracht, Keizersgracht, and Prinsengracht) a Unesco World Heritage site. [29]

Geography and climate Being part of the province North-Holland, Amsterdam is located in the northwest of the Netherlands next to the provinces Utrecht and Flevoland. The river Amstel terminates in the city center into a large number of canals that eventually terminate in the IJ. Amsterdam is situated 2 meters above sea level.[4] The surrounding land is flat as it is formed of large polders. To the southwest of the city lies a man-made forest called het Amsterdamse Bos. Amsterdam is connected to the North Sea through the long North Sea Canal. Amsterdam is intensely urbanized, as is the urban area surrounding the city. Comprising 219.4 square kilometers of land, the city proper has a population density of 4457 inhabitants and 2275 houses per square kilometer. [30] Parks and nature reserves make up 12% of Amsterdam's land area.[31]

Satellite image of Amsterdam

Amsterdam enjoys a temperate climate, strongly influenced by its proximity to the North Sea to the west with prevailing north-western winds and gales. Winter temperatures are mild, seldom below 0°C. Amsterdam, as well as most of Noord-Holland province sits in USDA Hardiness zone 9, the northernmost such occurrence in continental Europe. Frosts merely occur during spells of eastern or northeastern winds from the inner European continent, i.e., from Scandinavia, Russia, and even Siberia. Still then, because Amsterdam is surrounded on three sides by major bodies of water, as well as having a significant heat island effect, nights rarely drop below -5°C, while it easily could be -12°C in Hilversum, 25 kilometres southeast. Summers are moderately warm but rarely hot. The average high in August is 22°C, and 30°C or higher is only measured on average on 3 days, placing Amsterdam in AHS Heat zone 2. Days with measurable precipitation are common, on average 175 days a year. Nevertheless, Amsterdam's average annual precipitation is less than 760 mm. Most of this precipitation is protracted drizzle or light rain, making cloudy and damp days common during the cooler months, October through March. Only the occasional European windstorm brings significant water at once, requiring all of it to be pumped out to higher grounds or to the seas around the city.

Weather averages for Amsterdam Month Average high °C (°F)

Jan 5.4 (42)

Feb 6.0 (43)

Mar 9.2 (49)

Apr 12.4 (54)

May 17.1 (63)

Jun 19.2 (67)

Jul 21.4 (71)

Aug 21.8 (71)

Sep 18.4 (65)

Oct 14.1 (57)

Nov 9.2 (49)

Dec 6.2 (43)

Year 12.3 (54)

Average low °C (°F)

0.5 (33)

0.2 (32)

2.4 (36)

4.0 (39)

7.8 (46)

10.4 (51)

12.5 (55)

12.3 (54)

10.2 (50)

7.0 (45)

3.9 (39)

1.9 (35)

6.1 (43)

Precipitation mm (inches)

62.1 (2.44)

43.4 (1.71)

58.9 (2.32)

41.0 (1.61)

48.3 (1.9)

67.5 (2.66)

65.8 (2.59)

61.4 (2.42)

82.1 (3.23)

85.1 (3.35)

89.0 (3.5)

74.9 (2.95)

779.5 (30.69)

Avg. precipitation days

17

13

18

14

14

14

13

13

16

17

19

18

186

• This is < 25% of the page ? Where is the answer now? ? Do I have to read through the whole page until I find it? ? Or go back and ’copy’ part of the snippet, so I can ’search’ for it?

Source: World Weather Information Service [32] 2008-01-06

Cityscape and architecture Amsterdam fans out south from the Amsterdam Centraal railway station. The Damrak is the main street and leads into the street Rokin. The oldest area of the town is known as de Wallen (the quays, this does not refer to the old city walls, the Dutch word for wall being 'muur'). It lies to the east of Damrak and contains the city's famous red light district. To the south of de Wallen is the old Jewish quarter of Waterlooplein. The 17th century girdle of concentric canals, known as the Grachtengordel, embraces the heart of the city where homes have interesting gables. Beyond the Grachtengordel are the formerly working class areas of Jordaan and de Pijp. The Museumplein with the city's major museums, the Vondelpark, a 19th century park named after the Dutch writer Joost van den Vondel, and the Plantage neighborhood, with the zoo, are also located outside the Grachtengordel.

A bird's-eye view of Amsterdam's city centre

Several parts of the city and the surrounding urban area are polders. This can be recognized by the suffix -meer which means lake, as in Aalsmeer, Bijlmermeer, Haarlemmermeer, and Watergraafsmeer.

Canals The Amsterdam canal system is the result of conscious city planning.[33] In the early 17th century, when immigration was at a peak, a comprehensive plan was developed that was based on four concentric half-circles of canals with their ends emerging at the IJ bay. Known as the Grachtengordel, three of the canals are mostly for residential development: Those are the Herengracht (Gentleman's Canal), Keizersgracht (Emperor's Canal), and Prinsengracht (Prince's Canal’). The fourth and outermost canal, the Singelgracht (not to be confused with the older Singel), served purposes of defense and water management. The defensive purpose was established by moat and earthen dikes, with gates at transit points, but otherwise no masonry superstructures.[34] Furthermore, the plan envisaged: (1) Interconnecting canals along radii; (2) creating a set of parallel canals in the Jordaan quarter, primarily for transportation purposes; (3) converting the defensive purpose of the Singel to a residential and commercial purpose; (4) constructing more than one hundred bridges.

Boat on the Prinsengracht in 2006

Construction started in 1613 and proceeded from west to east, across the breadth of the lay–out, like a gigantic windshield wiper as the historian Geert Mak calls it—and not from the centre outwards as a popular myth has it. The canal constructions in the southern sector were accomplished by 1656. Subsequently, the construction of residential buildings commenced slowly. The eastern part of the concentric canal plan, covering the area between the Amstel river and the IJ bay, has never been implemented. In the following centuries, the land was used for parks, senior citizens' homes, theaters, other public facilities, and waterways without much planning.[35] Over the years, several canals have been filled in becoming streets or squares, such as the Nieuwezijds Voorburgwal and the Spui. [36]

• Wouldn’t it be great if you could find information

A woodcut (1885) of the Nieuwezijds Voorburgwal, a canal that is now filled in

Expansion of Amsterdam After the development of Amsterdam's canals in the 17th century, the city did not grow beyond its borders for two centuries. During the 19th century, a number of plans were devised to expand Amsterdam, the first of which was initiated by Samuel Sarphati. He devised a plan based on the grandeur of Paris and London of that time. The plan consisted of the construction of new houses, public buildings and streets just outside the grachtengordel. The main aim of the plan, however, was to improve public health. Although the plan did not expand the city, it did produce some of the largest public buildings to date, like the Paleis voor Volksvlijt. [37][38][39] Following Sarphati, Van Niftrik and Kalff designed an entire ring of 19th century neighbourhoods surrounding the city’s centre. Most of these neighbourhoods became home to the working class. [40] By the beginning of the 20th century, Amsterdam became overpopulated and experienced a shortage of living space. In response to this, two plans were designed which were very different from anything Amsterdam had ever seen before: Plan Zuid, designed by the architect Berlage, and West. These plans involved the development of new neighborhoods consisting of housing blocks for all social classes.[41][42] After World War II large new neighborhoods were built in the western, southeastern, and northern parts of the city. These new neighbourhoods were built to relieve the city from its shortage of living space and give people affordable houses with modern day conveniences. The neighbourhoods consisted mainly of large housing blocks situated among green spaces, connected to wide roads, making the neighbourhoods easily accessible by automobile. The western suburbs which were built in that period are collectively called the Westelijke Tuinsteden. The area to the southeast of the city built during the same period is known as the Bijlmer.[43][44]

Architecture Amsterdam has a rich architectural history. The oldest building in Amsterdam is the Oude Kerk (Old Church), at the heart of the Wallen, consecrated in 1306. The oldest wooden building is het Houten Huys[45] at the Begijnhof. It was constructed around 1425 and is one of only two existing wooden buildings. It is also one of the few rare examples of gothic architecture in Amsterdam.

A merchant house dating from the 17th century alongside one of the many canals.

In the sixteenth century, wooden buildings were broken down and replaced by brick ones. During this period, many buildings were constructed according to the architectural style of the Renaissance. Buildings from this period are very recognizable, since they have a façade which ends at the top in the shape of a stairway. This is, however, the common Dutch Renaissance style. Amsterdam quickly developed its own Renaissance architecture. These buildings were built according to the principles of the architect Hendrick de Keyser.[46] One of the most striking buildings designed by Hendrick de Keyer is the Westerkerk. In the seventeenth century baroque architecture became very popular, as it did elsewhere in Europe. This was roughly during the same period as Amsterdam’s Golden Age. The leading architects of this style in Amsterdam were Jacob van Campen, as well as Philip Vingboons and Daniel Stalpaert.[47]

Philip Vingboons designed splendid merchants' houses throughout the city. A famous building in baroque style in Amsterdam is the Royal Palace on Dam Square. Throughout the eighteenth century, Amsterdam was heavily influenced by French culture.This is reflected in the architecture from that period. Around 1815, architects broke with the baroque style and started building in different neo-styles [48] . Most gothic style buildings date from that era and are therefore said to be built in a neo-gothic style. At the end of the nineteenth century, the Jugendstil or Art Nouveau style became popular and a lot of new buildings were constructed in this architectural style. Since Amsterdam rapidly expanded during this period, new buildings adjacent to the city’s center were also built in this style. The houses in the vicinity of the Museum Square in Amsterdam Oud-Zuid are an example of Jugendstil. The last style that was popular in Amsterdam before the modern era was Art Deco. Amsterdam had its own version of the style, which was called the Amsterdamse School. Whole districts were built in Amsterdamse School, such as the Rivierenbuurt.[49] A notable feature of the façades of buildings designed in Amsterdamse School, is that they are highly decorated and ornate, with oddly shaped windows and doors.

Early 20th century houses in the architecture of the Amsterdam School

The old city’s center is the epicenter of all the architectural styles before the end of the nineteenth century. Jugendstil and Art Deco are mostly found outside the city’s center in the neighbourhoods built in the early twentieth century, although there are some striking examples of these styles present in the city’s center. Most historic buildings in the city’s center and nearby are houses, such as the famous merchant’s houses lining the canals.

Government The administration of the municipality of Amsterdam is divided into 15 boroughs or stadsdelen; the central one, Centrum, being circled by Westerpark, Bos en Lommer, De Baarsjes, Oud-West, Oud-Zuid, Oost/Watergraafsmeer, Zeeburg and Amsterdam-Noord, with the six outer boroughs creating a further encirclement.[50]

Definitions "Amsterdam" is usually understood to refer to the municipality of Amsterdam. Colloquially, some areas within the municipality, such as the village of Durgerdam, may not be considered part of Amsterdam. Statistics Netherlands uses three other definitions of Amsterdam: metropolitan agglomeration Amsterdam (Grootstedelijke Agglomeratie Amsterdam, not to be confused with Grootstedelijk Gebied Amsterdam, a synonym of Groot Amsterdam), Greater Amsterdam (Groot Amsterdam, a COROP region) and the urban region Amsterdam (Stadsgewest Amsterdam). [5] These definitions are not synonymous with the terms urban area and metropolitan area, which are commonly used in English speaking countries for the purpose of defining large conurbations. The Amsterdam Department for Research and Statistics uses a fourth conurbation, namely the City region Amsterdam. This region is similar to Greater Amsterdam, but includes the municipalities Zaanstad and Wormerland. It excludes Graft-De Rijp. The smallest of these areas is the municipality, with a population of 742,981 in 2006.[51] The The 15 boroughs of Amsterdam metropolitan agglomeration had a population of 1,021,870 in 2006.[51] It includes the municipalities of Zaanstad, Wormerland, Oostzaan, Diemen and Amstelveen only, as well as the municipality of Amsterdam. Greater Amsterdam includes 15 municipalities[52] , and had a population of 1,211,503 in 2006.[51] Though much larger in area, the population of this area is only slightly larger, because the definition excludes the relatively populous municipality of Zaanstad. The largest area by population, the urban region Amsterdam, has a population of 1,468,122.[51] It includes Zaanstad, Wormerveer, Muiden and Abcoude, but excludes Graft De Rijp, Uithoorn and Aalsmeer. Amsterdam is also part of the conglomerate metropolitan area Randstad, with a total population of 6,659,300 inhabitants. [6]

City government As with all Dutch municipalities, Amsterdam is governed by a mayor, aldermen, and the municipal council. However, unlike most other Dutch municipalities, Amsterdam is subdivided into fifteen stadsdelen (boroughs), a system that was implemented in the 1980s to improve local governance. The stadsdelen are responsible for many activities that had previously been run by the central city. Fourteen of these have their own council, chosen by a popular election. The fifteenth, Westpoort, covers the harbour of Amsterdam, has very few residents, and is governed by the central municipal council. Local decisions are made at borough level, and only affairs pertaining to the whole city, such as major infrastructure projects, are handled by the central city council. The borough system is currently being revised, and the number of boroughs will most probably be reduced to seven in the following years.

National government The present version of the Dutch constitution mentions "Amsterdam" and "capital" only in one place, chapter 2, article 32: The king's confirmation by oath and his coronation take place in "the capital Amsterdam" ("de hoofdstad Amsterdam"). Previous versions of the constitution (http://nl.wikisource.org/wiki/Nederlandse_grondwet) spoke of "the city of Amsterdam" ("de stad Amsterdam"), without mention of capital. In any case, the seat of the government, parliament and supreme court of the Netherlands is (and always has been, with the exception of a brief period between 1808 and 1810) located at The Hague. Foreign embassies too are in The Hague. Although capital of the country, Amsterdam is not the capital of the province in which it is located, North Holland, whose capital is located at Haarlem.

Symbols The coat of arms of Amsterdam is composed of several historical elements. First and centre are three St Andrew's crosses, aligned in a vertical band on the city's shield (although Amsterdam's patron saint was Saint Nicholas). These St Andrew's crosses can also be found on the cityshields of neighbours Amstelveen and Ouder-Amstel. This part of the coat of arms is the basis of the flag of Amsterdam, flown by the city government, but also as civil ensign for ships registered in Amsterdam. Second is the Imperial Crown of Austria. In 1489, out of gratitude for services and loans, Maximilian I awarded Amsterdam the right to adorn its coat of arms with the king's crown. Then, in 1508, this was replaced with Maximilian's imperial crown when he was crowned Holy Roman Emperor. In the early years of the 17th century, The coat of arms of Maximilian's crown in Amsterdam's coat of arms was again replaced, this time with the crown of Emperor Rudolph II, a Amsterdam crown that also would become the Imperial Crown of Austria. The lions date from the late 16th century, when city and province became part of the Republic of the Seven United Netherlands. Last came the city's official motto: Heldhaftig, Vastberaden, Barmhartig ("Valiant, Determined, Compassionate"), bestowed on the city in 1947 by Queen Wilhelmina, in recognition of the city's bravery during World War II.

Economy Amsterdam is the financial and business capital of the Netherlands. [53] Amsterdam is currently one of the best European cities in which to locate an international business. It is ranked fifth in this category and is only surpassed by London, Paris, Frankfurt and Barcelona.[54] Many large Dutch corporations and banks have their headquarters in Amsterdam, including ABN AMRO, Akzo Nobel, Heineken International, ING Group, Ahold, TomTom, Delta Lloyd Group and Philips. KPMG International's global headquarters is located in nearby Amstelveen. Though many small offices are still located on the old canals, companies are increasingly relocating outside the city centre. The Zuidas (English: South Axis) has become the new financial and legal hub.[55] The five largest law firms of the Netherlands, a number of Dutch subsidiaries of large consulting firms like Boston Consulting Group and Accenture, and the World Trade Center Amsterdam are also located in Zuidas.

? So from document retrieval to information retrieval proper ? Can we use the document structure to give more focused answers?

8

IR Evaluation in Context

FIRE 2011, December 4, 2011

Information on the Web • Before the web there were two basic types of data Structured data such as tables of names, numbers, or other fields Unstructured data such as free-text in ordinary language • Current data on the web fits neither of the two ? Most data contains both content (= raw text) and structure ? Think of HTML, XML, or semantic web languages • Web documents have properties of both unstructured and structured data

9

IR Evaluation in Context

FIRE 2011, December 4, 2011

INEX XML Retrieval • But then there was INEX ? the INitiative for the Evaluation of XML retrieval ? founded in 2002 ? to develop test collections for XML retrieval • Typical use case ? Document-centric XML ? Long full text documents ? Shallow structure • Note: this is the dominant type of XML on the Web

10

IR Evaluation in Context

FIRE 2011, December 4, 2011

INEX Assessments

11

IR Evaluation in Context

FIRE 2011, December 4, 2011

How does XML retrieval differ from standard IR? • The documents differ ? here we can return arbitrary XML elements ? ranking becomes harder ? presentation becomes an additional issue • Also the queries may differ ? Document structure gives additional handles for searching ? That is, a Content-And-Structure (CAS) query ? Query language: explicit support for querying XML documents (XPath like)

12

IR Evaluation in Context

FIRE 2011, December 4, 2011

Wikipedia Enriched with YAGO • YAGO http://www.mpi-inf.mpg.de/yago-naga/ ? Semantic knowledge base, derived from Wikipedia and WordNet ? More than 2 million entities (e.g. persons, organizations, cities) ? and 20 million facts about these entities • All documents have been automatically classified with YAGO concepts ? Added as htagsi to the article-level, as well as to all links ? Will these tags be more helpful?

13

IR Evaluation in Context

FIRE 2011, December 4, 2011

Queen with YAGO markup Queen (band) 42010 ... Freddie Mercury

14

IR Evaluation in Context

FIRE 2011, December 4, 2011

CAS Queries using YAGO • This leads to natural CAS queries: ? Groups with Freddie Mercury //group[about(., Freddie Mercury)]

? Groups with Freddie Mercury as singer //group[about(.//singer, Freddie Mercury)]

? Albums from the group Queen //album[about(//group, Queen)]

? Find persons who work in physics and were born in Germany //person[about(//work,physics) and about(//born,Germany)]

15

IR Evaluation in Context

FIRE 2011, December 4, 2011

New Tasks in INEX 2011 • ? Books and Social Search uses a collection of Amazon and LibraryThing data and look at the relative value of authoritative metadata versus social tags and descriptions ? Data Centric Track uses an XML’ified version of IMDB and look at Faceted Search – submissions being a ranked list of facets and facet-values that optimize access to relevant material ? Snippet Retrieval evaluates the generation of snippets with enough information to allow the user to determine the relevance of each document, without needing to view the document itself ? QA Track focuses on tweet contextualization focusing on a mobile scenario to give a summary of a linked page in a tweet/post – judged on relevance and readability

16

IR Evaluation in Context

FIRE 2011, December 4, 2011

Wrap Up: Modern IR Evaluation • IR evaluation is a means not a goal in itself ? Need to be clear on why we do it! ? Need to cherish “classic” Cranfield – a very powerful abstraction ? But not be afraid to tackle new problems that don’t fit it well • INEX has always been innovating and reaching out ? ? ? ? ?

Pioneered peer assessments (2002–) and crowdsourcing (2008–) Organizing workshops at SIGIR/CIKM Stimulated collaboration between the evaluation fora INEX tracks “moved” to CLEF, NTCIR, TREC In fact INEX will “merge” with CLEF in 2012!

17

IR Evaluation in Context

FIRE 2011, December 4, 2011

Outline • Motivation • Modern IR Evaluation ? Pushing the Boundaries of Cranfield ? INEX: Focused Retrieval • Evaluation in Context ? Contextual Evaluation ? Crowdsourcing • Conclusions

18

IR Evaluation in Context

FIRE 2011, December 4, 2011

Contextual Evaluation • Context may mean anything... ? Focus here on the searcher’s context ? and in particular on the context of those involved in test collection building • Three questions: ? (1) Can we capture the context of humans involved in test collection building? ? (2) Can we tailor IR evaluation to a specific collection and user group? ? (3) What is the impact of less controlled conditions of crowdsourcing platforms?

19

IR Evaluation in Context

FIRE 2011, December 4, 2011

Topic/Assessor context Questionnaires • Since 2007 INEX has collected extensive contextual data during topic creation and assessment ? Using dedicated questionnaires to elicit information ? issued directly after topic creation and assessment • Two questionnaires: ? on topic intent and expectations ? on assessor preferences for the topic and her/his judging style

20

IR Evaluation in Context

FIRE 2011, December 4, 2011

Main idea • Record the “context” of the humans already in the loop—the topic authors/assessors—by designing targeted questionnaires. ? The questionnaire data becomes part of the evaluation test-suite as valuable data on the context of the search requests. ? This can help explain and control some of the user or topic variation in the test collection. ? Moreover, it allows to break down the set of topics in various meaningful categories, e.g. those that suit a particular task scenario, and zoom in on the relative performance for such a group of topics. • So opening up “Cranfield’s box” makes the test collection more useful!

21

IR Evaluation in Context

FIRE 2011, December 4, 2011

22

Candidate Topic Questionnaire B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19

How familiar are you with the subject matter of the topic? Would you search for this topic in real-life? Does your query differ from what you would type in a web search engine? Are you looking for very specific information? Are you interested in reading a lot of relevant information on the topic? Could the topic be satisfied by combining the information in different (parts of) documents? Is the topic based on a seen relevant (part of a) document? Can information of equal relevance to the topic be found in several documents? How many articles in the whole collection do you expect to contain relevant information? Approximately how many relevant document parts do you expect in the whole collection? Could a relevant result be (check all that apply): a single sentence; a single paragraph; a single (sub)section; a whole article Can the topic be completely satisfied by a single relevant result? Is there additional value in reading several relevant results? Is there additional value in knowing all relevant results? Would you prefer seeing: only the best results; all relevant results; don’t know Would you prefer seeing: isolated document parts; the article’s context; don’t know Do you assume perfect knowledge of the DTD? Do you assume that the structure of at least one relevant result is known? Do you assume that references to the document structure are vague and imprecise?

IR Evaluation in Context

FIRE 2011, December 4, 2011

23

Post Assessment Questionnaire C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

C11 C12 C13 C14

Did you submit this topic to INEX? How familiar were you with the subject matter of the topic? How hard was it to decide whether information was relevant? Is Wikipedia an obvious source to look for information on the topic? Can a highlighted passage be (check all that apply): a single sentence; a single paragraph; a single (sub)section; a whole article Is a single highlighted passage enough to answer the topic? Are highlighted passages still informative when presented out of context? How often does relevant information occur in an article about something else? How well does the total length of highlighted text correspond to the usefulness of an article? Which of the following two strategies is closer to your actual highlighting: (I) I located useful articles and highlighted the best passages and nothing more, (II) I highlighted all text relevant according to narrative, even if this meant highlighting an entire article. Can a best entry point be (check all that apply): the start of a highlighted passage; the sectioning structure containing the highlighted text; the start of the article Does the best entry point correspond to the best passage? Does the best entry point correspond to the first passage? Comments or suggestions on any of the above (optional)

IR Evaluation in Context

FIRE 2011, December 4, 2011

Main Results • Only based on initial analysis ? Diversity of responses is large – larger than expected! – with some questions revealing radically different topic types and/or assessor strategies ? Impact of topic context on resulting evaluation isn’t very large ? Impact of assessor context is much more prominent ? MAP is robust – too robust? – so less change than expected ? Interesting swaps in top 10s (nDCG, P@10, ...) ? Data is not explored much, so there may be hidden treasures...

24

IR Evaluation in Context

FIRE 2011, December 4, 2011

Search Log Based Approach to Evaluation • Anyone offering information is interested in assessing its performance ? How well does my system satisfy the users’ information needs ? Standard benchmarks are not representative for the unique content and user population of a domain-specific collection • Why not use readily available interaction data in search logs to evaluate the domain specific search directly? ? That is: we can create a domain specific test collection tailored to the case at hand ? And use and reuse it under the same experimental conditions

25

IR Evaluation in Context

FIRE 2011, December 4, 2011

Log Based Test Collection Query (Topic) burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname burgerlijke stand suriname

File 1.05.11.16 1.05.11.16 3.223.06 1.05.11.16 2.05.65.01 1.05.11.16 1.05.11.16 1.05.11.16 3.231.07

Session ID 504d2bbe246d877bda09856ecc300612.5 212de7cab1c3709be3a95ac1a37a7873.1 22fe3a65b0c9223280f2dd576c57a012.35 2b844140ef7cfd438300da7ec6278de0.147 3784a93938e29a6aef8f50baa845a6f3.1 8b21ec51722f3a52cfaf35d320dfacb0.3 212de7cab1c3709be3a95ac1a37a7873.2 9235756a6dbdcffba9179d75108cd220.433 3c34072bef0d505467ca9394c392888d.2

• E.g., consider the query “civil registry of Suriname” ? 9 sessions from 8 IPs ? 6 sessions consult 1.05.11.16, with 38 clicks in total ? So collection 1.05.11.16 seems relevant for this query!

# 28 6 1 1 1 1 1 1 1

26

IR Evaluation in Context

FIRE 2011, December 4, 2011

Two Archival Test Collections • We extracted a (naive) log based test collection ? 50K unique queries plus associated clicks ? Interpret click as a pseudo relevance judgment ? Recorded how many clicks for graded measures • We build a traditional test collection ? We obtained reference email question to the archive ? plus responses by the archivists ? Selected 73 “topics” with associated relevant archival collections • Validation: basically same system rankings with both evaluations

27

IR Evaluation in Context

FIRE 2011, December 4, 2011

IR Evaluation in Context • Can we use interaction data in the log files to contextually evaluate the IR effectiveness, specifically for novices and expert users? • We distinguish two groups: ? “Experts”: Frequent visitors (10+ sessions) ? “Novices”: 1-Time visitors (1 session) • Reasonable approximation ? Frequent visitors will have some degree of experience ? 1-Time visitors predominantly casual visitors of the web site ? No particular claim on the exact expertise of each group

28

IR Evaluation in Context

FIRE 2011, December 4, 2011

Comparing Test Collections • Two sets of topics/qrels (1 month, Jan 2009) ? “Novices”: 1,388 topics with 1,775 relevant archival descriptions ? “Experts”: 1,701 topics with 3,053 relevant archival descriptions • So we will vary the topics and qrels, and see if the system rankings will differ ? Again, on MAP, results are quite stable... ? Some interesting upsets: feedback helps novices but not experts • Ongoing work on evaluating interactive information access – based on simulated and observed navigation patterns

29

IR Evaluation in Context

FIRE 2011, December 4, 2011

Outline • Motivation • Modern IR Evaluation ? Pushing the Boundaries of Cranfield ? INEX: Focused Retrieval • Evaluation in Context ? Contextual Evaluation ? Crowdsourcing • Conclusions

30

IR Evaluation in Context

FIRE 2011, December 4, 2011

Crowdsourcing Relevance Assessments

• Evaluation is mainly “relevance assessments” ? A human computation task suitable for crowdsourcing platforms

31

IR Evaluation in Context

FIRE 2011, December 4, 2011

Crowdsourcing: INEX Book Track (2008–) • Platforms (AMT, Crowdflower) offer ? Judgments at “any” scale ? Very fast: days/weeks, not a year ? Relatively low costs • Uncontrolled conditions ? Many workers ? Due to $: Malicious workers ? Relevance is subjective (70-80%) • Shown to work as well as editorial judgments

32

IR Evaluation in Context

FIRE 2011, December 4, 2011

Impact on Evaluation (1) Document Document Document

Assessor

1. Assessor/Topic variation

• In traditional test collections the assessor/topic variation is large ? Larger than the system variation we want to measure! ? So, create “optimal” conditions to ellicit relevance judgments

33

IR Evaluation in Context

FIRE 2011, December 4, 2011

0.6 0.4 0.2

Accuracy

0.8

1.0

HIT Design: SD versus FD

0.0

● ●

SD

FD

• Two designs: Full Design implementing quality control elements and Simple Design – with FD clearly superior to SD!

34

IR Evaluation in Context

FIRE 2011, December 4, 2011

Impact on Evaluation (2) Document Document Document

Continue?

yes

no

Assessor

1. Assessor/Topic variation 2. distribution of work

• In crowdsourcing, work is divided over many assessors ? Not 1 topic per judge, but many judges per topic ? Highly skewed: most do the minimal amount, some do a lot! ? How does this impact the resulting test collection?

35

IR Evaluation in Context

FIRE 2011, December 4, 2011

Impact of Assessor/Worker Distributions (cont’d) ● ●





● ● ●

● ●

● ●

● ● ●

● ●

● ● ●



● ●

● ●



0.10

Density (Workers)

0.20



All data SD FD





● ●







● ●



● ●









● ● ●







● ●

0.05







● ● ●



● ●

● ●





● ●● ●

2

5

10

20

●●



50

Number of HITs

• Work distribution is very skewed: 133 out of 263 workers do 1 HIT, 1 workers does 86...

36

IR Evaluation in Context

FIRE 2011, December 4, 2011

1.0

1.0

Impact of Assessor/Worker Distributions (cont’d) ●● ● ●

0.8

● ● ● ● ● ●











● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●

● ●

● ●

● ● ●





0.2

0.4

●● ●●

● ● ●

0.6



0.4



Accuracy

0.6





0.2

Accuracy

0.8

● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●●● ● ● ● ● ●●● ● ●● ● ● ● ●●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

●● ●

0



● ●



20



0.0

0.0

● ●●

40

#HITs

60

80



(0,8]

(8,16] (16,24]

#HITs

• Impact of # of HITs isn’t very large (on average)

(32,40]

37

IR Evaluation in Context

FIRE 2011, December 4, 2011

Impact on Evaluation (3) Population?

Document Document Document

Continue?

yes

no

Assessor

1. Assessor/Topic variation 2. distribution of work 3. selection of workers

• Different task conditions attract different types of assessors ? That is, our judges may differ in a fundamental way! ? This can have a massive impact on the resulting test collection...

38

Vol 466|1 July 2010

OPINION Most people are not WEIRD

M

uch research on human behaviour and psychology assumes that everyone shares most fundamental cognitive and affective processes, and that findings from one population apply across the board. A growing body of evidence suggests that this is not the case. Experimental findings from several disciplines indicate considerable variation among human populations in diverse domains, such as visual perception, analytic reasoning, fairness, cooperation, memory and the heritability of IQ1,2. This is in line with what anthropologists have long suggested: that people from Western, educated, industrialized, rich and democratic (WEIRD) societies — and particularly American undergraduates — are some of the most psychologically unusual people on Earth1. So the fact that the vast majority of studies use WEIRD participants presents a challenge to the understanding of human psychology and behaviour. A 2008 survey of the top psychology journals found that 96% of subjects were from Western industrialized countries — which house just 12% of the world’s population3. Strange, then, that research articles routinely assume that their results are broadly representative, rarely adding even a cautionary footnote on how far their findings can be generalized. The evidence that basic cognitive and motivational processes vary across populations has become increasingly difficult to ignore. For example, many studies have shown that Americans, Canadians and western Europeans rely on analytical reasoning strategies — which separate objects from their contexts and rely on rules to explain and predict behaviour — substantially more than non-Westerners. Research also indicates that Americans use analytical thinking more than, say, Europeans. By contrast, Asians tend to reason holistically, for example by considering people’s behaviour in terms of their situation1. Yet many long-standing theories of how humans perceive, categorize and remember emphasize the centrality of analytical thought. It is a similar story with social behaviour related to fairness and equality. Here, researchers often use one-shot economic experiments such as the ultimatum game, in which a player

decides how much of a fixed amount to offer a second player, who can then accept or reject this proposal. If the second player rejects it, neither player gets anything. Participants from industrialized societies tend to divide the money equally, and reject low offers. People from non-industrialized societies behave differently, especially in the smallest-scale nonmarket societies such as foragers in Africa and horticulturalists in South America, where people are neither inclined to make equal offers nor to punish those who make low offers4.

Recent developments in evolutionary biology, neuroscience and related fields suggest that these differences stem from the way in which populations have adapted to diverse culturally constructed environments. Amazonian groups, such as the Piraha, whose languages do not include numerals above three, are worse at distinguishing large quantities digitally than groups using extensive counting systems, but are similar in their ability to approximate quantities. This suggests the kind of counting system people grow up with influences how they think about integers1.

Costly generalizations

Using study participants from one unusual population could have important practical consequences. For example, economists have been developing theories of decision-making incorporating insights from psychology and social science — such as how to set wages — and examining how these might translate into policy5. Researchers and policy-makers should recognize that populations vary considerably in the extent to which they display certain biases, patterns and preferences in economic decisions, such as those related to optimism1. Such differences can, for example, © 2010 Macmillan Publishers Limited. All rights reserved

affect the way that experienced investors make decisions about the stock market6. We offer four suggestions to help put theories of human behaviour and psychology on a firmer empirical footing. First, editors and reviewers should push researchers to support any generalizations with evidence. Second, granting agencies, reviewers and editors should give researchers credit for comparing diverse and inconvenient subject pools. Third, granting agencies should prioritize cross-disciplinary, cross-cultural research. Fourth, researchers must strive to evaluate how their findings apply to other populations. There are several low-cost ways to approach this in the short term: one is to select a few judiciously chosen populations that provide a ‘tough test’ of universality in some domain, such as societies with limited counting systems for testing theories about numerical cognition1,2. A crucial longer-term goal is to establish a set of principles that researchers can use to distinguish variable from universal aspects of psychology. Establishing such principles will remain difficult until behavioural scientists develop interdisciplinary, international research networks for long-term studies on diverse populations using an array of methods, from experimental techniques and ethnography to brain-imaging and biomarkers. Recognizing the full extent of human diversity does not mean giving up on the quest to understand human nature. To the contrary, this recognition illuminates a journey into human nature that is more exciting, more complex, and ultimately more consequential than has previously been suspected ■ Joseph Henrich, Steven J. Heine and Ara Norenzayan are in the Department of Psychology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada. Joseph Henrich is also in the Department of Economics. e-mail: [email protected]

1. Henrich, J., Heine, S. J. & Norenzayan, A. Behav. Brain Sci. doi:10.1017/S0140525X0999152X (2010). 2. Henrich, J., Heine, S. J. & Norenzayan, A. Behav. Brain Sci. doi:10.1017/S0140525X10000725 (2010). 3. Arnett, J. Am. Psychol. 63, 602–614 (2008). 4. Henrich, J. et al. Science 327, 1480–1484 (2010). 5. Foote, C. L., Goette, L. & Meier, S. Policymaking Insights from Behavioral Economics (Federal Reserve Bank of Boston, 2009). 6. Ji, L. J., Zhang, Z. Y. & Guo, T. Y. J. Behav. Decis. Making 21, 399–413 (2008).

29

GRACIA LAM

To understand human psychology, behavioural scientists must stop doing most of their experiments on Westerners, argue Joseph Henrich, Steven J. Heine and Ara Norenzayan.

IR Evaluation in Context

FIRE 2011, December 4, 2011

Personality: Big Five Index

• Personality test: big five index (OCEAN): Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism

40

IR Evaluation in Context

FIRE 2011, December 4, 2011

1.0

5

Personality (cont’d)

4

0.8



0.6

● ● ● ●

0.4

● ● ●

0.0

1

0.2

2

3

Accuracy

● ●



Open.

Cons.

Extr.

Agree.

Neur.

1

1.5

2

2.5

3

3.5

4

Conscientiousness

• Openness, Conscientiousness, Agreeableness ∝ Accuracy ? Extraversion, Neuroticism leads to lower Accuracy

4.5

5

41

IR Evaluation in Context

FIRE 2011, December 4, 2011

Personality (cont’d) ●

3 2

4

3 2

4



5

FD

5

SD

1

1





Open.

Cons.

Extr.

Agree.

Neur.



Open.

Cons.

Extr.

Agree.

Neur.

• FD somewhat higher Openness, Conscientiousness, Agreeableness, lower Extraversion

42

IR Evaluation in Context

FIRE 2011, December 4, 2011

Personality and Demographics (cont’d) ●

5

India

5

US ●

3 2 1

4

3 2 1

4



Open.

Cons.

Extr.

Agree.

Neur.



Open.

Cons.





Extr.

Agree.

Neur.

• Our US workers have personality traits that are favorable for the task, relative to those from India – will this affect their work?

43

IR Evaluation in Context

FIRE 2011, December 4, 2011

Incompetent

Competent

Diligent

Lo – – 51

Hi Lo Lo 38

Hi Hi Lo 34

Hi Lo Hi 59

Hi Hi Hi 81

0.6 0.4 0.2 0.0

Accuracy

0.8

Sloppy

%Rel AvgTime Accuracy Workers

Spammer

1.0

Worker Stereotypes



spammer

sloppy

incomp

comp

diligent

• Simple binning of workers based fraction of useful labels, time (relative to design/conditions), and accuracy

44

IR Evaluation in Context

FIRE 2011, December 4, 2011

100

SD FD



50



● ● ●







● ● ● ● ●

20







● ● ●

10

#HITs

● ●

● ●

● ●

5



2



1

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Worker Stereotypes (2)

spammer

sloppy

incomp

comp

diligent

spammer

sloppy

• Conditions select a different type of workers!

incomp

comp

diligent

45

IR Evaluation in Context

FIRE 2011, December 4, 2011

1.0

0.6

Self-selection of Workers (cont’d)

0.6 0.4

0.2

0.3

Accuracy

0.4

0.8

0.5

SD FD

0.0

0.0

0.1

0.2

● ● ●

America

Asia

Europe



India.SD

US.SD



India.FD

• In fact, really distinct population in terms of background!

US.FD

46

IR Evaluation in Context

FIRE 2011, December 4, 2011

Self-selection of Workers (cont’d)

[Plot left out.]

• In terms of the stereotypes, we see a striking difference between the US and Indian workers (for this task, and condition) ? Use location as a qualifying requirement ? Address Indian worker with tailored conditions

47

IR Evaluation in Context

FIRE 2011, December 4, 2011

WrapUp: Crowdsourcing > t F) Design 1 1.3187 1.31873 17.2435 4.876e-05 *** hits 1 0.0020 0.00204 0.0266 0.8705 regionusin 1 2.1615 2.16149 28.2634 2.830e-07 *** Residuals 199 15.2189 0.07648 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

• Crowdsourcing conditions complicates relevance assessments ? Three interrelated aspects 1. Task conditions of the assessment tool 2. Work distributions amongst assessors 3. Self-selected population of assessors ? Need to control who does the work ? Complicates statistical analysis of the results...

48

IR Evaluation in Context

FIRE 2011, December 4, 2011

Outline • Motivation • Modern IR Evaluation ? Pushing the Boundaries of Cranfield ? INEX: Focused Retrieval • Evaluation in Context ? Contextual Evaluation ? Crowdsourcing • Conclusions

49

IR Evaluation in Context

FIRE 2011, December 4, 2011

What Have We Done Today? • IR Evaluation in Context ? Looked at the context of humans involved in test collection building – the “messy” part of Cranfield ? Capturing context using questionnaires useful for analysis and reuse ? Tailoring test collections to particular use and users is important – certainly for professional search ? Crowdsourcing is a powerful tool but must be used with extreme care ? MAP should be used, but not exclusively! • Many thanks to Gabriella Kazai and MSRC for the crowdsourcing work!

50

IR Evaluation in Context

FIRE 2011, December 4, 2011

References J. Henrich, S. J. Heine, and A. Norenzayan. Most people are not WEIRD. Nature, 466(7302):29, 2010. J. Kamps, M. Lalmas, and B. Larsen. Evaluation in context. In M. Agosti, J. Borbinha, S. Kapidakis, C. Papatheodorou, and G. Tsakonas, editors, Proceedings of the 13th European Conferences on Digital Libraries (ECDL 2009), volume 5714 of LNCS, pages 339–351. Springer Verlag, Berlin, Heidelberg, 2009. G. Kazai, J. Kamps, M. Koolen, and N. Milic-Frayling. Crowdsourcing for book search evaluation: Impact of hit design on comparative system ranking. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York NY, 2011a. G. Kazai, J. Kamps, M. Koolen, and N. Milic-Frayling. Worker types and personality traits in crowdsourcing relevance labels. In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011). ACM Press, New York NY, 2011b. J. Zhang and J. Kamps. A search log-based approach to evaluation. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, and I. Frommholz, editors, Proceedings of the 14th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2010), volume 6273 of LNCS, pages 248–260. Springer Verlag, Berlin, Heidelberg, 2010a. J. Zhang and J. Kamps. Search log analysis of user stereotypes, information seeking behavior, and contextual evaluation. In N. J. Belkin and D. Kelly, editors, Proceedings of the 3rd Symposium

51

IR Evaluation in Context

FIRE 2011, December 4, 2011

on Information Interaction in Context (IIiX 2010), pages 245–254. ACM Press, New York, 2010b.

52

IR Evaluation in Context

FIRE 2011, December 4, 2011

Shameless Self-promotion... • INEX 2011 Workshop ? Shlomo Geva, Ralf Schenkel and myself ? December 12–14 in Saarbruecken – Still in time to participate this summer! ? https://inex.mmci.uni-saarland.de/

• Information Interaction in Context (IIiX) 2012 ? Wessel Kraaij and myself ? August 21–24 in Nijmegen – Papers due early April! ? http://iiix2012.org/

• INEX/CLEF 2012 ? Shlomo Geva, Ralf Schenkel and myself ? September 17–20 in Rome – New cycle start early next year! ? https://inex.mmci.uni-saarland.de/

53

IR Evaluation in Context

FIRE 2011, December 4, 2011

Job Openings at the University of Amsterdam • Large project on Web Archive Retrieval Tools http://nwo.nl/catch/webart/ ? ? ? ? ?

Fully funded Postdoc (3 yrs), PhD (4 yrs), Programmer (4 yrs) Collaborate with National Library and New Media researchers Programmer: Hadoop version of MonetDB/Pathfinder PhD: interactive complex query construction/result exploration Postdoc: Link to use-case of Web (Archive) research(ers)

• Project on Search with Structured Data (INEX++) ? Postdoc for 1.5 (or 2) years ? Work on connecting IR/DB/Web approaches to searching structured/linked data

54