Human. Genetic Diversity Joint Report

87604$$332 P332 GL 87604 30-05-06 12:39:39 PDF S. J. Mack A. Sanchez-Mazas D. Meyer R. M. Single Y. Tsai H. A. Erlich Authors’ addresses Steven J...
Author: Susan Pope
5 downloads 1 Views 1MB Size
87604$$332 P332

GL 87604

30-05-06 12:39:39 PDF

S. J. Mack A. Sanchez-Mazas D. Meyer R. M. Single Y. Tsai H. A. Erlich

Authors’ addresses

Steven J. Mack1,2, Alicia Sanchez-Mazas3, Diogo Meyer4, Richard M. Single5, Yingssu Tsai3, Henry A. Erlich1,2 1 Children’s Hospital Oakland Research Institute, Oakland CA, 2Department of Human Genetics, Roche Molecular Systems, Alameda, CA, 3Laboratory of Anthropology, Genetics and Peopling history, Department of Anthropology and Ecology, University of Geneva, Switzerland, 4Department of Integrative Biology, University of California, Berkeley CA, and 5Department of Medical Biostatistics, University of Vermont, Burlington, VT

Acknowledgements

This work was supported by NIH shared resource grant U24 A149213 and by FNS (Switzerland) grant .3100-49771.96. We wish to thank the following IHWG AHGDC participants for their contribution of data to the IHWG: D. Adorno, S. Agrawal, T. Akesaka, A. Arnaiz-Villena, N. Bendukidze, C. Brautbar, T. L. Bugawan, J. Cervantes, M. Crawford, E. Donadi, S. Easteal, H. A. Erlich, M. FernandezVina, X. Gao, E. Gazit, C. Gorodezky, M. Hammond, E. Ivaskova, A. Kastelan, V. I. Konenkov, M. H. S. Kraemer, D. Kumashiro, R. Lang, Z. Layrise, M. S. Leffel, M. Lin, M. L. Lokki, L. Louie, M. Luo, S. J. Mack, M. Martinetti, J. McCluskey, N. Mehra, D. Middleton, E. Naumova, Y. Paik, M. H. Park, M. L. PetzlErler, A. Sanchez-Mazas, G. Saruhan, M. L. Sartakova, M. Schroeder, U. Shankarkumar, S. Sonoda, J. Tang, E. Thorsby, J.M. Tiercy, K. Tokunaga, E. Trachtenberg, V. Trieu An, B. Vidan-Jeras, and Y. Zaretskaya. In addition, we thank D. Gjertson, J. Hollenbach, A. SanchezMazas, S. Tonks, and E. Trachtenberg for providing access to 12th Workshop datasets.

HLA 2004: Immunobiology of the Human MHC. Proceedings of the 13th International Histocompatibility Workshop and Congress

C2 JR332

13th International Histocompatibility Workshop Anthropology/Human Genetic Diversity Joint Report Chapter 2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Section 1. Description of datasets

From 1998 through 2002, 110 population datasets representing 13,481 sampled individuals were submitted to the International Histocompatibility Working Group (IHWG) Anthropology/Human Genetic Diversity Component (AHGDC) for analysis as part of the 13th International Histocompatibility Workshop (13W) by 39 participating laboratories. For the most part, these datasets represent high to allele-level genotyping data at subsets of the HLA-A, C, B, DRB1, DQA1, DQB1, DPA1 and DPB1 loci. Analyses for 95 of these population samples (referred to afterward as 13W datasets, described in Table 1 and provided in Appendix C), representing 12,225 individuals, are presented in subsequent chapters (4–7). The remaining 15 population samples were typed at serological to mediumlevel resolution, or were missing information, and were excluded from analysis. Table 2 describes a set of 48 supplementary datasets (originally genotyped as part of the 12th International Workshop (12W) Anthropology Component), representing 5774 sampled individuals that were included in these analyses (although the results of analyses including these datasets are not always reported here), as well as an additional 20 12W datasets that were included in the linguistics-related analyses presented in chapter 7. These 12W datasets were chosen to supplement global regions (see below and Figure 1) for which low numbers of 13W datasets were analyzed. The ‘‘map number’’ assigned to each population in the 12W Anthropology Report (1) is provided on Table 2 for clarification (12W.). The number of individuals typed per locus for each of the 48 12W and 95 13W datasets is described in Table 3. The latter ranged in size (n) from 12 to 1000 sampled individuals, with a median value of 98, while n for the 48 12W datasets ranged from 15 to 1012, with a median value of 82.

1

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 1. Description of 13W population datasets (nΩ95 populations)

Region

Population dataset name

Labcode

01.SS-Africa

Kenyan_142

CANLUO

12th IHWS1 SubNo. mission samples Country/Region Kenya/different areas

01.SS-Africa

Mandenka

CHETIE

YES

01.SS-Africa 01.SS-Africa

01.SS-Africa

Shona Dogon Kenyan-Lowlander (Luo) Kenyan-Highlander (Nandi) Ugandan Zambian North American (Afr_descent)

USAMFV

01.SS-Africa

Rwandan

USATNG

01.SS-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa

Zulu Algerian_98 Moroccan_94 Moroccan_99 Chaouya Metalsa

ZAFHAM CHESAN CHESAN ESPARN ITAADO ITAADO

03.Europe

Bulgarian_Gipsy

BGRNAU

03.Europe 03.Europe 03.Europe

Czech Georgian Finn_89

CZEIVS CZEIVS FINLOK

03.Europe

Croatian

HRVKAS

03.Europe

Slovenian

SVNJER

03.Europe

UKIMID

03.Europe

Irish North American (Eur_descent) Cuban_(Eur_ descent)

04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia

01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa

ª1.3

36.8 ª12

1

25.9 ª3.6

Niger-Congo ± Central ± Shona 2 Niger-Congo ± Central ± Dogon 1

ª18 14.4

USAMFV

Kenya/Kanyawegi

ª0.11

34.63 Nilo-Saharan ± Nilotic

2

USAMFV USAMFV USAMFV

Kenya/Kipsamoite Uganda/Kampala Zambia/Lusaka USA (American Red Cross)

35.01 Nilo-Saharan ± Nilotic 32.4 Mainly Niger-Congo 28.2 Niger-Congo ± Central ± South Indo-European ± Germanic ± English Niger-Congo ± Central ± South ª1.6 30.5 ± Kinyarwanda (ΩRwanda) Niger-Congo ± Central ± South ª30 31 ± Zulu 35.5 ª0.4 Afro-Asiatic ± Berber 30.3 9.35 Afro-Asiatic ± Berber 33.2 ª8.5 Afro-Asiatic ± Semitic ± Arabic 33.04 ª7.37 Afro-Asiatic ± Semitic ± Arabic 35.3 ª4 Afro-Asiatic ± Berber ± Tarifit Indo-European ± Slavic ± 42.7 23.3 Bulgarian

2 3 3

79 235 501

Bulgaria/Sofia Czech Republic/ Praha Georgia/Tbilisi Finland

0.33 0.2 ª15

14.3 44.9 25.1

Croatia/Zagreb

45.2

15.5

46

14

USAMFV

Slovenia/Ljubljana Northern Ireland/all regions USA/American Red Cross

UKIMID

Cuba/Havana

21.5

Kurdish Druze Israeli_Jew Turk

CZEIVS ISRBRA ISRGAZ TURSAR

41.7 32 32 41

44.9 34.5 34.5 28.6

04.SW-Asia

Omani

UKIMID

Georgia/Tbilisi Israel Israel Turkey/Marmara Oman/various regions

21

57

04.SW-Asia

New_Dehli

USAERL

28.6

77.2

04.SW-Asia

South_Indian

USAERL

17.5

78.5

04.SW-Asia

Tamil

ZAFHAM

05.SE-Asia

Ami

TWNLIN

05.SE-Asia

Atayal

TWNLIN

05.SE-Asia

Bunun

TWNLIN

05.SE-Asia 05.SE-Asia

Hakka Minnan

TWNLIN TWNLIN

India/New Delhi India/Andhra Pradesh, Golla South Africa/ Durban Taiwan/Hualien, Taitung Taiwan/Wulai, Chenshih, Wufen Taiwan/Hsin-I, Taitung Taiwan/Hsinchu, Pintung Taiwan/Taipei

05.SE-Asia

Paiwan_51

TWNLIN

05.SE-Asia

Pazeh

TWNLIN

05.SE-Asia 05.SE-Asia

Puyuma_49 Rukai

TWNLIN TWNLIN

05.SE-Asia

Saisiat

TWNLIN

05.SE-Asia

Siraya

TWNLIN

2

Complexity

Mainly Niger-Congo Niger-Congo ± Mande ± Mandenka

50 41.7 60.2

03.Europe

Linguistic family/Language

USALOU USAMFV

Rwanda/Kigali South Africa/ Durban Algeria/Oran Morocco, Souss Morocco/El Jadida Morocco/Settat Morocco/Nador

12.6

Long3

Senegal/Ke´dougou Zimbabwe/ Mashonaland Mali/Bandiagara

YES YES YES

122

Lat2

YES

YES

21

89

54.7

ª6.7

ª80

Indo-European ± Slavic ± Czech South Caucasian ± Georgian Uralic-Yukaghir ± Uralic ± Finnish Indo-European ± Slavic ± SerboCroatian Indo-European ± Balto-Slavic ± Slovene Indo-European ± Germanic ± English Indo-European ± Germanic ± English Indo-European ± Italic ± Spanish Indo-European ± Indo-Iranian ± Kurdish Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Semitic ± Hebrew Altaic ± Turkic ± Turkish

2

3 3 2 2 2 2 2 2 2 2 2 3 3 2 3 3 3 2 3 3 2

31

Afro-Asiatic ± Semitic ± Arabic Indo-European ± Indo-Iranian ± Indic Elamo-Dravidian ± Dravidian ± South Central ± Telugu Elamo-Dravidian ± Dravidian ± South ± Tamil

25.1

122

Austronesian ± Paiwanic ± Ami

1

24.9

122

Austronesian ± Atayal

1

23.6

121

1

24.8 25.1

121 122

Taiwan/Lai-I Taiwan/Puli, Liyutan, Fengyuan

22.5

121

24

121

Taiwan/Peinan Taiwan/Wutai Taiwan/Wufen, Nanchuang Taiwan/Tanei, Tsochen

22.8 22.8

121 121

24.6

121

23.1

120

Austronesian ± Paiwanic ± Bunun Sino-Tibetan ± Chinese southwestπcentral ± Cantonese Sino-Tibetan ± Chinese southeast Austronesian ± Paiwanic ± Paiwan Austronesian ± Paiwanic Sinicized ± Pazeh Austronesian ± Paiwanic ± Puyuma Austronesian ± Tsouic ± Rukai Austronesian ± Paiwanic ± Saisiyat Austronesian ± Paiwanic Sinicized ± Siraya

HLA 2004: Immunobiology of the Human MHC

ª30

3 3 2 3

3 3 1 3 1 1 1 3

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 1. Continued

Region

Population dataset name

Labcode

05.SE-Asia

Tao (Yami)

TWNLIN

05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia

Thao Toroko Tsou Han_Chinese_149 Han_Chinese_572

05.SE-Asia

Malay Singapore_(Chin_ descent)

05.SE-Asia 05.SE-Asia

12th IHWS1 SubNo. mission samples Country/Region

Lat2

Long3

Taiwan/Lan-Yu

22

122

TWNLIN TWNLIN TWNLIN UKIMID UKIMID

Taiwan/Yuchih Taiwan/Hsiulin Taiwan/Tapang Singapore Hong-Kong

23.9 23 23.5 1.28 22.2

121 120 121 104 114

USAERL

Malay Peninsula

3

103

USAERL

Singapore

1.28

104

12

100

23.6

113

21

106

Linguistic family/Language Austronesian ± Western MalayoPolynesian ± Yami Austronesian ± Paiwanic Sinicized ± Thao Austronesian ± Atayal Austronesian ± Tsouic ± Tsou Sino-Tibetan ± Chinese Sino-Tibetan ± Chinese Austronesian ± Western MalayoPolynesian ± Sundic ± Malay

Complexity 1 3 1 1 3 3 2

Sino-Tibetan ± Chinese 3 Tai-Kadai ± Tai-Sek ± Southwestern Tai ± Chiang Saeng 3 Indo-European ± Germanic ± English 3

05.SE-Asia

Thai USAERL North American (Asi_descent) USAMFV

05.SE-Asia

Chinese

USATRA

Thailand USA (American Red Cross) China/South, Canton region

05.SE-Asia

Kinh

VNMVTA

Viet Nam/Hanoi

05.SE-Asia

Muong

VNMVTA

105

06.Oceania

Ivatan

TWNLIN

06.Oceania

East_Timorese

USAERL

Viet Nam/Hoa Binh 20.8 Philippines/Batan Island, Baso 20.5 Indonesia/East Timor/Nusa Tenggara ª9

06.Oceania 06.Oceania

Filipino Indonesian

USAERL USAERL

Philippines/Manilla Indonesia

14.6 ª3

121 120

06.Oceania

Moluccan

USAERL

ª2

127

06.Oceania

PNG_Highlander

USAERL

ª5

145

06.Oceania

PNG_Lowlander_ 48

USAERL

ª10

147

Indo-Pacific

06.Oceania

PNG_Lowlander_ 95

USAERL

Indonesia/Moluccas Melanesia/New Guinea, Highlands Melanesia/New Guinea, Lowlands, many areas Melanesia/New Guinea, Lowlands, Wosera

Sino-Tibetan ± Chinese Austroasiatic ± Mon-Khmer ± Vietnamese Austroasiatic ± Mon-Khmer ± Muong Austronesian ± Extra-Formosan ± Proto-Filipino ± Ivatan Austronesian ± Central MalayoPolynesian ± Flores-Lembata Austronesian ± Western MalayoPolynesian ± Tagalog Austronesian ± Malayo-Polynesian Austronesian ± Central MalayoPolynesian ± Southwest Maluku Indo-Pacific ± Trans-New Guinea ± East New Guinea Highlands

ª5

150

06.Oceania

Samoa

USARWL

Indo-Pacific ± Sepik-Ramu ± Sepik ± Middle Sepik ± Ndu 2 Austronesian ± Eastern MalayoPolynesian ± Oceanic ± Samoan 2

07.Australia

Australian_Cape_ York

USAGAO

Melanesia /Samoa Australia/ Queensland/Cape York Australia/Northern Territory/GrooteEylandt Australia/Western Australia/Kimberley Australia/Northern Territory/Yuendumu

14.2

122 125

ª171

ª13

143

ª14

137

ª17

127

ª24

132

07.Australia

Australian_Groote_ Eylandt USAGAO Australian_ Kimberley USAGAO Australian_ Yuendumu USAGAO

08.NE-Asia

Okinawan

USAPAK

Hawai/Honolulu

26

128

08.NE-Asia 08.NE-Asia

Ryukuan Buriat

JPNTKN JPNTKN

Japan/Okinawa Mongolia/Angarsk

26.4 47.6

128 119

08.NE-Asia

Korean

KORPMH

37.6

127

08.NE-Asia 09.N-America

Tuva Lacandon

USAERL MEXGOR

Korea/Seoul Russia/NovosibirskKyzyl Mexico/Chiapas

09.N-America

Seri

MEXGOR

09.N-America

Canoncito

USAERL

09.N-America

Maya

USAERL

09.N-America 09.N-America

Pima_17 Pima_99

09.N-America

07.Australia 07.Australia

YES

6

50 16.7

95 ª91

Mexico/Isla Tiburon USA/Arizona, Grand Canyon

29

ª112

36.1

ª112

20

ª90

USAERL USAERL

Mexico/Yucatan USA/Arizona, Gila River USA/Arizona

33 33

ª113 ª112

Sioux

USAERL

USA/South Dakota

43.6

ª97

09.N-America 09.N-America

Zuni Yupik

USAERL USALEF

35 60

ª107 ª160

09.N-America

Amerindian

USAMFV

USA/New Mexico Alaska/South USA/American Red Cross

Australian ± Pama-Nyungan Australian ± non-Pama-Nyungan ± Anindhilyaguan Australian ± non-Pama-Nyungan ± Wororan and Nyulnyulan Australian ± Pama-Nyungan ± Ngargan Altaic ± Korean-Japanese ± Ryukyuan Altaic ± Korean-Japanese ± Ryukyuan ± Amami-Okinawan Altaic ± Mongol Altaic ± Korean-Japanese ± Korean

3 3 2 1 2 2 3 2 2 2

2 1 2 2 2 1 1 2

Altaic ± Turkic ± Tuvinian Amerind ± Maya ± Lacandon Amerind ± North Amerind ± Hokan ± Seri Na-Dene ± Athapascan ± Navajo ± Canoncito Amerind ± North Amerind ± Penutian ± Mayan Amerind ± Central Amerind ± Uto-Aztecan Amerind ± North Amerind Amerind ± North Amerind ± Almosan-Keresiouan ± Dakota Amerind ± North Amerind ± Penutian ± Zuni Eskimo-Aleut ± Eskimo ± Yupik

2 1

Amerind

3

HLA 2004: Immunobiology of the Human MHC

1 1 2 1 1 2 1 2

3

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 1. Continued 12th IHWS1 SubNo. mission samples Country/Region Lat2 Brazil/Mato Grosso do Sul, Amambai, Lima˜o Verde ª23 Brazil/Mato Grosso do Sul, Porto Lindo, Amambai, ª24

Region

Population dataset name

Labcode

10.S-America

Guarani-Kaiowa

BRAPTZ

10.S-America

Guarani-Nandeva

BRAPTZ

10.S-America

Ticuna

USAERL

Brazil/Tabatinga

10.S-America

Central American

USAERL

10.S-America

Bari Brazilian_(Afr-Eur_ descent) Mexican

VENLAY

Costa Rica/Panama 3 Venezuela/ Saimadodyi & Campo Rosario 9.8 Brazil/Ribeirao Preto ª23 Mexico/Mexico City 19.4 Brazil/Belo Horizonte ª10

ª55

21.5

ª80

11.Other 11.Other 11.Other

BRADON MEXGOR

Brazilian UKIMID Cuban_(Afr-Eur_ 11.Other descent) UKIMID Cuba/Havana North American USA (American 11.Other (His_descent) USAMFV Red Cross) 1 These same samples were also reported in the 12th IHWS (see reference .1) 2 LAT indicates latitude in degrees north or south. 3 LONG indicates longitude in degrees east or west.

The mean number of loci genotyped in the 13W datasets was 3.9, while the mean for the 48 12W datasets was 3.6. Many of the population samples submitted as 13W datasets had been previously typed as part of the 11th International Workshop (11W) or 12W, primarily at class II loci. Submitting laboratories were encouraged to use IHWG methods (see section A, HLA Typing and Informatics) for new typing. In cases where participating laboratories were unable to carry out molecular-level typing, genotyping was accomplished by a second laboratory. All laboratories using IHWG reagents were required to type a subset of the IHWG Quality Control (QC) cell panel cells at 92% accuracy before new genotyping data could be accepted for analysis. Data that had been generated before the start of the IHWG, or which had been generated using non-IHWG methods, was classed as non-qualified data (or ‘‘Available Data’’) and was accepted in the form of four digit (relatively unambiguous) genotype assignments. Data generated with IHWG reagents was submitted in the form of probe-reactivity patterns, formatted using either the RLS software or the IHWG Virtual DNA Analysis (VDA) component’s SCORE software (Section Joint R, Virtual DNA Analysis Report). In these cases, alleles and genotypes were inferred by the software. Approximately 53% (50/95) of the IHWG datasets had been typed subsequent to the 12W and were accepted as non-qualified data. Of the remaining 45 datasets, 41 were typed at class I loci using IHWG RLS reagents, 3 using IHWG SSOP reagents, and 1 by sequencing based typing (SBT) (these typing methods are described in the Tech-

4

HLA 2004: Immunobiology of the Human MHC

ª5

Complexity

Long3

Linguistic family/Language

ª55

Amerind ± Equatorial-Tucanoan ± Tupi-Guarani ± Guarani 1

ª55 ª70 ª65 ª73 ª48 ª99

Amerind ± Equatorial-Tucanoan ± Tupi-Guarani ± Guarani 1 Amerind ± Equatorial-Tucanoan ± Ticuna 1 Amerind ± Chibcan-Paezan ± Chibchan 1 Amerind ± Chibcan-Paezan ± Chibchan ± Bari (Motilon) Indo-European ± Italic ± Portuguese Indo-European ± Italic ± Spanish Indo-European ± Italic ± Portuguese

1 3 2 3

Indo-European ± Italic ± Spanish 3 Indo-European ± Germanic ± English 3

nology joint report, sections A.2, A.3, and A.5). In many instances, the results of these methods were verified by SBT. Class II typing in many of these datasets was carried out using 12W or local reagents. Datasets typed using local reagents were submitted in a format similar to non-qualified data (relatively unambiguous assignments). Thanks to the efforts of the submitting laboratories, complete background information (especially geographic and linguistic information) is available for most population samples. In those cases where well-defined geographic information was not available, the latitude and longitude of a close locality or the capital city of the country was used. Linguistic assignments were based on information provided by each laboratory when available (see below), and either Ruhlen’s classification scheme for linguistic families (2) or the Ethnologue (3) was consulted when no such information was available. There were a few cases where the broad linguistic family could not be specified with certainty. For example, ‘‘Kenyans’’ (i.e., the Kenyan_142 sample) and ‘‘Ugandans’’ may include AfroAsiatic- and Nilo-Saharan-speakers, in addition to Niger-Congo speakers. These populations were classified as ‘‘Mainly Niger-Congo’’ based on the proportion of Niger-Congo speakers in these nations when compared to Afro-Asiatic and NiloSaharan languages (as shown in Table 4 (3)). In other cases, only a broad linguistic characterization was possible (e.g., ‘‘Indo-Pacific’’). A summary of these data is provided in Table 5, which describes the number of 13W populations that correspond to linguistic families in each geographic region

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 2A. Description of 12W datasets supplementing all analyses (nΩ48 populations) Region

12W labcode

12W.

Country/Region

01.SS-Africa 01.SS-Africa 01.SS-Africa

Population Name Colombian (Afr_descent) Amhara Baganda

12trachtenberg ITAGDS USALOU

* 155 226

Colombia Ethiopia/Arssi Uganda/Kampala

01.SS-Africa 02.N-Africa

Mukongo Egyptian

BELDPT 12ferencik

*

02.N-Africa 02.N-Africa 03.Europe 03.Europe 03.Europe

Algerian_100 Bedouin Italian North_Italian Finn_143

FRAMER EGYELC 12ferrara ITAFER FINTII

55 171 * 45 29

03.Europe

Hvar_Island_Croatian

CROKAS

20

03.Europe 03.Europe

Krk_Island_Croatian Polish

CRORUD FRADDC

19 156

03.Europe

Pomaki

GRESTV

164

03.Europe 03.Europe 03.Europe 03.Europe 04.SW-Asia

Provincial_French Spanish_100 Spanish_133 Spanish_Basque Sri_Lankan

FRADDC SPAARN SPALAR SPABER 12hashemi

229 104 82 120 *

04.SW-Asia

Zoroastrian

12hashemi

*

04.SW-Asia 04.SW-Asia

North_Indian Ashkenazi_Jews

12ferencik ISRBRA

* 116

04.SW-Asia 04.SW-Asia 04.SW-Asia

Hunza-Burushaski Libyan_Jews Moroccan_Jews

PAKQAS ISRBRA ISRBRA

214 153 117

04.SW-Asia

Sindhi

PAKQAS

213

05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia

South_Han Taiwanese Ami_14 Paiwan_64

12johnlee 12johnlee JAPSEK JAPSEK

* * 243 241

05.SE-Asia 05.SE-Asia

Puyuma_15 Thai-Chinese

JAPSEK THACHI

08.NE-Asia

Japanese

12juji

08.NE-Asia

Japanese_Kobe

12araki

47

08.NE-Asia 08.NE-Asia

Halkh Han

JAPTSU JAPINK

184 137

08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia

Hoton Kazakh Korean Mongolian

JAPTSU JAPINK JAPJUJ JAPJUJ

112 135 240 16

08.NE-Asia 08.NE-Asia

Tuvinian Uygur

RUSKON JAPINK

09.N-America

Mixe

09.N-America 09.N-America 10.S-America 10.S-America

1

Long.

Linguistic family/Language

4.34 7.58 0

ª74 39 32.5

ª4.2 30.1

15.2 31.1

Indo-European ± Italic ± Spanish Afro-Asiatic ± Semitic ± Amharic Niger-Congo ± Bantu ± Luganda Niger-Congo ± Bantu ± Kikongo, Lingala, Tsheluba Afro-Asiatic ± Semitic ± Arabic

36.5 31.2 41.5 45.7 65

3 27.2 12.3 9.7 25.3

Croatia/Hvar

43.1

16.3

Croatia/Krk Poland Greece/Northern_ Xanthis France/Ile_de_ France Spain Spain Spain India/Sri Lanka

45 52.3

Zaire/Kinshasa Egypt Algeria/Algiers_ area Egypt/Siwa Italy Italy/Bergamo Finland/Oulu

Canada India/Uttar Pradesh, Lucknow Israel Pakistan/ North:Gilgit Israel Israel

Lat.

Complexity 2 2 2 2 3

14.5 16.5

Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Semitic ± Arabic Indo-European ± Italic ± Italian Indo-European ± Italic ± Italian Uralic-Yukaghir ± Finnic ± Finnish Indo-European ± Slavic ± Croatian Indo-European ± Slavic ± Croatian Indo-European ± Slavic ± Polish

3 3

41.1

24.6

Indo-European ± Greek

2

48 40.3 40.3 43 6.7

0 ª3.4 ª3.4 ª2 79.9

3 3 3 2 2

45.3

ª73

Indo-European ± Italic ± French Indo-European ± Italic ± Spanish Indo-European ± Italic ± Spanish Basque Elamo-Dravidian ± Dravidian Indo-European ± Germanic ± English Indo-European ± Indo-Iranian ± Indic Afro-Asiatic ± Semitic ± Hebrew

2 3 2

27.5 31.5

82 35.1

36.2 31.5 31.5

74.4 35.1 35.1

24.5

67

Burushaski Afro-Asiatic ± Semitic ± Hebrew Afro-Asiatic ± Semitic ± Hebrew Indo-European ± Indo-Iranian ± Indic ± Sindhi

3 2 3 2 3 3

3 2 3

24.5 23 22.5 23.5

118 120 121 121

Sino-Tibetan ± Chinese Sino-Tibetan ± Chinese Austronesian ± Paiwanic Austronesian ± Paiwanic

3 3 1 1

245 85

Pakistan/Sindh China/Fujian, Xiamen China/Fujian Taiwan/East Taiwan/South Taiwan/South_ West Thailand/Centre

22 15

121 100

1 3

*

Japan

35.3

140

34

135

3

47.5 43.4

107 87.4

Altaic ± Mongolian Sino-Tibetan ± Sinitic ± Chinese

2 3

48 48 46 47.5

92 68 127 107

Altaic Altaic Altaic Altaic

Mongolian Turkic ± Kazakh Korean Mongolian

2 3 3 3

236 17

Japan/Kobe Mongolia/ Ulaanbaatar China/North Mongolia/Uvs_ Aimag Russia/Kazakhstan China/Heilongjiang Mongolia Russia/Tuvinian/ several_Regions China

Austronesian ± Paiwanic Sino-Tibetan ± Chinese Altaic ± Korean-Japanese ± Japanese Altaic ± Korean-Japanese ± Japanese

51 43.4

95 87.4

1 2

USAKLI

209

Mexico/Oaxaca

17.1

ª97

Mixteca

USAKLI

211

Mexico/Oaxaca

17.1

ª97

Zapotec Colombian Ecuadorian

USAKLI 12trachtenberg 12trachtenberg

210 * *

Mexico/Oaxaca Colombia Ecuador

17.1 4.34 ª0.2

ª97 ª74 ª78

Venezuela/Zulia

10

ª73

Altaic ± Turkic ± Tuva Altaic ± Turkic ± Uygur Amerind ± North Amerind ± Penutian ± Mixe Amerind ± Central Amerind ± Oto-Manguean ± Mixtec Amerind ± Central Amerind ± Oto-Manguean ± Zapotec Indo-European ± Italic ± Spanish Indo-European ± Italic ± Spanish Amerind ± Ge-Pano-Carib ± Yupan

10.S-America Yukpa VENLAY 185 * No 12W map number, data provided by corresponding laboratory.

± ± ± ±

HLA 2004: Immunobiology of the Human MHC

2

3

1 1 1 1 1 1

5

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 2B. Description of 12W datasets supplementing linguistics-related analyses (nΩ20 populations) Region 01.SS-Africa 01.SS-Africa

Population Name Merina Oromo

12W. 220 154

Country/Region Madagascar/Central Highlands Ethiopia/Arssi

Lat. ª18 7.58

Long. 47.2 39

01.SS-Africa 02.N-Africa 02.N-Africa 02.N-Africa 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 04.SW-Asia

Zairian Egyptian_Copts Egyptian_Delta Mzab Belgian Bulgarian French_North Greek_Attiki Italian_Pavia Portuguese_Coimbra Portuguese_South Sardinian Swiss Lebanese

234 221 109 3 158 119 61 165 42 195 65 46 196 145

Zaire/several regions Egypt (in USA) Egypt/Delta Algeria/South Sahara Belgium/Namur & Luxembourg Bulgaria France/Northern Greece/Attiki Italy/Pavia Portugal/Coimbra Portugal/South Italy/Sardinia Switzerland/Geneva Lebanon

ª4.2 30 31 32.2 50.3 42.4 50.4 38 45.1 40.1 39 40 46.2 33.5

15.2 31 31.3 3.4 4.52 23.2 3.05 25.3 9.09 ª8.3 ª9 9 6.1 35.3

04.SW-Asia 06.Oceania 08.NE-Asia 10.S-America

Punjabi Trobriand Manchu Kaingang

38 111 15 9

India/Punjab Melanesia/Papua New Guinea/islands China/Heilongjiang Brazil/South: Parana

28.4 ª8.3 45.2 ª25

77.1 151 126 ª52

(defined in section 2.II below and shown in Figure 1). In addition, detailed descriptions of each 13W population (summarizing history, sampling and genotyping methods, and

Figure 1. Boundaries for global regions.

6

HLA 2004: Immunobiology of the Human MHC

Linguistic Family/Language Austronesian ± Malagasy Afro-Asiatic ± Cushitic ± Oromo Niger-Congo ±Bantu ± Kikongo, Lingala, Tsheluba Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Berber ± Mzab Indo-European ± Italic ± French Indo-European ± Slavic ± Bulgarian Indo-European ± Italic ± French Indo-European ± Greek Indo-European ± Italic ± Italian Indo-European ± Italic ± Portuguese Indo-European ± Italic ± Portuguese Indo-European ± Italic ± Sardinian Indo-European ± Italic ± French Afro-Asiatic ± Semitic ± Arabic Indo-European ± Indo-Iranian ± Indic ± Punjabi Austronesian ± Kilivila Altaic ± Tungus ± Manchu Amerind ± Ge-Pano-Carib ± Kaingang

preliminary analyses) are included in the following chapter (Chapter 3, Short Population Reports). The report for each 13W population is referenced by the 13W. in the table of contents.

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 3. Number of individuals typed at each locus in the 12W and 13W datasets (total of 163 populations) 13W. 1 2

Region 01.SS-Africa 01.SS-Africa

3 4 5 6 7 8 9

01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa

159 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57

01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe

134 58 59 60 61 62 63 64 65 66 67 68 69 70 71

03.Europe 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia

160 72

05.SE-Asia 05.SE-Asia

Population Name Dogon Kenyan_Lowlander (Luo) Kenyan_Highlander (Nandi) Kenyan_142 Mandenka Rwandan Shona Ugandan Zambian North American (Afr_descent) Zulu Amhara Baganda Merina Mukongo Oromo Zairian Algerian_98 Chaouya Metalsa Moroccan_94 Moroccan_99 Algerian_100 Bedouin Egyptian Egyptian_Copts Egyptian_Delta Libyan_Jews Moroccan_Jews Mzab Bulgarian_Gipsy Croatian Czech Finn_89 Georgian Irish Slovenian Belgian Bulgarian Finn_143 French_North Greek_Attiki Hvar_Island_Croatian Italian Italian_Pavia Krk_Island_Croatian North_Italian Polish Pomaki Portuguese_Coimbra Portuguese_South Provincial_French Ashkenazi_Jews Sardinian Spanish_100 Spanish_133 Spanish_Basque Swiss North American (Eur_descent) Druze Israeli_Jew Kurdish New_Dehli Omani South_Indian Tamil Turk Hunza-Burushaski Lebanese North_Indian Punjabi Sindhi Sri_Lankan North American (Asi_descent) Ami

Labcode USAMFV USAMFV

IHWC 13th 13th

USAMFV CANLUO CHETIE USATNG USALOU USAMFV USAMFV

13th 13th 13th 13th 13th 13th 13th

USAMFV ZAFHAM ITAGDS USALOU FRADAN BELDPT ITAGDS FRAKPL CHESAN ITAADO ITAADO CHESAN ESPARN FRAMER EGYELC 12ferencik USAYUN EGYELC ISRBRA ISRBRA FRATHO BRGNAU HRVKAS CZEIVS FINLOK CZEIVS UKIMID SVNJER BELOSS BULNAU FINTII FRADAN GRESTV CROKAS 12ferrara ITAMRA CRORUD 12ferrara FRADDC GRESTV PORLTC PORCHS FRADDC ISRBRA ITACON SPAARN SPALAR SPABER SUIJEA

13th 13th

13th 13th 13th 13th 13th

13th 13th 13th 13th 13th 13th 13th

12W . A C B DRB1 138 129 138 138 265 265 265

12th

12th 12th 12th 12th 12th 12th 12th

13th 13th 13th 13th 13th 13th 13th 13th 13th

501

12th 12th 12th 12th 12th 12th 12th 12th

55 171 221 109 153 117 3

12th

21

USAMFV TWNLIN

13th 13th

158 119 29 61 165 20 26 42 19 101 156 164 195 65 229 116 46 104 82 120 196

89 214 145 38 213

240 143 94

226 163 45

226 161 44

255 199

252 98

251 201

155 226 220 1 154 234 235

12th

12th 12th 12th 12th 12th 12th 12th

240 143 54

225 163 43

DQB1

DPA1

119 84 280 229

113

129

123

280 229

229

88

89 98

87 98

DPB1

85 228

87

26 160 30

20 82 106 99

99

98 95

98 96

101 79

98 78

40 40

40 40

12 11 11 150 150 139 105 106 106 104 90 90 90 90 105 107 108 1000 1000 1000 1000 100 131

132

67 72

12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th

USAMFV ISRBRA ISRGAZ CZEIVS USAERL UKIMID USAERL ZAFHAM TURSAR PAKQAS USABIA 12ferencik INDRAN PAKQAS 12hashemi

122

241 113 93

DQA1

68 68

99 99 98 100

40 40

39 39

40 40

79 40 103 40 40 107

79

105 35

106 30

102

100

100

100

96 104 50 92 102 101 99 100 219 110 244 40

96

100 38 42 234

97 101

40

101

40

101

40

106 97

106 97

104 101 98 100

104 101 99 100

224 40 80 100 57

224 40 80 100 125 165

100 126 158

100 108 39

68 88

297 100 117 30 66 121 88 50

292 100 94 29 56 121 104 48

287 100 109 29 66 109 49

46

46

46

245

245

120

120

126 120

118 51

39

39

39 57

411 98

401 98

396 98

98

HLA 2004: Immunobiology of the Human MHC

7

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 3. Continued 13W. 73 74 75 76 77 78 79 80 81 82 84 85 86 87 89

Region 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia

90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113

05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 07.Australia

114 115 116 83 88 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 135 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151

07.Australia 07.Australia 07.Australia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 10.S-America 10.S-America 10.S-America 10.S-America 10.S-America 10.S-America

8

Population Name Atayal Bunun Chinese Hakka Han_Chinese_149 Han_Chinese_572 Kinh Malay Minnan Muong Paiwan_51 Pazeh Puyuma_49 Rukai Saisiat Singapore_ (Chinese_descent) Siraya Thai Thao Toroko Tsou Yami (Tao) Ami_14 Paiwan_64 Puyuma_15 South_Han Taiwanese Thai-Chinese East_Timorese Filipino Indonesian Ivatan Moluccan PNG_Highlander PNG_Lowlander_48 PNG_Lowlander_95 Samoa Trobriand Australian_Cape_York Australian_Groote_ Eylandt Australian_Kimberley Australian_Yuendumu Okinawan Ryukuan Buriat Korean Tuva Halkh Han Hoton Japanese Japanese_Kobe Kazakh Korean Manchu Mongolian Tuvinian Uygur Amerindian Canoncito Lacandon Maya Mixe Mixteco Pima_17 Pima_99 Seri Sioux Yupik Zapotec Zuni Bari Brazilian Guarani-Kaiowa Guarani-Nandeva Ticuna Central American

Labcode TWNLIN TWNLIN USATRA TWNLIN UKIMID UKIMID VNHNAN USAERL TWNLIN VNHNAN TWNLIN TWNLIN TWNLIN TWNLIN TWNLIN

IHWC 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th

USAERL TWNLIN USAERL TWNLIN TWNLIN TWNLIN TWNLIN JAPSEK JAPSEK JAPSEK 12johnlee 12johnlee THACHI USAERL USAERL USAERL TWNLIN USAERL USAERL USAERL USAERL USARWL GERNAG USAGAO

13th 13th 13th 13th 13th 13th 13th

USAGAO USAGAO USAGAO USAPAK JPNTKN JPNTKN KORPMH USAERL JAPTSU JAPINK JAPTSU 12juji 12araki JAPINK JAPJUJ JAPJUJ JAPJUJ RUSKON JAPINK USAMFV USAERL MEXGOR USAERL Hollenbach Hollenbach USAERL USAERL MEXGOR USAERL USALEF Hollenbach USAERL VENLAY UKIMID BRAPTZ BRAPTZ USAERL USAERL

12W . A C B DRB1 106 106 106 106 101 101 101 101 282 281 282 55 55 55 55 149 149 572 572 102 124 107 101 54 102 102 102 102 83 51 51 51 51 55 55 55 55 50 50 50 50 50 50 50 50 51 51 51 51 86 51 98 30 55 51 50 12th 12th 12th 12th 12th 12th

243 241 245 162 1012 85

13th 13th 13th 13th 13th 13th 13th 13th 13th

86 51 92 30 55 51 50

51 99 30 55 51 50

162 162 162 199 1011 1012 42 57 94 94 94 50 50 49 50 50 50 25 92 77

DQA1

DQB1

DPA1

100 55

53

83

51 30 55 51 50 15 65 16 162 74 86 94

86

86 94

86 94

86 94

46 91 48

46 78 48

46 88 48

83

46 92 48 93

50 40 90

79 50

50

50

13th

103

89

100

99

99

99

96

13th 13th 13th 13th 13th 13th 13th 13th

75 36 191 105 142 140 191 189

73 28 192 105

75 38 193 104

41 190

41

41

38

200 174

200 180

12th

12th

12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th

111

6

184 137 112 608 47 135 240 15 16 236 17

13th 13th 13th 13th

79

608 32

608 32

32

73

67

199

61

34

203

52 52

13th 13th 13th 13th 13th 12th 13th 13th 13th 13th 13th 13th 13th

HLA 2004: Immunobiology of the Human MHC

72

116

199 189 40 57 85

41 57 84

41 57 84

41 57 85

30 39

32 39

30 39

39

66

66

66

65

40 162 15 52 52

40 162 15 52 52

40 162 15 52 52 97

40

160 191 257

12th 12th

DPB1

248

235

52 51 86

52 52 89

52 52

33

33

33

252 76

149 71

252 74

92 97 144 53

86 95 144 53

82 144 53

33 96 252 76 50

33 96 58 76 50

25 96 252 72 50

144 53 49 55

144 53 49 55

49 55

15 95 83 50

49 55

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 3. Continued 13W.

Region

Population Name

152

10.S-America

Colombian

153 154 155

10.S-America 10.S-America 10.S-America

156 136

11.Other 11.Other

157 158

11.Other 11.Other

161

11.Other

Ecuadorian Kaingang Yukpa Brazilian_(Afr-Eur_ descent) Mexican Cuban_ (Afr-Eur_descent) Cuban_(Eur_descent) North American (His_descent)

162 163

11.Other 11.Other

12W . A

Labcode 12trachtenberg 12trachtenberg BRAPTZ VENLAY

IHWC

BRADON MEXGOR

13th 13th

99 62

106 29

UKIMID UKIMID

13th 13th

42 70

42 70

13th

247

246

USAMFV 12trachColombian (Afr_descent) tenberg Zoroastrian 12hashemi

C

B

DRB1

12th 12th 12th 12th

12th 12th

As shown in Table 5, Southeast Asia is the best-represented region (26 population samples) and is largely represented by Aboriginal populations from Taiwan. By contrast, very few populations have been typed in Northeast Asia (only 3 population samples). As a consequence of this geographic distribution, the Austronesian and Amerindian linguistic families are the best represented (20 and 14 population samples, respectively), followed by Indo-European languages (16 populations). The Indo-European family is the most widespread linguistic group in the 13W dataset, with speakers in the Europe, South-West Asia, North America and Other regions. Some linguistic groups are absent, as is the case for Afroasiatic and Khoisan in sub-Saharan Africa. In summary, the 13W dataset represents a considerable contribution to the pool of populations tested for HLA polymorphism at class I and class II loci using high-resolution methods. A large number of populations (95) have been tested, approximately 42% of which are from Southeast Asia, and North/Central America. Moreover, contextual and historical background information regarding each population sample accompanies each dataset in the form of a short report. This information is crucial for the proper interpretation of the resulting genetic analyses. At the same time, it must be noted that these 13W population samples have not all been typed for the same HLA loci, although similar numbers of samples have been tested for some class I and class II loci (as shown in Table 6). Even when the 68 12W datasets are included, analyses are limited to some 50–90 populations per locus (e.g., Chapter C.7). Future studies should encourage additional molecular-level typing of the same samples in these populations, with the goal of obtaining genotypes at all HLA loci in all sampled individuals. Finally, the population sample sizes are too low in many cases

9 185 69 40

DQA1

DQB1

DPA1

217

217

217

217

99 28 73

99 87 73

99 101 70

99

100 204

204

204

70 54

70 54

70

DPB1

240 70

(fewer than 40 individuals in some populations) to permit multi-locus analyses. As the number of distinct alleles increases with each report of the WHO Committee for Nomenclature for Factors of the HLA System, the problems presented by low sample sizes are compounded. As shown in Table 6, the average sample size at the class I and DRB1 loci is greater than 100 individuals, but this number is considerably lower than the number of currently distinguishable alleles. Because large numbers of HLA alleles are common in human populations, the frequencies of most alleles are low (∞5%), and there is a good chance that low frequency alleles will not be detected when sample sizes are small. In addition, statistical tests performed on poorly sampled populations have a low power and are not reliable. This is true at the single-locus level, and may be even more dramatic at the multi-locus level, where the estimation of haplotype frequencies and tests of linkage disequilibrium depend on accurate sampling of allelic diversity. Finally, the method for reducing genotype ambiguity (described in section 2.I, below) is dependent on the observation of alleles in a number of different genotypes, the likelihood of which is proportional to the size of the population sample. In general, an effort must be made to sample at least 100 individuals per population (e.g., see Sanchez-Mazas 2002 (5)).

Section 2. Pre-analytical dataset processing

Subsequent to submission, each 13W dataset was prepared and formatted for analysis in a multi-step process. First, all ambigu-

Table 4. Linguistic representation in Kenya and Uganda1 Nation Niger-Congo Kenya 17.9 million Uganda 11.4 million 1 These values are taken from reference

Nilo-Saharan 7.5 million 5.3 million (3).

HLA 2004: Immunobiology of the Human MHC

Afro-Asiatic 715,000 0

9

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 5. Geographic and linguistic distribution of 13W datasets Linguistic groups Geographic Regions AA NC NS IE UY SC ED ST AU TK AN IP AB AL AM EA ND Total 01.SS-Africa 8 2 1 11 02.N-Africa 5 5 03.Europe 5 1 1 7 04.SW-Asia 3 2 2 1 8 05.SE-Asia 1 6 2 1 14 2 26 06.Oceania 6 3 9 07.Australia 4 4 08.NE-Asia 3 3 09.N-America 4 8 1 1 14 10.S-America 1 5 6 11.Other 2 2 Total 8 8 2 16 1 1 2 6 2 1 20 3 4 6 4 1 1 95 AA: Afro-Asiatic, NC: Niger-Congo, NS: Nilo-Saharan, IE: Indo-European, UY: Uralic-Yukaghir, SC: South-Caucasian, ED: Elamo-Dravidian, ST: Sino-Tibetan, AU: Austroasiatic, TK: Tai-Kadai, AN: Austronesian, IP: Indo-Pacific, AB: Australian, AL: Altaic, AM: Amerindian, EA: Eskimo-Aleut, ND: Na-Dene.

Table 6. 13W Population datasets genotyped at each locus

Number of populations tested Average sample size Standard deviation

HLA locus A 78 130.6 136.3

C 59 129.3 141.8

B 73 131.7 139.5

DRB1 59 114.8 132.1

ities associated with a given genotype were reduced to pairs of single alleles. Second, genotype data for each population sample was merged with basic demographic and typing information in a standardized format. Third, the existence of each allele at each locus was verified by comparison to the March 2002 allele list approved by the WHO Committee for Nomenclature for Factors of the HLA System, and alleles were truncated to peptide level (4-character) designations. Fourth, ‘‘binning’’ rules, reassigning alleles that were identified using high resolution typing methods in a subset of populations as variants detectable with lower-resolution methods, were applied to reduce these alleles to common denominator categories. Each of these

DQA1 21 88.4 55.1

DQB1 33 103.7 68.1

DPA1 8 76.1 34.3

DPB1 21 72.7 30.4

processes is described in detail in the following sections (I. Ambiguity Reduction, II. Datafile Formatting, III. Data Filtering, and IV. Binning). The overall extent of changes made to datasets as a result of these processes is summarized in section 3, below. I. Ambiguity Reduction

Much of the 13W genotype data was not resolved to the allelic level (i.e., only two alleles per genotype) when submitted for analysis and contained ambiguous alleles and/or ambiguous genotypes (described below). The extent of allelic and genotypic ambiguity in the overall dataset is detailed in Table 7.

Table 7. Extent of ambiguous alleles and ambiguous genotypes in the 13W dataset Number of datasets with any ambiguity 35 30 36 4 4 5 1 0

Percent of datasets with any ambiguity 0.76087 0.76923 0.78261 0.13793 0.26667 0.2381 0.5 0

Average percent of ambiguity among datasets with any ambiguity 0.6194 0.49542 0.30536 0.06012 0.31798 0.10201 0.33721 –

30 30 30 1 1 1 0 0

0.65217 0.76923 0.65217 0.03448 0.06667 0.04762 0 0

0.40851 0.6293 0.30565 0.13061 0.13061 0.13061 – –

Type of Ambiguity Allelic

HLA Locus A C B DRB1 DQA1 DQB1 DPA1 DPB1

Number of population datasets 46 39 46 29 15 21 2 12

Genotypic

A C B DRB1 DQA1 DQB1 DPA1 DPB1

46 39 46 29 15 21 2 12

10

HLA 2004: Immunobiology of the Human MHC

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Because most of the class I data was submitted for analysis in the form of probe reactivities, and because many of the class II datasets were submitted as Available Data, with many ambiguities reduced to individual allele calls prior to submission, both allele and genotype ambiguities are more extensive at the class I loci, with ambiguities in 60–80% of population datasets, than at the class II loci, with ambiguities in 5–20% of population datasets. It was necessary to resolve these ambiguities to the allelic level before analysis could begin. Ambiguous alleles are those that cannot be distinguished because the typing method cannot assess all pertinent polymorphisms. For example, ‘A*020101, 0209, 0230, 0231’ is an ambiguous allele set used to represent an ambiguous allele, the actual identity of which could be any one of the four constituent alleles. An unambiguously assigned allele is characterized by a single, DNA-level designation (e.g., A*020101). Because ambiguous alleles result from the limitation of a given typing system, population samples typed using multiple systems may have similar but different ambiguous allele sets (e.g., one system results in an ‘A*020101, 0209, 0230, 0231’ allele set, while a second system results in an ‘A*020101, 020102, 020104, 0230’ allele set). Ambiguous genotypes cannot be distinguished due to an inability to establish the phase of the assessed polymorphisms in a given probe reactivity pattern. For example, ‘A*0101/ *02011 or *0101/*0236 or *0106/*02011 or *0106/ *0236’ is used to denote an ambiguous genotype set, the actual identity of which could be any one of the four constituent genotypes. Two sets of ambiguously assigned alleles are shown in this case; the alleles in the A*01 serogroup constitute one ambiguously assigned allele set in this case, and the alleles in the A*02 serogroup constitute a second such set. An unambiguously assigned genotype is characterized by a single possible genotype for a given sample (e.g., A*0101/ *0236). Because ambiguous genotypes are the result of particular combinations of allele-specific probe reactivity patterns, some alleles will only appear in a population as part of an ambiguous genotype set, while other alleles will appear in both ambiguous and unambiguous genotypes. In some cases, one allele may be unambiguously assigned in an ambiguous genotype. For example, in the ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype, the A*0101 allele has been unambiguously assigned. The identity of the alleles in the A*02 allele set is obscured by the inability to set phase. Both types of ambiguity can be observed for a given sample. For example, ‘A* 0101/*02011, 0209, 0230, 0231 or *0101/*0236 or *0106/*02011, 0209, 0230, 0231 or *0106/*0236’ represents an ambiguous genotype with four

possible constituent genotypes, two of which contain an ambiguous allele with four possible constituent alleles. The method used to reduce ambiguities to the allele level attempts to resolve ambiguous genotypes separately from ambiguous alleles (stages 1 and 2 below), and assumes that the sampled individuals are part of a single population with relatively little admixture, and that they were typed with a single typing system. In general, it is assumed that these populations will have low numbers of alleles within a particular serogroup, and that alleles with the same pattern of polymorphic sequence motifs share the same DNA sequence when observed in the same population. Stage 1. Reduction of Ambiguous Genotypes

This method proceeds in four steps as outlined here: Step 1. Eliminate genotypes with alleles never seen in unambiguous genotypes. Step 2. Reduce ambiguously assigned allele sets to common denominators. Step 3. Rely on Hardy-Weinberg proportions to establish homozygotes. Step 4. Consider all remaining ambiguous allele sets as ambiguous alleles. Detailed description: Step 1. Compile a list of all alleles (both ambiguous and unambiguous alleles) observed in unambiguous genotypes as well as the unambiguously assigned alleles in ambiguous genotypes. These are the ‘‘observed alleles’’. In each ambiguous genotype set, eliminate those genotypes lacking observed alleles, reducing each set to those genotypes with observed alleles. If there are no genotypes with two observed alleles in a given set, keep all genotypes with one observed allele, and eliminate those with no observed alleles. If all of the genotypes in a set lack observed alleles, do nothing to that set. For example, a hypothetical population with only two samples presents an unambiguous‘A*02011/*3303’ genotype and an ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype. In this case, the A*0101, *02011, and *3303 alleles have been unambiguously assigned, and the A*0236 allele is eliminated as it is never seen in an unambiguously assigned genotype. This step should be repeated until there is no change to the ambiguous genotype sets, but usually only requires one iteration. This step assumes that a population will have a small number of alleles in a given serogroup. When multiple alleles from a given serogroup do exist in a population, it assumes HLA 2004: Immunobiology of the Human MHC

11

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

that distinct patterns of ambiguity will result when these alleles are in a genotype with a given allele. This assumption may not be true for admixed populations. Step 2. Comparing the ambiguous allele sets with at least one allele in common, eliminate those genotypes (involved in this comparison) containing alleles not found in all such ambiguous allele sets. For example, in three ambiguous genotypes, ‘A*0101/ *02011 or *0101/*0209’, ‘A*2402/*02011 or *2402/ *0236’, and ‘A*3303/*02011 or *3303/*0231’, the A*02011 allele is found in all A*02 sets. The genotypes containing the other A*02 alleles are eliminated. Note that an ambiguous ‘A*0101/*02012 or *0101/*0235’ genotype would not influence this decision, as none of the A*02 alleles in this ambiguous allele set overlap with the other A*02 set. As in Step 1, this step assumes that the number of alleles in a given serogroup in a population will be low, and that distinct alleles will present distinct patterns of ambiguity when present in a genotype with a given allele. Step 3. Where a genotype may be either heterozygous or homozygous, assign homozygote and heterozygote status based on Hardy-Weinberg expectations. For example, the genotype ‘A*2402/*2402 or *2402/ *2403’ could be either a homozygous ‘*2402/*2402’ genotype or a heterozygous ‘*2402/*2403’ genotype. Consider that two such ambiguous genotypes are observed in a population of 100 individuals, with the *2402 allele observed in 25 other genotypes, and the *2403 allele observed in two other genotypes. In this case the number of homozygous ‘*2402/ *2402’ genotypes expected under Hardy-Weinberg equilibrium is calculated with the assumption that both ambiguous genotypes are homozygous (so that all four alleles in these genotypes are *2402 alleles), and the number of heterozygous ‘*2402/*2403’ genotypes expected under Hardy-Weinberg equilibrium is calculated with the assumption that both genotypes are heterozygous (so that two alleles are *2402 alleles and two are *2403 alleles). Under these circumstances, either 2.1 homozygous or 0.54 heterozygous genotypes are expected, and the two ambiguous genotypes are re-assigned as homozygous for the *2402 allele. Alternatively, if the number of *2402 alleles observed in other genotypes were lower than the number of *2403 alleles observed in other genotypes (e.g., 13 versus 25), primarily heterozygous genotypes might be expected (e.g., 0.72 expected homozygotes versus 2.0 expected heterozygotes) and the two genotypes would be re-assigned as heterozygotes.

12

HLA 2004: Immunobiology of the Human MHC

This step assumes that the genotype proportions of the population in question are in Hardy-Weinberg equilibrium. Step 4. For a given ambiguous genotype set, lump all alleles of a given serogroup to form an ambiguous allele. For example, the ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype would be changed to an unambiguous ‘A*0101/ *02011, 0236’ allele. It should be noted that this process has the potential to bias the dataset in favor of high-frequency alleles, resulting in an underestimation of the allele diversity and the exclusion of low-frequency alleles at that locus. Because this method assumes a correspondence between serological specificity and the first two digits of the allele name, it cannot easily be used as described for the DPB1 locus. Overall, this method has been most effective on populations assumed to be relatively free of admixture (i.e., population complexity values of 1 and 2, as described in section II, below). SBT has confirmed that the ambiguity reduction method was correct of select samples, as summarized in Table 8. Overall, this method assumes that genotype diversity in a population will result in sufficient unambiguous assignment of alleles to permit the reduction of ambiguity in the other allele assignments. Ultimately, the adequacy of this assumption rests on the size of the population sample that is genotyped. Irrespective of the genotyping method used, the larger the population sample genotyped, the greater the chance that genotype combinations will result in unambiguous allele assignment, so that the success of SSOP genotyping of populations using this ambiguity reduction approach becomes a function of sample size. Stage 2. Reduction of Ambiguous Alleles

The IHWG Biostatistics core has compiled a database of HLA allele frequency distributions published for populations from around the world (4). The populations in this database have been divided into seven regions (Africa, Europe, Middle East, Asia, Siberia, South Pacific Islands, and the Americas) for the purpose of investigating geographic structure. This database was used for the reduction of ambiguous alleles in a given population, by identifying those constituent alleles in the corresponding global region. This stage of the method assumes that the correlation between geography and genetic distance observed for many populations at other loci extends to MHC loci, and proceeds in three steps. Step 1. Eliminate those constituent alleles that are not observed in the corresponding region.

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 8. SBT confirmation of the reduction of ambiguous genotypes Population Paiwan_51

Sample ID PW34

Submitted Ambiguous Genotypes Allele 1 Allele 2 3501, 3507, 3511, 3523 40011, 40012 3520 4007

Reduced Genotypes Allele 1 Allele 2 3501 4001

SBT Genotypes Allele 1 3501

Allele 2 40011, 40012

Thao

TH05 1521

1525 39021, 39022

39011, 39013, 3905

1525

3901

1525

39011, 39013

Siraya

SL23

3501, 3507, 3511, 3523 3502, 3504, 35091, 35092 3515

4002 4003 4005

3501

4002

3501

4002

Hakka

HA07

15011, 1526N, 1533 1512, 1519 1532, 1535 1521 1521

4601 4601 4601 39011, 39013, 3905 3910

1512

4601

1512, 1519

4601

1521

3901

1521

39011, 39013

HE08

Step 2. Of the remaining constituent alleles, keep the allele that has the highest frequency in that region, and eliminate the rest. When multiple candidate alleles have similar frequencies, keep the allele at the highest frequency in a population geographically closest to the population being analyzed. Step 3. If none of the constituent alleles are identified in the database, reduce the ambiguous allele to the lowest numbered constituent allele. It should be noted that the utility of this database is proportional to the number of populations typed at each locus, and that older published datasets will likely contain no data on recently identified alleles. As a result, novel and rare alleles are less likely to be detected, and a bias in favor of more widespread alleles will be introduced. However, this bias will be consistent between populations. In addition, genetic distances between populations within regions will be under-estimated. II. Datafile Format

Each population dataset consisted of a ‘header block’ that described the six-character IHWG labcode for the submitting laboratory, the typing method used, the ethnicity of the sampled population, the population’s region of origin, the site at which the population sample were collected, the latitude and longitude of the collection site, and the complexity of the population (described below), as well as a ‘data block’ that included the unique population name, sample ID, and genotype data for each sampled individual. These genotype data were organized by locus and presented in map order (HLA-A, C, B, DRA, DRB1, DQA1, DQB1, DPA1, and DPB1). The 95 13W population datasets are included in Appendix C, Table 1.

Header block fields (labcode, method, ethnic, contin, collect, latit, longit, and complex)

Typing methods (method field): The typing protocols used fell into five categories; (1) PCR-Single Stranded Oligo Probe (SSO, SSOP) methods (11W, 12W, IHWG and local SSOP systems), (2) Reverse hybridization format PCR-SSOP methods (IHWG Reverse Line Strip (RLS) and Innolipa PCR-SSO systems), (3) Sequence Specific Primer (SSP) methods (12W ARMS, Genovision SSP, and Dynal SSP systems), (4) Sequence Based Typing (SBT) methods (IHWG SBT and local SBT systems), and (5) PCR-single strand conformation polymorphism (SSCP) methods. Ethnicity (ethnic field): A table of 268 linguistically and culturally defined ethnic codes and 10 admixture codes was provided for data submitting labs (see appendix C, Table 3). In instances where an ethnic or admixture code was not available on this table, a new code was assigned for the new ethnic identifier. In as many cases as is possible, the ethnic identity of each population sample is distinct from the unique population name and regional identification (see below). Regional categories (contin field): Each population sample was assigned to one of eleven regional categories (Sub-Saharan Africa, North Africa, Europe, South-West Asia, Oceania, Australia, North-East Asia, North America, South America and Other), based on the geographic region of origin and the estimated degree of admixture of the sampled population. For non-indigenous populations, regional assignments were made based on the historical locale of ancestors of those populations 1000 years ago. Admixed populations were assigned to the Other category when members of these populations were estimated to be descended from parent populations from different regional categories. Using these criteria, HLA 2004: Immunobiology of the Human MHC

13

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

populations of predominantly Sub-Saharan African descent living outside of Africa were assigned to the Sub-Saharan Africa region, and populations of predominantly European descent living outside of Europe were assigned to the Europe region, while populations of both Sub-Saharan African and European descent were assigned to the Other category. A map defining the boundaries of these regions is shown in Figure 1. Latitude and Longitude (latit and longit fields): Latitude and longitudes were recorded in a decimal format, with minutes and seconds indicated as fractions of each degree value. North latitudes and east longitudes were recorded as positive values, while south latitudes and west longitudes were recorded as negative values. For example 35 degrees 20 minutes south latitude would be recorded as latitΩª35.33, and 2 degrees 30 minutes east longitude would be recorded as longitΩ2.5. Complexity (complex field): Each population sample was assigned to one of three complexity categories (ranging in value from 1 to 3), in an attempt to estimate the degree of potential sub-structure and admixture in each population sample. A population sample collected from a single settlement or group of closely related settlements was assigned a complexity of 1. A population sample collected from a group of disparate but discrete settlements, or across a large region of territory was assigned a complexity of 2. A population sample collected in a metropolitan area, across an entire nation, or from an extremely admixed population was assigned a complexity of 3. Such assignments were made conservatively, with the higher value assigned in equivocal cases. Given these designations, the ambiguity reduction process (section I, above) will function best on populations with low complexity values.

tions at the submitter’s institution, including any required regarding informed consent for prospective research use. III. Data Filtering

In the next pre-analytical step of data processing, each allele name was inspected to ensure that it conformed to standard nomenclature formats. This data ‘‘filtration’’ step took the form of both data truncation and the reclassification of serologically designated alleles. Each of these processes is described below. A. Data truncation

Because of the variety of genotyping methods used to generate the 13W datasets, alleles were reported at varying levels of specificity (e.g., A*24020101, *240201, *2402). Because these differences reflect synonymous nucleotide changes in most instances, allele names were truncated to a common, peptide level (4-character) allele name (e.g., *24020101 was changed to *2402), and the existence of these truncated alleles was verified using the allele set in the IMGT/HLA database (approved by the WHO committee for nomenclature for factors of the HLA system as of March 2002) as follows: i. If a common 4-character substring was found between the truncated allele and an allele (or alleles) in the IMGT/HLA database (e.g., if *2402 was the reduced allele in question, this would match *24020101, *24020102L, *240202, *240203, and *240204, and ‘2402’ for the common substring), then this truncated allele was used in the data analysis. ii. If no substring match was found between the truncated allele and alleles in the IMGT/HLA database, analysis was halted for data review.

Data block fields (populat, id, and locus names):

Population name (populat field): When possible, the population name supplied by the submitting laboratory was used. When multiple populations samples were submitted with identical names, a unique population name was created by appending the sample size (n) to the end of the population name. For example, two population samples named ‘population’ with samples sizes of 140 and 200 would be noted as ‘population_140’ and ‘population_200’. Sample Identifier (id field): Each sampled individual in a given population sample was assigned a unique identifier. These identifiers have been coded to protect the confidentiality of the individual, in accordance with the IHWG protocol for the use of human subjects in research. All samples have been obtained in accordance with applicable laws and regula-

14

HLA 2004: Immunobiology of the Human MHC

B. Serological reclassification

IHWG datasets for which more than 10% of the alleles typed at serological to intermediate levels of resolution were excluded from the 13W dataset. In cases where serological-level allele designations were provided for less than 10% of the alleles at a given locus in a dataset (e.g, DQA1*03), those serological designations were coded as a 4-character allele name in the format XX00, where XX represents the serological designation for that allele (e.g., *03 is coded as *0300). These coded alleles were then assigned the name of an allele in the IMGT/HLA database using the following rules: i. If other alleles with the same serological designation were observed in the population, the name of the coded allele was

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 9. Binning reassignment of high-resolution HLA alleles Locus A

High resolution allele 2409N

Reassigned allele name 2402

C

0706

0701

B

0706

0705

DRB1

1443 1506

1405 1501

DQA1

0104 0302 0303 0502

0101 0301 0301 0501

DQB1

0202 0309 0609 0611

0201 0301 0605 0602

DPB1

2301 3901 4801 4901 5101 6201 7601

0401 0401 0201 0402 0402 4001 1401

Table 10. Fraction of HLA allele assignments that remained unchanged during pre-analytical data processing of 47 population datasets Number of Total number of Percent of allele unchanged allele allele assignments assignments Locus assignments (2n) unchanged A 4608 9344 49.3 B 5802 9328 62.2 C 3981 8334 47.8 DRB1 6353 8728 72.8 DQA1 1456 1825 79.8 DQB1 2980 3755 79.4 DPA1 287 410 70.0 DPB1 856 876 97.7 All Loci 26323 42600 61.8 This table reflects modifications made to individual population datasets including the American_Samoa, Ami_97, Arab_Druze, Atayal, Bari, Brazilian(Af_Eu), Bulgarian, Bunun, Canoncito, Central American, Chaouya, Chinese, Croatian, Filipino, Finn_90, Georgian, Guarani-Kaiowa, GuaraniNandeva, Hakka, Irish, Israeli_Jews, Ivatan, Kinh, Korean_200, Maya, Metalsa, Minnan, Moroccan_99, Muong, Paiwan_51, Pazeh, Pima_17, Puyuma_49, Rukai, Rwandan, Saisiat, Siraya, Tamil, Thao, Ticuna, Toroko, Tsou, Turk, Yami, Yupik, Zulu, and Zuni.

changed to the allele that had the highest frequency in that population at that locus (e.g. if the coded allele was 0300, and *0301 was observed in that population with a frequency of 0.2 while *0302 was observed with a frequency of 0.05, the name of the coded allele was changed to *0301). ii. If no alleles with the same serological designation were observed at that locus in that population, then the coded allele was re-named to correspond to the lowest-numbered allele in the IMGT/HLA database with the same serological designation as the coded allele (e.g., if no other 03 alleles were observed then all *0300 alleles were renamed as *0301). iii. If neither of the previous steps resulted in a name change to a coded allele, analysis was halted for data review.

Overall, this data ‘‘filtering’’ step results in a reduction of the number of alleles (k) when alleles that are identical at the peptide level, but which differ at the nucleotide level, are reported in the same population. In addition, it is possible that k was reduced in datasets generated with multiple typing systems. In these instances, alleles typed at different levels of resolution (e.g., A*2402 versus A*240202) that might represent distinct nucleotide-level variants were treated as identical. It should be noted that all analytical results and inferences are valid only for peptide-level allele variation. IV. Binning

In the final step of pre-analytical data processing, alleles that were only detectable in a subset of samples (due to the use of higher resolution genotyping methods for those samples) were reassigned to a level of resolution equivalent to that which could be detected using lower resolution genotyping methods. For example, HLA-B*0706 alleles were reassigned as B*0705 alleles. This process of reassignment is described as ‘‘binning’’. These binning reassignments were made in order to facilitate useful comparisons across datasets that were genotyped using different methods, and are not reflected in the datasets available in Appendix C. Table 9 identifies the alleles that were binned (High resolution allele), and the allelic category to which they were reassigned (Reassigned allele name).

Section 3. Overall modifications made to datasets

The extent of the modifications made to allele assignments before analysis is described in Table 10, which summarizes the fraction of allele assignments that were unchanged through of the various steps of data processing in 47 datasets for which raw data (i.e., including allele and genotype ambiguity) were available. The data in the remaining 48 datasets was submitted as Available Data and required significantly less modification. Overall, 60% of the submitted allele assignments in these 47 datasets were analyzed as they were submitted. For the purpose of this summary, reassignment includes the reduction of ambiguous allele sets to individual alleles; the truncation of nucleotide-level allele names to peptidelevel allele names; the reassignment of serological allele designations to peptide-level allele names; the reclassification of improperly formatted allele names; and the binning of alleles genotyped at varying levels of resolution. Unambiguously assigned alleles that remained unchanged, but that were submitted in ambiguous genotype sets were counted as fractions of alleles in proportion to the number of genotypes in that HLA 2004: Immunobiology of the Human MHC

15

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

set. As expected given the greater extent of allele and genotype ambiguity observed in class I datasets, fewer (50-60%) class I allele assignments remained unchanged in comparison

to class II allele assignments (50-60% versus 80% respectively).

References 1. Bodmer J, Cambon-Thomsen A, Hors J, Piazza A, Sanchez-Mazas A. Report of the Anthropology Component. In, D Charron (ed.) HLA: Proceedings of the Twelfth International Histocompatibility Workshop and Conference: Genetic Diversity of HLA: Functional and medical Implication, Volume I, EDK, 1997, 269– 74.

16

2. Ruhlen M. A Guide to the World’s Languages, volume 1. Stanford University Press, Stanford, California, 1987. 3. Grimes BF, Grimes JE (eds.) Ethnologue: Languages of the World, 14th Edition. SIL Publications, 2002. (http://www.ethnologue.com).

HLA 2004: Immunobiology of the Human MHC

4. Literature database for the 13th IHWC. http://allele5.biol.berkeley.edu/13ihwg/ lit_data.html 5. Sanchez-Mazas A. HLA data analysis in anthropology: basic theory and practice. Teaching session 5: Biostatistics, 16th European Histocompatibility Conference, Strasbourg, 19–22 March 2002, p. 68–83 (available at http://anthro.unige.ch/∂sanchez/ pdf_files/).