87604$$332 P332
GL 87604
30-05-06 12:39:39 PDF
S. J. Mack A. Sanchez-Mazas D. Meyer R. M. Single Y. Tsai H. A. Erlich
Authors’ addresses
Steven J. Mack1,2, Alicia Sanchez-Mazas3, Diogo Meyer4, Richard M. Single5, Yingssu Tsai3, Henry A. Erlich1,2 1 Children’s Hospital Oakland Research Institute, Oakland CA, 2Department of Human Genetics, Roche Molecular Systems, Alameda, CA, 3Laboratory of Anthropology, Genetics and Peopling history, Department of Anthropology and Ecology, University of Geneva, Switzerland, 4Department of Integrative Biology, University of California, Berkeley CA, and 5Department of Medical Biostatistics, University of Vermont, Burlington, VT
Acknowledgements
This work was supported by NIH shared resource grant U24 A149213 and by FNS (Switzerland) grant .3100-49771.96. We wish to thank the following IHWG AHGDC participants for their contribution of data to the IHWG: D. Adorno, S. Agrawal, T. Akesaka, A. Arnaiz-Villena, N. Bendukidze, C. Brautbar, T. L. Bugawan, J. Cervantes, M. Crawford, E. Donadi, S. Easteal, H. A. Erlich, M. FernandezVina, X. Gao, E. Gazit, C. Gorodezky, M. Hammond, E. Ivaskova, A. Kastelan, V. I. Konenkov, M. H. S. Kraemer, D. Kumashiro, R. Lang, Z. Layrise, M. S. Leffel, M. Lin, M. L. Lokki, L. Louie, M. Luo, S. J. Mack, M. Martinetti, J. McCluskey, N. Mehra, D. Middleton, E. Naumova, Y. Paik, M. H. Park, M. L. PetzlErler, A. Sanchez-Mazas, G. Saruhan, M. L. Sartakova, M. Schroeder, U. Shankarkumar, S. Sonoda, J. Tang, E. Thorsby, J.M. Tiercy, K. Tokunaga, E. Trachtenberg, V. Trieu An, B. Vidan-Jeras, and Y. Zaretskaya. In addition, we thank D. Gjertson, J. Hollenbach, A. SanchezMazas, S. Tonks, and E. Trachtenberg for providing access to 12th Workshop datasets.
HLA 2004: Immunobiology of the Human MHC. Proceedings of the 13th International Histocompatibility Workshop and Congress
C2 JR332
13th International Histocompatibility Workshop Anthropology/Human Genetic Diversity Joint Report Chapter 2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Section 1. Description of datasets
From 1998 through 2002, 110 population datasets representing 13,481 sampled individuals were submitted to the International Histocompatibility Working Group (IHWG) Anthropology/Human Genetic Diversity Component (AHGDC) for analysis as part of the 13th International Histocompatibility Workshop (13W) by 39 participating laboratories. For the most part, these datasets represent high to allele-level genotyping data at subsets of the HLA-A, C, B, DRB1, DQA1, DQB1, DPA1 and DPB1 loci. Analyses for 95 of these population samples (referred to afterward as 13W datasets, described in Table 1 and provided in Appendix C), representing 12,225 individuals, are presented in subsequent chapters (4–7). The remaining 15 population samples were typed at serological to mediumlevel resolution, or were missing information, and were excluded from analysis. Table 2 describes a set of 48 supplementary datasets (originally genotyped as part of the 12th International Workshop (12W) Anthropology Component), representing 5774 sampled individuals that were included in these analyses (although the results of analyses including these datasets are not always reported here), as well as an additional 20 12W datasets that were included in the linguistics-related analyses presented in chapter 7. These 12W datasets were chosen to supplement global regions (see below and Figure 1) for which low numbers of 13W datasets were analyzed. The ‘‘map number’’ assigned to each population in the 12W Anthropology Report (1) is provided on Table 2 for clarification (12W.). The number of individuals typed per locus for each of the 48 12W and 95 13W datasets is described in Table 3. The latter ranged in size (n) from 12 to 1000 sampled individuals, with a median value of 98, while n for the 48 12W datasets ranged from 15 to 1012, with a median value of 82.
1
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 1. Description of 13W population datasets (nΩ95 populations)
Region
Population dataset name
Labcode
01.SS-Africa
Kenyan_142
CANLUO
12th IHWS1 SubNo. mission samples Country/Region Kenya/different areas
01.SS-Africa
Mandenka
CHETIE
YES
01.SS-Africa 01.SS-Africa
01.SS-Africa
Shona Dogon Kenyan-Lowlander (Luo) Kenyan-Highlander (Nandi) Ugandan Zambian North American (Afr_descent)
USAMFV
01.SS-Africa
Rwandan
USATNG
01.SS-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa
Zulu Algerian_98 Moroccan_94 Moroccan_99 Chaouya Metalsa
ZAFHAM CHESAN CHESAN ESPARN ITAADO ITAADO
03.Europe
Bulgarian_Gipsy
BGRNAU
03.Europe 03.Europe 03.Europe
Czech Georgian Finn_89
CZEIVS CZEIVS FINLOK
03.Europe
Croatian
HRVKAS
03.Europe
Slovenian
SVNJER
03.Europe
UKIMID
03.Europe
Irish North American (Eur_descent) Cuban_(Eur_ descent)
04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia
01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa
ª1.3
36.8 ª12
1
25.9 ª3.6
Niger-Congo ± Central ± Shona 2 Niger-Congo ± Central ± Dogon 1
ª18 14.4
USAMFV
Kenya/Kanyawegi
ª0.11
34.63 Nilo-Saharan ± Nilotic
2
USAMFV USAMFV USAMFV
Kenya/Kipsamoite Uganda/Kampala Zambia/Lusaka USA (American Red Cross)
35.01 Nilo-Saharan ± Nilotic 32.4 Mainly Niger-Congo 28.2 Niger-Congo ± Central ± South Indo-European ± Germanic ± English Niger-Congo ± Central ± South ª1.6 30.5 ± Kinyarwanda (ΩRwanda) Niger-Congo ± Central ± South ª30 31 ± Zulu 35.5 ª0.4 Afro-Asiatic ± Berber 30.3 9.35 Afro-Asiatic ± Berber 33.2 ª8.5 Afro-Asiatic ± Semitic ± Arabic 33.04 ª7.37 Afro-Asiatic ± Semitic ± Arabic 35.3 ª4 Afro-Asiatic ± Berber ± Tarifit Indo-European ± Slavic ± 42.7 23.3 Bulgarian
2 3 3
79 235 501
Bulgaria/Sofia Czech Republic/ Praha Georgia/Tbilisi Finland
0.33 0.2 ª15
14.3 44.9 25.1
Croatia/Zagreb
45.2
15.5
46
14
USAMFV
Slovenia/Ljubljana Northern Ireland/all regions USA/American Red Cross
UKIMID
Cuba/Havana
21.5
Kurdish Druze Israeli_Jew Turk
CZEIVS ISRBRA ISRGAZ TURSAR
41.7 32 32 41
44.9 34.5 34.5 28.6
04.SW-Asia
Omani
UKIMID
Georgia/Tbilisi Israel Israel Turkey/Marmara Oman/various regions
21
57
04.SW-Asia
New_Dehli
USAERL
28.6
77.2
04.SW-Asia
South_Indian
USAERL
17.5
78.5
04.SW-Asia
Tamil
ZAFHAM
05.SE-Asia
Ami
TWNLIN
05.SE-Asia
Atayal
TWNLIN
05.SE-Asia
Bunun
TWNLIN
05.SE-Asia 05.SE-Asia
Hakka Minnan
TWNLIN TWNLIN
India/New Delhi India/Andhra Pradesh, Golla South Africa/ Durban Taiwan/Hualien, Taitung Taiwan/Wulai, Chenshih, Wufen Taiwan/Hsin-I, Taitung Taiwan/Hsinchu, Pintung Taiwan/Taipei
05.SE-Asia
Paiwan_51
TWNLIN
05.SE-Asia
Pazeh
TWNLIN
05.SE-Asia 05.SE-Asia
Puyuma_49 Rukai
TWNLIN TWNLIN
05.SE-Asia
Saisiat
TWNLIN
05.SE-Asia
Siraya
TWNLIN
2
Complexity
Mainly Niger-Congo Niger-Congo ± Mande ± Mandenka
50 41.7 60.2
03.Europe
Linguistic family/Language
USALOU USAMFV
Rwanda/Kigali South Africa/ Durban Algeria/Oran Morocco, Souss Morocco/El Jadida Morocco/Settat Morocco/Nador
12.6
Long3
Senegal/Ke´dougou Zimbabwe/ Mashonaland Mali/Bandiagara
YES YES YES
122
Lat2
YES
YES
21
89
54.7
ª6.7
ª80
Indo-European ± Slavic ± Czech South Caucasian ± Georgian Uralic-Yukaghir ± Uralic ± Finnish Indo-European ± Slavic ± SerboCroatian Indo-European ± Balto-Slavic ± Slovene Indo-European ± Germanic ± English Indo-European ± Germanic ± English Indo-European ± Italic ± Spanish Indo-European ± Indo-Iranian ± Kurdish Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Semitic ± Hebrew Altaic ± Turkic ± Turkish
2
3 3 2 2 2 2 2 2 2 2 2 3 3 2 3 3 3 2 3 3 2
31
Afro-Asiatic ± Semitic ± Arabic Indo-European ± Indo-Iranian ± Indic Elamo-Dravidian ± Dravidian ± South Central ± Telugu Elamo-Dravidian ± Dravidian ± South ± Tamil
25.1
122
Austronesian ± Paiwanic ± Ami
1
24.9
122
Austronesian ± Atayal
1
23.6
121
1
24.8 25.1
121 122
Taiwan/Lai-I Taiwan/Puli, Liyutan, Fengyuan
22.5
121
24
121
Taiwan/Peinan Taiwan/Wutai Taiwan/Wufen, Nanchuang Taiwan/Tanei, Tsochen
22.8 22.8
121 121
24.6
121
23.1
120
Austronesian ± Paiwanic ± Bunun Sino-Tibetan ± Chinese southwestπcentral ± Cantonese Sino-Tibetan ± Chinese southeast Austronesian ± Paiwanic ± Paiwan Austronesian ± Paiwanic Sinicized ± Pazeh Austronesian ± Paiwanic ± Puyuma Austronesian ± Tsouic ± Rukai Austronesian ± Paiwanic ± Saisiyat Austronesian ± Paiwanic Sinicized ± Siraya
HLA 2004: Immunobiology of the Human MHC
ª30
3 3 2 3
3 3 1 3 1 1 1 3
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 1. Continued
Region
Population dataset name
Labcode
05.SE-Asia
Tao (Yami)
TWNLIN
05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia
Thao Toroko Tsou Han_Chinese_149 Han_Chinese_572
05.SE-Asia
Malay Singapore_(Chin_ descent)
05.SE-Asia 05.SE-Asia
12th IHWS1 SubNo. mission samples Country/Region
Lat2
Long3
Taiwan/Lan-Yu
22
122
TWNLIN TWNLIN TWNLIN UKIMID UKIMID
Taiwan/Yuchih Taiwan/Hsiulin Taiwan/Tapang Singapore Hong-Kong
23.9 23 23.5 1.28 22.2
121 120 121 104 114
USAERL
Malay Peninsula
3
103
USAERL
Singapore
1.28
104
12
100
23.6
113
21
106
Linguistic family/Language Austronesian ± Western MalayoPolynesian ± Yami Austronesian ± Paiwanic Sinicized ± Thao Austronesian ± Atayal Austronesian ± Tsouic ± Tsou Sino-Tibetan ± Chinese Sino-Tibetan ± Chinese Austronesian ± Western MalayoPolynesian ± Sundic ± Malay
Complexity 1 3 1 1 3 3 2
Sino-Tibetan ± Chinese 3 Tai-Kadai ± Tai-Sek ± Southwestern Tai ± Chiang Saeng 3 Indo-European ± Germanic ± English 3
05.SE-Asia
Thai USAERL North American (Asi_descent) USAMFV
05.SE-Asia
Chinese
USATRA
Thailand USA (American Red Cross) China/South, Canton region
05.SE-Asia
Kinh
VNMVTA
Viet Nam/Hanoi
05.SE-Asia
Muong
VNMVTA
105
06.Oceania
Ivatan
TWNLIN
06.Oceania
East_Timorese
USAERL
Viet Nam/Hoa Binh 20.8 Philippines/Batan Island, Baso 20.5 Indonesia/East Timor/Nusa Tenggara ª9
06.Oceania 06.Oceania
Filipino Indonesian
USAERL USAERL
Philippines/Manilla Indonesia
14.6 ª3
121 120
06.Oceania
Moluccan
USAERL
ª2
127
06.Oceania
PNG_Highlander
USAERL
ª5
145
06.Oceania
PNG_Lowlander_ 48
USAERL
ª10
147
Indo-Pacific
06.Oceania
PNG_Lowlander_ 95
USAERL
Indonesia/Moluccas Melanesia/New Guinea, Highlands Melanesia/New Guinea, Lowlands, many areas Melanesia/New Guinea, Lowlands, Wosera
Sino-Tibetan ± Chinese Austroasiatic ± Mon-Khmer ± Vietnamese Austroasiatic ± Mon-Khmer ± Muong Austronesian ± Extra-Formosan ± Proto-Filipino ± Ivatan Austronesian ± Central MalayoPolynesian ± Flores-Lembata Austronesian ± Western MalayoPolynesian ± Tagalog Austronesian ± Malayo-Polynesian Austronesian ± Central MalayoPolynesian ± Southwest Maluku Indo-Pacific ± Trans-New Guinea ± East New Guinea Highlands
ª5
150
06.Oceania
Samoa
USARWL
Indo-Pacific ± Sepik-Ramu ± Sepik ± Middle Sepik ± Ndu 2 Austronesian ± Eastern MalayoPolynesian ± Oceanic ± Samoan 2
07.Australia
Australian_Cape_ York
USAGAO
Melanesia /Samoa Australia/ Queensland/Cape York Australia/Northern Territory/GrooteEylandt Australia/Western Australia/Kimberley Australia/Northern Territory/Yuendumu
14.2
122 125
ª171
ª13
143
ª14
137
ª17
127
ª24
132
07.Australia
Australian_Groote_ Eylandt USAGAO Australian_ Kimberley USAGAO Australian_ Yuendumu USAGAO
08.NE-Asia
Okinawan
USAPAK
Hawai/Honolulu
26
128
08.NE-Asia 08.NE-Asia
Ryukuan Buriat
JPNTKN JPNTKN
Japan/Okinawa Mongolia/Angarsk
26.4 47.6
128 119
08.NE-Asia
Korean
KORPMH
37.6
127
08.NE-Asia 09.N-America
Tuva Lacandon
USAERL MEXGOR
Korea/Seoul Russia/NovosibirskKyzyl Mexico/Chiapas
09.N-America
Seri
MEXGOR
09.N-America
Canoncito
USAERL
09.N-America
Maya
USAERL
09.N-America 09.N-America
Pima_17 Pima_99
09.N-America
07.Australia 07.Australia
YES
6
50 16.7
95 ª91
Mexico/Isla Tiburon USA/Arizona, Grand Canyon
29
ª112
36.1
ª112
20
ª90
USAERL USAERL
Mexico/Yucatan USA/Arizona, Gila River USA/Arizona
33 33
ª113 ª112
Sioux
USAERL
USA/South Dakota
43.6
ª97
09.N-America 09.N-America
Zuni Yupik
USAERL USALEF
35 60
ª107 ª160
09.N-America
Amerindian
USAMFV
USA/New Mexico Alaska/South USA/American Red Cross
Australian ± Pama-Nyungan Australian ± non-Pama-Nyungan ± Anindhilyaguan Australian ± non-Pama-Nyungan ± Wororan and Nyulnyulan Australian ± Pama-Nyungan ± Ngargan Altaic ± Korean-Japanese ± Ryukyuan Altaic ± Korean-Japanese ± Ryukyuan ± Amami-Okinawan Altaic ± Mongol Altaic ± Korean-Japanese ± Korean
3 3 2 1 2 2 3 2 2 2
2 1 2 2 2 1 1 2
Altaic ± Turkic ± Tuvinian Amerind ± Maya ± Lacandon Amerind ± North Amerind ± Hokan ± Seri Na-Dene ± Athapascan ± Navajo ± Canoncito Amerind ± North Amerind ± Penutian ± Mayan Amerind ± Central Amerind ± Uto-Aztecan Amerind ± North Amerind Amerind ± North Amerind ± Almosan-Keresiouan ± Dakota Amerind ± North Amerind ± Penutian ± Zuni Eskimo-Aleut ± Eskimo ± Yupik
2 1
Amerind
3
HLA 2004: Immunobiology of the Human MHC
1 1 2 1 1 2 1 2
3
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 1. Continued 12th IHWS1 SubNo. mission samples Country/Region Lat2 Brazil/Mato Grosso do Sul, Amambai, Lima˜o Verde ª23 Brazil/Mato Grosso do Sul, Porto Lindo, Amambai, ª24
Region
Population dataset name
Labcode
10.S-America
Guarani-Kaiowa
BRAPTZ
10.S-America
Guarani-Nandeva
BRAPTZ
10.S-America
Ticuna
USAERL
Brazil/Tabatinga
10.S-America
Central American
USAERL
10.S-America
Bari Brazilian_(Afr-Eur_ descent) Mexican
VENLAY
Costa Rica/Panama 3 Venezuela/ Saimadodyi & Campo Rosario 9.8 Brazil/Ribeirao Preto ª23 Mexico/Mexico City 19.4 Brazil/Belo Horizonte ª10
ª55
21.5
ª80
11.Other 11.Other 11.Other
BRADON MEXGOR
Brazilian UKIMID Cuban_(Afr-Eur_ 11.Other descent) UKIMID Cuba/Havana North American USA (American 11.Other (His_descent) USAMFV Red Cross) 1 These same samples were also reported in the 12th IHWS (see reference .1) 2 LAT indicates latitude in degrees north or south. 3 LONG indicates longitude in degrees east or west.
The mean number of loci genotyped in the 13W datasets was 3.9, while the mean for the 48 12W datasets was 3.6. Many of the population samples submitted as 13W datasets had been previously typed as part of the 11th International Workshop (11W) or 12W, primarily at class II loci. Submitting laboratories were encouraged to use IHWG methods (see section A, HLA Typing and Informatics) for new typing. In cases where participating laboratories were unable to carry out molecular-level typing, genotyping was accomplished by a second laboratory. All laboratories using IHWG reagents were required to type a subset of the IHWG Quality Control (QC) cell panel cells at 92% accuracy before new genotyping data could be accepted for analysis. Data that had been generated before the start of the IHWG, or which had been generated using non-IHWG methods, was classed as non-qualified data (or ‘‘Available Data’’) and was accepted in the form of four digit (relatively unambiguous) genotype assignments. Data generated with IHWG reagents was submitted in the form of probe-reactivity patterns, formatted using either the RLS software or the IHWG Virtual DNA Analysis (VDA) component’s SCORE software (Section Joint R, Virtual DNA Analysis Report). In these cases, alleles and genotypes were inferred by the software. Approximately 53% (50/95) of the IHWG datasets had been typed subsequent to the 12W and were accepted as non-qualified data. Of the remaining 45 datasets, 41 were typed at class I loci using IHWG RLS reagents, 3 using IHWG SSOP reagents, and 1 by sequencing based typing (SBT) (these typing methods are described in the Tech-
4
HLA 2004: Immunobiology of the Human MHC
ª5
Complexity
Long3
Linguistic family/Language
ª55
Amerind ± Equatorial-Tucanoan ± Tupi-Guarani ± Guarani 1
ª55 ª70 ª65 ª73 ª48 ª99
Amerind ± Equatorial-Tucanoan ± Tupi-Guarani ± Guarani 1 Amerind ± Equatorial-Tucanoan ± Ticuna 1 Amerind ± Chibcan-Paezan ± Chibchan 1 Amerind ± Chibcan-Paezan ± Chibchan ± Bari (Motilon) Indo-European ± Italic ± Portuguese Indo-European ± Italic ± Spanish Indo-European ± Italic ± Portuguese
1 3 2 3
Indo-European ± Italic ± Spanish 3 Indo-European ± Germanic ± English 3
nology joint report, sections A.2, A.3, and A.5). In many instances, the results of these methods were verified by SBT. Class II typing in many of these datasets was carried out using 12W or local reagents. Datasets typed using local reagents were submitted in a format similar to non-qualified data (relatively unambiguous assignments). Thanks to the efforts of the submitting laboratories, complete background information (especially geographic and linguistic information) is available for most population samples. In those cases where well-defined geographic information was not available, the latitude and longitude of a close locality or the capital city of the country was used. Linguistic assignments were based on information provided by each laboratory when available (see below), and either Ruhlen’s classification scheme for linguistic families (2) or the Ethnologue (3) was consulted when no such information was available. There were a few cases where the broad linguistic family could not be specified with certainty. For example, ‘‘Kenyans’’ (i.e., the Kenyan_142 sample) and ‘‘Ugandans’’ may include AfroAsiatic- and Nilo-Saharan-speakers, in addition to Niger-Congo speakers. These populations were classified as ‘‘Mainly Niger-Congo’’ based on the proportion of Niger-Congo speakers in these nations when compared to Afro-Asiatic and NiloSaharan languages (as shown in Table 4 (3)). In other cases, only a broad linguistic characterization was possible (e.g., ‘‘Indo-Pacific’’). A summary of these data is provided in Table 5, which describes the number of 13W populations that correspond to linguistic families in each geographic region
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 2A. Description of 12W datasets supplementing all analyses (nΩ48 populations) Region
12W labcode
12W.
Country/Region
01.SS-Africa 01.SS-Africa 01.SS-Africa
Population Name Colombian (Afr_descent) Amhara Baganda
12trachtenberg ITAGDS USALOU
* 155 226
Colombia Ethiopia/Arssi Uganda/Kampala
01.SS-Africa 02.N-Africa
Mukongo Egyptian
BELDPT 12ferencik
*
02.N-Africa 02.N-Africa 03.Europe 03.Europe 03.Europe
Algerian_100 Bedouin Italian North_Italian Finn_143
FRAMER EGYELC 12ferrara ITAFER FINTII
55 171 * 45 29
03.Europe
Hvar_Island_Croatian
CROKAS
20
03.Europe 03.Europe
Krk_Island_Croatian Polish
CRORUD FRADDC
19 156
03.Europe
Pomaki
GRESTV
164
03.Europe 03.Europe 03.Europe 03.Europe 04.SW-Asia
Provincial_French Spanish_100 Spanish_133 Spanish_Basque Sri_Lankan
FRADDC SPAARN SPALAR SPABER 12hashemi
229 104 82 120 *
04.SW-Asia
Zoroastrian
12hashemi
*
04.SW-Asia 04.SW-Asia
North_Indian Ashkenazi_Jews
12ferencik ISRBRA
* 116
04.SW-Asia 04.SW-Asia 04.SW-Asia
Hunza-Burushaski Libyan_Jews Moroccan_Jews
PAKQAS ISRBRA ISRBRA
214 153 117
04.SW-Asia
Sindhi
PAKQAS
213
05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia
South_Han Taiwanese Ami_14 Paiwan_64
12johnlee 12johnlee JAPSEK JAPSEK
* * 243 241
05.SE-Asia 05.SE-Asia
Puyuma_15 Thai-Chinese
JAPSEK THACHI
08.NE-Asia
Japanese
12juji
08.NE-Asia
Japanese_Kobe
12araki
47
08.NE-Asia 08.NE-Asia
Halkh Han
JAPTSU JAPINK
184 137
08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia
Hoton Kazakh Korean Mongolian
JAPTSU JAPINK JAPJUJ JAPJUJ
112 135 240 16
08.NE-Asia 08.NE-Asia
Tuvinian Uygur
RUSKON JAPINK
09.N-America
Mixe
09.N-America 09.N-America 10.S-America 10.S-America
1
Long.
Linguistic family/Language
4.34 7.58 0
ª74 39 32.5
ª4.2 30.1
15.2 31.1
Indo-European ± Italic ± Spanish Afro-Asiatic ± Semitic ± Amharic Niger-Congo ± Bantu ± Luganda Niger-Congo ± Bantu ± Kikongo, Lingala, Tsheluba Afro-Asiatic ± Semitic ± Arabic
36.5 31.2 41.5 45.7 65
3 27.2 12.3 9.7 25.3
Croatia/Hvar
43.1
16.3
Croatia/Krk Poland Greece/Northern_ Xanthis France/Ile_de_ France Spain Spain Spain India/Sri Lanka
45 52.3
Zaire/Kinshasa Egypt Algeria/Algiers_ area Egypt/Siwa Italy Italy/Bergamo Finland/Oulu
Canada India/Uttar Pradesh, Lucknow Israel Pakistan/ North:Gilgit Israel Israel
Lat.
Complexity 2 2 2 2 3
14.5 16.5
Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Semitic ± Arabic Indo-European ± Italic ± Italian Indo-European ± Italic ± Italian Uralic-Yukaghir ± Finnic ± Finnish Indo-European ± Slavic ± Croatian Indo-European ± Slavic ± Croatian Indo-European ± Slavic ± Polish
3 3
41.1
24.6
Indo-European ± Greek
2
48 40.3 40.3 43 6.7
0 ª3.4 ª3.4 ª2 79.9
3 3 3 2 2
45.3
ª73
Indo-European ± Italic ± French Indo-European ± Italic ± Spanish Indo-European ± Italic ± Spanish Basque Elamo-Dravidian ± Dravidian Indo-European ± Germanic ± English Indo-European ± Indo-Iranian ± Indic Afro-Asiatic ± Semitic ± Hebrew
2 3 2
27.5 31.5
82 35.1
36.2 31.5 31.5
74.4 35.1 35.1
24.5
67
Burushaski Afro-Asiatic ± Semitic ± Hebrew Afro-Asiatic ± Semitic ± Hebrew Indo-European ± Indo-Iranian ± Indic ± Sindhi
3 2 3 2 3 3
3 2 3
24.5 23 22.5 23.5
118 120 121 121
Sino-Tibetan ± Chinese Sino-Tibetan ± Chinese Austronesian ± Paiwanic Austronesian ± Paiwanic
3 3 1 1
245 85
Pakistan/Sindh China/Fujian, Xiamen China/Fujian Taiwan/East Taiwan/South Taiwan/South_ West Thailand/Centre
22 15
121 100
1 3
*
Japan
35.3
140
34
135
3
47.5 43.4
107 87.4
Altaic ± Mongolian Sino-Tibetan ± Sinitic ± Chinese
2 3
48 48 46 47.5
92 68 127 107
Altaic Altaic Altaic Altaic
Mongolian Turkic ± Kazakh Korean Mongolian
2 3 3 3
236 17
Japan/Kobe Mongolia/ Ulaanbaatar China/North Mongolia/Uvs_ Aimag Russia/Kazakhstan China/Heilongjiang Mongolia Russia/Tuvinian/ several_Regions China
Austronesian ± Paiwanic Sino-Tibetan ± Chinese Altaic ± Korean-Japanese ± Japanese Altaic ± Korean-Japanese ± Japanese
51 43.4
95 87.4
1 2
USAKLI
209
Mexico/Oaxaca
17.1
ª97
Mixteca
USAKLI
211
Mexico/Oaxaca
17.1
ª97
Zapotec Colombian Ecuadorian
USAKLI 12trachtenberg 12trachtenberg
210 * *
Mexico/Oaxaca Colombia Ecuador
17.1 4.34 ª0.2
ª97 ª74 ª78
Venezuela/Zulia
10
ª73
Altaic ± Turkic ± Tuva Altaic ± Turkic ± Uygur Amerind ± North Amerind ± Penutian ± Mixe Amerind ± Central Amerind ± Oto-Manguean ± Mixtec Amerind ± Central Amerind ± Oto-Manguean ± Zapotec Indo-European ± Italic ± Spanish Indo-European ± Italic ± Spanish Amerind ± Ge-Pano-Carib ± Yupan
10.S-America Yukpa VENLAY 185 * No 12W map number, data provided by corresponding laboratory.
± ± ± ±
HLA 2004: Immunobiology of the Human MHC
2
3
1 1 1 1 1 1
5
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 2B. Description of 12W datasets supplementing linguistics-related analyses (nΩ20 populations) Region 01.SS-Africa 01.SS-Africa
Population Name Merina Oromo
12W. 220 154
Country/Region Madagascar/Central Highlands Ethiopia/Arssi
Lat. ª18 7.58
Long. 47.2 39
01.SS-Africa 02.N-Africa 02.N-Africa 02.N-Africa 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 04.SW-Asia
Zairian Egyptian_Copts Egyptian_Delta Mzab Belgian Bulgarian French_North Greek_Attiki Italian_Pavia Portuguese_Coimbra Portuguese_South Sardinian Swiss Lebanese
234 221 109 3 158 119 61 165 42 195 65 46 196 145
Zaire/several regions Egypt (in USA) Egypt/Delta Algeria/South Sahara Belgium/Namur & Luxembourg Bulgaria France/Northern Greece/Attiki Italy/Pavia Portugal/Coimbra Portugal/South Italy/Sardinia Switzerland/Geneva Lebanon
ª4.2 30 31 32.2 50.3 42.4 50.4 38 45.1 40.1 39 40 46.2 33.5
15.2 31 31.3 3.4 4.52 23.2 3.05 25.3 9.09 ª8.3 ª9 9 6.1 35.3
04.SW-Asia 06.Oceania 08.NE-Asia 10.S-America
Punjabi Trobriand Manchu Kaingang
38 111 15 9
India/Punjab Melanesia/Papua New Guinea/islands China/Heilongjiang Brazil/South: Parana
28.4 ª8.3 45.2 ª25
77.1 151 126 ª52
(defined in section 2.II below and shown in Figure 1). In addition, detailed descriptions of each 13W population (summarizing history, sampling and genotyping methods, and
Figure 1. Boundaries for global regions.
6
HLA 2004: Immunobiology of the Human MHC
Linguistic Family/Language Austronesian ± Malagasy Afro-Asiatic ± Cushitic ± Oromo Niger-Congo ±Bantu ± Kikongo, Lingala, Tsheluba Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Semitic ± Arabic Afro-Asiatic ± Berber ± Mzab Indo-European ± Italic ± French Indo-European ± Slavic ± Bulgarian Indo-European ± Italic ± French Indo-European ± Greek Indo-European ± Italic ± Italian Indo-European ± Italic ± Portuguese Indo-European ± Italic ± Portuguese Indo-European ± Italic ± Sardinian Indo-European ± Italic ± French Afro-Asiatic ± Semitic ± Arabic Indo-European ± Indo-Iranian ± Indic ± Punjabi Austronesian ± Kilivila Altaic ± Tungus ± Manchu Amerind ± Ge-Pano-Carib ± Kaingang
preliminary analyses) are included in the following chapter (Chapter 3, Short Population Reports). The report for each 13W population is referenced by the 13W. in the table of contents.
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 3. Number of individuals typed at each locus in the 12W and 13W datasets (total of 163 populations) 13W. 1 2
Region 01.SS-Africa 01.SS-Africa
3 4 5 6 7 8 9
01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa
159 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 01.SS-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 02.N-Africa 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe 03.Europe
134 58 59 60 61 62 63 64 65 66 67 68 69 70 71
03.Europe 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia 04.SW-Asia
160 72
05.SE-Asia 05.SE-Asia
Population Name Dogon Kenyan_Lowlander (Luo) Kenyan_Highlander (Nandi) Kenyan_142 Mandenka Rwandan Shona Ugandan Zambian North American (Afr_descent) Zulu Amhara Baganda Merina Mukongo Oromo Zairian Algerian_98 Chaouya Metalsa Moroccan_94 Moroccan_99 Algerian_100 Bedouin Egyptian Egyptian_Copts Egyptian_Delta Libyan_Jews Moroccan_Jews Mzab Bulgarian_Gipsy Croatian Czech Finn_89 Georgian Irish Slovenian Belgian Bulgarian Finn_143 French_North Greek_Attiki Hvar_Island_Croatian Italian Italian_Pavia Krk_Island_Croatian North_Italian Polish Pomaki Portuguese_Coimbra Portuguese_South Provincial_French Ashkenazi_Jews Sardinian Spanish_100 Spanish_133 Spanish_Basque Swiss North American (Eur_descent) Druze Israeli_Jew Kurdish New_Dehli Omani South_Indian Tamil Turk Hunza-Burushaski Lebanese North_Indian Punjabi Sindhi Sri_Lankan North American (Asi_descent) Ami
Labcode USAMFV USAMFV
IHWC 13th 13th
USAMFV CANLUO CHETIE USATNG USALOU USAMFV USAMFV
13th 13th 13th 13th 13th 13th 13th
USAMFV ZAFHAM ITAGDS USALOU FRADAN BELDPT ITAGDS FRAKPL CHESAN ITAADO ITAADO CHESAN ESPARN FRAMER EGYELC 12ferencik USAYUN EGYELC ISRBRA ISRBRA FRATHO BRGNAU HRVKAS CZEIVS FINLOK CZEIVS UKIMID SVNJER BELOSS BULNAU FINTII FRADAN GRESTV CROKAS 12ferrara ITAMRA CRORUD 12ferrara FRADDC GRESTV PORLTC PORCHS FRADDC ISRBRA ITACON SPAARN SPALAR SPABER SUIJEA
13th 13th
13th 13th 13th 13th 13th
13th 13th 13th 13th 13th 13th 13th
12W . A C B DRB1 138 129 138 138 265 265 265
12th
12th 12th 12th 12th 12th 12th 12th
13th 13th 13th 13th 13th 13th 13th 13th 13th
501
12th 12th 12th 12th 12th 12th 12th 12th
55 171 221 109 153 117 3
12th
21
USAMFV TWNLIN
13th 13th
158 119 29 61 165 20 26 42 19 101 156 164 195 65 229 116 46 104 82 120 196
89 214 145 38 213
240 143 94
226 163 45
226 161 44
255 199
252 98
251 201
155 226 220 1 154 234 235
12th
12th 12th 12th 12th 12th 12th 12th
240 143 54
225 163 43
DQB1
DPA1
119 84 280 229
113
129
123
280 229
229
88
89 98
87 98
DPB1
85 228
87
26 160 30
20 82 106 99
99
98 95
98 96
101 79
98 78
40 40
40 40
12 11 11 150 150 139 105 106 106 104 90 90 90 90 105 107 108 1000 1000 1000 1000 100 131
132
67 72
12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th
USAMFV ISRBRA ISRGAZ CZEIVS USAERL UKIMID USAERL ZAFHAM TURSAR PAKQAS USABIA 12ferencik INDRAN PAKQAS 12hashemi
122
241 113 93
DQA1
68 68
99 99 98 100
40 40
39 39
40 40
79 40 103 40 40 107
79
105 35
106 30
102
100
100
100
96 104 50 92 102 101 99 100 219 110 244 40
96
100 38 42 234
97 101
40
101
40
101
40
106 97
106 97
104 101 98 100
104 101 99 100
224 40 80 100 57
224 40 80 100 125 165
100 126 158
100 108 39
68 88
297 100 117 30 66 121 88 50
292 100 94 29 56 121 104 48
287 100 109 29 66 109 49
46
46
46
245
245
120
120
126 120
118 51
39
39
39 57
411 98
401 98
396 98
98
HLA 2004: Immunobiology of the Human MHC
7
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 3. Continued 13W. 73 74 75 76 77 78 79 80 81 82 84 85 86 87 89
Region 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 05.SE-Asia 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 06.Oceania 07.Australia
114 115 116 83 88 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 135 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
07.Australia 07.Australia 07.Australia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 08.NE-Asia 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 09.N-America 10.S-America 10.S-America 10.S-America 10.S-America 10.S-America 10.S-America
8
Population Name Atayal Bunun Chinese Hakka Han_Chinese_149 Han_Chinese_572 Kinh Malay Minnan Muong Paiwan_51 Pazeh Puyuma_49 Rukai Saisiat Singapore_ (Chinese_descent) Siraya Thai Thao Toroko Tsou Yami (Tao) Ami_14 Paiwan_64 Puyuma_15 South_Han Taiwanese Thai-Chinese East_Timorese Filipino Indonesian Ivatan Moluccan PNG_Highlander PNG_Lowlander_48 PNG_Lowlander_95 Samoa Trobriand Australian_Cape_York Australian_Groote_ Eylandt Australian_Kimberley Australian_Yuendumu Okinawan Ryukuan Buriat Korean Tuva Halkh Han Hoton Japanese Japanese_Kobe Kazakh Korean Manchu Mongolian Tuvinian Uygur Amerindian Canoncito Lacandon Maya Mixe Mixteco Pima_17 Pima_99 Seri Sioux Yupik Zapotec Zuni Bari Brazilian Guarani-Kaiowa Guarani-Nandeva Ticuna Central American
Labcode TWNLIN TWNLIN USATRA TWNLIN UKIMID UKIMID VNHNAN USAERL TWNLIN VNHNAN TWNLIN TWNLIN TWNLIN TWNLIN TWNLIN
IHWC 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th 13th
USAERL TWNLIN USAERL TWNLIN TWNLIN TWNLIN TWNLIN JAPSEK JAPSEK JAPSEK 12johnlee 12johnlee THACHI USAERL USAERL USAERL TWNLIN USAERL USAERL USAERL USAERL USARWL GERNAG USAGAO
13th 13th 13th 13th 13th 13th 13th
USAGAO USAGAO USAGAO USAPAK JPNTKN JPNTKN KORPMH USAERL JAPTSU JAPINK JAPTSU 12juji 12araki JAPINK JAPJUJ JAPJUJ JAPJUJ RUSKON JAPINK USAMFV USAERL MEXGOR USAERL Hollenbach Hollenbach USAERL USAERL MEXGOR USAERL USALEF Hollenbach USAERL VENLAY UKIMID BRAPTZ BRAPTZ USAERL USAERL
12W . A C B DRB1 106 106 106 106 101 101 101 101 282 281 282 55 55 55 55 149 149 572 572 102 124 107 101 54 102 102 102 102 83 51 51 51 51 55 55 55 55 50 50 50 50 50 50 50 50 51 51 51 51 86 51 98 30 55 51 50 12th 12th 12th 12th 12th 12th
243 241 245 162 1012 85
13th 13th 13th 13th 13th 13th 13th 13th 13th
86 51 92 30 55 51 50
51 99 30 55 51 50
162 162 162 199 1011 1012 42 57 94 94 94 50 50 49 50 50 50 25 92 77
DQA1
DQB1
DPA1
100 55
53
83
51 30 55 51 50 15 65 16 162 74 86 94
86
86 94
86 94
86 94
46 91 48
46 78 48
46 88 48
83
46 92 48 93
50 40 90
79 50
50
50
13th
103
89
100
99
99
99
96
13th 13th 13th 13th 13th 13th 13th 13th
75 36 191 105 142 140 191 189
73 28 192 105
75 38 193 104
41 190
41
41
38
200 174
200 180
12th
12th
12th 12th 12th 12th 12th 12th 12th 12th 12th 12th 12th
111
6
184 137 112 608 47 135 240 15 16 236 17
13th 13th 13th 13th
79
608 32
608 32
32
73
67
199
61
34
203
52 52
13th 13th 13th 13th 13th 12th 13th 13th 13th 13th 13th 13th 13th
HLA 2004: Immunobiology of the Human MHC
72
116
199 189 40 57 85
41 57 84
41 57 84
41 57 85
30 39
32 39
30 39
39
66
66
66
65
40 162 15 52 52
40 162 15 52 52
40 162 15 52 52 97
40
160 191 257
12th 12th
DPB1
248
235
52 51 86
52 52 89
52 52
33
33
33
252 76
149 71
252 74
92 97 144 53
86 95 144 53
82 144 53
33 96 252 76 50
33 96 58 76 50
25 96 252 72 50
144 53 49 55
144 53 49 55
49 55
15 95 83 50
49 55
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 3. Continued 13W.
Region
Population Name
152
10.S-America
Colombian
153 154 155
10.S-America 10.S-America 10.S-America
156 136
11.Other 11.Other
157 158
11.Other 11.Other
161
11.Other
Ecuadorian Kaingang Yukpa Brazilian_(Afr-Eur_ descent) Mexican Cuban_ (Afr-Eur_descent) Cuban_(Eur_descent) North American (His_descent)
162 163
11.Other 11.Other
12W . A
Labcode 12trachtenberg 12trachtenberg BRAPTZ VENLAY
IHWC
BRADON MEXGOR
13th 13th
99 62
106 29
UKIMID UKIMID
13th 13th
42 70
42 70
13th
247
246
USAMFV 12trachColombian (Afr_descent) tenberg Zoroastrian 12hashemi
C
B
DRB1
12th 12th 12th 12th
12th 12th
As shown in Table 5, Southeast Asia is the best-represented region (26 population samples) and is largely represented by Aboriginal populations from Taiwan. By contrast, very few populations have been typed in Northeast Asia (only 3 population samples). As a consequence of this geographic distribution, the Austronesian and Amerindian linguistic families are the best represented (20 and 14 population samples, respectively), followed by Indo-European languages (16 populations). The Indo-European family is the most widespread linguistic group in the 13W dataset, with speakers in the Europe, South-West Asia, North America and Other regions. Some linguistic groups are absent, as is the case for Afroasiatic and Khoisan in sub-Saharan Africa. In summary, the 13W dataset represents a considerable contribution to the pool of populations tested for HLA polymorphism at class I and class II loci using high-resolution methods. A large number of populations (95) have been tested, approximately 42% of which are from Southeast Asia, and North/Central America. Moreover, contextual and historical background information regarding each population sample accompanies each dataset in the form of a short report. This information is crucial for the proper interpretation of the resulting genetic analyses. At the same time, it must be noted that these 13W population samples have not all been typed for the same HLA loci, although similar numbers of samples have been tested for some class I and class II loci (as shown in Table 6). Even when the 68 12W datasets are included, analyses are limited to some 50–90 populations per locus (e.g., Chapter C.7). Future studies should encourage additional molecular-level typing of the same samples in these populations, with the goal of obtaining genotypes at all HLA loci in all sampled individuals. Finally, the population sample sizes are too low in many cases
9 185 69 40
DQA1
DQB1
DPA1
217
217
217
217
99 28 73
99 87 73
99 101 70
99
100 204
204
204
70 54
70 54
70
DPB1
240 70
(fewer than 40 individuals in some populations) to permit multi-locus analyses. As the number of distinct alleles increases with each report of the WHO Committee for Nomenclature for Factors of the HLA System, the problems presented by low sample sizes are compounded. As shown in Table 6, the average sample size at the class I and DRB1 loci is greater than 100 individuals, but this number is considerably lower than the number of currently distinguishable alleles. Because large numbers of HLA alleles are common in human populations, the frequencies of most alleles are low (∞5%), and there is a good chance that low frequency alleles will not be detected when sample sizes are small. In addition, statistical tests performed on poorly sampled populations have a low power and are not reliable. This is true at the single-locus level, and may be even more dramatic at the multi-locus level, where the estimation of haplotype frequencies and tests of linkage disequilibrium depend on accurate sampling of allelic diversity. Finally, the method for reducing genotype ambiguity (described in section 2.I, below) is dependent on the observation of alleles in a number of different genotypes, the likelihood of which is proportional to the size of the population sample. In general, an effort must be made to sample at least 100 individuals per population (e.g., see Sanchez-Mazas 2002 (5)).
Section 2. Pre-analytical dataset processing
Subsequent to submission, each 13W dataset was prepared and formatted for analysis in a multi-step process. First, all ambigu-
Table 4. Linguistic representation in Kenya and Uganda1 Nation Niger-Congo Kenya 17.9 million Uganda 11.4 million 1 These values are taken from reference
Nilo-Saharan 7.5 million 5.3 million (3).
HLA 2004: Immunobiology of the Human MHC
Afro-Asiatic 715,000 0
9
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 5. Geographic and linguistic distribution of 13W datasets Linguistic groups Geographic Regions AA NC NS IE UY SC ED ST AU TK AN IP AB AL AM EA ND Total 01.SS-Africa 8 2 1 11 02.N-Africa 5 5 03.Europe 5 1 1 7 04.SW-Asia 3 2 2 1 8 05.SE-Asia 1 6 2 1 14 2 26 06.Oceania 6 3 9 07.Australia 4 4 08.NE-Asia 3 3 09.N-America 4 8 1 1 14 10.S-America 1 5 6 11.Other 2 2 Total 8 8 2 16 1 1 2 6 2 1 20 3 4 6 4 1 1 95 AA: Afro-Asiatic, NC: Niger-Congo, NS: Nilo-Saharan, IE: Indo-European, UY: Uralic-Yukaghir, SC: South-Caucasian, ED: Elamo-Dravidian, ST: Sino-Tibetan, AU: Austroasiatic, TK: Tai-Kadai, AN: Austronesian, IP: Indo-Pacific, AB: Australian, AL: Altaic, AM: Amerindian, EA: Eskimo-Aleut, ND: Na-Dene.
Table 6. 13W Population datasets genotyped at each locus
Number of populations tested Average sample size Standard deviation
HLA locus A 78 130.6 136.3
C 59 129.3 141.8
B 73 131.7 139.5
DRB1 59 114.8 132.1
ities associated with a given genotype were reduced to pairs of single alleles. Second, genotype data for each population sample was merged with basic demographic and typing information in a standardized format. Third, the existence of each allele at each locus was verified by comparison to the March 2002 allele list approved by the WHO Committee for Nomenclature for Factors of the HLA System, and alleles were truncated to peptide level (4-character) designations. Fourth, ‘‘binning’’ rules, reassigning alleles that were identified using high resolution typing methods in a subset of populations as variants detectable with lower-resolution methods, were applied to reduce these alleles to common denominator categories. Each of these
DQA1 21 88.4 55.1
DQB1 33 103.7 68.1
DPA1 8 76.1 34.3
DPB1 21 72.7 30.4
processes is described in detail in the following sections (I. Ambiguity Reduction, II. Datafile Formatting, III. Data Filtering, and IV. Binning). The overall extent of changes made to datasets as a result of these processes is summarized in section 3, below. I. Ambiguity Reduction
Much of the 13W genotype data was not resolved to the allelic level (i.e., only two alleles per genotype) when submitted for analysis and contained ambiguous alleles and/or ambiguous genotypes (described below). The extent of allelic and genotypic ambiguity in the overall dataset is detailed in Table 7.
Table 7. Extent of ambiguous alleles and ambiguous genotypes in the 13W dataset Number of datasets with any ambiguity 35 30 36 4 4 5 1 0
Percent of datasets with any ambiguity 0.76087 0.76923 0.78261 0.13793 0.26667 0.2381 0.5 0
Average percent of ambiguity among datasets with any ambiguity 0.6194 0.49542 0.30536 0.06012 0.31798 0.10201 0.33721 –
30 30 30 1 1 1 0 0
0.65217 0.76923 0.65217 0.03448 0.06667 0.04762 0 0
0.40851 0.6293 0.30565 0.13061 0.13061 0.13061 – –
Type of Ambiguity Allelic
HLA Locus A C B DRB1 DQA1 DQB1 DPA1 DPB1
Number of population datasets 46 39 46 29 15 21 2 12
Genotypic
A C B DRB1 DQA1 DQB1 DPA1 DPB1
46 39 46 29 15 21 2 12
10
HLA 2004: Immunobiology of the Human MHC
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Because most of the class I data was submitted for analysis in the form of probe reactivities, and because many of the class II datasets were submitted as Available Data, with many ambiguities reduced to individual allele calls prior to submission, both allele and genotype ambiguities are more extensive at the class I loci, with ambiguities in 60–80% of population datasets, than at the class II loci, with ambiguities in 5–20% of population datasets. It was necessary to resolve these ambiguities to the allelic level before analysis could begin. Ambiguous alleles are those that cannot be distinguished because the typing method cannot assess all pertinent polymorphisms. For example, ‘A*020101, 0209, 0230, 0231’ is an ambiguous allele set used to represent an ambiguous allele, the actual identity of which could be any one of the four constituent alleles. An unambiguously assigned allele is characterized by a single, DNA-level designation (e.g., A*020101). Because ambiguous alleles result from the limitation of a given typing system, population samples typed using multiple systems may have similar but different ambiguous allele sets (e.g., one system results in an ‘A*020101, 0209, 0230, 0231’ allele set, while a second system results in an ‘A*020101, 020102, 020104, 0230’ allele set). Ambiguous genotypes cannot be distinguished due to an inability to establish the phase of the assessed polymorphisms in a given probe reactivity pattern. For example, ‘A*0101/ *02011 or *0101/*0236 or *0106/*02011 or *0106/ *0236’ is used to denote an ambiguous genotype set, the actual identity of which could be any one of the four constituent genotypes. Two sets of ambiguously assigned alleles are shown in this case; the alleles in the A*01 serogroup constitute one ambiguously assigned allele set in this case, and the alleles in the A*02 serogroup constitute a second such set. An unambiguously assigned genotype is characterized by a single possible genotype for a given sample (e.g., A*0101/ *0236). Because ambiguous genotypes are the result of particular combinations of allele-specific probe reactivity patterns, some alleles will only appear in a population as part of an ambiguous genotype set, while other alleles will appear in both ambiguous and unambiguous genotypes. In some cases, one allele may be unambiguously assigned in an ambiguous genotype. For example, in the ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype, the A*0101 allele has been unambiguously assigned. The identity of the alleles in the A*02 allele set is obscured by the inability to set phase. Both types of ambiguity can be observed for a given sample. For example, ‘A* 0101/*02011, 0209, 0230, 0231 or *0101/*0236 or *0106/*02011, 0209, 0230, 0231 or *0106/*0236’ represents an ambiguous genotype with four
possible constituent genotypes, two of which contain an ambiguous allele with four possible constituent alleles. The method used to reduce ambiguities to the allele level attempts to resolve ambiguous genotypes separately from ambiguous alleles (stages 1 and 2 below), and assumes that the sampled individuals are part of a single population with relatively little admixture, and that they were typed with a single typing system. In general, it is assumed that these populations will have low numbers of alleles within a particular serogroup, and that alleles with the same pattern of polymorphic sequence motifs share the same DNA sequence when observed in the same population. Stage 1. Reduction of Ambiguous Genotypes
This method proceeds in four steps as outlined here: Step 1. Eliminate genotypes with alleles never seen in unambiguous genotypes. Step 2. Reduce ambiguously assigned allele sets to common denominators. Step 3. Rely on Hardy-Weinberg proportions to establish homozygotes. Step 4. Consider all remaining ambiguous allele sets as ambiguous alleles. Detailed description: Step 1. Compile a list of all alleles (both ambiguous and unambiguous alleles) observed in unambiguous genotypes as well as the unambiguously assigned alleles in ambiguous genotypes. These are the ‘‘observed alleles’’. In each ambiguous genotype set, eliminate those genotypes lacking observed alleles, reducing each set to those genotypes with observed alleles. If there are no genotypes with two observed alleles in a given set, keep all genotypes with one observed allele, and eliminate those with no observed alleles. If all of the genotypes in a set lack observed alleles, do nothing to that set. For example, a hypothetical population with only two samples presents an unambiguous‘A*02011/*3303’ genotype and an ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype. In this case, the A*0101, *02011, and *3303 alleles have been unambiguously assigned, and the A*0236 allele is eliminated as it is never seen in an unambiguously assigned genotype. This step should be repeated until there is no change to the ambiguous genotype sets, but usually only requires one iteration. This step assumes that a population will have a small number of alleles in a given serogroup. When multiple alleles from a given serogroup do exist in a population, it assumes HLA 2004: Immunobiology of the Human MHC
11
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
that distinct patterns of ambiguity will result when these alleles are in a genotype with a given allele. This assumption may not be true for admixed populations. Step 2. Comparing the ambiguous allele sets with at least one allele in common, eliminate those genotypes (involved in this comparison) containing alleles not found in all such ambiguous allele sets. For example, in three ambiguous genotypes, ‘A*0101/ *02011 or *0101/*0209’, ‘A*2402/*02011 or *2402/ *0236’, and ‘A*3303/*02011 or *3303/*0231’, the A*02011 allele is found in all A*02 sets. The genotypes containing the other A*02 alleles are eliminated. Note that an ambiguous ‘A*0101/*02012 or *0101/*0235’ genotype would not influence this decision, as none of the A*02 alleles in this ambiguous allele set overlap with the other A*02 set. As in Step 1, this step assumes that the number of alleles in a given serogroup in a population will be low, and that distinct alleles will present distinct patterns of ambiguity when present in a genotype with a given allele. Step 3. Where a genotype may be either heterozygous or homozygous, assign homozygote and heterozygote status based on Hardy-Weinberg expectations. For example, the genotype ‘A*2402/*2402 or *2402/ *2403’ could be either a homozygous ‘*2402/*2402’ genotype or a heterozygous ‘*2402/*2403’ genotype. Consider that two such ambiguous genotypes are observed in a population of 100 individuals, with the *2402 allele observed in 25 other genotypes, and the *2403 allele observed in two other genotypes. In this case the number of homozygous ‘*2402/ *2402’ genotypes expected under Hardy-Weinberg equilibrium is calculated with the assumption that both ambiguous genotypes are homozygous (so that all four alleles in these genotypes are *2402 alleles), and the number of heterozygous ‘*2402/*2403’ genotypes expected under Hardy-Weinberg equilibrium is calculated with the assumption that both genotypes are heterozygous (so that two alleles are *2402 alleles and two are *2403 alleles). Under these circumstances, either 2.1 homozygous or 0.54 heterozygous genotypes are expected, and the two ambiguous genotypes are re-assigned as homozygous for the *2402 allele. Alternatively, if the number of *2402 alleles observed in other genotypes were lower than the number of *2403 alleles observed in other genotypes (e.g., 13 versus 25), primarily heterozygous genotypes might be expected (e.g., 0.72 expected homozygotes versus 2.0 expected heterozygotes) and the two genotypes would be re-assigned as heterozygotes.
12
HLA 2004: Immunobiology of the Human MHC
This step assumes that the genotype proportions of the population in question are in Hardy-Weinberg equilibrium. Step 4. For a given ambiguous genotype set, lump all alleles of a given serogroup to form an ambiguous allele. For example, the ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype would be changed to an unambiguous ‘A*0101/ *02011, 0236’ allele. It should be noted that this process has the potential to bias the dataset in favor of high-frequency alleles, resulting in an underestimation of the allele diversity and the exclusion of low-frequency alleles at that locus. Because this method assumes a correspondence between serological specificity and the first two digits of the allele name, it cannot easily be used as described for the DPB1 locus. Overall, this method has been most effective on populations assumed to be relatively free of admixture (i.e., population complexity values of 1 and 2, as described in section II, below). SBT has confirmed that the ambiguity reduction method was correct of select samples, as summarized in Table 8. Overall, this method assumes that genotype diversity in a population will result in sufficient unambiguous assignment of alleles to permit the reduction of ambiguity in the other allele assignments. Ultimately, the adequacy of this assumption rests on the size of the population sample that is genotyped. Irrespective of the genotyping method used, the larger the population sample genotyped, the greater the chance that genotype combinations will result in unambiguous allele assignment, so that the success of SSOP genotyping of populations using this ambiguity reduction approach becomes a function of sample size. Stage 2. Reduction of Ambiguous Alleles
The IHWG Biostatistics core has compiled a database of HLA allele frequency distributions published for populations from around the world (4). The populations in this database have been divided into seven regions (Africa, Europe, Middle East, Asia, Siberia, South Pacific Islands, and the Americas) for the purpose of investigating geographic structure. This database was used for the reduction of ambiguous alleles in a given population, by identifying those constituent alleles in the corresponding global region. This stage of the method assumes that the correlation between geography and genetic distance observed for many populations at other loci extends to MHC loci, and proceeds in three steps. Step 1. Eliminate those constituent alleles that are not observed in the corresponding region.
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 8. SBT confirmation of the reduction of ambiguous genotypes Population Paiwan_51
Sample ID PW34
Submitted Ambiguous Genotypes Allele 1 Allele 2 3501, 3507, 3511, 3523 40011, 40012 3520 4007
Reduced Genotypes Allele 1 Allele 2 3501 4001
SBT Genotypes Allele 1 3501
Allele 2 40011, 40012
Thao
TH05 1521
1525 39021, 39022
39011, 39013, 3905
1525
3901
1525
39011, 39013
Siraya
SL23
3501, 3507, 3511, 3523 3502, 3504, 35091, 35092 3515
4002 4003 4005
3501
4002
3501
4002
Hakka
HA07
15011, 1526N, 1533 1512, 1519 1532, 1535 1521 1521
4601 4601 4601 39011, 39013, 3905 3910
1512
4601
1512, 1519
4601
1521
3901
1521
39011, 39013
HE08
Step 2. Of the remaining constituent alleles, keep the allele that has the highest frequency in that region, and eliminate the rest. When multiple candidate alleles have similar frequencies, keep the allele at the highest frequency in a population geographically closest to the population being analyzed. Step 3. If none of the constituent alleles are identified in the database, reduce the ambiguous allele to the lowest numbered constituent allele. It should be noted that the utility of this database is proportional to the number of populations typed at each locus, and that older published datasets will likely contain no data on recently identified alleles. As a result, novel and rare alleles are less likely to be detected, and a bias in favor of more widespread alleles will be introduced. However, this bias will be consistent between populations. In addition, genetic distances between populations within regions will be under-estimated. II. Datafile Format
Each population dataset consisted of a ‘header block’ that described the six-character IHWG labcode for the submitting laboratory, the typing method used, the ethnicity of the sampled population, the population’s region of origin, the site at which the population sample were collected, the latitude and longitude of the collection site, and the complexity of the population (described below), as well as a ‘data block’ that included the unique population name, sample ID, and genotype data for each sampled individual. These genotype data were organized by locus and presented in map order (HLA-A, C, B, DRA, DRB1, DQA1, DQB1, DPA1, and DPB1). The 95 13W population datasets are included in Appendix C, Table 1.
Header block fields (labcode, method, ethnic, contin, collect, latit, longit, and complex)
Typing methods (method field): The typing protocols used fell into five categories; (1) PCR-Single Stranded Oligo Probe (SSO, SSOP) methods (11W, 12W, IHWG and local SSOP systems), (2) Reverse hybridization format PCR-SSOP methods (IHWG Reverse Line Strip (RLS) and Innolipa PCR-SSO systems), (3) Sequence Specific Primer (SSP) methods (12W ARMS, Genovision SSP, and Dynal SSP systems), (4) Sequence Based Typing (SBT) methods (IHWG SBT and local SBT systems), and (5) PCR-single strand conformation polymorphism (SSCP) methods. Ethnicity (ethnic field): A table of 268 linguistically and culturally defined ethnic codes and 10 admixture codes was provided for data submitting labs (see appendix C, Table 3). In instances where an ethnic or admixture code was not available on this table, a new code was assigned for the new ethnic identifier. In as many cases as is possible, the ethnic identity of each population sample is distinct from the unique population name and regional identification (see below). Regional categories (contin field): Each population sample was assigned to one of eleven regional categories (Sub-Saharan Africa, North Africa, Europe, South-West Asia, Oceania, Australia, North-East Asia, North America, South America and Other), based on the geographic region of origin and the estimated degree of admixture of the sampled population. For non-indigenous populations, regional assignments were made based on the historical locale of ancestors of those populations 1000 years ago. Admixed populations were assigned to the Other category when members of these populations were estimated to be descended from parent populations from different regional categories. Using these criteria, HLA 2004: Immunobiology of the Human MHC
13
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
populations of predominantly Sub-Saharan African descent living outside of Africa were assigned to the Sub-Saharan Africa region, and populations of predominantly European descent living outside of Europe were assigned to the Europe region, while populations of both Sub-Saharan African and European descent were assigned to the Other category. A map defining the boundaries of these regions is shown in Figure 1. Latitude and Longitude (latit and longit fields): Latitude and longitudes were recorded in a decimal format, with minutes and seconds indicated as fractions of each degree value. North latitudes and east longitudes were recorded as positive values, while south latitudes and west longitudes were recorded as negative values. For example 35 degrees 20 minutes south latitude would be recorded as latitΩª35.33, and 2 degrees 30 minutes east longitude would be recorded as longitΩ2.5. Complexity (complex field): Each population sample was assigned to one of three complexity categories (ranging in value from 1 to 3), in an attempt to estimate the degree of potential sub-structure and admixture in each population sample. A population sample collected from a single settlement or group of closely related settlements was assigned a complexity of 1. A population sample collected from a group of disparate but discrete settlements, or across a large region of territory was assigned a complexity of 2. A population sample collected in a metropolitan area, across an entire nation, or from an extremely admixed population was assigned a complexity of 3. Such assignments were made conservatively, with the higher value assigned in equivocal cases. Given these designations, the ambiguity reduction process (section I, above) will function best on populations with low complexity values.
tions at the submitter’s institution, including any required regarding informed consent for prospective research use. III. Data Filtering
In the next pre-analytical step of data processing, each allele name was inspected to ensure that it conformed to standard nomenclature formats. This data ‘‘filtration’’ step took the form of both data truncation and the reclassification of serologically designated alleles. Each of these processes is described below. A. Data truncation
Because of the variety of genotyping methods used to generate the 13W datasets, alleles were reported at varying levels of specificity (e.g., A*24020101, *240201, *2402). Because these differences reflect synonymous nucleotide changes in most instances, allele names were truncated to a common, peptide level (4-character) allele name (e.g., *24020101 was changed to *2402), and the existence of these truncated alleles was verified using the allele set in the IMGT/HLA database (approved by the WHO committee for nomenclature for factors of the HLA system as of March 2002) as follows: i. If a common 4-character substring was found between the truncated allele and an allele (or alleles) in the IMGT/HLA database (e.g., if *2402 was the reduced allele in question, this would match *24020101, *24020102L, *240202, *240203, and *240204, and ‘2402’ for the common substring), then this truncated allele was used in the data analysis. ii. If no substring match was found between the truncated allele and alleles in the IMGT/HLA database, analysis was halted for data review.
Data block fields (populat, id, and locus names):
Population name (populat field): When possible, the population name supplied by the submitting laboratory was used. When multiple populations samples were submitted with identical names, a unique population name was created by appending the sample size (n) to the end of the population name. For example, two population samples named ‘population’ with samples sizes of 140 and 200 would be noted as ‘population_140’ and ‘population_200’. Sample Identifier (id field): Each sampled individual in a given population sample was assigned a unique identifier. These identifiers have been coded to protect the confidentiality of the individual, in accordance with the IHWG protocol for the use of human subjects in research. All samples have been obtained in accordance with applicable laws and regula-
14
HLA 2004: Immunobiology of the Human MHC
B. Serological reclassification
IHWG datasets for which more than 10% of the alleles typed at serological to intermediate levels of resolution were excluded from the 13W dataset. In cases where serological-level allele designations were provided for less than 10% of the alleles at a given locus in a dataset (e.g, DQA1*03), those serological designations were coded as a 4-character allele name in the format XX00, where XX represents the serological designation for that allele (e.g., *03 is coded as *0300). These coded alleles were then assigned the name of an allele in the IMGT/HLA database using the following rules: i. If other alleles with the same serological designation were observed in the population, the name of the coded allele was
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
Table 9. Binning reassignment of high-resolution HLA alleles Locus A
High resolution allele 2409N
Reassigned allele name 2402
C
0706
0701
B
0706
0705
DRB1
1443 1506
1405 1501
DQA1
0104 0302 0303 0502
0101 0301 0301 0501
DQB1
0202 0309 0609 0611
0201 0301 0605 0602
DPB1
2301 3901 4801 4901 5101 6201 7601
0401 0401 0201 0402 0402 4001 1401
Table 10. Fraction of HLA allele assignments that remained unchanged during pre-analytical data processing of 47 population datasets Number of Total number of Percent of allele unchanged allele allele assignments assignments Locus assignments (2n) unchanged A 4608 9344 49.3 B 5802 9328 62.2 C 3981 8334 47.8 DRB1 6353 8728 72.8 DQA1 1456 1825 79.8 DQB1 2980 3755 79.4 DPA1 287 410 70.0 DPB1 856 876 97.7 All Loci 26323 42600 61.8 This table reflects modifications made to individual population datasets including the American_Samoa, Ami_97, Arab_Druze, Atayal, Bari, Brazilian(Af_Eu), Bulgarian, Bunun, Canoncito, Central American, Chaouya, Chinese, Croatian, Filipino, Finn_90, Georgian, Guarani-Kaiowa, GuaraniNandeva, Hakka, Irish, Israeli_Jews, Ivatan, Kinh, Korean_200, Maya, Metalsa, Minnan, Moroccan_99, Muong, Paiwan_51, Pazeh, Pima_17, Puyuma_49, Rukai, Rwandan, Saisiat, Siraya, Tamil, Thao, Ticuna, Toroko, Tsou, Turk, Yami, Yupik, Zulu, and Zuni.
changed to the allele that had the highest frequency in that population at that locus (e.g. if the coded allele was 0300, and *0301 was observed in that population with a frequency of 0.2 while *0302 was observed with a frequency of 0.05, the name of the coded allele was changed to *0301). ii. If no alleles with the same serological designation were observed at that locus in that population, then the coded allele was re-named to correspond to the lowest-numbered allele in the IMGT/HLA database with the same serological designation as the coded allele (e.g., if no other 03 alleles were observed then all *0300 alleles were renamed as *0301). iii. If neither of the previous steps resulted in a name change to a coded allele, analysis was halted for data review.
Overall, this data ‘‘filtering’’ step results in a reduction of the number of alleles (k) when alleles that are identical at the peptide level, but which differ at the nucleotide level, are reported in the same population. In addition, it is possible that k was reduced in datasets generated with multiple typing systems. In these instances, alleles typed at different levels of resolution (e.g., A*2402 versus A*240202) that might represent distinct nucleotide-level variants were treated as identical. It should be noted that all analytical results and inferences are valid only for peptide-level allele variation. IV. Binning
In the final step of pre-analytical data processing, alleles that were only detectable in a subset of samples (due to the use of higher resolution genotyping methods for those samples) were reassigned to a level of resolution equivalent to that which could be detected using lower resolution genotyping methods. For example, HLA-B*0706 alleles were reassigned as B*0705 alleles. This process of reassignment is described as ‘‘binning’’. These binning reassignments were made in order to facilitate useful comparisons across datasets that were genotyped using different methods, and are not reflected in the datasets available in Appendix C. Table 9 identifies the alleles that were binned (High resolution allele), and the allelic category to which they were reassigned (Reassigned allele name).
Section 3. Overall modifications made to datasets
The extent of the modifications made to allele assignments before analysis is described in Table 10, which summarizes the fraction of allele assignments that were unchanged through of the various steps of data processing in 47 datasets for which raw data (i.e., including allele and genotype ambiguity) were available. The data in the remaining 48 datasets was submitted as Available Data and required significantly less modification. Overall, 60% of the submitted allele assignments in these 47 datasets were analyzed as they were submitted. For the purpose of this summary, reassignment includes the reduction of ambiguous allele sets to individual alleles; the truncation of nucleotide-level allele names to peptidelevel allele names; the reassignment of serological allele designations to peptide-level allele names; the reclassification of improperly formatted allele names; and the binning of alleles genotyped at varying levels of resolution. Unambiguously assigned alleles that remained unchanged, but that were submitted in ambiguous genotype sets were counted as fractions of alleles in proportion to the number of genotypes in that HLA 2004: Immunobiology of the Human MHC
15
Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop
set. As expected given the greater extent of allele and genotype ambiguity observed in class I datasets, fewer (50-60%) class I allele assignments remained unchanged in comparison
to class II allele assignments (50-60% versus 80% respectively).
References 1. Bodmer J, Cambon-Thomsen A, Hors J, Piazza A, Sanchez-Mazas A. Report of the Anthropology Component. In, D Charron (ed.) HLA: Proceedings of the Twelfth International Histocompatibility Workshop and Conference: Genetic Diversity of HLA: Functional and medical Implication, Volume I, EDK, 1997, 269– 74.
16
2. Ruhlen M. A Guide to the World’s Languages, volume 1. Stanford University Press, Stanford, California, 1987. 3. Grimes BF, Grimes JE (eds.) Ethnologue: Languages of the World, 14th Edition. SIL Publications, 2002. (http://www.ethnologue.com).
HLA 2004: Immunobiology of the Human MHC
4. Literature database for the 13th IHWC. http://allele5.biol.berkeley.edu/13ihwg/ lit_data.html 5. Sanchez-Mazas A. HLA data analysis in anthropology: basic theory and practice. Teaching session 5: Biostatistics, 16th European Histocompatibility Conference, Strasbourg, 19–22 March 2002, p. 68–83 (available at http://anthro.unige.ch/∂sanchez/ pdf_files/).