Infection, Genetics and Evolution

Infection, Genetics and Evolution 10 (2010) 84–88 Contents lists available at ScienceDirect Infection, Genetics and Evolution journal homepage: www....
1 downloads 0 Views 311KB Size
Infection, Genetics and Evolution 10 (2010) 84–88

Contents lists available at ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Evolutionary selection associated with the multi-function of overlapping genes in the hepatitis B virus Dake Zhang b,1, Jian Chen a,1, Libin Deng c, Qing Mao a, Jiang Zheng a, Jun Wu a, Changqing Zeng b, Yan Li a,* a

Southwest Hospital, Third Military Medical University, Chongqing 400038, China Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China c Faculty of Basic Medical Science, Nanchang University, Nanchang, 330046, China b

A R T I C L E I N F O

A B S T R A C T

Article history: Received 1 July 2009 Received in revised form 10 October 2009 Accepted 20 October 2009 Available online 29 October 2009

It is a challenge to understand the discrete roles of each point mutation in viral evolution, but overlapping genes provide an excellent entrance for the investigation of this complicated process. We obtained 132 sequences from the largest overlapping region in the HBV genome. Based on the genetic divergence between genotypes B and C, we distinguished a set of related footprint mutations that are believed to be responsible for historical selection events. Examining the mutations in the functional domains, we found that the virus has adopted a coherent strategy in its evolutionary process that can be summarized as follows: (1) the distribution of mutations was non-random throughout the overlapping region, and more mutations were preserved in the sequence when one of the genes was under relaxed selection; (2) the viral domains were subject to different selective pressures; for instance, the PreS1 domain underwent a strict selection, whereas the overlapped Spacer domain was relatively relaxed with obvious tolerance of non-synonymous mutations with a high dN/dS ratio; (3) different selective pressures on two codon sites ultimately determined that every mutation persevered at a proper position. Taken together, the functional constraints of protein domains are believed to be primarily responsible for the different selection patterns exhibited by the distribution of mutations and amino acid changes in the region where overlapping genes reside. ß 2009 Elsevier B.V. All rights reserved.

Keywords: Evolution Selection Functional domain Overlapping genes Hepatitis B virus

1. Introduction It is a challenge to obtain a full understanding of point mutations in viral evolution; however, the overlapping genes of the hepatitis B virus (HBV) provide a convenient view by which to explore such mutations. HBV is a small particle virus with a circular partially double-strand DNA genome of approximately 3.2 kb. The genome has four overlapping genes regions: large S region (PreS/S), PreC/C, X, and P. Among them, the large S region is the largest overlapping genes region, occupying one-third of the genome (Fig. 1). It encompasses not only the PreS1/PreS2/S genes, but also in other ORF frame, it encodes the Spacer region as well as the main part of the reverse transcriptase (RT) region of the P gene (Ganem and Varmus, 1987; Tiollais et al., 1985). The PreS1/PreS2/S gene encodes three viral envelope proteins: large, middle, and small S proteins (Ganem and Varmus, 1987;

* Corresponding author at: Medical Research Center, Southwest Hospital, Third Military Medical University, Chongqing 400038, China. Tel.: +86 23 68765463; fax: +86 23 65428041. E-mail address: [email protected] (Y. Li). 1 These two authors contributed equally to the work. 1567-1348/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2009.10.006

Tiollais et al., 1985), which are involved in the invasion to the host. The P gene encodes polymerase with reverse transcriptase and RNase H activity, and encodes Spacer that has not been associated with a specific function (Lin et al., 2001; Radziwill et al., 1990; Tiollais et al., 1985). Therefore, the large S region is a good model used to reveal the relationship between function and evolution. Moreover, overlapping genes region is helpful for evolutionary studies on point mutations because the incidence of recombination is rare and any point mutation would impact two genes at the same time. Recently, relevant studies on different viruses have been reported. The evolution of HBV should be interactional and constrained by the overlap of genes (Mizokami et al., 1997). The evolution of one protein encoded in the overlapping reading frames might be constrained by negative selection, while the other has evolved more rapidly (Jordan et al., 2000), and the overlapping genes might be subject to different selections (Pavesi, 2006). In addition, independent adaptive selection of both overlapping genes has been reported (Zaaijer et al., 2007). However, from the different reports presented above, it is hard to clearly understand the evolutionary mechanism of overlapping genes. Based on phylogenetic analyses, eight genotypes (A–H) have been identified for HBV worldwide, which show a distinctive geographical distribution (Miyakawa and Mizokami, 2003; Schaefer,

D. Zhang et al. / Infection, Genetics and Evolution 10 (2010) 84–88

85

TCAAAAAGTTGCATGGTGCTGG-30 ), under the PCR condition: 95 8C for 2 min and 30 s, followed by 30 cycles of 94 8C for 1 min, 58 8C for 90 s, 72 8C for 3 min and a final extension at 72 8C for 10 min. 2.2.2. Purification of PCR products and DNA sequencing PCR products were purified and concentrated using QIAquick silica columns, with a final concentration of 10 ng/ml for direct sequencing. The sequencing reactions were performed on an ABI 377 machine with the primers in Table 1. Every read was obtained after the sequence of the primer itself was trimmed away. The reads from every sample were assembled into the entire large S region with Phrap software (Gordon et al., 1998). 2.3. Bioinformatic analysis

Fig. 1. The structure of large S overlapping region.

2007). The ninth genotype I was described recently (Tran et al., 2008), but it is doubted as a real genotype due to complex recombination (Kurbanov et al., 2008), so its classification has not been settled yet. The most prevalent genotypes in China are genotypes B and C (Kramvis and Kew, 2005; Zeng et al., 2005). Because HBV evolution has occurred for a period of time since the emergence of the genotypes, the genotypes provide not only relatively isolated populations but also time markers for HBV evolutionary studies (Simmonds, 2001; Twiddy et al., 2003). Here, we focused on the large S region of 132 HBV dominant strains from different patients with chronic HBV infection in both northern and southern China. Based on the divergence of genotypes, we identified genetic markers from point mutations and recreated the historical selection event. In our observation of evolutionary selection during and after genotype emergence, we found more detailed evidence to reveal the evolutionary mechanism of overlapping genes and elucidated the relationship between the independent selection patterns and different functional constraints. 2. Materials and methods 2.1. Serum samples We collected serum samples from 120 HBsAg positive patients with a viral load of more than 1  105 copies/ml, who were from the regions of Shandong (in northern China) and Chongqing (in southern China). In addition to these samples, we selected 12 other sequences of the large S gene from the NCBI database as a complement. 2.2. Sequencing of HBV PCR products 2.2.1. Amplification of full-length HBV genome of the dominant strain The HBV DNA materials were extracted from the above serum samples by QIAamp Ultrasens Virus kit (QIAGEN) according to the manual. Standard PCR was used to amplify the full-length HBV genome as described by Gunther et al. (1995), with TransTaq HiFi DNA Polymerase and primers P1 (50 -CCGGAAAGCTTGAGCTCTTCTTTTTCACCTCTGCCTAATCA-30 ) and P2 (50 -CCGGAAAGCTTGAGCTCT-

The genotypes of the HBV samples were determined by phylogenetic analysis of the large S region using ClustalX 2.0 in combination with Treeview X software tools (Larkin et al., 2007). A set of sequences with known genotypes as described by Magnus served as a reference (Lindh et al., 1997). The SNP sites were counted for minor allele frequency (MAF) more than 2% with our in-house Perl scripts based on the multiple alignment result. The distribution of the substitution rate was calculated using the frequency function in Microsoft Excel. The practical statistics of synonymous and non-synonymous sites were counted with our in-house Perl scripts. Using MEGA 4 software (Tamura et al., 2007), the overall average synonymous and non-synonymous rates (dN/dS) per site were calculated using the model of modified Nei–Gojobori method (Nei and Gojobori, 1986). The codeml module of PAML 4.1 was applied for estimating of the ratio of synonymous and non-synonymous substitutions (dN/ dS) at sites in the overlapping reading frames of P and S separately by maximum-likelihood approximation (Yang, 1997; Yang and Nielsen, 2002). The nested site models (0, 1 and 2) of PAML were run on a PC computer with AMD Athlon 64 X2 dual core processor 5000+ and 2 GB memory under windows XP. Likelihood-ratio tests (LRT) and Bayes Empirical Bayes (BEB) statistics (Yang et al., 2005) were observed. In order to estimate if some nucleotide substitutions consistently appear together with other nucleotide substitutions, the genetic linkage of the loci in the large S region was analyzed by Haploview software (Barrett et al., 2005). 3. Results We obtained 132 sequences measuring 1072 nt in length. The sequence from 1 to 1072 nt in this study corresponded to 2848–3215(0)–704 nt in the HBV genome. In detail, the PreS1/ PreS2 region approximately overlapped with the whole Spacer region, and the small S gene covered most of the RT region of the P gene (Fig. 1). Phylogenetic analysis of the 132 large S regions demonstrated that 81 strains belonged to genotype B and the other 51 strains were of genotype C. The genotypes provided isolated viral populations for the evolutionary perspective study.

Table 1 Sequencing primers. Name

Primer (50 –30 )

Position in genome

Length

Direction

ISP1 SP1 SP2 ISP2

TCACCATATTCTTGGGAACAAGA CTCCAGTTCAGGAACAGTAAACCC CGAGTCTAGACTCTGTGGTAA CGAACCACTGAACAAATGGC

2817–2839 67–90 236–256 685–704

23 24 21 20

Forward Forward Reverse Reverse

86

D. Zhang et al. / Infection, Genetics and Evolution 10 (2010) 84–88

Table 2 The number of mutation sites at different substitution levels.

Table 3 Variant degree (VD) caused by genotype-specific mutations.

Substitution rates

The number of mutation sites in genotype B

The number of mutation sites in genotype C

The number of mutation sites in all strains

40%

134 53 16 1 1 1

135 37 23 3 7 1

81 23 7 5 62 28

3.1. Footprint sites determined using the genetic markers of genotypes After the SNP analysis of 132 sequences of the large S region, 206 mutation sites were detected in this region. The total average substitution rate was about 5.03%. The number of mutation sites at different substitution levels is listed for all strains and for intra-genotype groups in Table 2. Interestingly, there were 90 sites with substitution rates of more than 30% in all strains; however, the substitution rates of 85 sites within this group decreased to 5.03% within each genotype. This observation suggests that these sites are relatively conserved within their local genotypes. Another outstanding feature was the genetic linkage. These nucleotide substitutions consistently appeared together with other nucleotide substitutions (D0 = 0.96  0.05, LOD = 52.71  11.17, r2 = 0.833  0.139). Therefore these genotype markers should be associated with the emergence of different genotypes that carry the historical footprint of evolution. 3.2. Historical selection event during genotype emergence Above genotype-specific mutations represent the historical selection during the emergence of genotype B and C. Fig. 2 shows the different amino acid changes on the S protein and the P protein caused by the 85 genotype-specific point mutations. Overall, the distribution of mutations was uneven in the overlapping region. Fifty-six mutations were located in the 50 half

Functional domain

dN/dS

VD

PreS1 PreS2 Small S Spacer RT

0.5 1.7 2.0 6.0 0.7

9.2% 21.8% 9.7% 25.1% 6.7%

of the region, which caused 42 amino acid changes in the Spacer protein (Fig. 2b, white block), but only 11 amino acid changes in the PreS1 domain and 12 amino acid changes in the PreS2 domain (Fig. 2a, light blue and blue block). Only 29 mutations were located in the 30 half of the region, which caused 18 amino acid changes in the small S protein (Fig. 2a, light green block) and 13 amino acid changes in the RT protein (Fig. 2b, light red block). Two parameters were used to estimate the selection on each of the protein domains, the dN/dS ratio and the variant degree (VD), defined as the number of changed amino acids in the length of each region. Only with a high dN/dS ratio, it was hard to judge whether the Spacer was subject to positive selection or just tolerant to nonsynonymous mutations (Table 3). Meanwhile, most of positively selected sites found by a BEB analysis using PAML were consistence with the genotype-specific sites. Obviously, genotype-specific sites were not simply regarded as a result of positive selection. So these ‘positively selected sites’ in Spacer region of P gene were likely to come from a relaxed selection, but not from a real positive selection. The VD parameter could be used to clarify the result from the comparison of the dN/dS ratio and the BEB model. The Spacer and the PreS2 domain with dispensable functions had more variants (VD > 20%) than the PreS1/S and the RT region (VD < 10%). This finding reflects that fact that VDs are negatively correlated with the importance of the biological function, with a higher VD being indicative of a dispensable role and vice versa. The VD parameter represents the level of functional constraint; therefore, it was more realistic that the Spacer domain and the PreS2 were prone to relaxed selection and that the PreS1, small S and RT genes maintained their essential functions via strict selection.

Fig. 2. Variant patterns of S protein and P protein caused by the footprint loci. The red cell represent the non-synonymous mutant, in which the former amino acid is from genotype B and the latter one is from genotype C. The ‘s’ cell represents the synonymous one. (a) The PreS1 region is the block in light blue, the PreS2 region is the block in dark blue, and the small S region is the block in light green with the a-determinant fragment in light yellow. (b) The Spacer region is the block in white, and the RT region is the block in light red. Because of the gene overlap, each substitution affected the S gene and P gene simultaneously and resulted in different changes in the two proteins. For instance, the first footprint site, at site 28 in the nucleotide sequence, was ‘A’ in genotype B, but ‘C’ in genotype C, this substitution produced a change (K/Q) at amino acid 10 in S protein and another change (K/T) at amino acid 9 in P protein. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.)

D. Zhang et al. / Infection, Genetics and Evolution 10 (2010) 84–88

To further investigate the non-synonymous changes caused by the genotype-specific mutations, 50 complete genomes without indel were extracted from Genbank for each genotype (B and C), and the amino acid sequences of each functional domain (PreS, Spacer, small S and RT-part) were aligned. The statistics of alignment was listed in Supplementary Table 1. As shown (see Supplementary Figs. 1–4), the identity in the first line reflected the divergence of each site in the alignment. Interestingly, those sites with identity of around 50% (the column with about a half height) were conservative in local genotype, which were consistent with the genotype-specific mutations in Fig. 2. These data show that the RT-part was highly conservative, and the small S and the PreS were relatively conservative as well, but Spacer was much loose. Furthermore, the global or pairwise alignment for all the genotypes (A–H) displayed a similar result (data not shown). But PreS domain was an exception, the genotype-specific sites in this region indentified between B and C did not always show the specificity in other genotype pairs. In spite of this, these divergent sites were not superposed in any individual genotype, just could be visible between different genotypes. It suggested that the PreS ever had gone through a positive selection under the restrict selection during the more early evolutionary phase.

87

overlapping region should have been facing the same functional constraints in both genotypes. 3.4. Polymorphism of mutations in the overlapping region In this analysis 93.7% (193/206) of the polymorphic sites had two possible nucleotides, presenting bi-allele states. The most frequent bi-alleles were ‘T/C’ (27.24%) and ‘A/G’ (27.24%), followed by ‘A/C’ (19.84%). To answer why each mutation was generally limited to the bi-allele state we simulated mutations to the other two nucleotides. The result showed that synonymous mutation sites decreased and non-synonymous mutation sites increased, and the average dN/dS ratio increased from 1.76 to 2.47 and 2.79. It was obvious that too many non-synonymous mutations would have much higher probabilities of causing functional and structural changes, thus being unlikely to be preserved if they reduced the fitness of the virus. This implied that limiting mutations to the bi-allele state was essential for keeping the function stable. Meanwhile, it reflected that the genotype-specific mutations were relatively stable, while other mutations occurred randomly in other sites. 3.5. Codon sites of mutations in different ORFs

3.3. The distribution of random mutations in the evolutionary process Except for the 85 footprint mutations, most of the other 121 point mutations occurred randomly rather than in connection with each other and were not responsible for genotype divergence. Among them, 11 mutations appeared only in strains of genotype B and 36 mutations only in those of genotype C. These mutations mostly likely appeared after genotype emergence. The other 74 mutations were found in both of the genotypes; 60 of these mutations had low mutation rates and 14 had high mutation rates. These sites of high mutation might support a sub-genotype, but they are weakly linked together. Next we demonstrated the quantitative distribution of the mutations along the large S region with a sequence window of 104 nt per step (Fig. 3). Like the footprint sites, the random mutations were predominantly located in the 50 half of the region compared to the 30 half of the region. Moreover, the curves of random mutations in the genotypes B and C groups generally fit each other, hinting that the different genotypes of the virus adopted a similar strategy while mutating randomly. Interestingly, the peak of random mutations was located at 208–416 nt, avoiding the peak at 104–208 nt where the footprint sites were located. In addition, there were more random mutations at site 832–936 nt. Either during or after genotype emergence, this

Due to the frame shift of different ORFs, each mutation would affect two different codon sites and would cause different mutations. For the large S gene and the P gene, the first codon site of the S gene (S1) corresponds to the second codon site of the P gene (P2); sites of this sort are denoted as S1P2. Likewise, we define the other two codon sites as S2P3 and S3P1. In the PreS-Spacer overlapping region, the S3P1 mutations accounted for 38.8% of mutations, while S1P2 and S2P3 mutations both accounted for 12.9%. Meanwhile, in the S-RT overlapping region, the S2P3 mutations accounted for 15.3%, the S3P1 for 11.8% and the S1P3 for 8.2%. Therefore, the S3P1 mutations were associated with more chances for synonymous mutations in the PreS1 domain than in the Spacer domain, and the S2P3 mutations led to more non-synonymous mutations in the small S domain than in the RT domain. This reflected the main role that the codon site of the mutation played in regulating different selection patterns of genes in the overlapping region. As for the codon sites of the random mutations, the percentage of S1P2 point mutations increased by approximately 8% in both the PreS-Spacer and S-RT regions, in comparison to the above footprint sites. Because any mutation at a second site for each codon was a non-synonymous mutation, the S1P2 point mutations mainly challenged the P gene. The non-essential Spacer domain could continue to tolerate the change, or alternatively, the RT domain might face uncertain pressures and amino acid changes. 4. Discussion

Fig. 3. Quantitative distribution of mutation sites in the large S region.

Overlapping gene regions are useful for evolutionary studies because they allow comparative analysis of every region in different ORF frames under relatively consistent conditions. In this study, we have analyzed the largest overlapping region of the HBV genome and have deeply revealed relationship of point mutation and evolution. Based on the genotype divergence of genotypes B and C, the linked footprint mutations were first distinguished to investigate historical selection events. From the following dynamic view of the mutation distribution we find that the virus displays a coherent selection pattern during and after genotype emergence (Fig. 3). This pattern suggests a relatively stable and consistent strategy throughout the evolutionary process. Several features of the evolution of HBV have been clarified in our study. First, the selection of each functional domain of the

88

D. Zhang et al. / Infection, Genetics and Evolution 10 (2010) 84–88

overlapping genes is independent and different. In spite of the interaction of the two ORF frames, the mutations can lead to different VDs in the two proteins. Second, the distribution of mutations was uneven in the overlapping region. The number of mutations mainly depends on whether the domain is prone to a relaxed selection or a strict selection. The dN/dS (or Ka/Ks) ratio is always used to estimate molecular sequence evolution. As synonymous (silent) mutations are largely invisible to natural selection, non-synonymous (amino acidreplacing) mutations may be under strong selective pressure. But according to a high dN/dS ratio, it is hard to interpret whether the Spacer domain of HBV genome has been subject to positive selection. Neither of the genotypes has been naturally eliminated, and both of them are able to infect the same patient. Given this, the Spacer domain, in a dispensable role, is more likely to lack functional constraint and to tolerate non-synonymous mutations, presenting biological diversity. Therefore, to avoid possible reversed result from the dN/dS (or Ka/Ks) ratio, the biological background of the object should be considered as well. In contrast to the dN/dS ratio, the VD value, by counting the amino acid changes in a length of region, is a direct parameter to evaluate the relationship between functional constraint and evolutionary selection. The low VD of the PreS1 domain maintains the domain’s stable binding to the potential host receptor, which is the crucial step in HBV’s entry into a hepatocyte (Neurath et al., 1986; Pontisso et al., 1986). The RT domain with a low VD maintains the essential function of the polymerase for viral replication (Lin et al., 2001; Radziwill et al., 1990; Tiollais et al., 1985), in agreement with key domains in polymerases of many other microbes (Kemdirim et al., 1986; Lin et al., 2001; Wei et al., 2008). In contrast, the Spacer and the PreS2 domains have high VDs, which are related to their dispensable roles. As reported, the Spacer could be deleted, to a large extent, without reducing much of the activity of the endogenous polymerase (Radziwill et al., 1990). In addition, HBV particles lacking preS2 envelope protein expression maybe remain infectious (Brind et al., 1997). Taken together, different functions of the overlapping genes essentially determine the different evolutionary selection patterns. This finding reveals that functional constraint is the basic mechanism for the viral evolution. Due to the tolerance of the Spacer domain, many mutations have been conserved in the PreS1-Spacer overlapping region, but the S3P1 codon site predominately maintains a low VD in the PreS1 domain. Moreover, we found that random mutations challenge the P gene, especially the RT domain. This might be related to the RT domain facing new pressures from antiretroviral therapies. We noticed that the polymorphisms in the studied region mainly comply with a bi-allelic state in order to minimize lethal mutations, which also explains why random mutations preferentially occur at new sites rather than in footprint sites. The result is that the accumulating point mutations display at new sites rather than at the original mutation sites. Therefore, we propose that the genotypespecific sites should be distinguished before studying evolutionary speed or the effect of external factors on the mutations. Acknowledgement This work was supported partially by the Science and Technology Major Projects of ‘AIDS and viral hepatitis prevention and treatment of major infectious diseases’ under Award Number 2009ZX10004-109. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2009.10.006.

References Barrett, J.C., Fry, B., Maller, J., Daly, M.J., 2005. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics (Oxford, England) 21, 263–265. Brind, A., Jiang, J., Samuel, D., Gigou, M., Feray, C., Brechot, C., Kremsdorf, D., 1997. Evidence for selection of hepatitis B mutants after liver transplantation through peripheral blood mononuclear cell infection. Journal of Hepatology 26, 228–235. Ganem, D., Varmus, H.E., 1987. The molecular biology of the hepatitis B viruses. Annual Review of Biochemistry 56, 651–693. Gordon, D., Abajian, C., Green, P., 1998. Consed: a graphical tool for sequence finishing. Genome Research 8, 195–202. Gunther, S., Li, B.C., Miska, S., Kruger, D.H., Meisel, H., Will, H., 1995. A novel method for efficient amplification of whole hepatitis B virus genomes permits rapid functional analysis and reveals deletion mutants in immunosuppressed patients. Journal of Virology 69, 5437–5444. Jordan, I.K., Sutter, B.A.T., McClure, M.A., 2000. Molecular evolution of the Paramyxoviridae and Rhabdoviridae multiple-protein-encoding P gene. Molecular Biology and Evolution 17, 75–86. Kemdirim, S., Palefsky, J., Briedis, D.J., 1986. Influenza B virus PB1 protein; nucleotide sequence of the genome RNA segment predicts a high degree of structural homology with the corresponding influenza A virus polymerase protein. Virology 152, 126–135. Kramvis, A., Kew, M.C., 2005. Relationship of genotypes of hepatitis B virus to mutations, disease progression and response to antiviral therapy. Journal of Viral Hepatitis 12, 456–464. Kurbanov, F, Tanaka, Y., Kramvis, A., Simmonds, P., Mizokami, M., 2008. When should ‘‘I’’ consider a new hepatitis B virus genotype? Journal of Virology 82, 8241–8242. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G., 2007. Clustal W and Clustal X version 2.0. Bioinformatics (Oxford, England) 23, 2947–2948. Lin, X., Yuan, Z.H., Wu, L., Ding, J.P., Wen, Y.M., 2001. A single amino acid in the reverse transcriptase domain of hepatitis B virus affects virus replication efficiency. Journal of Virology 75, 11827–11833. Lindh, M., Andersson, A.S., Gusdal, A., 1997. Genotypes, nt 1858 variants, and geographic origin of hepatitis B virus—large-scale analysis using a new genotyping method. The Journal of Infectious Diseases 175, 1285–1293. Miyakawa, Y., Mizokami, M., 2003. Classifying hepatitis B virus genotypes. Intervirology 46, 329–338. Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J.Y., Gojobori, T., 1997. Constrained evolution with respect to gene overlap of hepatitis B virus. Journal of Molecular Evolution 44 (Suppl 1), S83–S90. Nei, M., Gojobori, T., 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution 3, 418–426. Neurath, A.R., Kent, S.B., Parker, K., Prince, A.M., Strick, N., Brotman, B., Sproul, P., 1986. Antibodies to a synthetic peptide from the preS 120-145 region of the hepatitis B virus envelope are virus neutralizing. Vaccine 4, 35–37. Pavesi, A., 2006. Origin and evolution of overlapping genes in the family Microviridae. The Journal of General Virology 87, 1013–1017. Pontisso, P., Schiavon, E., Fraiese, A., Pornaro, E., Realdi, G., Alberti, A., 1986. Antibody to the hepatitis B virus receptor for polymerized albumin in acute infection and in hepatitis B vaccine recipients. Journal of Hepatology 3, 393–398. Radziwill, G., Tucker, W., Schaller, H., 1990. Mutational analysis of the hepatitis B virus P gene product: domain structure and RNase H activity. Journal of Virology 64, 613–620. Schaefer, S., 2007. Hepatitis B virus taxonomy and hepatitis B virus genotypes. World Journal of Gastroenterology 13, 14–21. Simmonds, P., 2001. Reconstructing the origins of human hepatitis viruses. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 356, 1013–1026. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24, 1596–1599. Tiollais, P., Pourcel, C., Dejean, A., 1985. The hepatitis B virus. Nature 317, 489–495. Tran, T.T., Trinh, T.N., Abe, K., 2008. New complex recombinant genotype of hepatitis B virus identified in Vietnam. Journal of Virology 82, 5657–5663. Twiddy, S.S., Holmes, E.C., Rambaut, A., 2003. Inferring the rate and time-scale of dengue virus evolution. Molecular Biology and Evolution 20, 122–129. Wei, D., Yang, B., Li, Y.L., Xue, C.F., Chen, Z.N., Bian, H., 2008. Characterization of the genome sequence of an oncolytic Newcastle disease virus strain Italien. Virus Research. Yang, Z., 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences 13, 555–556. Yang, Z., Nielsen, R., 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular Biology and Evolution 19, 908–917. Yang, Z., Wong, W.S., Nielsen, R., 2005. Bayes empirical bayes inference of amino acid sites under positive selection. Molecular Biology and Evolution 22, 1107–1118. Zaaijer, H.L., van Hemert, F.J., Koppelman, M.H., Lukashov, V.V., 2007. Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus. The Journal of General Virology 88, 2137–2143. Zeng, G., Wang, Z., Wen, S., Jiang, J., Wang, L., Cheng, J., Tan, D., Xiao, F., Ma, S., Li, W., Luo, K., Naoumov, N.V., Hou, J., 2005. Geographic distribution, virologic and clinical characteristics of hepatitis B virus genotypes in China. Journal of Viral Hepatitis 12, 609–617.