当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 病菌学杂志 > 2005年 > 第3期 > 正文
编号:11201785
Semen-Specific Genetic Characteristics of Human Im
http://www.100md.com 病菌学杂志 2005年第3期
     University of California, San Diego, La Jolla

    Veterans Administration, San Diego Healthcare System, San Diego, California

    ABSTRACT

    Human immunodeficiency virus type 1 (HIV-1) in the male genital tract may comprise virus produced locally in addition to virus transported from the circulation. Virus produced in the male genital tract may be genetically distinct, due to tissue-specific cellular characteristics and immunological pressures. HIV-1 env sequences derived from paired blood and semen samples from the Los Alamos HIV Sequence Database were analyzed to ascertain a male genital tract-specific viral signature. Machine learning algorithms could predict seminal tropism based on env sequences with accuracies exceeding 90%, suggesting that a strong genetic signature does exist for virus replicating in the male genital tract. Additionally, semen-derived viral populations exhibited constrained diversity (P < 0.05), decreased levels of positive selection (P < 0.025), decreased CXCR4 coreceptor utilization, and altered glycosylation patterns. Our analysis suggests that the male genital tract represents a distinct selective environment that contributes to the apparent genetic bottlenecks associated with the sexual transmission of HIV-1.

    INTRODUCTION

    Most human immunodeficiency virus (HIV) transmission events globally occur via mucosal exposure to male genital secretions carrying the virus (34, 46). Although the risk of sexual HIV transmission correlates with the amount of virus present in the blood of the source partner (36), the correlation between the viral load in the blood and genital compartment is inconsistent (3, 23, 24). The biological determinants that influence the transmissibility of different viral variants from within the genital tract of the HIV-infected source are still incompletely understood. Since transmitted virus represents the initial virus that the immune system encounters, the understanding of its composition will be critical in our attempts to develop a successful HIV vaccine (1, 7, 54).

    HIV in each chronically infected person exists as a diverse population of related genetic variants (5, 12, 20). Anatomic compartmentalization of these variants has been described in blood, lung, central nervous system, and genital tract (10, 16, 17, 20, 21, 32, 41, 50, 53). Male genital tract tissues (e.g., the prostate, seminal vesicles, and epididymis) serve as sites of viral replication and are likely to differ from peripheral tissues in immunological surveillance, target cell characteristics, and efficiencies of drug penetration (10, 17, 43). Virus replicating within the male genital tract could therefore develop distinct, compartment-specific characteristics in response to these local selective pressures (10, 16, 17, 20, 21, 32, 41, 50, 53). Although genetic differences between blood- and semen-derived HIV in an individual have been documented, a seminal signature sequence remains elusive (6, 10). This failure to identify a signature sequence could be attributable to the fact that previous efforts mainly focused on proviral DNA sequences, which often represent archival viral genotypes rather than contemporary, actively replicating variants (4, 44).

    We investigated viral genetics and compartmentalization within the male genital tract by applying a battery of computational techniques to paired semen- and blood-derived HIV-1 RNA env sequences. Our results suggest that the male genital tract can represent a legitimate viral compartment, although this compartmentalization is not absolute. Furthermore, when viral migration between blood plasma and the male genital tract is minimal and infrequent, there are several distinct genetic features associated with semen-derived HIV variants. Understanding these tissue-specific properties of HIV type 1 (HIV-1) will likely be crucial for the development of an effective vaccine.

    MATERIALS AND METHODS

    Sequence data. All of the semen-derived HIV-1 env sequences from the Los Alamos National Lab HIV Sequence Database with accompanying subject identification were downloaded. Blood-derived sequences from the same individuals were downloaded; semen sequences without matching blood data were removed from the set. GenBank database accession numbers included in our analysis are AF098718 to AF098734, AF256230 to AF256465, AF373037 to AF373043, AF535219 to AF535859, AY005164 to AY005179, U00821 to U00843, U13381 to U13388, and U96502 to U96608. Duplicates, sequences derived by direct PCR sequencing, proviral DNA sequences, and nonfunctional open reading frames (containing frameshifts, premature stop codons, etc.) were deleted. The final set consisted of 659 env C2-V3 RNA sequences (spanning HXB2 coordinates 799 to 1410) from a total of 12 patients (376 plasma and 283 semen samples).

    Phylogenetic reconstruction. Initial multiple sequence alignments were generated by using Multalin (8), with default gap parameters and the DNA 5-0 substitution matrix. Subsequent manual aligning was performed by using the Se-Al sequence alignment editor (37). Phylogenies describing sequences from each individual host were built by using FastDNAml (30), estimating base frequencies from the data and a transition/transversion ratio of 2.0. All diversity and divergence measurements were calculated by using dnadist (14). The absolute rate of molecular evolution (molecular clock) was estimated by running TipDate (38) on maximum likelihood phylogenies with dated tips. A master tree describing the entire data set was built by implementing dnadist and neighbor within the PHYLIP version 3.5c software package (14) by using the F84 model, gamma distributed rates across sites, and a transition/transversion ratio of 2.0. Trees were viewed with TreeView X (31).

    Evaluation of compartmentalization. The degree of segregation between compartments was assessed by testing for panmixis by using gene phylogenies (18, 42) as implemented in the MacClade program (Sinauer, Sunderland, Mass.). In brief, the minimum possible number of intercompartment migration events was tallied, based on the maximum likelihood trees for each individual subject's C2-V3 sequences and their characterization according to compartment of origin. This result was compared to the distribution of migration events for 1,000 randomly generated trees. Evidence of restricted gene flow (compartmentalization) was documented when <1% of the random trees required the same or fewer number of migration events as for the sample data (29).

    Machine learning classification. A machine learning approach was employed to look for a tissue-specific genetic signature. All classification experiments in this analysis were conducted by using WEKA (Waikato environment for knowledge analysis), an open source collection of data processing and machine learning algorithms (49). The J48 decision tree inducer, based on the C4.5 algorithm (35) was implemented with the parameter "MinNumObj" set at a value of 7 to limit the complexity of theories and minimize the risk of overfitting. Classifiers were evaluated by using 100 iterations of stratified 10-fold cross-validation, a procedure designed to reflect the performance of classification models on novel data sets. For each of 100 trials, the data set was randomly divided into 10 groups of approximately equal size and class distribution. For each "fold," the classifier was trained by using all but 1 of the 10 groups and then tested on the unseen group. This procedure was repeated for each of the 10 groups. The cross-validation score for one trial was the average performance across each of the 10 training runs. The reported score is the average across the 100 trials (49). In addition, we have reported the true positive rate (TPR) and precision for these classification experiments: TPR = [number of true positives/(number of true positives + number of false negatives)]; precision = [number of true positives/(number of true positives + number of false positives)].

    Analysis of selection. A maximum likelihood method was used to detect and quantify positive and negative selection. All data sets were first evaluated by using a model selection procedure (22) to identify and correct for strong nucleotide substitution biases which are ubiquitous in HIV. The fixed-effects likelihood (FEL) approach (22) was employed to test for selective pressure at a given site. Maximum likelihood estimates of branch lengths and nucleotide substitution rate parameters were derived from the entire alignment. A full codon model, using a modified MG94 (28) rate matrix with site-specific instantaneous synonymous (alphas) and nonsynonymous (betas) rates was then fitted independently to every codon position in the data, under two hypotheses: H_0, neutral evolution (alphas equal betas); H_A, nonneutral evolution (alphas and betas are free to vary independently).

    When the hypothesis of neutrality was rejected at site s, it was called positively selected if betas was estimated to be greater than alphas. The FEL method was implemented on a cluster of computers by using the HyPhy package (22).

    Coreceptor usage prediction. A support vector machine-based method was employed to predict the coreceptor usage of viruses based on the V3 loop amino acid sequence (33). This method is highly reliable and is reported to predict CXCR4 usage with a specificity of 93% (19). The coreceptor classifier is available for public use at: http://genomiac2.ucsd.edu:8080/wetcat/tropism.html.

    Glycosylation. GlycoTracker.pl (S. Pillai, unpublished data) was used to identify N-linked glycosylation sites within each sequence. The Perl script provides a tally of all sequons, along with their respective locations (numbered according to HXB2 gp160). We compared the extent and distribution of N-linked glycosylation across the C2-V3 region in both compartments by identifying NXS and NXT (where X is some other residue) motifs in plasma- and semen-derived sequences (25). All statistical comparisons were performed by using a Wilcoxon Mann-Whitney test (11).

    Codon usage analysis. The general codon usage analysis (GCUA) package was implemented to look for compartment-specific codon usage biases (26).

    RESULTS

    Compartmentalization of semen-derived virus. To determine if the male genital tract represents a viral compartment, we used systematic phylogenetic comparison of matched blood- and semen-derived HIV-1 RNA env sequences from 12 individuals. We hypothesized that if the male genital tract is indeed a viral compartment, semen-derived sequences within each individual should cluster independently, while exhibiting similar levels of diversity and divergence as matching plasma sequences given comparable effective population sizes (29). Maximum likelihood trees describing contemporaneous variants from both tissues revealed that the male genital tract represented a distinct virologic compartment in six individuals (identified as A to F) (Fig. 1a; see Fig. S1 in the supplemental material), based on phylogenetic segregation between blood and semen virus. In five of the individuals, sequences did not cluster with respect to compartment (Fig. 1b; see Fig. S3 in the supplemental material). In one individual, G, there were longitudinal data that showed compartmentalization at the earlier time points but then apparent panmixis at later time points (see Fig. S2 in the supplemental material). In accordance with previous reports, a neighbor-joining tree comprising pooled data from all compartmentalized patients revealed that host, rather than compartment of origin, was the strongest phylogenetic determinant (see Fig. S4 in the supplemental material).

    Genetic diversity in plasma- and semen-derived viral populations. Genetic diversity was characterized by calculating the average pairwise distance within a population, based on distance measurements obtained by using the F84 matrix. Data across multiple time points were pooled when available. Individuals with phylogenetically distinct virus in blood and semen consistently exhibited lower genetic diversity in semen-derived viral populations (P < 0.01 by a paired Wilcoxon test). Conversely, individuals with noncompartmentalized virus failed to demonstrate any significant differences in viral diversity between tissues (Fig. 2).

    Analysis of longitudinal sequence data. Longitudinal sequence data spanning multiple years were available for five individuals (identified as F, G, I, J, and K). We first evaluated tissue-specific longitudinal genetic diversity in these individuals by computing average pairwise genetic distances for each time point where blood and semen sequences were available. The longitudinal data reinforced our aforementioned results; individual F, characterized by compartmentalized virus at all available time points, exhibited constrained viral diversity in semen throughout the 2-year monitored period (Fig. 3a). Individual G, who transitioned from compartmentalized to noncompartmentalized virus, showed considerable variation in tissue-specific diversity; semen diversity bounced between being greater and less than contemporaneous plasma diversity, in accordance with inconsistent trafficking between these tissues. Individuals I, J, and K were consistently characterized by noncompartmentalized virus and exhibited similar levels of viral diversity in blood and semen at nearly all sample points (see Fig. S5 in the supplemental material).

    We next looked at longitudinal divergence in these five individuals, by calculating the average genetic distance from sequences at each time point to an artificial, tissue-specific baseline consensus sequence. On average, the observed level of divergence was comparable across tissues in individuals with both compartmentalized and noncompartmentalized virus, consistent with actively replicating viral populations in both blood and male genital tract (see Fig. S5 in the supplemental material). We also calculated the divergence between blood- and semen-derived virus by computing the average genetic distance between these populations at each time point. Individual F as expected demonstrated continually increasing divergence between tissue-specific populations, most probably due to a combination of genetic drift and compartment-specific viral adaptation. Intercompartment genetic distance exceeded 5% at the last available sample point (Fig. 3b). Individual G showed declining intercompartment divergence at each time point, mirroring the increased contribution of systemic virus to the seminal viral population. Divergence steadily diminished from approximately 8% at the onset to 2% at the final sampling time. Finally, hosts I, J, and K characterized by noncompartmentalized virus maintained low levels of intercompartment divergence throughout the monitored period; distances stayed below 2% at nearly all time points (see Fig. S5 in the supplemental material).

    Estimation of molecular clock. We used dated maximum likelihood phylogenies of sequences from host F, the only individual with compartmentalized virus and with available longitudinal data, to compare the viral molecular clock between plasma and semen. The estimated absolute rates of molecular evolution based on these phylogenies were 0.01004877 and 0.00637917 substitutions/site/year for plasma- and semen-derived sequences, respectively.

    Semen-specific env genetic signature. Although phylogenetic evidence suggests that semen- and blood-derived viruses from a given host are more closely related to each other than to virus from corresponding tissues in other individuals, semen-derived viruses may still share genetic characteristics across individuals due to tissue-specific selective pressures that are common across hosts. We employed a machine learning approach (27, 33, 39) to identify a genetic signature associated with seminal tropism. The J48 decision tree inducer (based on the C4.5 algorithm) used in our analysis has been relied on extensively as an alternative to traditional discriminant analysis, due largely to its capacity to detect and exploit interactions between feature variables in training data sets (27). We first applied this algorithm to classify env sequences from all individuals based on tissue of origin. The training data for this experiment drew samples from the entire available sequence set, consisting of 376 plasma sequences and 283 from semen. Our results (Table 1) indicate that in this first classification only 65% of sequences were classified correctly, and seminal tropism was predicted with a true positive rate of 0.48.

    It is likely that a lack of apparent viral compartmentalization is due to persistent trafficking between blood and semen. To determine if these low scores were due to the presence of viral sequence data classified as semen-derived that actually represented a recent introgression of plasma virus into the male genital tract, we purged the training set of all data associated with noncompartmentalized hosts. We retained the sequence data from individual G at compartmentalized time points. This pruned set consisted of 143 plasma sequences and 122 from semen. Our results for this second trial (Table 1) demonstrate a strong genetic signature associated with semen-derived sequences; 82% of sequences were classified accurately based on tissue of origin, and seminal tropism was predicted with a precision of 0.842 and a TPR of 0.818 (well over 90% of sequences were classified accurately when the entire training set was used for testing). It is important to point out that the cross-validation procedure used to evaluate this model is quite conservative; the classifier is always tested on a subset of the sequence data that it did not encounter during the training process. The signature underlying seminal tropism comprises a total of four positions within the C2-V3 region (numbered from the start of HXB2 gp160): 270, 291, 387, and 464 (Fig. 4; see Fig. S6 in the supplemental material). The bulk of the signature focuses on either the amino acid character at position 464 or its immediate linkage with a single other env residue.

    Identification of positively selected sites. We used a maximum likelihood approach to identify sites within env that were under positive selection in both compartments, focusing on individuals with compartmentalized virus. We sought to determine if the overall extent of selection and the array of sites under selection varied between compartments, consistent with our finding of a male genital tract-specific genetic signature. Sequence data from hosts A to G (including only data from the initial compartmentalized points associated with subject G) were first individually evaluated on a per compartment basis by using a model selection procedure to account for any existing mutational biases. Next the FEL approach (22) was employed to test for selective pressure at a given site. All sites in both compartments that appeared to be under positive selection were cataloged and compared. The number of positively selected sites was universally lower in semen-derived viral populations (P < 0.01 by a paired Wilcoxon test) (Table 2). Four out of seven individuals failed to exhibit positive selection at any sites within the C2-V3 region in their seminal virus. Additionally, in most cases the sites determined to be under positive selection varied between compartments. Only 3 out of 10 sites identified in seminal populations were also positively selected in corresponding plasma populations (Table 2).

    N-linked glycosylation in plasma- and semen-derived viral populations. To investigate variation in selection pressure from the neutralizing antibody response, we examined glycoslyation patterns across the viral envelope (48). If the antibody response is attenuated in the male genital tract, we might expect fewer glycosylation sites within semen-derived viral sequences. If the response is equivalent, but targeting different epitopes, we might expect a reassortment of sites though the overall number may remain constant. Our results demonstrate that the extent of glycosylation differs significantly in six out of seven patients characterized by compartmentalized virus, but the direction of the discrepancy is inconsistent (P < 0.05 for six intrapatient comparisons; Mann-Whitney test). Individuals A, E, and G have higher average numbers of sequons in semen-derived sequences, while the opposite condition holds true for individuals C, D, and F (Fig. 5).

    The distribution of glycosylation sites over time was tracked in the two individuals with compartmentalized virus and with associated longitudinal sequence data. Semen-derived sequences from individual F gradually acquired a single additional sequon at a site (position 411) that was never glycosylated in plasma populations. Plasma sequences demonstrated a continual reassortment of sites with negligible fluctuation in overall number, in accordance with the notion of an evolving "glycan shield" (48). Individual G exhibited a gradual increase in net number of glycosylation sites in both seminal and plasma-derived env sequences, with little reassortment in either compartment.

    Prediction of coreceptor usage. We predicted the chemokine receptor preference for all sequences derived from patients with compartmentalized virus to determine if seminal tropism was correlated with altered coreceptor usage. Our results suggest that a trend towards reduced CXCR4 usage in the male genital tract exists, although it is not statistically significant due to the rarity of the CXCR4 phenotype across individuals and compartments; only three out of seven hosts harbored variants predicted to use the CXCR4 receptor (Fig. 6).

    Evaluation of codon usage bias. It has previously been reported that the differential availability of nucleotide precursor pools in target cells may influence HIV-1 codon usage patterns. Additionally, the cytidine deaminase APOBEC3G, found in lymphocytes, induces G to A mutations that skew codon usage towards A-rich triplets (51). If viral target cells within the male genital tract differ from peripheral tissues in precursor frequencies and APOBEC3G expression levels, an altered codon usage bias may evolve in seminal virus. Our analysis revealed no significant differences in codon usage between blood and semen virus (data not shown).

    DISCUSSION

    In these investigations we applied a battery of computational techniques to paired semen- and blood-derived HIV-1 env sequences, which confirmed previous reports that HIV within the genital tract is different from that within the bloodstream (10, 20). This study extends those observations with findings important to the understanding of how HIV adapts to the male genital tract. First, the male genital tract can function as a viral compartment, but the extent of compartmentalization differs between individuals and within individuals over time. Second, there are discordant selective pressures operating in the male genital tract and blood. Third, semen-derived viruses share a genetic signature across individuals due to tissue-specific selective pressures that are common across hosts.

    Viral compartments are characterized by a restriction of gene flow between cells or tissues, usually identified by phylogenetic analysis (29). In this study, viral compartmentalization between blood and the male genital tract was identified in 6 out of 12 individuals, and another individual demonstrated compartmentalization of virus only at the earliest sampling times. Viral migration between blood plasma and the male genital tract was minimal and infrequent in these individuals, which reinforces the concept that a significant fraction of virus shed in semen is produced locally in the male genital tract. Furthermore, there was a lower genetic diversity and rate of molecular evolution in seminal sequences, probably reflecting a lower effective population size within the male genital tract. This lower effective population size may contribute to the genetic bottleneck associated with HIV-1 transmission. We cannot exclude the possibility, however, that sampling issues contributed to this phenomenon; the efficiency of RNA extraction and reverse transcription-PCR may be lower in semen than plasma, increasing the potential for resampling.

    The degree of compartmentalization varied among individuals and also within individuals over time. This may explain the observations of intermittent viral shedding in the semen of HIV-infected men (15, 47) and the increased viral shedding when the urethra is inflamed by concomitant bacterial or viral infection (40). Local inflammation is a likely explanation for increased trafficking of HIV from the circulation to the genital compartment. Future studies examining the relationship between sexually transmitted infections and seminal viral loads may provide valuable insight into viral adaptation and dynamics within the male genital tract. This understanding could be crucial in the development of methods to interrupt HIV transmission such as vaccines, microbicides, and antiretroviral suppression.

    Seeding of genital tissues occurs very early in infection before the development of any anti-HIV immune response (13). Once the host mounts an anti-HIV immune response, it most likely varies in strength and nature between compartments (29). We investigated the degree of selection on the virus within the two compartments and found that there was greater positive selection on virus in the blood than virus in the male genital tract. In six out of the seven individuals with compartmentalized virus, there were highly significant differences in env glycosylation but not in a consistent direction. While this reinforces the theory that virus is produced locally in the male genital tract and responds to local humoral immunity, it does not explain the recent reports that HIV transmission through heterosexual exposure involves viruses with fewer envelope glycans (11).

    Since cellular tropism may also play a role in viral compartmentalization and adaptation to the male genital tract, we investigated the coreceptor usage of viruses in blood and semen. It is provocative that in all individuals who harbored CXCR4-using viruses, these viruses were underrepresented in the genital tract. Selection favoring R5 variants in the male genital tract may explain the observation that newly infected individuals are disproportionately infected with CCR5-using viruses (54, 55).

    Although HIV within the male genital tract is often different from that within the bloodstream (10, 17, 32), the initially infecting virus (founding virus) and the individual's immune responses determine viral genetics more than tissue of origin (29). Therefore, it has been difficult to determine if semen-derived virus shares common genetic characteristics among individuals (10). Using machine learning techniques, we have found that semen-derived HIV-1 has a strong genetic signature among individuals with compartmentalized virus. The signature comprises several positions across C2-V3; however, the residue at position 464 appears to be the most critical in determining viral tropism to the male genital tract. This particular position, to the best of our knowledge, has not previously been reported within the context of tissue tropism or viral compartmentalization. Nevertheless, this classification trial presents convincing evidence that the male genital tract environment selects for similar, predictable genetic changes in env across individuals.

    The male genital tract has been characterized as a reservoir (43, 52), a compartment (10), and a drug sanctuary (45). All have significant implications for preventing the transmission of HIV by using various theoretical methods such as microbicides, vaccines, or antiretroviral therapy (2, 9, 10). Our investigations uniquely detail the viral compartmentalization dynamics and differing selection pressures between the blood and male genital tract and document a specific genetic signature of virus compartmentalized in the male genital tract. Taken together, these data offer important insights into the adaptation of HIV to the male genital tract, which may be valuable in the rational design of an effective vaccine.

    ACKNOWLEDGMENTS

    We are grateful to Susan Little and Simon Frost for their insightful comments. We also thank Brian Gaschen for assistance with assimilating the sequence data, John Day for his technical expertise, and Darica Smith and Sharon Wilcox for helping with the preparation of the manuscript.

    This work was supported by grants 5K23AI055276, AI27670, AI38858, AI43638, AI43752, AI36214 (UCSD Center for AIDS Research), AI29164, and AI047745 from the National Institutes of Health. Additional support was provided by the Research Center for AIDS and HIV Infection of the San Diego Veterans Affairs Healthcare System.

    Supplemental material for this article may be found at http://jvi.asm.org/.

    REFERENCES

    Altfeld, M., E. S. Rosenberg, R. Shankarappa, J. S. Mukherjee, F. M. Hecht, R. L. Eldridge, M. M. Addo, S. H. Poon, M. N. Phillips, G. K. Robbins, P. E. Sax, S. Boswell, J. O. Kahn, C. Brander, P. J. Goulder, J. A. Levy, J. I. Mullins, and B. D. Walker. 2001. Cellular immune responses and viral diversity in individuals treated during acute and early HIV-1 infection. J Exp. Med. 193:169-180.

    Auvert, B., S. Males, A. Puren, A. Taljaard, M. Carael, and B. Williams. 2004. Can highly active antiretroviral therapy reduce the spread of HIV?: a study in a township of South Africa. J. Acquir. Immune Defic. Syndr. 36:613-621.

    Chakraborty, H., P. K. Sen, R. W. Helms, P. L. Vernazza, S. A. Fiscus, J. J. Eron, B. K. Patterson, R. W. Coombs, J. N. Krieger, and M. S. Cohen. 2001. Viral burden in genital secretions determines male-to-female sexual transmission of HIV-1: a probabilistic empiric model. AIDS 15:621-627.

    Chun, T.-W., L. Carruth, D. Finzi, X. Shen, J. A. DiGiuseppe, H. Taylor, M. Hermankova, K. Chadwick, J. Margolick, T. C. Quinn, Y.-H. Kuo, R. Brookmeyer, M. A. Zeiger, P. Barditch-Crovo, and R. F. Siliciano. 1997. Quantification of latent tissue reservoirs and total body viral load in HIV-1 infection. Nature 387:183-188.

    Coffin, J. M. 1995. HIV population dynamics in vivo: implications for genetic variation, pathogenesis and therapy. Science 267:483-489.

    Coombs, R. W., P. S. Reichelderfer, and A. L. Landay. 2003. Recent observations on HIV type-1 infection in the genital tract of men and women. AIDS 17:455-480.

    Coombs, R. W., C. E. Speck, J. P. Hughes, W. Lee, R. Sampoleo, S. O. Ross, J. Dragavon, G. Peterson, T. M. Hooton, A. C. Collier, L. Corey, L. Koutsky, and J. N. Krieger. 1998. Association between culturable human immunodeficiency virus type 1 (HIV-1) in semen and HIV-1 RNA levels in semen and blood: evidence for compartmentalization of HIV-1 between semen and blood. J. Infect. Dis. 177:320-330.

    Corpet, F. 1988. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16:10881-10890.

    Davis, C. W., and R. W. Doms. 2004. HIV transmission: closing all the doors. J. Exp. Med. 199:1037-1040.

    Delwart, E. L., J. I. Mullins, P. Gupta, G. H. Learn, Jr., M. Holodniy, D. Katzenstein, B. D. Walker, and M. K. Singh. 1998. Human immunodeficiency virus type 1 populations in blood and semen. J. Virol. 72:617-623.

    Derdeyn, C. A., J. M. Decker, F. Bibollet-Ruche, J. L. Mokili, M. Muldoon, S. A. Denham, M. L. Heil, F. Kasolo, R. Musonda, B. H. Hahn, G. M. Shaw, B. T. Korber, S. Allen, and E. Hunter. 2004. Envelope-constrained neutralization-sensitive HIV-1 after heterosexual transmission. Science 303:2019-2022.

    Drew, W. L., R. C. Miner, D. F. Busch, S. E. Follansbee, J. Gullett, S. G. Mehalko, S. M. Gordon, W. F. Owen, Jr., T. R. Matthews, W. C. Buhles, and B. DeArmond. 1991. Prevalence of resistance in patients receiving ganciclovir for serious cytomegalovirus infection. J. Infect. Dis. 163:716-719.

    Dyer, J. R., B. L. Gilliam, J. J. Eron, Jr., M. S. Cohen, S. A. Fiscus, and P. L. Vernazza. 1997. Shedding of HIV-1 in semen during primary infection. AIDS 11:543-545.

    Felsenstein, J. 1993. PHYLIP-phylogeny inference package, version 3.5c. University of Washington, Seattle, Washington.

    Fiscus, S. A., P. L. Vernazza, B. Gilliam, J. Dyer, J. J. Eron, and M. S. Cohen. 1998. Factors associated with changes in HIV shedding in semen. AIDS Res. Hum. Retrovir. 14(Suppl. 1):S27-S31.

    Gunthard, H. F., D. V. Havlir, S. Fiscus, Z. Q. Zhang, J. Eron, J. Mellors, R. Gulick, S. D. Frost, A. J. Brown, W. Schleif, F. Valentine, L. Jonas, A. Meibohm, C. C. Ignacio, R. Isaacs, R. Gamagami, E. Emini, A. Haase, D. D. Richman, and J. K. Wong. 2001. Residual human immunodeficiency virus (HIV) type 1 RNA and DNA in lymph nodes and HIV RNA in genital secretions and in cerebrospinal fluid after suppression of viremia for 2 years. J. Infect. Dis. 183:1318-1327.

    Gupta, P., C. Leroux, B. K. Patterson, L. Kingsley, C. Rinaldo, M. Ding, Y. Chen, K. Kulka, W. Buchanan, B. McKeon, and R. Montelaro. 2000. Human immunodeficiency virus type 1 shedding pattern in semen correlates with the compartmentalization of viral quasispecies between blood and semen. J. Infect. Dis. 182:79-87.

    Hudson, R. R., M. Slatkin, and W. P. Maddison. 1992. Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589.

    Jensen, M. A., and A. B. van't Wout. 2003. Predicting HIV-1 coreceptor usage with sequence analysis. AIDS Rev. 5:104-112.

    Kemal, K. S., B. Foley, H. Burger, K. Anastos, H. Minkoff, C. Kitchen, S. M. Philpott, W. Gao, E. Robison, S. Holman, C. Dehner, S. Beck, W. A. Meyer, III, A. Landay, A. Kovacs, J. Bremer, and B. Weiser. 2003. HIV-1 in genital tract and plasma of women: compartmentalization of viral sequences, coreceptor usage, and glycosylation. Proc. Natl. Acad. Sci. USA 100:12972-12977.

    Kiessling, A. K., G. Zheng, and R. C. Eyre. 1992. Semen producing organs are an isolated reservoir of HIV which may play a significant role in the development of drug resistant strains. J. Hum. Virol. 2:193.

    Kosakovsky-Pond, S., and S. D. W. Frost. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol., in press.

    Krieger, J. N., R. W. Coombs, A. C. Collier, D. D. Ho, S. O. Ross, J. E. Zeh, and L. Corey. 1995. Intermittent shedding of human immunodeficiency virus in semen: implications for sexual transmission. J. Urol. 154:1035-1040.

    Krieger, J. N., A. Nirapathpongporn, M. Chaiyaporn, G. Peterson, I. Nikolaeva, R. Akridge, S. O. Ross, and R. W. Coombs. 1998. Vasectomy and human immunodeficiency virus type 1 in semen. J. Urol. 159:820-825.

    Marshall, R. D. 1974. The nature and metabolism of the carbohydrate-peptide linkages of glycoproteins. Biochem. Soc. Symp. 40:17-26.

    McInerney, J. O. 1998. GCUA: general codon usage analysis. Bioinformatics 14:372-373.

    Mjolsness, E., and D. DeCoste. 2001. Machine learning for science: state of the art and future prospects. Science 293:2051-2055.

    Muse, S. V., and B. S. Gaut. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11:715-724.

    Nickle, D. C., D. Shriner, J. E. Mittler, L. M. Frenkel, and J. I. Mullins. 2003. Importance and detection of virus reservoirs and compartments of HIV infection. Curr. Opin. Microbiol. 6:410-416.

    Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 2004. fastDNAml: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10:41-48.

    Page, R. D. M. 1996. TREEVIEW: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12:357-358.

    Paranjpe, S., J. Craigo, B. Patterson, M. Ding, P. Barroso, L. Harrison, R. Montelaro, and P. Gupta. 2002. Subcompartmentalization of HIV-1 quasispecies between seminal cells and seminal plasma indicates their origin in distinct genital tissues. AIDS Res. Hum. Retrovir. 18:1271-1280.

    Pillai, S., B. Good, D. Richman, and J. Corbeil. 2003. A new perspective on V3 phenotype prediction. AIDS Res. Hum. Retrovir. 19:145-149.

    Piot, P., M. Bartos, P. D. Ghys, N. Walker, and B. Schwartlander. 2001. The global impact of HIV/AIDS. Nature 410:968-973.

    Quinlan, J. R. 1993. C4.5: programs for machine learning. Morgan Kaufmann, San Francisco, Calif.

    Quinn, T. C., M. J. Wawer, N. Sewankambo, D. Serwadda, C. Li, F. Wabwire-Mangen, M. O. Meehan, T. Lutalo, R. H. Gray, et al. 2000. Viral load and heterosexual transmission of human immunodeficiency virus type 1. N. Engl. J. Med. 342:921-929.

    Rambaut, A. 2002. Se-Al sequence alignment editor version 2.0. Department of Zoology, University of Oxford, Oxford, United Kingdom.

    Rambaut, A. 2000. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16:395-399.

    Resch, W., N. Hoffman, and R. Swanstrom. 2001. Improved success phenotype prediction of the human immunodeficiency virus type 1 from envelope variable loop 3 sequence using neural networks. Virology 288:51-62.

    Sadiq, S. T., S. Taylor, S. Kaye, J. Bennett, R. Johnstone, P. Byrne, A. J. Copas, S. M. Drake, D. Pillay, and I. Weller. 2002. The effects of antiretroviral therapy on HIV-1 RNA loads in seminal plasma in HIV-positive patients with and without urethritis. AIDS 16:219-225.

    Singh, A., G. Besson, A. Mobasher, and R. G. Collman. 1999. Patterns of chemokine receptor fusion cofactor utilization by human immunodeficiency virus type 1 variants from the lungs and blood. J. Virol. 73:6680-6690.

    Slatkin, M., and W. P. Maddison. 1989. A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123:603-613.

    Smith, D. M., J. D. Kingery, J. K. Wong, C. C. Ignacio, D. D. Richman, and S. J. Little. 2004. The prostate as a reservoir for HIV-1. AIDS 18:6-8.

    Strain, M. C., H. F. Günthard, D. V. Havlir, C. C. Ignacio, D. M. Smith, A. J. Leigh Brown, T. R. Macaranas, R. Y. Lam, O. A. Daly, M. Fischer, M. Opravil, H. Levine, L. Bacheler, C. A. Spina, D. D. Richman, and J. K. Wong. 2003. Heterogeneous clearance rates of long-lived lymphocytes infected with HIV: intrinsic stability predicts lifelong persistence. Proc. Natl. Acad. Sci. USA 100:4819-4824.

    Taylor, S., R. P. van Heeswijk, R. M. Hoetelmans, J. Workman, S. M. Drake, D. J. White, and D. Pillay. 2000. Concentrations of nevirapine, lamivudine and stavudine in semen of HIV-1-infected men. AIDS 14:1979-1984.

    UNAIDS/WHO. 2004. AIDS epidemic update: December 2003. UNAIDS/World Health Organization, Geneva, Switzerland.

    Vernazza, P. L., B. L. Gilliam, J. Dyer, S. A. Fiscus, J. J. Eron, A. C. Frank, and M. S. Cohen. 1997. Quantification of HIV in semen: correlation with antiviral treatment and immune status. AIDS 11:987-993.

    Wei, X., J. M. Decker, S. Wang, H. Hui, J. C. Kappes, X. Wu, J. F. Salazar-Gonzalez, M. G. Salazar, J. M. Kilby, M. S. Saag, N. L. Komarova, M. A. Nowak, B. H. Hahn, P. D. Kwong, and G. M. Shaw. 2003. Antibody neutralization and escape by HIV-1. Nature 422:307-312.

    Witten, I. H., and E. Frank. 2000. Data mining practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco, Calif.

    Wong, J. K., C. C. Ignacio, F. Torriani, D. Havlir, N. J. S. Fitch, and D. D. Richman. 1997. In vivo compartmentalization of HIV: evidence from the examination of pol sequences from autopsy tissues. J. Virol. 70:2059-2071.

    Yu, Q., R. Konig, S. Pillai, K. Chiles, M. Kearney, S. Palmer, D. Richman, J. M. Coffin, and N. R. Landau. 2004. Single-strand specificity of APOBEC3G accounts for minus-strand deamination of the HIV genome. Nat. Struct. Mol. Biol. 11:435-442.

    Zhang, H., G. Dornadula, M. Beumont, L. Livornese, B. Van Uitert, K. Henning, and R. J. Pomerantz. 1998. Human immunodeficiency virus type 1 in the semen of men receiving highly active antiretroviral therapy. N. Engl. J. Med. 339:1803-1809.

    Zhang, L., L. Rowe, T. He, C. Chung, J. Yu, W. Yu, A. Talal, M. Markowitz, and D. D. Ho. 2002. Compartmentalization of surface envelope glycoprotein of human immunodeficiency virus type 1 during acute and chronic infection. J. Virol. 76:9465-9473.

    Zhang, L. Q., P. MacKenzie, A. Cleland, E. C. Holmes, A. J. Leigh-Brown, and P. Simmonds. 1993. Selection for specific sequences in the external envelope protein of human immunodeficiency virus type 1 upon primary infection. J. Virol. 67:3345-3356.

    Zhu, T., H. Mo, N. Wang, D. S. Nam, Y. Cao, R. A. Koup, and D. D. Ho. 1993. Genotypic and phenotypic characterization of HIV-1 in patients with primary infection. Science 261:1179-1184.(Satish K. Pillai, Benjami)