Skip Navigation



JXB Advance Access published online on April 23, 2007

Journal of Experimental Botany, doi:10.1093/jxb/erm054
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
58/8/1927    most recent
erm054v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ancillo, G
Right arrow Articles by Navarro, L
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ancillo, G
Right arrow Articles by Navarro, L
Agricola
Right arrow Articles by Ancillo, G
Right arrow Articles by Navarro, L
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author [2007]. Published by Oxford University Press [on behalf of the Society for Experimental Biology]. All rights reserved. For Permissions, please e-mail: journals.permissions@oxfordjournals.org

RESEARCH PAPER

Class prediction of closely related plant varieties using gene expression profiling

G Ancillo1, J Gadea2, J Forment2, J Guerri1 and L Navarro1,*

1Instituto Valenciano de Investigaciones Agrarias (IVIA), Carretera Moncada-Náquera, Km. 4.5, 46113 Moncada (Valencia), Spain
2Instituto de Biología Molecular y Celular de Plantas (IBMCP), Universidad Politécnica de Valencia, Laboratorio de Genómica, Avenida de los Naranjos, s/n, 46022 Valencia, Spain

* To whom correspondence should be addressed. E-mail: lnavarro{at}ivia.es

Received 30 November 2006; Revised 23 February 2007 Accepted 27 February 2007


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
In recent years, class prediction experiments have been largely developed in cancer research with the aim of classifying unknown samples by examining their expression signature. In natural populations, a significant component of gene expression variability is also heritable. Citrus species are an ideal model to accomplish the study of these questions in plants, due to the existence of varieties derived from somatic mutations that are likely to differ from each other by one or a few point mutations but are phenotypically indistinguishable at early vegetative stages. The small genetic variability existing among these varieties makes molecular markers ineffective in distinguishing genotypes within a particular species. Gene expression profiles have been used to predict mandarin clementine varieties (Citrus clementina Hort. ex Tan.) by means of two independent supervised learning algorithms: Support Vector Machines and Prediction Analysis of Microarrays. The results show that transcriptional variation is variety-dependent in citrus, and supervised clustering methods may correctly assign blind samples to varieties when both training and test samples are under the same experimental conditions.

Key words: Citrus, class prediction, expression profiling, microarray, natural variation


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Recent microarrays studies have revealed that every different biological stage is determined by the coordinate expression of a different set of genes. This fact has been exploited in class prediction experiments, with the aim of classifying unknown samples by examining their expression signature. In cancer research, new approaches based on gene expression profiles using supervised training methods are quickly being developed to provide more accurate diagnosis (Golub et al., 1999; Alizadeh et al., 2000; Khan et al., 2001; Kohlmann et al., 2004), complementing or discarding more subjective methods based on morphological characteristics which are very dependent on the expertise of the pathologist.

Training algorithms have already been used in plants (Heath et al., 2002) but, as far as is known, this is the first time class prediction methods have been applied to address molecular classification of plant varieties.

Transcriptional differences have also been investigated in the context of natural populations (Jin et al., 2001; Enard et al., 2002; Oleksiak et al., 2002; Stamatoyannopoulos, 2004). These studies have shown that a significant component of gene expression variation is heritable. Different species, varieties, and populations also express a particular set of genes at a certain moment. These heritable gene expression patterns, when applied to supervised training algorithms similar to those used for cancer diagnosis, could possibly be used to solve complex phylogenetic problems of germplasm banks and culture collections, assigning relationships between morphologically similar varieties or populations, and helping to generate evolutionary trees in cases where other methods failed.

Citrus represent an appropriate model to deepen the study of these questions in plants because most of the varieties from commercial species, such as clementine (Citrus clementina Hort. ex Tan.) and sweet orange [Citrus sinensis (L.) Osb.] groups, have arisen by somatic mutation (Cameron and Frost, 1968). These mutations occur spontaneously and frequently in buds and limbs, representing the main natural source of new varieties (Spiegel-Roy and Goldschmidt, 1996). Sports showing beneficial horticultural traits (i.e. maturity, flowering time, and fruit characteristics) are perpetuated by growers through vegetative propagation by budding. Moreover, asexual reproduction by apomictic seed (nucellar polyembryony), which is an important characteristic of many citrus varieties, also contributes to preserve the variation generated by mutation or hybridization (Spiegel-Roy and Goldschmidt, 1996). Morphological identification may be particularly difficult or even impossible for many varieties at early stages of development, being only distinguishable in mature trees (particularly by fruit traits) after periods of years as juveniles. Besides its importance in phylogenetic analyses, citrus identification is crucial to the citrus industry. The possibility of discriminating between citrus varieties which frequently differ from each other in few genetic changes is of great importance to protect variety.

Molecular markers such as RFLPs, RAPDs, AFLPs, ISSR, and SCAR have already been used in citrus (Moore, 2001) and have proved to be very useful for comprehensive phylogenetic studies. However, the small genetic variability existing among some varieties makes this kind of marker inefficient in distinguishing genotypes within a particular species (Deng et al., 1995; Fang and Roose, 1997; Bretó et al., 2001). Frequently, these markers are only able to distinguish between closely related variety groups, but not varieties belonging to the same cluster (Fang and Roose, 1997).

In this work, gene expression profiles have been used to predict citrus varieties by means of two independent supervised learning algorithms—Support Vector Machines (SVM; Vapnik, 1998) and Prediction Analysis of Microarrays (PAM; Tibshirani et al., 2002)—and a citrus cDNA microarray containing more than 6500 different citrus unigenes developed within the Spanish Citrus Functional Genomics Project (http://citrusgenomics.ibmcp-ivia.upv.es). As a proof of the principle, ‘Clemenules’, ‘Marisol’, and ‘Hernandina’, three clementines differing mainly in maturation time and in some fruit traits but phenotypically indistinguishable at vegetative stages, have been used. Herrero (1995) and Bretó et al. (2001) developed one RAPD and three IRAP markers, respectively, able to cluster ‘Marisol’ in a group different from the two other varieties, but no data have been reported to differentiate between ‘Clemenules’ and ‘Hernandina’, which always cluster in the same group. Here it is shown that transcriptional variation is variety-dependent in citrus and that, under appropriate experimental conditions, supervised clustering methods can correctly assign blind samples to phenotypically indistinguishable varieties.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Plant material
Three clementine (Citrus clementina Hort. ex Tan.) varieties were used for this study, namely Clemenules, Marisol, and Hernandina. Twenty samples of each variety (5 g of bark from the last finished sprouting branches) were harvested among 400 plants from the propagation block of the nursery AVASA located in Castellón, Spain in early summer. All plants were grown in the same screenhouse under the same conditions, and the effects of tree and branch position were minimized by random sampling.

Twelve additional samples were collected to check the feasibility of the generated models: seven (three from ‘Clemenules’ and ‘Hernandina’ and one from ‘Marisol’ varieties) were harvested 3 months later (early autumn) from the same trees sampled previously (growing in the AVASA nursery) and the other five (three from ‘Hernandina’ and one from ‘Clemenules’ and ‘Marisol’ varieties) were taken from a screenhouse at a different nursery (Alcanar, Tarragona, Spain) in the same week as those used to build the model. Trees at both nurseries were clonal propagations from the same original pathogen-free trees maintained at the Citrus Germplasm Bank of IVIA (http://www.ivia.es).

RNA isolation and labelling
Five grams of liquid nitrogen-powdered material were resuspended by vortexing in 5 ml of 0.2 M TRIS–HCl, pH 8, 50 mM EDTA, 0.2 M NaCl, 2% (w/v) SDS, 5 ml of H2O-saturated phenol, and 25 µl of ß-mercaptoethanol, and incubated for 5 min at 50 °C. After centrifugation, the aqueous phase was extracted with chloroform:isoamyl alcohol (24:1, v/v). One volume of 6 M LiCl was added to the new aqueous phase, and RNA was left to precipitate overnight at –20 °C. After centrifugation and washing with 70% (v/v) ethanol, RNA was resuspended in diethylpyrocarbonate-treated H2O.

RNA was labelled following an indirect method (Randolph and Waggoner, 1997). Reverse transcription, cDNA purification, dye coupling, and fluorescent cDNA purification were accomplished as described by Forment et al. (2005), except that total RNA (40 µg) was used instead of poly(A)+ RNA. Sample RNA was labelled with Cy3, and reference RNA (pooled RNA consisting of an equal amount of RNA from each sample) was labelled with Cy5.

Microarray hybridization and scanning
The cDNA microarray developed under the Citrus Functional Genomic Project (http://citrusgenomics.ibmcp-ivia.upv.es) was used. The microarray contains probes corresponding to 6875 putative unigenes from citrus (Forment et al., 2005). Microarray hybridization, washing, and scanning were performed as described by Forment et al. (2005).

Data preprocessing
Spots with background-subtracted intensity greater than 2-fold the mean background intensity in at least one channel were selected and used for normalization with the GenePix 4.1 (Molecular Devices) software. For normalization the mean of the ratio of medians of all the features was made 1. Those values derived from replicate spots were merged by average, and the feature patterns with <80% of existing values were removed (since the total number of samples was 30 that means that those features with no value in at least 24 of the samples were not considered). Missing values in the remaining patterns were inputted using K-Nearest Neighbors (Troyanskaya et al., 2001), replacing the missing values by the average value of the k=15 nearest patterns. Replicate merging and filtering of missing values and input was carried out by using the GEPAS interface (http://gepas.bioinfo.cnio.es; Vaquerizas et al., 2005).

Sample data to be used in the prediction were normalized and replicate spot values merged as described above. Filtering was not done, because classification of the same feature lanes present in the training set is required in the prediction set. For that purpose, local Perl scripts were used. To input the missing values into the test set, data from the training set and for the test set were joined and K-Nearest Neighbors input was used as described above. After that, data were again split into two sets. Data joining was accomplished only to input the missing values into the prediction set. During preprocessing, training set data were never contaminated with values from the prediction set.

SVM classification
Preprocessed data of a variable number of samples (depending on whether the suitable number of samples for the training set or for the definitive model has been established) were used to train SVM using the SVM train program from the GEPAS GUI (http://gepas.bioinfo.cnio.es; Vaquerizas et al., 2005). To establish the suitable number of samples for the training subset and leave-one-out (Scholkopf and Smola, 2002), the cross-validation method for the definitive models was used 3-fold. Linear classifications were performed, so kernel transformation of the inputs was not carried out. To predict the class of the new samples from the test set, the program SVM classify, accessible at http://gepas.bioinfo.cnio.es (Vaquerizas et al., 2005), was used.

PAM classification
Since PAM can perform multiclass prediction, a training set, containing the preprocessed sample data from the three clementine classes, was introduced in the Excel GUI software for PAM (http://www-stat.stanford.edu/~tibs/PAM) to build only one model. Automatic cross-validation (10-fold) was implemented and test errors estimated to optimize the shrinkage parameter ({Delta}), which is a threshold used to select genes for class prediction. The {Delta} value resulting in the best accuracy (lowest overall error rate) for the prediction with the lowest number of predictive features was chosen.

A test set consisting of preprocessed data from new samples of the three clementine varieties was used for prediction.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
With the aim of building mathematical models able to predict clementine classes, screenhouse-grown plants from three different varieties (‘Clemenules’, ‘Marisol’, and ‘Hernandina’) were harvested and their RNA extracted and used to hybridize a citrus cDNA microarray (Forment et al., 2005) using a reference design (see Materials and methods). Resulting data were processed as described in Materials and methods and used for the predictions by two different algorithms, SVM and PAM.

Binary classification using SVM
SVM is a supervised machine learning algorithm that operates by finding a hypersurface in the space of possible inputs. This approach attempts to split two classes by maximizing the distance between the hypersurface and the closest points of each class (Vapnik, 1998). To predict three different clementine classes, three different biclassifications had to be performed.

To find the minimum number of samples necessary to build a model able to predict clementine classes with substantial accuracy, different numbers of samples (4, 6, 8, 10, 12, and 15) from Clemenules and Marisol were used to train the SVM algorithm. For each number of samples, 10 replica models (using a different sample subset for each) were created and after cross-validation by 3-fold, the accuracy of the models was tested. Three-fold cross-validation splits the data into three subsets, and uses two to build a model and one to classify. This is iteratively repeated leaving one different subset each time. For each number of samples represented, the accuracy was calculated as the average of the 10 different accuracies from each replica model (Fig. 1).


Figure 1
View larger version (7K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Accuracy of the models generated through SVM for different sample numbers of the mandarin varieties ‘Clemenules’ and ‘Marisol’. Accuracy is expressed as an average of the results obtained in 10 replica models.

 
Although >80% average accuracy was found using only eight samples per variety, the result was still quite dependent on the sample subset used for every replica-model, indicating that sample variability still affected the output of the algorithm. This effect was reduced significantly by increasing the number of samples (see Fig. 1). Using 15 samples per variety, the total accuracy was 96.7% regardless of the sample subset used for training. Less than 15 samples could even be used in cases where sample size is a limitation, provided that larger misclassification rates are allowed.

Three different biclassification models were built pair wise. A preprocessed data set of 15 samples from each variety was split and used as a training set (30 in total). The apparent accuracy (internal validation) of the prediction, estimated by leave-one-out cross-validation, was 29/30 (96.7%) in all three binary models. The other five samples of each variety (10 in total), used as a test set, were correctly classified by the models ‘Clemenules–Marisol’ and ‘Marisol–Hernandina’. For the ‘Clemenules–Hernandina’ model, only one sample was misclassified (Table 1). Thus, the estimation of true accuracy (external validation) after prediction was 10/10 (100%) in the models ‘Clemenules–Marisol’ and ‘Marisol–Hernandina‘, and 9/10 (90%) in the ‘Clemenules–Hernandina’ model. The results were consistent regardless of the individuals chosen in the training set (data not shown).


View this table:
[in this window]
[in a new window]

 
Table 1. Accuracy estimations in the different classification models

 
Classification using PAM
This algorithm does sample classification using the nearest shrunken centroid method with automatic gene selection, based on a ranking that uses a penalized t-statistic and soft-thresholding. The misclassification error rate is determined through 10-fold cross-validation that works 3-fold but splitting the data into 10 subsets (Tibshirani et al., 2002). As multiclass classification can be performed by PAM. Only one model was built with 15 samples of each variety (45 in total). The thresholding parameter {Delta} (which determines the degree of shrinkage) to select genes for class prediction was fixed at {Delta}=4.65 (Fig. 2) to minimize the cross-validated results and the test errors. The method eliminates components (features) from the class prediction as the {Delta} parameter is increased. Adding more features usually increases the error rate, whereas fewer features result in insufficient power to discriminate between classes. The chosen value for {Delta} yielded 32 clones/features as the gene set used for the prediction (Fig. 3).


Figure 2
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Errors in clementine classification as a function of the threshold parameter {Delta}. The chosen value {Delta}=4.65 yields a subset of 32 selected genes. H, Hernandina; N, Nules; S, Marisol.

 

Figure 3
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Heat map of the chosen 32 genes. Expression greater than mean level is represented by red, expression less than mean level is represented by green, and expression close to mean level is represented by black. When available, the most similar protein at the nr NCBI's database is indicated for each citrus gene.

 
Having identified the number of genes able to distinguish the three varieties, next class prediction analyses were performed to classify an independent data set. The test set consisted of preprocessed data from five samples of each variety (15 in total). All the samples were correctly assigned (Table 1) and the results were reproducible regardless of the individuals chosen in the training set (data not shown).

Influence of environmental factors on the prediction
To evaluate the effect of environmental factors on the feasibility of the generated models, two more sample sets were generated next following the same sampling strategy used previously: one set of samples was taken from the same trees of the former set but harvested some months later than the first sample set and another set was taken from a different nursery at the same time as those used to generate the model (see Materials and methods).

Prediction accuracies diminished considerably when these samples were used as a test set with either the SVM or the PAM models (Table 2). Even samples taken from the same individuals that could be reliably predicted in the first sampling (see above) were predicted now with low accuracy. In some cases, results were very close to those expected at random, indicating that both training and test samples need a controlled experimental design to assure feasibility of prediction, and that the model cannot be extended to samples grown in other environmental conditions.


View this table:
[in this window]
[in a new window]

 
Table 2. Accuracy in the prediction of samples used to test the feasibility of the different classification models

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Supervised training methods using gene expression profiles have been mainly applied in recent years to cancer research (Golub et al., 1999; Alizadeh et al., 2000; Khan et al., 2001; Kohlmann et al., 2004). Here the same rationale has been used to elucidate whether closely related plant genotypes can be distinguished by their gene expression profile. By using three clementine varieties (‘Clemenules’, ‘Marisol’, and ‘Hernadina’), phenotypically indistinguishable at vegetative stages of the life cycle, it has been shown that these methods can assign a class to individuals from unknown origin. In citrus the genetic variation within each species is very narrow (Herrero et al., 1996) and all varieties of clementine mandarins most probably derive from a single plant, the product of uncontrolled crosses and spontaneous somatic mutations carefully selected by the growers and then vegetatively propagated (Bretó et al., 2001). This low genetic variation limits the use of molecular markers to identify varieties, and only two studies have reported molecular markers able differentially to cluster ‘Marisol’ from ‘Clemenules’ and ‘Hernandina’. Herrero (1995), studying the genetic uniformity of clementine varieties, found that only one out of 40 RAPD markers was able to differentiate the ‘Marisol’ group from that in which ‘Clemenules’ and ‘Hernandina’ clustered. Bretó et al. (2001) obtained similar results and showed that the ‘Marisol’ cluster differed from the ‘Clemenules’ and ‘Hernandina’ group in three IRAP markers. However, all markers were unable to differentiate ‘Clemenules’ from ‘Hernandina’.

Recently, several reports have addressed the question of the variability of gene expression in the context of natural populations and the extent to which inter-individual variability in the patterns of gene activity reflects heritable determinants of gene regulation versus the action of environmental factors (Oleksiak et al., 2002; Stamatoyannopoulos, 2004). Evidence exists that a significant component of the variability in gene expression is genetic, implying heritable variation in cis- and trans-acting factors among different individuals and populations (Enard et al., 2002; Stamatoyannopoulos, 2004). To separate the contributions of genotype and environment on gene expression in citrus, a sampling strategy that reduced environmental effects has been designed, eliminating systematic bias and randomizing the remaining variables.

Two different methods (SVM and PAM) (Vapnik, 1998; Tibshirani et al., 2002) were successful in predicting the three clementine varieties. In the SVM two-class analysis, a high predictive accuracy (100%) in assigning the blind samples to the correct class in the ‘Clemenules–Marisol’ and ‘Marisol–Hernandina’ models was obtained. When an attempt was made to classify the test samples using the ‘Clemenules–Hernandina’ model, the prediction accuracy decreased to 90% because one sample corresponding to the ‘Clemenules’ class was predicted as ‘Hernandina’. As implemented in GEPAS, SVM makes a binary prediction using the information from every gene. This kind of analysis has been described to allow the noise associated with genes with little or no discriminatory power to inhibit the performance of the algorithms (Ambroise and McLachlan, 2002). However, the low error rate obtained is still promising, and the only misclassification event is not surprising considering that ‘Clemenules’ and ‘Hernandina’ are very closely related and, therefore, their heritable transcriptional profiles are expected be more similar and difficult to classify unambiguously.

The PAM algorithm, however, is a multiclass classifier that implements a feature-subset selection approach to reduce the number of genes used for prediction (Tibshirani et al., 2002), identifying those contributing most to the classification. The PAM algorithm correctly classified all the samples (100% accuracy) in the present study, based on the gene expression of 32 genes considered differentially expressed between ‘Clemenules’, ‘Marisol’, and ‘Hernandina’ (see Table 1). The use of this ranked feature-selection approach allows a relatively small number of predictive genes to be found. This contrasts with other noise reduction approaches based on principal component analysis or clustering, where the biological meaning of the entities analysed is to some extent lost (Mateos et al., 2001). In this study, a microarray containing around 7000 genes (representing all functional categories) was used, the majority of which were equally expressed in all three varieties according to the data obtained. It is interesting that, among the predictive genes, there are an increased fraction of genes coding for proteins related to the signal transduction of ABA (C02014F08, C07010B02, C02004A05, C02014C03, C18013C02; see Fig. 3) that are overexpressed in the ‘Hernandina’ variety. The possible biological significance of this result remains to be established. In any case, the risks of extracting biological information from biased expression data have been described (Novatchkova and Eisenhaber, 2001), and higher gene representation should be used to extract reliable information about distinct gene pathway regulation in the different varieties.

The prediction accuracy diminished considerably with changing environmental conditions, indicating that, as expected for these three closely related citrus varieties, the heritable variety-dependent gene expression variation contributes very little to total gene expression, and that a small perturbation in these conditions provokes major changes in gene expression that obscure the heritable effects. Nonetheless, highly similar citrus varieties were predicted if environmental differences were reduced to a minimum, indicating that heritable gene expression applied to supervised training methods could be a useful way of identifying variety if the experimental design is controlled. In nurseries, citrus varieties are usually grown under controlled conditions and, even when this is not the case, due to the ease of vegetative clonal propagation by budding of citrus plants, it would take just 6–8 weeks to grow new plants under the same greenhouse conditions. In non-clonal organisms where inter-population genetic variation is greater, a well-designed strategy is likely to allow prediction using supervised clustering algorithms.

The possibility of generating prediction models for different varieties could help to determine legal issues regarding the protection of rights of citrus breeders until variety-specific molecular markers as SNPs are developed. However, carefully controlled experiments have to be designed. Moreover, as these algorithms assign any test sample to the most related class of the model, certified samples coming from new varieties or species could be tested on models generated from their possible parental or related counterparts. The model would thus reveal the most related species based on the expression profiles encountered, opening up new possibilities for phylogenetic studies.


    Acknowledgements
 
We thank Dr J Dopazo and R Diaz for helpful discussions, and Dr R Flores for critical reading of this manuscript. This work was supported by CICYT grant GEN2001-4885-C05-05 and by INIA grant RTA2005-00223.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Alizadeh AA, Eisen MB, Davis RE, et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403 503–511.[CrossRef][Medline]

Ambroise C and McLachlan GJ. (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences, USA 99 6562–6566.[Abstract/Free Full Text]

Bretó MP, Ruiz C, Pina JA, Asíns MJ. (2001) The diversification of Citrus clementina Hort. ex Tan, a vegetatively propagated crop species. Molecular Phylogenetics and Evolution 21 285–293.[CrossRef][Web of Science][Medline]

Cameron JW and Frost HB. (1968) Genetics, breeding and nucellar embryony. In Reuther W, Batchelor LD, Webber HJ (Eds.). The citrus industryBerkeley, CA University of California Vol. II pp. 325–370.

Deng ZN, Gentile A, Nicolosi E, Vardi A, Tribulato E. (1995) Identification of in vivo and in vitro lemon mutants by RAPD markers. Journal of Horicultural Science 70 117–125.

Enard W, Khaitovich P, Klose J, et al. (2002) Intra- and interspecific variation in primate gene expression patterns. Science 296 340–343.[Abstract/Free Full Text]

Fang DQ and Roose ML. (1997) Identification of closely related citrus cultivars with inter-simple sequence repeat markers. Theoretical and Applied Genetics 95 408–417.[CrossRef][Web of Science]

Forment J, Gadea J, Huerta L, et al. (2005) Development of a citrus genome-wide EST collection and cDNA microarray as resources for genomic studies. Plant Molecular Biology 57 75–91.

Golub T, Slonim D, Tamayo P, et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286 531–537.[Abstract/Free Full Text]

Heath LS, Ramakrishnan N, Sederoff RR, Whetten RW, Chevone BI, Struble CA, Jouenne VY, Chen D, van Zyl LM, Grene R. (2002) Studying the functional genomics of stress responses in Loblolly pine using the express microarray management system. Comparative and Functional Genomics 3 226–243.[CrossRef][Web of Science]

Herrero R. (1995) Genetic characterization, variability and phylogenetic relationships study of the Aurantioidae subfamily PhD thesis, University of Valencia, Spain, 135–140.

Herrero R, Asíns MJ, Carbonell EA, Navarro L. (1996) Genetic diversity in the orange subfamily Aurantoideae. I. Intraspecies and intragenus genetic variability. Theoretical and Applied Genetics 92 599–609.[CrossRef][Web of Science]

Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, Gibson G. (2001) The contributions of sex, genoytpe and age to transcrptional variance in Drosophila melanogaster. Nature Genetics 29 389–395.[CrossRef][Web of Science][Medline]

Khan J, Wei JS, Ringner M, et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and neural networks. Nature Medicine 7 673–679.[CrossRef][Web of Science][Medline]

Kohlmann A, Schoch C, Schittger S, Dugas M, Hiddemann W, Kern W, Haferlach T. (2004) Pediatric acute lymphoblastic leukemia (ALL) gene expression signatures classify an independent cohort of adult ALL patients. Leukemia 18 63–71.[CrossRef][Web of Science][Medline]

Mateos A, Herrero J, Tamames J, Dopazo J. (2001) Supervised neural networks for clustering conditions in DNA array data alter reducing noise by clustering gene expresión profiles. Microarray data analysisDordrecht Kluwer Academic Publishers Vol. II pp. 91–103.

Moore GA. (2001) Oranges and lemons: clues to the taxonomy of Citrus from molecular markers. Trends in Genetics 17 536–540.[CrossRef][Web of Science][Medline]

Novatchkova M and Eisenhaber F. (2001) Can molecular mechanisms of biological processes be extracted from expression profiles? Case study, endothelial contribution to tumor-induced angiogenesis. Bioessays 23 1159–1171.[CrossRef][Web of Science][Medline]

Oleksiak MF, Churchill GA, Crawford DL. (2002) Variation in gene expression within and among populations. Nature Genetics 32 261–266.[CrossRef][Web of Science][Medline]

Randolph JB and Waggoner AS. (1997) Stability, specificity and fluorescence brightness of multiply-labelled fluorescence DNA probes. Nucleic Acid Research 25 2923–2929.[Abstract/Free Full Text]

Scholkopf B and Smola A. (2002) Learning with kernelsCambridge, MA MIT Press.

Spiegel-Roy P and Goldschmidt EE. (1996) Biology of citrusCambridge Cambridge University Press.

Stamatoyannopoulos JA. (2004) The genomics of gene expression. Genomics 84 449–457.[CrossRef][Web of Science][Medline]

Tibshirani T, Hastie T, Narasimhan B, Chu G. (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, USA 99 6567–6572.[Abstract/Free Full Text]

Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 16 520–525.[Web of Science]

Vaquerizas JM, Conde L, Yankilevich P, Cabezon A, Minguez P, Diaz-Uriarte R, Al-Shahrour F, Herrero J, Dopazo J. (2005) Gepas: an experiment-oriented pipeline for the analysis of microarray gene expression data. Nucleic Acids Research 33 W616–W620.[Abstract/Free Full Text]

Vapnik V. (1998) Statistical learning theoryNew York, NY John Wiley-Interscience.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Exp BotHome page
M. A. Forner-Giner, M. J. Llosa, J. L. Carrasco, M. A. Perez-Amador, L. Navarro, and G. Ancillo
Differential gene expression analysis provides new insights into the molecular basis of iron deficiency stress response in the citrus rootstock Poncirus trifoliata (L.) Raf.
J. Exp. Bot., November 13, 2009; (2009) erp328v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
58/8/1927    most recent
erm054v1
Right arrow E-letters: Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when E-letters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Ancillo, G
Right arrow Articles by Navarro, L
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ancillo, G
Right arrow Articles by Navarro, L
Agricola
Right arrow Articles by Ancillo, G
Right arrow Articles by Navarro, L
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?