- Search for TFs
- Search for Genes
- Search for Associations
Support & suggestions:
YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking; www.yeastract.com) database presently contains 30990 regulatory associations between the yeast genes, based on more that 1000 bibliographic references. Each regulation has been annotated manually, after examination of the relevant references. The database also contains the description of 281 specific DNA binding sites for a sub-group of 108 transcription factors. Since a number of transcription factors bind to the same DNA motifs, these 281 binding sites associated to different transcription factors correspond to only 208 distinct nucleotide sequences. Further information about each yeast gene was obtained from Saccharomyces Genome Database (SGD), Regulatory Sequence Analysis Tools (RSAT) and Gene Ontology (GO) Consortium.
YEASTRACT database provides assistance in three major issues: prediction of gene transcriptional regulation, DNA motif and global expression analysis according to yeast transcription networks described in the literature. This tutorial presents three case-studies, exemplifying the use of different query options and utilities. Various other ways to exploit available options and utilities are possible.
Throughout YEASTRACT database and this tutorial, the regulatory associations are denominated "Documented" or "Potential":
- a documented association between a Transcription Factor (TF) and a target gene is supported by published data showing at least one of the following experimental evidences: i) Change in the expression of the target gene due to a deletion (or mutation) in the gene encoding transcription factor; these evidences may come from detailed gene by gene analysis or genome-wide expression analysis; ii) Binding of the transcription factor to the promoter region of the target gene, as supported by band-shift, foot-printing or Chromatine ImmunoPrecipitation (ChIP) assays. Therefore, the user is urged to check the literature references provided in the database to fully understand the nature of the evidences underlying the identified regulatory associations.
- a potential association between a TF and a target gene is based on the occurrence of the TF binding site in the promoter region of the target gene. The binding sites associated to each TF in this database are supported by published experimental evidence for the binding of the TF to the specific nucleotide sequence (data coming from foot-printing or ChIP assays). Again, the user is urged to check the literature references provided in the database.
The accuracy and updating of the information gathered, curated and inserted in this database is crucial to YEASTRACT users. Thus, we will value any contribution from the yeast community to achieve this goal.
Example 1: Identification of the documented and potential regulatory associations for an ORF/Gene
The functional analysis of an ORF or gene can be guided through the identification of its documented and potential transcription factors (TF). This example describes one of the possible ways to explore the regulatory associations for ORF YNR070w, encoding a putative ATP-binding cassette transporter, using various queries and utilities provided by YEASTRACT.
1.1 - Search for Documented Transcription Factors (TFs) The use of "Search Transcription Factors" query allows the identification of TFs which, are Documented and/or Potential transcriptional regulators of a given gene. The search for documented transcription factors acting directly upon YNR070w uncovers Nrg1p and Rap1p. The associated bibliographic references may be checked by the user to know the experimental basis for these regulatory associations.
According to the SGD description of Nrg1p and Rap1p these regulators are involved in glucose repression and chromatin silencing, respectively. Therefore, it may be considered of interest to examine the eventual link of ORF YNR070w to these biological processes.
1.2 - Search for Potential Transcription Factors (TFs) The use of "Search Transcription Factors" query may also identify the potential regulators of YNR070w.By default, all of the searched potential transcription factors will be displayed in tabular form. The Promoter link can be followed to see the binding sites for each TF in the promoter sequence of YNR070w. The distribution of TF binding sites in the promoter region of YNR070w can be viewed by checking the option image while searching.
The display of potential TFs on the image can be
controlled by un-checking their respective box in the color pallet
below the image and pressing the Redisplay button. The color
pallet displays the color for only those TFs for which binding sites
are found in the promoter region of the given gene(s). A close
observation of the image for TFs which are documented regulators for
YNR070w (i.e., Nrg1p and Rap1p) reveals that the binding site
for Nrg1p is present, while that for Rap1p is not. The role of Rap1p
in YNR070w regulation may be indirect or through a binding site
still not described in the literature or not listed in the database.
1.3 - Gene Grouping based on shared Gene Ontology (GO) terms The YEASTRACT utility "Group Genes by GO" allows the grouping of a list of genes according to the GO terms they share. The following list of genes, identified as potential regulators of YNR070w, is subjected to the GO based grouping, selecting Ontology Biological Process and Level 5.
The output (Table 1) displays GO terms in the first column, the percentage of genes out of the given list associated with respective GO terms in the second column and the cluster of genes associated to the GO term in the last column. Depending on the chosen Gene Ontology and level, grouping may differ.
Table 1 - GO associations for genes using Biological Process at level 5
The information in Table 1 reveals that most of the TFs potentially binding to the promoter region of YNR070W are involved in cell cycle, pseudohyphal growth, organic acid metabolism, response to abiotic stimulus and cellular lipid metabolism. The eventual involvement of YNR070W in these processes can thus be hypothesized. The association of this ORF, with the GO term "response to abiotic stimulus" appears to be consistent with its previous association to the PDR network (de Risi et al., 2000), as encoding a putative multidrug transporter (Bauer et al., 1999).
If the ORF/gene under study is predicted to encode a TF, it would be convenient to use the query, "Search Regulated Genes", options Documented or Potential, to retrieve all documented and potential targets for the TF, respectively. The grouping of the searched target genes by GO may also provide clues on the biological processes or molecular functions controlled by the TF.
1.4 - References
Example 2: Microarray data clustering based on regulatory associations
YEASTRACT provides tools for the classification and grouping of large lists of genes that are up- or down-regulated under a specific environmental or biological situation, as suggested by genome-wide expression data. These analyses are based on known or algorithmically identified potential regulatory associations and on the GO-based schema. To exemplify the several utilities made available in YEASTRACT, the list of genes up-regulated in response to the expression of a point mutation in the PDR1 gene, encoding a transcription factor involved in Pleiotropic Drug Resistance in yeast, named PDR1-3, is tested. The data were retrieved from de Risi et al. (2000).
2.1 - Transform an ORF list into a Gene List and vice-versa
The utility ORF List<->Gene List converts a given list of ORFs or Genes to a list of Genes or ORFs, respectively. In addition it filters a mixed list into two separate lists of ORFs or Genes. This is useful to make the gene/ORF list reading more intuitive.2.2 - Group Genes by Gene Ontology (GO)
The grouping of genes based on the GO terms they share is a common feature of a number of microarray analysis software and is also implemented in YEASTRACT. Depending on the chosen Gene Ontology and level, grouping may differ. The grouping of the de Risi et al., 2000 gene list, based on Biological Process ontology at level 5 results in the following table:
Table 2 - GO associations for genes using Biological Process at level 5
In agreement with the published analysis of these results (de Risi et al., 2000) the main functional groups include "response to abiotic stimulus" (drugs included), "drug transport" and "cellular lipid metabolism", among others.2.3 - Group Genes by TF
The query “Group Genes by TF” allows the grouping of a list of genes based on the documented or potential TFs that may be involved in their regulation. The order by which these groups are presented in the resulting table is based on the decreasing percentage of genes that are regulated by the TF, relative to one of two options;
Below is the output of this utility following each of the two options, regarding documented regulatory associations. In both Tables 3 and 4, the first column indicates the name of the TF, the second column the % of genes regulated by the TF, calculated as referred above, and the third column shows the cluster of genes associated to the TF.
Table 3 - Genes grouped by TF ordered by the percentage of genes regulated by each TF, relative to the total number of genes in the list
Table 4 - Genes grouped by TF ordered by the percentage of genes regulated by each TF, relative to the number of genes in the genome which are regulated by the TF
Not surprisingly, the values in Table 3 show that 100% of the genes in the list are regulated by the TF Pdr1p and 92% by its close homologue Pdr3p. However, the values in Table 4 also indicate that only 14% of the known Pdr1p regulon (which includes 168 genes whose transcription was demonstrated to be affected by Pdr1p) was proved to be up-regulated under the experimental conditions used by de Risi and co-workers. Please notice that this calculated value is merely indicative. Indeed, the Pdr1p regulon includes genes, which may not be direct targets, according to the definition of documented regulatory association provided in the beginning of this Tutorial. However, the number of genes associated to any regulon depends on the amount of data published on the subject and gathered in this database. Nevertheless, the referred analysis indicates that a percentage of genes whose transcription is dependent on Pdr1p could not be identified under the experimental conditions designed by deRisi and co-workers (2000).
The output of this utility also indicates that Yrr1p and Pdr8p are involved in the transcriptional regulation of 24% and 20% of the genes in the test list, respectively (Table 3). These two TFs are close homologues to Pdr1p and Pdr3p and have overlapping targets (Le Crom et al., 2002; Hickell et al., 2003). The values in Table 4 indicate that about 30% of the genes belonging to the Yrr1p and Pdr8p described regulons (which comprise 21 and 17 target genes, respectively) are included in the list of genes resulting from the study of de Risi et al. (2000). All the corresponding references may be checked for each TF using "Search Regulated Genes", option "Documented".
Table 3 shows next that 20% of the genes in the test list are documented targets of Sok2p, a TF involved in pseudohyphal growth, but they represent only 1,6 % of the known Sok2p regulon, which includes over 300 genes (Table 4).
Table 4 also indicates that all the genes known to be regulated by each one of the Pdr1p homologous TFs Rds2, Stb5, Rds1 or Rds3, are present in the gene list of PDR1-3 targets described by de Risi et al., (2000). However, the known target genes for these recently described TFs are so far limited to 2, 3, 1 and 2 genes, respectively.
The genes present in the list under examination also include 50% of the Ecm22p regulon (Table4). Although this transcription factor only regulates 8% of the test gene list (Table 3), Ecm22p is associated to the regulation of sterol biosynthesis and known to be involved in the regulation of PDR5 and PDR16 (Akache and Turcotte, 2002), which are part of the yeast PDR network. PDR16, encoding a putative phospholipid transporter, is involved in sterol biosynthesis (van den Hazel et al., 1999).2.4 - Search for Regulatory Associations
The query "Find regulatory associations" may be used to group genes according to their documented and potential co-regulations. This query displays all the information obtained using the several options of "Group genes by TF" functionality in a single table, allowing the comparison of the potential and documented regulons deduced for an array gene list. To save space, this comparison is exemplified in Table 5 just for Pdr1p, although the whole list of the implicated TFs appears when using this functionality.
Table 5 clearly shows that there is a significant discrepancy between the genes, which are considered documented or potential targets of Pdr1p. The same is registered for other TFs. The observed differences may be due to the fact that: i) the documented targets of each TF may include indirect targets; ii) the existence of the TF binding site in the promoter region of a gene does not necessarily makes it a target of the corresponding TF; iii) there may be gene targets and binding sites for a specific TF that are not yet described in the literature or included in the database. For example, HXK1, SCW11, MET17, FMP43, FRE4, DSE4, COS10, REV1 genes, all confirmed targets of Pdr1-3p do not possess any Pdr1p binding site in their promoter regions. These genes may be indirect targets of Pdr1p, or their promoter region may include a binding site for this TF, which is not yet defined (or introduced in this database).
Table 5 - Regulatory Associations
Notice that within the query "Find regulatory associations", there are two search options, Any Transcription Factors to Any Gene and All Transcription Factors to Any Gene. The former option was used in the previous analysis to search for regulatory association. The later option searches a regulatory association where all the input TFs control at-least one of the input genes. This option enables the identification of groups of genes whose transcription is potentially under the simultaneous control of a number of different transcription factors. For instance, we may search for the regulatory association between the PDR related TF, Pdr1p,Pdr3p, Pdr8p and Yrr1p and the de Risi gene list using the All Transcription Factors to Any Gene option:
Table 6 - Regulatory associations
The results in table 6 reveals that there are four documented gene targets for Pdr1p, Pdr3p, Yrr1p and Pdr8p, although there is no potential gene target for all four TFs within the list under examination. The possibility that these TFs act together in the up-regulation of their overlapping targets has been examined to some extent. For instance, Pdr1p and Pdr3p can act as homo- or heterodimers (Mamnun et al., 2002) and the transcriptional regulation of Yrr1p or Pdr8p was found to be dependent on Pdr1p or Pdr3p (Hickell et al., 2003, Akache et al., 2004).
2.5 - References
Example 3: Search for a DNA motif within known TF binding sites and promoter regions
The search for over-represented consensus or DNA motifs in the promoter regions of co-regulated genes, revealed by global expression analysis, may contribute to the identification of known or new transcription associations underlying the yeast response under study. YEASTRACT provides "Search by DNA Motif" option to facilitate this analysis. This is exemplified below for the motif CGGGC found to be over-represented in the upstream regions of the genes up-regulated in yeast cells under glucose- or ethanol-limited growth (Wu et al., 2004).3.1 Search for a DNA Motif within known TF Binding Sites
This query allows the user to check if the DNA motif has already been documented as the binding site for a specific TF. This search allows the user to check whether a newly identified DNA motif matches perfectly, is contained in or contains a previously describe TF binding site.
The result of this query shows that the CGGGC motif has no exact matches to any of the 284 different TF binding sites described in the literature and compiled in YEASTRACT, but is contained by the Cup2p binding site, in its most degenerate region (HTHNNGCTGD; Beaudoin and Labbé, 2001). This conclusion appears to suggest that the examined motif does not correspond to any of the TF binding sites described so far.
3.2 Search for Genes having a DNA Motif in their Promoter Regions
This query search the existence of a new DNA motif in the promoter regions of all genes present in the yeast genome. The result of this query shows that the CGGGC motif occurs in the promoter region of 2169 among the approximately 6000 yeast genes. In the promoter region of 567 of these genes it occurs at least twice. This information, together with the tests on statistical significance (Wu et al., 2004), may be useful to anticipate the biological significance of a newly proposed consensus.3.3 References
Suggestions and Comments:
|Back to top|