- Search for TFs - Search for Genes - Search for Associations
Group genes:
Pattern Matching:
Utilities:
Retrieve:
About Yeastract:
Support & suggestions:
|
Index
| |||||||||||||||||||
| Search both strands | Should be selected if the user wishes to perform the search for motifs in both the forward and reverse strands of the promoters of the input genes. |
| Minimum number of sequences (sieve rate) | Minimum percentage of the input genes that must contain the motifs in their promoters. |
| Lambda parameter (λ, size of λ-mers) | Size of the small sequences used to build the matrix of co-occurrences. |
| Epsilon parameter (ε, distance tolerance) | Distance tolerance in the configuration of a pair of λ-mers. The use of ε greater than zero allows the configurations of λ-mers to have slight variations from sequence to sequence. |
| Maximum p-value for assessment of relevant motifs for family generation | Specifies the maximum p-value of the subset of motifs that will be considered to generate the families |
An email will be sent to the given address once the algorithm finishes. This email contains information about the input parameters and a link to a web page, as shown in Figure 5.
A sample output page is shown in Figure 6, below. This page contains a link to the motif finder's output and a table listing the families of motifs. Each entry contains a logo depicting the PWM (Position Weight Matrix) of the family, the p-value of the root and a link to a file containing the list of motifs in the family and the PWM itself.
The families obtained are displayed in a number of pages, and these can be viewed by using the links shown in Figure 7, below.
Each of the PWMs in this output file can be compared to the TFBS contained in the YEASTRACT database, by following the steps depicted in Figure 8:
The default metric is "Sum of the Squared Distances", and the input PWM can also be trimmed. Trimming removes the columns at the edges of the PWM that have an information content below the selected threshold.
The comparison of the input PWM with the TFBS of the YEASTRACT database is done using a procedure described in [2]. First of all, the TFBS of the YEASTRACT database are converted to PWMs, using the IUPAC rules and assuming equiprobability between the nucleotides. A few examples are shown in Figure 9. The input PWM is then locally aligned (using the Smith-Waterman local alignment algorithm) with each of the TFBS PWMs, with the selected column distance metric to perform the alignment. Four distance metrics were implemented:
For the average log-likelihood ratio distance metric, the nucleotide background frequencies were corrected for the GC-content (38%) of S. cerevisiae promoter DNA.
Each of these metrics compares two columns, evaluating their similarity numerically and either favouring or penalizing their alignment. After the alignments are performed between the input PWM and all the TFBS, they are ordered by score, and the twenty top scoring alignments are displayed in the results table.
An example of the output obtained when a PWM of the family of motifs is compared with the YEASTRACT TFBS is shown in Figure 10. The results table contains TFBS that were found to be similar to the input PWM. Each row contains the TFBS, the TF it belongs to, whether the input PWM aligns on the forward or the reverse strand of the TFBS and the alignment of the input PWM with the PWM of the TFBS. An example of one local alignment is shown in Figure 11.
[1] Mendes N.D., Casimiro A.C., Santos P.M., Sá-Correia I., Oliveira A.L., Freitas A.T., MUSA: A parameter free algorithm for the identification of biologically significant motifs, Bioinformatics, 22, 2996-3002, 2006
[2] Mahony S, Auron PE, Benos PV (2007) DNA familial binding profiles made easy: Comparison of various motif alignment and clustering strategies., PLoS Comput Biol, 3(3): e61. doi:10.1371/journal.pcbi.0030061