- Search for TFs - Search for Genes - Search for Associations
Group genes:
Pattern Matching:
Utilities:
Retrieve:
About Yeastract:
Support & suggestions:
|
Index
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pattern |
Matches |
|---|---|
| TATATAAG | TATATAAG |
| TATAWAAM | TATAAAAC, TATAAAAA, TATATAAC, TATAAAAA |
| TATA[GC]AA[AT] | TATAGAAT, TATAGAAA, TATACAAT, TATACAAA |
The query requires a DNA motif as input, which must be at least four bases long.
Back to topThe search allows for substitutions, whose number (zero, one or two) can be selected using the box labelled Substitutions.
Back to topThe input DNA motifs (and their complementary motifs) are compared with the described motifs contained in the database. The ouput is a list of TF binding sites that contain, are contained in, or precisely match the input moitfs (or their complements). This search allows the user to check whether a newly identified DNA motif corresponds precisely, is contained in, or contains a previously described TF binding site.
If a TF binding site contains a long stretch of fully degenerate nucleotides (N), then many input DNA motifs could match these nucleotides. To avoid obtaining a large list of irrelevant matches, only the terminal Ns of N repeat regions more than two nucleotides long (NN) in the TF binding sites are considered.
For example, a search for the DNA motif ATGAT results in the identification of the binding site belonging to the transcription factor Abf1p, with the consensus sequence TNNCGTNNNNNNTGAT. The DNA motif aligns with the five last nucleotides of the consensus, which is valid since this region contains only one fourfold degenerate position (N). On the other hand, the search for DNA motif AATGAT does not result in the previous TF binding site. Although the motif aligns with the previous TF binding site, its homologous subsequence in the TF binding site contains two fully degenerate positions (NN).
Now, consider the search for the motif TAACGT. This motif aligns perfectly at the beginning of the Ab1fp TF binding site, which contains an N repeat region of length 2. In this case, the alignment is allowed; hence the TF binding site is said to contain the motif TAACGT.
Back to top Simple nucleotide sequences are strings that
consist exclusively of the four characters that represent the DNA
nucleotides:
A, T, G and C. A search for a given simple nucleotide sequence only
returns
sequences that match the query string exactly.
Standard IUPAC Nucleotide code is used to describe ambiguous sites in a given DNA sequence motif, where a single character may represent more than one nucleotide. The code is shown in the table below.
| IUPAC Code |
Meaning |
Origin of Description |
|---|---|---|
| G |
G |
Guanine |
| A |
A |
Adenine |
| T |
T |
Thymine |
| C |
C |
Cytosine |
| R |
G or A |
puRine |
| Y |
T or C |
pYrimidine |
| M |
A or C |
aMino |
| K |
G or T |
Ketone |
| S |
G or C |
Strong interaction |
| W |
A or T |
Weak interaction |
| H |
A or C or T |
not-G, H follows G in the alphabet |
| B |
G or T or C |
not-A, B follows A in the alphabet |
| V |
G or C or A |
not-T (not-U), V follows U in the alphabet |
| D |
G or A or T |
not-C, D follows C in the alphabet |
| N |
G or A or T or C |
aNy |
A regular expression is a pattern containing characters and syntactic elements that matches a set of strings. The regular expression characters permitted in the searches for DNA motifs are those included in the IUPAC nucleotide code as well as the following syntactic element:
[] – Matches one of the characters contained in the brackets.
Back to top[1] Biochem J. 1985 July 15; 229(2): 281–286. (PubMed)