Index
- Summary
- Input
- Options
- Output
- Notes
- Simple Nucleotide Sequences
- IUPAC Nucleotide Code
- Regular Expressions
- References
1. Summary
This utility searches for the motifs in a given list in the motifs from the other list.
It allows for the comparison of new DNA motifs (for instance, those over-represented in the promoter regions of co-regulated genes) with
DNA motifs which are not described in the YEASTRACT database (for instance, DNA motifs found to be conserved in the promoter regions of
closely related yeast species but not associated with a specific TF).
Back to top
2. Input
The required input are two lists of DNA motifs. These motifs can be simple nucleotide sequences,
contain IUPAC nucleotide code or
even
contain regular expression elements.
Back to top
3. Options
The search can be performed to determine whether the list 1 motifs are
contained in the list 2 motifs, or vice versa. The user can allow for
up to two substitutions.
Furthermore, the user may restrict its search to the regulatory associations identified based on direct or indirect evidences.
Direct Evidence was considered to be provided through experiments such as Chromatine ImmunoPrecipitation (ChIP), ChIP-on-chip and Electrophoretic Mobility Shift Assay (EMSA), that prove the direct binding of the TF to the target gene's promoter region, or such as the analysis of the effect on target-gene expression of the site-directed mutation of the TF binding site in its promoter region, which strongly suggests that the TF interacts with that specific target promoter.
The classification Indirect Evidence was attributed to experiments such as the comparative analysis of gene
expression changes occurring in response to the deletion, mutation or over-expression of a given TF.
The complete enumeration of the experimental approaches considered to provide direct or indirect evidences is provided in Table Evidence code list.
Back to top
4. Output
The output is a set of pairs of motifs, one from
each list, that are found to match each other, the position at which
the match was found and the strand (forward or reverse).
Back to top
5. Notes
Simple Nucleotide Sequences
Simple nucleotide sequences are strings that
consist exclusively of the four characters that represent the DNA
nucleotides:
A, T, G and C. A search for a given simple nucleotide sequence only
returns
sequences that match the query string exactly.
Back to top
IUPAC Nucleotide Code
Standard IUPAC Nucleotide code is used to
describe
ambiguous sites in a given DNA sequence motif, where a single character
may
represent more than one nucleotide. The
code is shown in the table below.
IUPAC Code
|
Meaning
|
Origin of Description
|
G
|
G
|
Guanine
|
A
|
A
|
Adenine
|
T
|
T
|
Thymine
|
C
|
C
|
Cytosine
|
R
|
G or A
|
puRine
|
Y
|
T or C
|
pYrimidine
|
M
|
A or C
|
aMino
|
K
|
G or T
|
Ketone
|
S
|
G or C
|
Strong interaction
|
W
|
A or T
|
Weak interaction
|
H
|
A or C or T
|
not-G, H follows G in the alphabet
|
B
|
G or T or C
|
not-A, B follows A in the alphabet
|
V
|
G or C or A
|
not-T (not-U), V follows U in the alphabet
|
D
|
G or A or T
|
not-C, D follows C in the alphabet
|
N
|
G or A or T or C
|
aNy
|
Back to top
Regular Expressions
A regular expression is a pattern containing
characters and syntactic elements that matches a set of strings. The
regular
expression characters permitted in the searches for DNA motifs are
those included
in the IUPAC nucleotide code as well as the following syntactic
element:
[] – Matches one of the characters contained in
the brackets.
Back to top
6. References
[1] Biochem J. 1985 July 15; 229(2): 281–286. (PubMed)
Back to top