PhyloGibbs
NEW A significant enhancement, PhyloGibbs-MP, is now available. It is in late beta form, but almost final, and PhyloGibbs-1.0 (the version below, which was released with the original paper) is now deprecated. Please consider using PhyloGibbs-MP instead.
The original version of PhyloGibbs is being maintained at Erik van Nimwegen's group in Basel and is available here. It, too, differs in some ways from PhyloGibbs-1.0. A web interface is available here.
PhyloGibbs is a motif finder to find binding sites for transcription
factors in cis-regulatory sequences of DNA. It is based on the Gibbs
sampling algorithm, but with the following enhancements:
- If sequences from closely related species are used, it
systematically accounts for non-functional conservation due to
phylogeny and modifies the scoring accordingly. On tests, with
synthetic and real genomic data, we find that this
approach significantly increases specificity to known binding sites. Input
sequences need to be preprocessed by a multiple alignment program
and presented in "aligned fasta" or "multi-fasta" format; we have developed an
alignment program Sigma designed for
non-coding DNA, and also recommend Dialign,
but other programs may be used.
- It bypasses the problems of estimating the number of motifs in the
sequence, and of assessing significance of found motifs, by using a
two-stage motif-finding strategy: the first "simulated annealing"
phase finds a few high-quality groups of binding sites representing a
few different motifs, and the second "tracking" phase keeps statistics
on how much these groups hang together and what other sites get
co-clustered with them. The output is a list of putative binding
sites (not limited by the initial guess) and the fraction of time they
were co-clustered in that group (the most direct measure of their
significance).
References:
-
Siddharthan R, van Nimwegen E, Siggia ED. (2004) PhyloGibbs: A Gibbs
sampler incorporating phylogenetic information, in Eskin E, Workman
C (eds), RECOMB 2004 Satellite Workshop on Regulatory Genomics,
LNBI 3318, 3041 (Springer-Verlag Berlin Heidelberg
2005). NOTE: This is a preliminary report and there have been
many changes in the program since then.
-
Siddharthan R, Siggia ED, van Nimwegen E. (2005) PhyloGibbs: A Gibbs
sampling motif finder that incorporates phylogeny.
PLoS Comput Biol 1(7): e67
Read the phylogibbs(1) manpage on
basic usage of the program, and the phylogibbs_algorithm(7) manpage
describing the algorithm.
The code:
The final (version 1.0) version of PhyloGibbs is released now.
A webserver where one can submit
PhyloGibbs jobs is also planned. The last feature-complete snapshot on
this page (November 15, 2005) had a small bug in string-handling when
parsing the "-L" option, which apparently showed up only on very new linux
systems: this is the only fix in 1.0, but if "-L" works for you in
the 20051115 version, you don't need to upgrade.
Further development continues and will be made available in later
versions.
- Source code tarball,
including instructions on compiling and usage (start with README), and
example output.
Requires the GSL and glib libraries and headers installed
(standard on most linux systems). Should compile on most Unix-like
systems, and on Microsoft Windows in the
Cygwin environment. (185 KB)
- Linux binary (dynamic), requires
glibc 2.3, glib 2.x, gsl 1.x. (90 KB)
- Linux binary (static),
should run on most linux systems (as well as BSD systems with linux
emulation). (1.1 MB)
- Windows zip archive,
contains phylogibbs.exe binary (compiled under cygwin on Windows XP)
and required DLL's from cygwin. Also contains the manual pages
in html format, and the Examples subdirectory.
(Updated Jan 4 2006, compiled from
source file of Nov 15, but apparently not affected by the "-L" handling
bug above).
(Changes from earlier snapshot of July 28: a change in the -L syntax
to conform with the standard Newick syntax; new options -M, -x, -y
(not described in paper; see manpage).
For Unix manual pages and examples, download the source distribution.
To know more, contact
me or one of
the other authors.
The authors of PhyloGibbs are:
Rahul Siddharthan, The Institute of Mathematical
Sciences
(much of this work was
done at The Rockefeller
University)
Eric D. Siggia,
The Rockefeller University
Erik van
Nimwegen, Biozentrum, University of Basel