THiCweed paper: supporting material
The "real genomic data" figures in the paper come from
these files. The input files (".fa", fasta files) are generated from the
ENCODE datasets (IDs as below) for those cell types. The peak positions in the
narrowpeak files are taken, with a flank of +- 50bp; these are merged across
replicates; and for any resulting regions > 100bp, the central 100bp is taken.
The output files following the same filenames are ".txt" (output
"architectures") and ".tr" (position weight matrices corresponding to the
clusters found, in Transfac format). In two of the cases, the initial output
was reclustered to get fewer clusters.
The sequence logos were generated using scripts linked on the main THiCweed page. The other plots were generated using custom scripts in R.
THiCweed page