THiCweed paper: supporting material

The "real genomic data" figures in the paper come from these files. The input files (".fa", fasta files) are generated from the ENCODE datasets (IDs as below) for those cell types. The peak positions in the narrowpeak files are taken, with a flank of +- 50bp; these are merged across replicates; and for any resulting regions > 100bp, the central 100bp is taken. The output files following the same filenames are ".txt" (output "architectures") and ".tr" (position weight matrices corresponding to the clusters found, in Transfac format). In two of the cases, the initial output was reclustered to get fewer clusters.

The sequence logos were generated using scripts linked on the main THiCweed page. The other plots were generated using custom scripts in R.


THiCweed page