THiCweed paper: supporting material

The "real genomic data" figures in the paper come from these files. The input files (".fa", fasta files) are generated from the ENCODE datasets (IDs as below) for those cell types. The peak positions in the narrowpeak files are taken, with a flank of +- 50bp; these are merged across replicates; and for any resulting regions > 100bp, the central 100bp is taken. The output files following the same filenames are ".txt" (output "architectures") and ".tr" (position weight matrices corresponding to the clusters found, in Transfac format). In two of the cases, the initial output was reclustered to get fewer clusters.

MAFF (from ENCSR000EGI):
K562_MAFF_all_m_lcr_100.fa
K562_MAFF_all_m_lcr_100.fa_archs.txt
K562_MAFF_all_m_lcr_100.fa_wms.tr
ATF1 (from ENCSR000DNZ)
K562_ATF1_all_m_lcr_100.fa
K562_ATF1_all_m_lcr_100.fa_archs_reclust.txt
K562_ATF1_all_m_lcr_100.fa_archs_reclust_wms.tr
RAD21 (from ENCSR000BMY and ENCSR000EAC):
GM12878_RAD21_all_m_lcr_100.fa
GM12878_RAD21_all_m_lcr_100.fa_archs_reclust.txt
GM12878_RAD21_all_m_lcr_100.fa_reclust_wms.tr

The sequence logos were generated using scripts linked on the main THiCweed page. The other plots were generated using custom scripts in R.

THiCweed page