schicluster.cool.scool
#
Module Contents#
- generate_scool_batch_data(cell_path_dict, resolution, chrom_offset, chrom_size_path, blacklist_1d_path, blacklist_2d_path, remove_duplicates, blacklist_resolution, output_path, chr1=1, chr2=5, pos1=2, pos2=6, min_pos_dist=2500)[source]#
- generate_scool_single_resolution(cell_path_dict, chrom_size_path, resolution, output_path, blacklist_1d_path, blacklist_2d_path, remove_duplicates, blacklist_resolution, chr1=1, chr2=5, pos1=2, pos2=6, min_pos_dist=2500, batch_n=20, cpu=1)[source]#
- generate_scool(contacts_table, output_prefix, chrom_size_path, resolutions, blacklist_1d_path=None, blacklist_2d_path=None, blacklist_resolution=10000, remove_duplicates=True, chr1=1, chr2=5, pos1=2, pos2=6, min_pos_dist=2500, cpu=1, batch_n=50)[source]#
Generate single-resolution cool files from single-cell contact files recorded in contacts_table
- Parameters:
contacts_table – tab-separated table containing tow columns, 1) cell id, 2) cell contact file path (juicer-pre format) No header
output_prefix – Output prefix of the cool files. Output path will be {output_prefix}.{resolution_str}.cool
chrom_size_path – Path to the chromosome size file, this file should be in UCSC chrom.sizes format. We will use this file as the reference to create matrix. It is recommended to remove small contigs or chrM from this file.
resolutions – Resolutions to generate the matrix. Each resolution will be stored in a separate file.
blacklist_1d_path – Path to blacklist region BED file, such as ENCODE blacklist. Either side of the contact overlapping with a blacklist region will be removed.
blacklist_2d_path – Path to blacklist region pair BEDPE file. Both side of the contact overlapping with the same blacklist region pair will be removed.
blacklist_resolution – Resolution in bps when consider the 2D blacklist region pairs.
remove_duplicates – If true, will remove duplicated contacts based on [chr1, pos1, chr2, pos2] values
chr1 – 0 based index of chr1 column.
chr2 – 0 based index of chr2 column.
pos1 – 0 based index of pos1 column.
pos2 – 0 based index of pos2 column.
min_pos_dist – Minimum distance for a fragment to be considered.
cpu – number of cpus to parallel.
batch_n – number of cells to deal with in each cpu process.