CHISEL Parser
This module provides ChiselParser, specialized for parsing CHISEL outputs and exporting standardized matrices and helper files used by hcbench workflows.
Key features:
- Run the existing pipeline on CHISEL CNA tables
- Optional two-pass execution to export two different value columns into:
clone_level/,cell_level/ - Parse CHISEL cluster assignments into a standardized
clusters.csv - Export bin-level count matrices as
bin_counts.csv - Convert a cellSNP-like VAF long table into sparse matrix outputs
π Quick Start
𧬠1. Parse the CNA Matrix
from hcbench.parsers.chisel import ChiselParser
chisel_input = "/demo_output/chisel/calls/calls.tsv"
chisel_output = "/output/chisel/"
chisel_parser = ChiselParser(chisel_input, chisel_output,value_cols=["HAP_CN_CLONE", "HAP_CN_CELL"])
chisel_parser.run()
After running, the parser will read calls.tsv and results are saved to the output directory, typically containing the following files:
/output/chisel/
βββ clone-level/
β βββ haplotype_combined.csv
β βββ haplotype_1.csv
β βββ haplotype_2.csv
β βββ minor.csv
β βββ major.csv
β βββ minor_major.csv
βββ cell-level/
β βββ haplotype_combined.csv
β βββ haplotype_1.csv
β βββ haplotype_2.csv
β βββ minor.csv
β βββ major.csv
β βββ minor_major.csv
haplotype_combined.csvβ main CNA matrix (regions Γ cells). Each value represents the combined haplotype copy number in the form"hap1|hap2".
𧬠2. Parse the Cluster File
If you have a CHISEL cluster mapping file (commonly named mapping.tsv), you can parse it separately using the new get_cluster() method:
cluster_file = "/demo_output/chisel/clones/mapping.tsv"
chisel_parser.get_cluster(cluster_file)
An example of mapping.tsv:
#CELL CLUSTER CLONE
AAACCTGAGAAGGACA 3233 Clone3233
AAACCTGAGATCTGCT 1484 Clone1484
AAACCTGAGTAATCCC 1924 None
AAACCTGAGTGCTGCC 3233 Clone3233
After running this command, the following standardized file will be created:
/output/chisel/
βββclusters.csv
with content:
cell_id,clone_id
AAACCTGAGAAGGACA,3233
AAACCTGAGATCTGCT,1484
AAACCTGAGTAATCCC,1924
𧬠3. Parse the bin counts matrix
chisel_parser.get_bin_counts()
After running this command, the following standardized file will be created:
/output/chisel/
βββbin_counts.csv
𧬠4. Parse the VAF sparse matrices
chisel_parser.get_VAF_matrix(
vaf_file_path="/demo_output/chisel/baf/baf.tsv",
min_dp=3,
min_cells=10,
)
After running this command, the following standardized file will be created:
/output/chisel/
βββ VAF/
β βββcellSNP_*.mtx
βοΈ Initialization
ChiselParser(
input_path: str,
output_path: str,
barcode_path: str | None = None,
value_cols: list[str] | tuple[str, str] | None = None,
**kwargs
)
input_path: Path to the CHISEL CNA output table (long format), the output directory of CHISEL typically looks like this:
demo_output/chisel/
βββ calls/
β βββ calls.tsv
An example of calls.tsv:
#CHR START END CELL NORM_COUNT COUNT RDR A_COUNT B_COUNT BAF CLUSTER HAP_CN CORRECTED_HAP_CN
chr1 0 1000000 AAACAGGTACAT 16269 1590 0.7594 76 67 0.4685 55 1|1 1|1
chr1 0 1000000 AAATTTGCCTTA 16269 3003 1.3324 74 195 0.7249 55 6|2 6|2
chr1 0 1000000 AACACATCCATC 16269 1587 0.9759 78 78 0.5 55 1|1 1|1
The required columns are:
#CHR, START, END, CELL, CORRECTED_HAP_CN
β Please ensure these column names are spelled exactly as shown.
output_path: base output directory.
If value_cols contains two columns, ChiselParser will automatically write to:
{output_path}/clone_level{output_path}/cell_level
barcode_path (optional): Path to a barcode mapping file used to remap cell.
Mapping is applied in:
- CNA table
- cluster table
- counts table
value_cols (optional): Enables two-pass execution. Must be a list/tuple of length 2, e.g.:
value_cols=['CORRECTED_HAP_CN','HAP_CN']
value_cols[0]β written underclone_level/value_cols[1]β written undercell_level/
If not provided, the parser runs once using the default value_col = "HAP_CN".
π§ Core Methods
ChiselParser.run()
- Legacy mode (default): if
value_colsis not provided (or only one column is configured), it runs once: - Uses
self.value_col(default:HAP_CN) - Writes into
output_path - Two-pass mode: if
value_colshas length 2, it runs twice: self.value_col = value_cols[0]βoutput_path/clone_levelself.value_col = value_cols[1]βoutput_path/cell_level
ChiselParser.get_cluster(cluster_file_path)
Parses a CHISEL cluster mapping file and writes a standardized CSV.
Input
cluster_file_path: TSV file containing at least:#CELL,CLUSTER
Output writes to:
{self.output_path}/clusters.csv, for example:
| column | meaning |
|---|---|
| cell_id | cell identifier (possibly remapped) |
| clone_id | CHISEL cluster/clone label |
ChiselParser.get_bin_counts()
Creates a region-by-cell wide matrix of per-bin counts.
Output writes to:
{self.output_path}/bin_counts.csv
This is a wide matrix:
- rows:
region(e.g.,chr1:1000-2000) - columns: cells
- values: counts
ChiselParser.get_VAF_matrix(vaf_file_path, output_path=None, min_dp=1, min_cells=1, prefix="cellSNP")
Converts a cellSNP-like VAF long table into sparse matrix outputs using long_to_mtx().
Input format
vaf_file_path must be a tab-separated file with no header and exactly 5 columns:
| column index | meaning |
|---|---|
| 0 | chromosome |
| 1 | genomic position |
| 2 | cell identifier |
| 3 | allele A count |
| 4 | allele B count |
Parameters
output_path (optional)
- If provided: outputs under
{output_path}/VAF, else: outputs under{self.output_path}/VAF
min_dp
- filter low depth (typically
Acount + Bcount)
min_cells
- filter sites supported by too few cells
prefix
- output file prefix (default:
cellSNP)
Output
Creates a VAF/ directory containing files:
.../VAF/
βββcellSNP_*.mtx