CHISEL Parser

📂 Input Files

The output directory of CHISEL typically looks like this:

demo_output/chisel/
├── calls/
│   └── calls.tsv
└── clones/
    └── mapping.tsv

calls.tsv — the main file used to generate the CNA matrix.
mapping.tsv — an optional file mapping cells to their inferred clones.

An example of calls.tsv:

#CHR    START   END CELL    NORM_COUNT  COUNT   RDR A_COUNT B_COUNT BAF CLUSTER HAP_CN  CORRECTED_HAP_CN
chr1    0   1000000 AAACAGGTACAT    16269   1590    0.7594  76  67  0.4685  55  1|1 1|1
chr1    0   1000000 AAATTTGCCTTA    16269   3003    1.3324  74  195 0.7249  55  6|2 6|2
chr1    0   1000000 AACACATCCATC    16269   1587    0.9759  78  78  0.5 55  1|1 1|1

The required columns are:

#CHR, START, END, CELL, CORRECTED_HAP_CN

Please ensure these column names are spelled exactly as shown.

To maintain a consistent coordinate convention,
the parser automatically increments the START value by 1 — converting 0-based to 1-based coordinates, as per standard genomic conventions.

📤 Output Files

After parsing, results are saved to the chisel_output/ directory, containing the following six files:

haplotype_combined.csv
haplotype_1.csv
haplotype_2.csv
minor.csv
major.csv
minor_major.csv

🚀 Example Usage

from hcbench.parsers.chisel import ChiselParser

chisel_input = "/mnt/cbc_adam/public/workspace/HCDSIM/demo_output/chisel/calls/calls.tsv"
chisel_output = "/home/jianganna/workspace/HCBench_project/HCBench/output/chisel/"

chisel_parser = ChiselParser(chisel_input, chisel_output)
chisel_parser.run()

After running, the parser will read calls.tsv and automatically generate the standardized output files listed above.