Overview
🧬 HCBench Parsers
Welcome to the HCBench Parser Suite — a unified framework for standardizing single-cell copy number alteration (CNA) outputs from multiple tools.
Each parser converts heterogeneous output formats from different algorithms (CHISEL, Alleloscope, CNRein, SEACON, and SIGNALS) into a single consistent format suitable for downstream benchmarking and visualization.
🎯 Common Standardized Outputs
While each tool requires different input files and formats, every parser in this suite is designed to output a canonical directory structure. Depending on the specific methods called, a fully processed output directory will typically contain:
CNA Matrices: haplotype_combined.csv, or split minor.csv, major.csv, and minor_major.csv matrices (regions × cells).
Cluster Mapping: clusters.csv containing standardized cell_id and clone_id columns.
Bin Counts / RDR: bin_counts.csv or bin_rdr.csv formatted as a wide matrix of regions by cells.
Sparse VAF Matrices: A VAF/ directory containing standard Matrix Market (.mtx) files for Allelic Depth (AD) and Read Depth (DP).
📖 Parsers Overview
| Parser | Main Input | Additional Inputs | Format Types | Supported Outputs |
|---|---|---|---|---|
| CHISEL | calls.tsv |
mapping.tsv, VAF table |
Tab-delimited text | CNA Matrix, Clusters, Bin Counts, Sparse VAF |
| Alleloscope | .rds files |
clusters.csv, raw counts TSV, cellSNP directory |
R serialized data, TSV, CSV | CNA Matrix, Clusters, Bin Counts, Sparse VAF |
| CNRein | CNReinPrediction.csv |
.npz arrays, split VCF files |
CSV, NPZ, VCF | CNA Matrix, Bin RDR, Sparse VAF |
| SEACON | calls.tsv |
counts.tsv, vaf.tsv |
Tab-delimited text | Split CNA Matrices, Bin Counts, Sparse VAF |
| SIGNALS | hscn.rds (exported to .tsv) |
cluster file, bin counts, VAF table | Tab-delimited text / CSV | CNA Matrix, Clusters, Bin Counts, Sparse VAF |
All parsers generate outputs in the same canonical structure, making it possible to directly compare results across tools in the HCBench benchmarking pipeline.