Overview

🧬 HCBench Parsers

Welcome to the HCBench Parser Suite — a unified framework for standardizing single-cell copy number alteration (CNA) outputs from multiple tools.

Each parser converts heterogeneous output formats from different algorithms (CHISEL, Alleloscope, CNRein, SEACON, and SIGNALS) into a single consistent format suitable for downstream benchmarking and visualization.

🎯 Common Standardized Outputs

While each tool requires different input files and formats, every parser in this suite is designed to output a canonical directory structure. Depending on the specific methods called, a fully processed output directory will typically contain:

CNA Matrices: haplotype_combined.csv, or split minor.csv, major.csv, and minor_major.csv matrices (regions × cells).

Cluster Mapping: clusters.csv containing standardized cell_id and clone_id columns.

Bin Counts / RDR: bin_counts.csv or bin_rdr.csv formatted as a wide matrix of regions by cells.

Sparse VAF Matrices: A VAF/ directory containing standard Matrix Market (.mtx) files for Allelic Depth (AD) and Read Depth (DP).

📖 Parsers Overview

Parser	Main Input	Additional Inputs	Format Types	Supported Outputs
CHISEL	`calls.tsv`	`mapping.tsv`, VAF table	Tab-delimited text	CNA Matrix, Clusters, Bin Counts, Sparse VAF
Alleloscope	`.rds` files	`clusters.csv`, raw counts TSV, cellSNP directory	R serialized data, TSV, CSV	CNA Matrix, Clusters, Bin Counts, Sparse VAF
CNRein	`CNReinPrediction.csv`	`.npz` arrays, split VCF files	CSV, NPZ, VCF	CNA Matrix, Bin RDR, Sparse VAF
SEACON	`calls.tsv`	`counts.tsv`, `vaf.tsv`	Tab-delimited text	Split CNA Matrices, Bin Counts, Sparse VAF
SIGNALS	`hscn.rds` (exported to `.tsv`)	cluster file, bin counts, VAF table	Tab-delimited text / CSV	CNA Matrix, Clusters, Bin Counts, Sparse VAF

All parsers generate outputs in the same canonical structure, making it possible to directly compare results across tools in the HCBench benchmarking pipeline.