Cnclass
Function
cnclass(
self,
tool_hap1_cna_files: List[str],
tool_hap2_cna_files: List[str],
tool_names: List[str],
profile_hap1_cna_file: str,
profile_hap2_cna_file: str,
type: str = "hcCNA",
profile_bin_size=100000,
outfile: str = "cnclass_results.csv",
) -> pd.DataFrame
This function performs CNA class–specific evaluation by stratifying genomic bins into predefined CNA categories (e.g., gain/neutral/loss, exact CN states) and then computing clone-level metrics for each category, haplotype, and tool.
It supports both:
hcCNA: haplotype-specific copy numbers (hap1,hap2)acCNA: allele-specific copy numbers (minor,major)
For each tool, the function:
- Loads GT haplotype CNA profiles (hap1/hap2 or minor/major).
- Loads tool predictions for hap1 and hap2.
- Re-bins regions to a uniform resolution (
profile_bin_size) and aligns GT vs predictions. - For each CNA category (condition), saves category-specific bin tables into an output folder.
- Computes clone-level metrics within each category and aggregates results into one summary table.
Parameters
| Name | Type | Description |
|---|---|---|
tool_hap1_cna_files |
List[str] |
List of tool prediction CSVs for haplotype 1 (or minor allele if type="acCNA"). Must align with tool_names. |
tool_hap2_cna_files |
List[str] |
List of tool prediction CSVs for haplotype 2 (or major allele if type="acCNA"). Must align with tool_names. |
tool_names |
List[str] |
Tool names used to create output directories and label result rows. |
profile_hap1_cna_file |
str |
Path to GT CNA profile CSV for haplotype 1 (or minor). |
profile_hap2_cna_file |
str |
Path to GT CNA profile CSV for haplotype 2 (or major). |
type |
str |
CNA type. Must be "hcCNA" or "acCNA". Default: "hcCNA". |
profile_bin_size |
int |
Bin size used to split regions for tool predictions before alignment. Default: 100000 (100kb). |
outfile |
str |
Output filename for the final aggregated result table. Default: "cnclass_results.csv". |
Input File Format
GT files: profile_hap1_cna_file, profile_hap2_cna_file
Expected to be CNA matrices with:
- one column named
region - remaining columns as cell IDs
- values as integer CNA states or strings convertible to numeric (depends on your pipeline)
Loaded via read_and_drop_empty(...) and then indexed by "region".
Tool prediction files: tool_hap1_cna_files, tool_hap2_cna_files
Expected to be CSVs with:
regioncolumn- cell columns
Implementation details:
- tool predictions are loaded with
pd.read_csv(...).fillna(-1) - regions are re-binned using:
split_all_regions(df.set_index("region"), profile_bin_size)
- GT and predictions are aligned by:
gt_h1, p_h1 = align(gt_h1_r, p_h1)
gt_h2, p_h2 = align(gt_h2_r, p_h2)
CNA Categories (Conditions)
The function evaluates each tool under multiple CNA “classes”, defined by conditions:
CN_Gain:>=2CN_Neutral:=1CN_Loss:=0- exact CN states:
=2, =3, ..., =10(foldersCN_equal_2…CN_equal_10)
Each class corresponds to a subfolder name (used in output directory structure) and a condition string (used by downstream categorization).
Haplotype Labels
Haplotype naming depends on type:
- if
type="hcCNA"→hap_list = ["hap1", "hap2"] - if
type="acCNA"→hap_list = ["minor", "major"]
These names are passed into downstream functions and appear in the final output table.
Output
Directory structure (per tool × CNA class)
For each tool and each CNA class, the function creates:
{self.output_dir}/{tool}/{folder}/
Examples:
.../CHISEL/CN_Gain/.../CHISEL/CN_equal_3/
Inside each folder, the function writes class-specific intermediate files generated by:
categorize_and_save(...)(for hap1/minor and hap2/major)process_folder_for_metrics_clone(...)(reads folder contents and computes metrics)
Final aggregated CSV (outfile)
Saved to:
os.path.join(self.output_dir, outfile)
Default:
{self.output_dir}/cnclass_results.csv
Output Table Schema
The returned dataframe (and saved CSV) contains one row per:
(CNA class × haplotype × clone × tool)
Columns include:
| Column | Meaning |
|---|---|
Type |
CNA class folder name (e.g., CN_Gain, CN_equal_4) |
Haplotype |
hap1/hap2 or minor/major depending on type |
Clone |
Clone identifier returned by process_folder_for_metrics_clone |
Tool |
Tool name |
| (metrics...) | Additional metric fields returned in metrics dict (depends on your implementation of process_folder_for_metrics_clone) |
Return Value
Returns a pd.DataFrame with the aggregated per-class, per-haplotype, per-clone metrics for all tools.
Example
from hcbench.gtbench.gtbench import GTBench
bench = GTBench(output_dir="out/gt_output")
df = bench.cnclass(
tool_hap1_cna_files=[
"/path/to/chisel/hap1.csv",
"/path/to/signals/hap1.csv",
],
tool_hap2_cna_files=[
"/path/to/chisel/hap2.csv",
"/path/to/signals/hap2.csv",
],
tool_names=["CHISEL", "SIGNALS"],
profile_hap1_cna_file="/path/to/gt/hap1.csv",
profile_hap2_cna_file="/path/to/gt/hap2.csv",
type="hcCNA",
profile_bin_size=100000,
outfile="cnclass_results.csv",
)
print(df.head())