Skip to content

Cnclass

Function

cnclass(
    self,
    tool_hap1_cna_files: List[str],
    tool_hap2_cna_files: List[str],
    tool_names: List[str],
    profile_hap1_cna_file: str,
    profile_hap2_cna_file: str,
    type: str = "hcCNA",
    profile_bin_size=100000,
    outfile: str = "cnclass_results.csv",
) -> pd.DataFrame

This function performs CNA class–specific evaluation by stratifying genomic bins into predefined CNA categories (e.g., gain/neutral/loss, exact CN states) and then computing clone-level metrics for each category, haplotype, and tool.

It supports both:

  • hcCNA: haplotype-specific copy numbers (hap1, hap2)
  • acCNA: allele-specific copy numbers (minor, major)

For each tool, the function:

  1. Loads GT haplotype CNA profiles (hap1/hap2 or minor/major).
  2. Loads tool predictions for hap1 and hap2.
  3. Re-bins regions to a uniform resolution (profile_bin_size) and aligns GT vs predictions.
  4. For each CNA category (condition), saves category-specific bin tables into an output folder.
  5. Computes clone-level metrics within each category and aggregates results into one summary table.

Parameters

Name Type Description
tool_hap1_cna_files List[str] List of tool prediction CSVs for haplotype 1 (or minor allele if type="acCNA"). Must align with tool_names.
tool_hap2_cna_files List[str] List of tool prediction CSVs for haplotype 2 (or major allele if type="acCNA"). Must align with tool_names.
tool_names List[str] Tool names used to create output directories and label result rows.
profile_hap1_cna_file str Path to GT CNA profile CSV for haplotype 1 (or minor).
profile_hap2_cna_file str Path to GT CNA profile CSV for haplotype 2 (or major).
type str CNA type. Must be "hcCNA" or "acCNA". Default: "hcCNA".
profile_bin_size int Bin size used to split regions for tool predictions before alignment. Default: 100000 (100kb).
outfile str Output filename for the final aggregated result table. Default: "cnclass_results.csv".

Input File Format

GT files: profile_hap1_cna_file, profile_hap2_cna_file

Expected to be CNA matrices with:

  • one column named region
  • remaining columns as cell IDs
  • values as integer CNA states or strings convertible to numeric (depends on your pipeline)

Loaded via read_and_drop_empty(...) and then indexed by "region".

Tool prediction files: tool_hap1_cna_files, tool_hap2_cna_files

Expected to be CSVs with:

  • region column
  • cell columns

Implementation details:

  • tool predictions are loaded with pd.read_csv(...).fillna(-1)
  • regions are re-binned using:
split_all_regions(df.set_index("region"), profile_bin_size)
  • GT and predictions are aligned by:
gt_h1, p_h1 = align(gt_h1_r, p_h1)
gt_h2, p_h2 = align(gt_h2_r, p_h2)

CNA Categories (Conditions)

The function evaluates each tool under multiple CNA “classes”, defined by conditions:

  • CN_Gain: >=2
  • CN_Neutral: =1
  • CN_Loss: =0
  • exact CN states: =2, =3, ..., =10 (folders CN_equal_2CN_equal_10)

Each class corresponds to a subfolder name (used in output directory structure) and a condition string (used by downstream categorization).


Haplotype Labels

Haplotype naming depends on type:

  • if type="hcCNA"hap_list = ["hap1", "hap2"]
  • if type="acCNA"hap_list = ["minor", "major"]

These names are passed into downstream functions and appear in the final output table.


Output

Directory structure (per tool × CNA class)

For each tool and each CNA class, the function creates:

{self.output_dir}/{tool}/{folder}/

Examples:

  • .../CHISEL/CN_Gain/
  • .../CHISEL/CN_equal_3/

Inside each folder, the function writes class-specific intermediate files generated by:

  • categorize_and_save(...) (for hap1/minor and hap2/major)
  • process_folder_for_metrics_clone(...) (reads folder contents and computes metrics)

Final aggregated CSV (outfile)

Saved to:

os.path.join(self.output_dir, outfile)

Default:

{self.output_dir}/cnclass_results.csv

Output Table Schema

The returned dataframe (and saved CSV) contains one row per:

(CNA class × haplotype × clone × tool)

Columns include:

Column Meaning
Type CNA class folder name (e.g., CN_Gain, CN_equal_4)
Haplotype hap1/hap2 or minor/major depending on type
Clone Clone identifier returned by process_folder_for_metrics_clone
Tool Tool name
(metrics...) Additional metric fields returned in metrics dict (depends on your implementation of process_folder_for_metrics_clone)

Return Value

Returns a pd.DataFrame with the aggregated per-class, per-haplotype, per-clone metrics for all tools.


Example

from hcbench.gtbench.gtbench import GTBench

bench = GTBench(output_dir="out/gt_output")

df = bench.cnclass(
    tool_hap1_cna_files=[
        "/path/to/chisel/hap1.csv",
        "/path/to/signals/hap1.csv",
    ],
    tool_hap2_cna_files=[
        "/path/to/chisel/hap2.csv",
        "/path/to/signals/hap2.csv",
    ],
    tool_names=["CHISEL", "SIGNALS"],
    profile_hap1_cna_file="/path/to/gt/hap1.csv",
    profile_hap2_cna_file="/path/to/gt/hap2.csv",
    type="hcCNA",
    profile_bin_size=100000,
    outfile="cnclass_results.csv",
)

print(df.head())