Hccnchange
Function
hccnchange(
self,
tool_hap1_cna_files: List[str],
tool_hap2_cna_files: List[str],
tool_names: List[str],
changes_file: str,
profile_bin_size=100000,
outfile: str = "evolution_onset_CN_Change.csv",
) -> pd.DataFrame
This function evaluates haplotype-specific copy-number change (CN change) prediction for multiple tools during evolutionary onset.
Given a ground-truth change table (changes_file) and each tool’s haplotype CNA profiles (hap1 and hap2), it:
- Standardizes GT segments and tool CNA profiles to the same bin size (
profile_bin_size). - Joins GT change segments with predicted haplotype copy numbers using
_onset_join. - Produces predicted CN-change labels/values (
Change_predict) and compares them to GTChange. - Computes RMSE and ACC for CN-change prediction, stratified by change
Type. - Saves a per-tool, per-type summary table.
Parameters
| Name | Type | Description |
|---|---|---|
tool_hap1_cna_files |
List[str] |
List of tool CNA profile CSVs for haplotype 1. Must align with tool_names. |
tool_hap2_cna_files |
List[str] |
List of tool CNA profile CSVs for haplotype 2. Must align with tool_names. |
tool_names |
List[str] |
Tool names used in output rows. |
changes_file |
str |
Path to the ground-truth CN-change table (segments + change labels). |
profile_bin_size |
int |
Bin size used to split/standardize both GT segments and predictions. Default: 100000 (100kb). |
outfile |
str |
Output filename for the summary table. Default: "evolution_onset_CN_Change.csv". |
Input File Format
changes_file (ground-truth CN-change table)
Expected to contain at least:
Segment: genomic interval identifier used as the segment keyType: change category label (used for stratified evaluation)Change: ground-truth CN-change signal (numeric or categorical, depending on your pipeline)
Optional:
Haplotype: if present, haplotype labels are normalized viaself._normalize_hap_label(...)
Implementation details:
change_truth = read_and_drop_empty(changes_file)
if "Haplotype" in change_truth.columns:
change_truth["Haplotype"] = self._normalize_hap_label(change_truth["Haplotype"])
change_truth = split_all_regions(change_truth.set_index("Segment"), profile_bin_size)
change_truth = change_truth.reset_index().rename(columns={"index": "Segment"})
Tool CNA profiles (tool_hap1_cna_files, tool_hap2_cna_files)
Expected to be CNA matrices with:
regioncolumn- cell columns
- haplotype-specific CN values
Each file is loaded via read_and_drop_empty and re-binned:
p_h1 = split_all_regions(p_h1.set_index("region"), profile_bin_size).reset_index()
p_h2 = split_all_regions(p_h2.set_index("region"), profile_bin_size).reset_index()
Evaluation Logic
For each tool:
- Join GT changes with predicted haplotype CN:
h1 = self._onset_join(change_truth, p_h1, "hap1")
h2 = self._onset_join(change_truth, p_h2, "hap2")
comb = pd.concat([h1, h2], ignore_index=True)
The joined table is expected to include (at minimum):
TypeChange(GT)-
Change_predict(predicted CN-change) -
For each change
Type, compute CN-change metrics:
rmse = self._cn_change_rmse(gt, pd_)
acc = self._cn_change_acc(gt, pd_)
-
Append one summary row per
(Tool × Type): -
Tool TypeRMSEACC
Output
A single CSV summary table is written to self.output_dir:
os.path.join(self.output_dir, outfile)
Default:
{self.output_dir}/evolution_onset_CN_Change.csv
Output Table Schema
The output table contains one row per (Tool × Type):
| Column | Meaning |
|---|---|
Tool |
Tool name |
Type |
Change category label from the GT table |
RMSE |
CN-change RMSE computed by self._cn_change_rmse |
ACC |
CN-change accuracy computed by self._cn_change_acc |
Notes:
- The exact interpretation of
RMSEandACCdepends on howChange/Change_predictare encoded and how_cn_change_rmse/_cn_change_accare implemented (numeric vs categorical).
Return Value
Returns a pd.DataFrame with columns:
ToolTypeRMSEACC
Example
from hcbench.gtbench.gtbench import GTBench
bench = GTBench(output_dir="out/gt_output")
df = bench.hccnchange(
tool_hap1_cna_files=[
"/path/to/chisel/hap1.csv",
"/path/to/signals/hap1.csv",
],
tool_hap2_cna_files=[
"/path/to/chisel/hap2.csv",
"/path/to/signals/hap2.csv",
],
tool_names=["CHISEL", "SIGNALS"],
changes_file="/path/to/gt/changes_cn_change.csv",
profile_bin_size=100000,
outfile="evolution_onset_CN_Change.csv",
)
print(df)