cloneSizebycellprofile
Function
cloneSizebycellprofile(
self,
gt_cna_file: str,
tool_cna_files: List[str],
tool_names: List[str],
outfile: str = "clone_size_by_cell_profile.csv",
) -> pd.DataFrame
This function evaluates clone-size consistency at the cell profile level. It first derives clone sizes from the GT CNA matrix (cells sharing identical bin-level CNA profiles are grouped as one profile-clone), then compares each tool's inferred profile-clone sizes against GT.
The function writes:
- a detailed per-cell comparison table
- a mean summary grouped by GT
cluster_size
Parameters
| Name | Type | Description |
|---|---|---|
gt_cna_file |
str |
Path to GT CNA profile CSV. |
tool_cna_files |
List[str] |
List of tool CNA profile CSV files. Must align with tool_names in order. |
tool_names |
List[str] |
Tool names used as output column prefixes ({Tool}_pred_size). |
outfile |
str |
Detailed output filename. Default: "clone_size_by_cell_profile.csv". |
Input File Format
gt_cna_file and each file in tool_cna_files are expected to be CNA matrices:
- rows: genomic regions (first column as index when reading)
- columns: cell IDs
- values: CNA states (e.g.,
"1|1")
Example:
region,cell_001,cell_002,cell_003
chr1:1-100000,1|1,1|1,2|1
chr1:100001-200000,1|1,1|1,2|1
Implementation details from source:
- missing values are filled with
"1|1" - for tool matrices, all-empty columns are dropped before processing
Output
Two files are written to self.output_dir:
os.path.join(self.output_dir, outfile)os.path.join(self.output_dir, f"mean_{outfile}")
1) Detailed Table (outfile)
Index: GT cell ID
| Column | Meaning |
|---|---|
cluster_size |
GT profile-clone size for that cell |
{Tool}_pred_size |
Predicted profile-clone size for the same cell ID from each tool (NaN if cell is missing in tool result) |
2) Mean Table (mean_{outfile})
Grouped by GT cluster_size, numeric prediction columns are averaged:
| Column | Meaning |
|---|---|
cluster_size |
GT profile-clone size |
{Tool}_pred_size |
Mean predicted size for GT clones of that size |
Return Type
The function signature annotates -> pd.DataFrame, but current implementation does not explicitly return a value (runtime return is None).
Example
from hcbench.gtbench.gtbench import GTBench
bench = GTBench(output_dir="out/gt_output")
bench.cloneSizebycellprofile(
gt_cna_file="/path/to/gt/haplotype_combined.csv",
tool_cna_files=[
"/path/to/signals/haplotype_combined.csv",
"/path/to/seacon/haplotype_combined.csv",
],
tool_names=["signals", "seacon"],
outfile="clone_size_by_cell_profile.csv",
)