`pheno_prerank_enrich()` Enrichment: Gene Interaction List with Rank Scores for Phenotypes

Step 1: Prepare `rank_df`

Begin by preparing a list of gene interactions along with their preranked scores. Store this information in a DataFrame called rank_df.

The index of rank_df should be the gene interaction symbols. e.g. Oas1a_Ifit1
The columns of rank_df should correspond to one target phenotype (e.g., GK vs.).

This format allows pheno_prerank_enrich to perform enrichment analysis using phenotype-level ranked interaction-level statistics.

In this tutorial, we use gene interaction data from the GSE13268 dataset, which includes two phenotypes:

GK (Goto-Kakizaki): An inbred rat model commonly used for diabetes research. GK rats exhibit a polygenic form of diabetes, closely mirroring human disease characteristics such as hyperglycemia, impaired glucose tolerance, and insulin resistance. This makes them a valuable model for studying human type 2 diabetes.
WKY (Wistar-Kyoto): A standard inbred laboratory rat strain often used as a control. In this study, WKY rats serve as the reference group for comparison with GK rats, enabling the investigation of insulin resistance and diabetes progression.

For each rat sample, gene–gene interactions and their corresponding entropy-based Critical Transition (CT) scores were precomputed by NIEE. A differential analysis was then performed between the GK and WKY groups. From this, we identified gene interactions with higher fluctuation in GK rats, ranked by their z-scores from the differential test (grea/data/GSE13268_GK.csv).

%load_ext autoreload

import pandas as pd
rank_df = pd.read_csv("grea/db/GSE13268_GK.csv", index_col=0)
rank_df.head()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

	GK vs. WKY
names
Ifit3_Ifit1	7.116836
Usp18_Ifit1	7.105887
Oas1a_Ifit1	7.105887
Gstm4_Gsta4	7.094938
Gstt1_Gsta4	7.094938

Step 2: Preparing Gene Set Libraries

There are several ways to prepare gene set libraries for use in GREA:

Option 1: Use Built-in Libraries

Simply specify the libraries you're interested in as a list. For example:

libraries = ['KEGG_2019_Mouse', 'WikiPathways_2024_Mouse']

You can use the grea.library.list_libraries() function to view all available pathway libraries included in GREA.

Option 2: Load from GMT File

You can load external gene set libraries from .gmt files using:

libraries = read_gmt('your_library_file.gmt')

Option 3: Define a Custom Library

Create your own gene set library using a Python dictionary, where each key is a pathway name and the corresponding value is a list of genes:

libraries = {
    'term1': ['Oas1a', 'Ifit1'],
    'term2': ['Gstm4', 'Oas1a']
}

%autoreload
from grea.library import list_libraries

print(list_libraries())

['GeneSigDB', 'Enrichr_Submissions_TF-Gene_Coocurrence', 'SysMyo_Muscle_Gene_Sets', 'WikiPathway_2021_Human', 'HomoloGene', 'WikiPathways_2013', 'PFOCR_Pathways_2023', 'OMIM_Disease', 'Data_Acquisition_Method_Most_Popular_Genes', 'NIBR_Jensen_DISEASES_Curated_2025', 'Cancer_Cell_Line_Encyclopedia', 'WikiPathways_2016', 'WikiPathways_2015', 'RNAseq_Automatic_GEO_Signatures_Human_Up', 'Human_Gene_Atlas', 'KOMP2_Mouse_Phenotypes_2022', 'MoTrPAC_2023', 'Kinase_Perturbations_from_GEO_down', 'Disease_Signatures_from_GEO_down_2014', 'Disease_Perturbations_from_GEO_up', 'Old_CMAP_down', 'MCF7_Perturbations_from_GEO_up', 'NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions', 'DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019', 'PPI_Hub_Proteins', 'Disease_Signatures_from_GEO_up_2014', 'GTEx_Tissue_Expression_Up', 'NIBR_DRUGseq_2025_down', 'L1000_Kinase_and_GPCR_Perturbations_up', 'ARCHS4_Cell-lines', 'VirusMINT', 'KEGG_2019_Human', 'ARCHS4_Tissues', 'MGI_Mammalian_Phenotype_Level_4', 'The_Kinase_Library_2024', 'The_Kinase_Library_2023', 'MGI_Mammalian_Phenotype_Level_3', 'InterPro_Domains_2019', 'WikiPathways_2024_Mouse', 'DRUGseq_2025_up', 'KEGG_2015', 'MSigDB_Computational', 'KEGG_2013', 'TF-LOF_Expression_from_GEO', 'GWAS_Catalog_2019', 'KEGG_2016', 'NCI-Nature_2015', 'NCI-Nature_2016', 'CCLE_Proteomics_2020', 'PheWeb_2019', 'GeDiPNet_2023', 'RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO', 'LINCS_L1000_Chem_Pert_down', 'Old_CMAP_up', 'LINCS_L1000_Ligand_Perturbations_down', 'Enrichr_Users_Contributed_Lists_2020', 'NIH_Funded_PIs_2017_Human_GeneRIF', 'Jensen_TISSUES', 'Azimuth_Cell_Types_2021', 'DisGeNET', 'Panther_2016', 'LINCS_L1000_Ligand_Perturbations_up', 'Rare_Diseases_AutoRIF_Gene_Lists', 'Achilles_fitness_increase', 'TargetScan_microRNA', 'Panther_2015', 'WikiPathways_2019_Mouse', 'ARCHS4_TFs_Coexp', 'LINCS_L1000_Chem_Pert_up', 'MSigDB_Oncogenic_Signatures', 'Gene_Perturbations_from_GEO_down', 'Table_Mining_of_CRISPR_Studies', 'Rare_Diseases_GeneRIF_Gene_Lists', 'Ligand_Perturbations_from_GEO_down', 'SILAC_Phosphoproteomics', 'Ligand_Perturbations_from_GEO_up', 'Drug_Perturbations_from_GEO_up', 'SynGO_2022', 'MGI_Mammalian_Phenotype_Level_4_2024', 'SynGO_2024', 'Allen_Brain_Atlas_10x_scRNA_2021', 'MGI_Mammalian_Phenotype_Level_4_2021', 'ClinVar_2019', 'GWAS_Catalog_2023', 'MAGMA_Drugs_and_Diseases', 'KEA_2015', 'KEA_2013', 'Microbe_Perturbations_from_GEO_up', 'Chromosome_Location', 'COVID-19_Related_Gene_Sets_2021', 'MGI_Mammalian_Phenotype_Level_4_2019', 'DRUGseqr_2025_down', 'ARCHS4_IDG_Coexp', 'NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions', 'Jensen_DISEASES_Experimental_2025', 'lncHUB_lncRNA_Co-Expression', 'PerturbAtlas', 'DrugMatrix', 'Virus_Perturbations_from_GEO_down', 'huMAP', 'L1000_Kinase_and_GPCR_Perturbations_down', 'Elsevier_Pathway_Collection', 'NIH_Funded_PIs_2017_Human_AutoRIF', 'Diabetes_Perturbations_GEO_2022', 'ESCAPE', 'RNAseq_Automatic_GEO_Signatures_Mouse_Down', 'UK_Biobank_GWAS_v1', 'Aging_Perturbations_from_GEO_up', 'Human_Phenotype_Ontology', 'Jensen_DISEASES_Curated_2025', 'Proteomics_Drug_Atlas_2023', 'dbGaP', 'SubCell_BarCode', 'Transcription_Factor_PPIs', 'GO_Cellular_Component_2017b', 'HuBMAP_ASCTplusB_augmented_2022', 'MSigDB_Hallmark_2020', 'GlyGen_Glycosylated_Proteins_2022', 'MAGNET_2023', 'CellMarker_2024', 'BioPlanet_2019', 'HDSigDB_Human_2021', 'GTEx_Tissues_V8_2023', 'GTEx_Tissue_Expression_Down', 'Metabolomics_Workbench_Metabolites_2022', 'Tissue_Protein_Expression_from_Human_Proteome_Map', 'Epigenomics_Roadmap_HM_ChIP-seq', 'PhenGenI_Association_2021', 'MCF7_Perturbations_from_GEO_down', 'ProteomicsDB_2020', 'Virus-Host_PPI_P-HIPSTer_2020', 'OMIM_Expanded', 'Reactome_2022', 'Genes_Associated_with_NIH_Grants', 'CellMarker_Augmented_2021', 'ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X', 'HDSigDB_Mouse_2021', 'Jensen_COMPARTMENTS', 'ChEA_2015', 'ChEA_2016', 'KEGG_2019_Mouse', 'ChEA_2022', 'LINCS_L1000_Chem_Pert_Consensus_Sigs', 'Drug_Perturbations_from_GEO_2014', 'TargetScan_microRNA_2017', 'KEGG_2021_Mouse', 'Allen_Brain_Atlas_down', 'WikiPathways_2019_Human', 'Reactome_2013', 'BioCarta_2013', 'Rummagene_transcription_factors', 'Gene_Perturbations_from_GEO_up', 'GO_Cellular_Component_2015', 'Rummagene-signatures', 'GO_Cellular_Component_2013', 'BioCarta_2016', 'NIBR_Jensen_DISEASES_Experimental_2025', 'BioCarta_2015', 'Reactome_2015', 'Reactome_2016', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_2017', 'GO_Cellular_Component_2023', 'GO_Cellular_Component_2021', 'WikiPathway_2021_Mouse', 'ENCODE_TF_ChIP-seq_2015', 'ENCODE_TF_ChIP-seq_2014', 'RNAseq_Automatic_GEO_Signatures_Mouse_Up', 'GO_Molecular_Function_2017b', 'DRUGseq_2025_down', 'FANTOM6_lncRNA_KD_DEGs', 'MGI_Mammalian_Phenotype_2013', 'GO_Cellular_Component_2025', 'HMS_LINCS_KinomeScan', 'NCI-60_Cancer_Cell_Lines', 'Azimuth_2023', 'MGI_Mammalian_Phenotype_2017', 'Rare_Diseases_GeneRIF_ARCHS4_Predictions', 'Virus_Perturbations_from_GEO_up', 'PFOCR_Pathways', 'IDG_Drug_Targets_2022', 'Enrichr_Libraries_Most_Popular_Genes', 'Orphanet_Augmented_2021', 'NIBR_DRUGseq_2025_up', 'GO_Biological_Process_2021', 'TRANSFAC_and_JASPAR_PWMs', 'Reactome_Pathways_2024', 'GO_Biological_Process_2023', 'Rare_Diseases_AutoRIF_ARCHS4_Predictions', 'COVID-19_Related_Gene_Sets', 'Kinase_Perturbations_from_GEO_up', 'Descartes_Cell_Types_and_Tissue_2021', 'Tabula_Muris', 'Tabula_Sapiens', 'GO_Biological_Process_2025', 'TF_Perturbations_Followed_by_Expression', 'Rummagene_kinases', 'GTEx_Aging_Signatures_2021', 'WikiPathways_2024_Human', 'Tissue_Protein_Expression_from_ProteomicsDB', 'DGIdb_Drug_Targets_2024', 'Serine_Threonine_Kinome_Atlas_2023', 'Aging_Perturbations_from_GEO_down', 'DepMap_CRISPR_GeneDependency_CellLines_2023', 'GO_Biological_Process_2013', 'GO_Biological_Process_2017b', 'GO_Biological_Process_2018', 'CORUM', 'GO_Biological_Process_2015', 'Phosphatase_Substrates_from_DEPOD', 'BioPlex_2017', 'TRRUST_Transcription_Factors_2019', 'GO_Biological_Process_2017', 'Pfam_InterPro_Domains', 'HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression', 'Pfam_Domains_2019', 'WikiPathway_2023_Human', 'Allen_Brain_Atlas_up', 'Genome_Browser_PWMs', 'NURSA_Human_Endogenous_Complexome', 'HumanCyc_2015', 'HumanCyc_2016', 'Rummagene_signatures', 'Chromosome_Location_hg19', 'Mouse_Gene_Atlas', 'ChEA_2013', 'miRTarBase_2017', 'GO_Molecular_Function_2023', 'Jensen_DISEASES', 'RNAseq_Automatic_GEO_Signatures_Human_Down', 'GO_Molecular_Function_2025', 'Rummagene-transcription-factors', 'ARCHS4_Kinases_Coexp', 'Microbe_Perturbations_from_GEO_down', 'DRUGseqr_2025_up', 'PanglaoDB_Augmented_2021', 'ENCODE_Histone_Modifications_2013', 'ENCODE_Histone_Modifications_2015', 'Achilles_fitness_decrease', 'DSigDB', 'DepMap_WG_CRISPR_Screens_Broad_CellLines_2019', 'Disease_Perturbations_from_GEO_down', 'Drug_Perturbations_from_GEO_down', 'GO_Molecular_Function_2021', 'GO_Molecular_Function_2017', 'GO_Molecular_Function_2018', 'Mitchell_Proteomics_Drug_Atlas_2023', 'GO_Molecular_Function_2013', 'GO_Molecular_Function_2015', 'Rummagene-kinases', 'TG_GATES_2020', 'KEGG_2021_Human', 'HMDB_Metabolites', 'LINCS_L1000_CRISPR_KO_Consensus_Sigs']

Step 3: Run Enrichment

To perform enrichment analysis, call the grea.pheno_prerank_enrich(rank_df, libraries) function. You can customize the analysis using the following arguments:

prob_method: Method for p-value calculation. Currently supports 'perm' for permutation-based testing.
n_perm: Number of permutations to use for estimating the null distribution.
sig_sep: The delimiter used to separate gene names in an interaction string (e.g., set sig_sep='_' for interactions like Oas1a_Ifit1).

The function returns a GREA object containing all enrichment results, including enrichment scores and statistical significance for each library term.

%autoreload

from grea import grea
libraries = ['WikiPathways_2024_Mouse', 'Mouse_Gene_Atlas']
n_perm = 10
prob_method = 'perm'
sig_sep = '_'
obj = grea.pheno_prerank_enrich(rank_df, libraries,n_perm=n_perm, prob_method=prob_method, sig_sep=sig_sep)
obj

---Finished: Load WikiPathways_2024_Mouse with 188 terms.
---WARMING: "Mouse_Gene_Atlas-testis" has 3059 genes, larger than max_size 1000, filter it out.
---Finished: Load Mouse_Gene_Atlas with 95 terms.
Low numer of permutations can lead to inaccurate p-value estimation. Symmetric Gamma distribution enabled to increase accuracy.
---WARMING: 98.0% of entries has zero overlap ratio.

<grea.grea._GREA at 0x200e9721be0>

Step 4: Check Enrichment Results

The GREA object stores all enrichment results, including enrichment scores and statistical significance for each library term. GREA supports three types of enrichment scores, each reflecting a different scoring strategy:

'KS-ES': Kolmogorov–Smirnov-based Enrichment Score, capturing the peak deviation between hit and miss distributions.
'KS-ESD': KS-based enrichment Score Difference, the sum of the maximum positive and negative deviations from the running score.
'RC-AUC': Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.
'RC-nAUC': The normalized Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.
'nRC-AUC': Area Under the normalized Recovery Curve, summarizing early enrichment of target genes along the ranking, ranges from 0 to 1.

You can select the appropriate metric depending on your analysis goal or data characteristics.

To retrieve the enrichment results as a long DataFrame, use the get_enrich_results(metric) function.

%autoreload

df = obj.get_enrich_results(metric='KS-ES')
df.head()

	Term	Obs	KS-ES	Prob_method	KS-ES_pval	N_lead_sigs	Lead_sigs	KS-ES_fdr	KS-ES_sidak
109	WikiPathways_2024_Mouse\|Pentose Phosphate Path...	GK vs. WKY	0.741123	perm	0.01	40	Taldo1_Gpi;Taldo1_Eno1;Tkt_Pdhb;Tkt_Taldo1;Tkt...	0.014588	0.941822
209	Mouse_Gene_Atlas\|mammary gland lact	GK vs. WKY	0.676365	perm	0.01	123	Pik3r1_Mtor;Pik3r1_Pdgfrb;Pik3r1_Grap2;Pik3r1_...	0.014588	0.941822
169	WikiPathways_2024_Mouse\|Heme Biosynthesis WP18	GK vs. WKY	0.651665	perm	0.01	16	Fech_Abcb10;Alas2_Alad;Fech_Abcb6;Uros_Cpox;Sl...	0.014588	0.941822
183	WikiPathways_2024_Mouse\|Methylation WP1247	GK vs. WKY	0.625270	perm	0.01	44	Cyp1a1_Comt;Drd4_Comt;Comt_Aldh3b1;Ugt2b10_Com...	0.014588	0.941822
3	WikiPathways_2024_Mouse\|Dysregulated miRNA Tar...	GK vs. WKY	0.619595	perm	0.01	189	Pik3r1_Mtor;Pik3r1_Pdgfrb;Pik3r1_Grap2;Pik3r1_...	0.014588	0.941822

%autoreload

df = obj.get_enrich_results(metric='KS-ESD')
df.head()

	Term	Obs	KS-ESD	Prob_method	KS-ESD_pval	N_lead_sigs	Lead_sigs	KS-ESD_fdr	KS-ESD_sidak
109	WikiPathways_2024_Mouse\|Pentose Phosphate Path...	GK vs. WKY	0.731309	perm	0.01	40	Taldo1_Gpi;Taldo1_Eno1;Tkt_Pdhb;Tkt_Taldo1;Tkt...	0.014221	0.941822
209	Mouse_Gene_Atlas\|mammary gland lact	GK vs. WKY	0.675842	perm	0.01	123	Pik3r1_Mtor;Pik3r1_Pdgfrb;Pik3r1_Grap2;Pik3r1_...	0.014221	0.941822
169	WikiPathways_2024_Mouse\|Heme Biosynthesis WP18	GK vs. WKY	0.640218	perm	0.01	16	Fech_Abcb10;Alas2_Alad;Fech_Abcb6;Uros_Cpox;Sl...	0.014221	0.941822
183	WikiPathways_2024_Mouse\|Methylation WP1247	GK vs. WKY	0.620004	perm	0.01	44	Cyp1a1_Comt;Drd4_Comt;Comt_Aldh3b1;Ugt2b10_Com...	0.014221	0.941822
3	WikiPathways_2024_Mouse\|Dysregulated miRNA Tar...	GK vs. WKY	0.619071	perm	0.01	189	Pik3r1_Mtor;Pik3r1_Pdgfrb;Pik3r1_Grap2;Pik3r1_...	0.014221	0.941822

%autoreload

df = obj.get_enrich_results(metric='RC-AUC')
df.head()

	Term	Obs	RC-AUC	Prob_method	RC-AUC_pval	RC-AUC_fdr	RC-AUC_sidak
16	WikiPathways_2024_Mouse\|Focal Adhesion PI3K Ak...	GK vs. WKY	2466.959341	perm	0.01	0.03043	0.941822
246	Mouse_Gene_Atlas\|embryonic stem line Bruce4 p13	GK vs. WKY	2176.005442	perm	0.99	0.99000	1.000000
171	WikiPathways_2024_Mouse\|Insulin Signaling WP65	GK vs. WKY	1972.318166	perm	0.01	0.03043	0.941822
266	Mouse_Gene_Atlas\|embryonic stem line V26 2 p16	GK vs. WKY	1819.335029	perm	0.99	0.99000	1.000000
118	WikiPathways_2024_Mouse\|EGFR1 Signaling Pathwa...	GK vs. WKY	1807.684383	perm	0.01	0.03043	0.941822

%autoreload

df = obj.get_enrich_results(metric='RC-nAUC')
df.head()

	Term	Obs	RC-nAUC	Prob_method	RC-nAUC_pval	RC-nAUC_fdr	RC-nAUC_sidak
169	WikiPathways_2024_Mouse\|Heme Biosynthesis WP18	GK vs. WKY	1.076391	perm	0.01	0.03043	0.941822
209	Mouse_Gene_Atlas\|mammary gland lact	GK vs. WKY	1.005294	perm	0.01	0.03043	0.941822
109	WikiPathways_2024_Mouse\|Pentose Phosphate Path...	GK vs. WKY	0.931951	perm	0.01	0.03043	0.941822
3	WikiPathways_2024_Mouse\|Dysregulated miRNA Tar...	GK vs. WKY	0.929094	perm	0.01	0.03043	0.941822
183	WikiPathways_2024_Mouse\|Methylation WP1247	GK vs. WKY	0.842388	perm	0.01	0.03043	0.941822

%autoreload

df = obj.get_enrich_results(metric='nRC-AUC')
df.head()

	Term	Obs	nRC-AUC	Prob_method	nRC-AUC_pval	nRC-AUC_fdr	nRC-AUC_sidak
109	WikiPathways_2024_Mouse\|Pentose Phosphate Path...	GK vs. WKY	0.914688	perm	0.01	0.026204	0.941822
183	WikiPathways_2024_Mouse\|Methylation WP1247	GK vs. WKY	0.875713	perm	0.01	0.026204	0.941822
169	WikiPathways_2024_Mouse\|Heme Biosynthesis WP18	GK vs. WKY	0.860808	perm	0.01	0.026204	0.941822
3	WikiPathways_2024_Mouse\|Dysregulated miRNA Tar...	GK vs. WKY	0.822837	perm	0.01	0.026204	0.941822
86	WikiPathways_2024_Mouse\|Non Homologous End Joi...	GK vs. WKY	0.820583	perm	0.01	0.026204	0.941822

Step 5: Visualize Enrichment Results

To visualize the enrichment results, use the pl_running_sum(metric, term, pheno_id) function by specifying the desired metric, term, and target phenotype.

%autoreload
term = 'Mouse_Gene_Atlas|uterus'
pheno_id = 'GK vs. WKY'
fig = obj.pl_running_sum('KS-ES', term, pheno_id)
fig

No description has been provided for this image

%autoreload
term = 'Mouse_Gene_Atlas|uterus'
pheno_id = 'GK vs. WKY'
fig = obj.pl_running_sum('KS-ESD', term, pheno_id)
fig

%autoreload
term = 'Mouse_Gene_Atlas|uterus'
pheno_id = 'GK vs. WKY'
fig = obj.pl_running_sum('RC-AUC', term, pheno_id)
fig

%autoreload
term = 'Mouse_Gene_Atlas|uterus'
pheno_id = 'GK vs. WKY'
fig = obj.pl_running_sum('RC-nAUC', term, pheno_id)
fig

%autoreload
term = 'Mouse_Gene_Atlas|uterus'
pheno_id = 'GK vs. WKY'
fig = obj.pl_running_sum('nRC-AUC', term, pheno_id)
fig

pheno_prerank_enrich() Enrichment: Gene Interaction List with Rank Scores for Phenotypes

Step 1: Prepare rank_df