`pheno_prerank_enrich()` Enrichment: Gene List with Rank Scores for Phenotypes

Step 1: Prepare `rank_df`

Begin by preparing a list of genes along with their preranked scores. Store this information in a DataFrame called rank_df.

The index of rank_df should be the gene symbols.
The columns of rank_df should correspond to one target phenotype (e.g., Case vs. Control).

This format allows pheno_prerank_enrich to perform enrichment analysis using phenotype-level ranked gene-level statistics.

%load_ext autoreload

import pandas as pd
rank_df = pd.read_csv("grea/db/ageing_muscle_gtex.tsv", index_col=0)
rank_df.columns = ['Case vs. Control']
rank_df.head()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

	Case vs. Control
0
ADO	-7.833439
CHUK	-7.800920
GOLGA4	-7.787221
EIF3J	-7.716298
GID4	-7.551146

Step 2: Preparing Gene Set Libraries

There are several ways to prepare gene set libraries for use in GREA:

Option 1: Use Built-in Libraries

Simply specify the libraries you're interested in as a list. For example:

libraries = ['KEGG_2021_Human', 'MSigDB_Hallmark_2020']

You can use the grea.library.list_libraries() function to view all available pathway libraries included in GREA.

Option 2: Load from GMT File

You can load external gene set libraries from .gmt files using:

libraries = read_gmt('your_library_file.gmt')

Option 3: Define a Custom Library

Create your own gene set library using a Python dictionary, where each key is a pathway name and the corresponding value is a list of genes:

libraries = {
    'term1': ['GOLGA4', 'GID4'],
    'term2': ['ADO', 'CHUK']
}

%autoreload
from grea.library import list_libraries

print(list_libraries())

['GeneSigDB', 'Enrichr_Submissions_TF-Gene_Coocurrence', 'SysMyo_Muscle_Gene_Sets', 'WikiPathway_2021_Human', 'HomoloGene', 'WikiPathways_2013', 'PFOCR_Pathways_2023', 'OMIM_Disease', 'Data_Acquisition_Method_Most_Popular_Genes', 'NIBR_Jensen_DISEASES_Curated_2025', 'Cancer_Cell_Line_Encyclopedia', 'WikiPathways_2016', 'WikiPathways_2015', 'RNAseq_Automatic_GEO_Signatures_Human_Up', 'Human_Gene_Atlas', 'KOMP2_Mouse_Phenotypes_2022', 'MoTrPAC_2023', 'Kinase_Perturbations_from_GEO_down', 'Disease_Signatures_from_GEO_down_2014', 'Disease_Perturbations_from_GEO_up', 'Old_CMAP_down', 'MCF7_Perturbations_from_GEO_up', 'NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions', 'DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019', 'PPI_Hub_Proteins', 'Disease_Signatures_from_GEO_up_2014', 'GTEx_Tissue_Expression_Up', 'NIBR_DRUGseq_2025_down', 'L1000_Kinase_and_GPCR_Perturbations_up', 'ARCHS4_Cell-lines', 'VirusMINT', 'KEGG_2019_Human', 'ARCHS4_Tissues', 'MGI_Mammalian_Phenotype_Level_4', 'The_Kinase_Library_2024', 'The_Kinase_Library_2023', 'MGI_Mammalian_Phenotype_Level_3', 'InterPro_Domains_2019', 'WikiPathways_2024_Mouse', 'DRUGseq_2025_up', 'KEGG_2015', 'MSigDB_Computational', 'KEGG_2013', 'TF-LOF_Expression_from_GEO', 'GWAS_Catalog_2019', 'KEGG_2016', 'NCI-Nature_2015', 'NCI-Nature_2016', 'CCLE_Proteomics_2020', 'PheWeb_2019', 'GeDiPNet_2023', 'RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO', 'LINCS_L1000_Chem_Pert_down', 'Old_CMAP_up', 'LINCS_L1000_Ligand_Perturbations_down', 'Enrichr_Users_Contributed_Lists_2020', 'NIH_Funded_PIs_2017_Human_GeneRIF', 'Jensen_TISSUES', 'Azimuth_Cell_Types_2021', 'DisGeNET', 'Panther_2016', 'LINCS_L1000_Ligand_Perturbations_up', 'Rare_Diseases_AutoRIF_Gene_Lists', 'Achilles_fitness_increase', 'TargetScan_microRNA', 'Panther_2015', 'WikiPathways_2019_Mouse', 'ARCHS4_TFs_Coexp', 'LINCS_L1000_Chem_Pert_up', 'MSigDB_Oncogenic_Signatures', 'Gene_Perturbations_from_GEO_down', 'Table_Mining_of_CRISPR_Studies', 'Rare_Diseases_GeneRIF_Gene_Lists', 'Ligand_Perturbations_from_GEO_down', 'SILAC_Phosphoproteomics', 'Ligand_Perturbations_from_GEO_up', 'Drug_Perturbations_from_GEO_up', 'SynGO_2022', 'MGI_Mammalian_Phenotype_Level_4_2024', 'SynGO_2024', 'Allen_Brain_Atlas_10x_scRNA_2021', 'MGI_Mammalian_Phenotype_Level_4_2021', 'ClinVar_2019', 'GWAS_Catalog_2023', 'MAGMA_Drugs_and_Diseases', 'KEA_2015', 'KEA_2013', 'Microbe_Perturbations_from_GEO_up', 'Chromosome_Location', 'COVID-19_Related_Gene_Sets_2021', 'MGI_Mammalian_Phenotype_Level_4_2019', 'DRUGseqr_2025_down', 'ARCHS4_IDG_Coexp', 'NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions', 'Jensen_DISEASES_Experimental_2025', 'lncHUB_lncRNA_Co-Expression', 'PerturbAtlas', 'DrugMatrix', 'Virus_Perturbations_from_GEO_down', 'huMAP', 'L1000_Kinase_and_GPCR_Perturbations_down', 'Elsevier_Pathway_Collection', 'NIH_Funded_PIs_2017_Human_AutoRIF', 'Diabetes_Perturbations_GEO_2022', 'ESCAPE', 'RNAseq_Automatic_GEO_Signatures_Mouse_Down', 'UK_Biobank_GWAS_v1', 'Aging_Perturbations_from_GEO_up', 'Human_Phenotype_Ontology', 'Jensen_DISEASES_Curated_2025', 'Proteomics_Drug_Atlas_2023', 'dbGaP', 'SubCell_BarCode', 'Transcription_Factor_PPIs', 'GO_Cellular_Component_2017b', 'HuBMAP_ASCTplusB_augmented_2022', 'MSigDB_Hallmark_2020', 'GlyGen_Glycosylated_Proteins_2022', 'MAGNET_2023', 'CellMarker_2024', 'BioPlanet_2019', 'HDSigDB_Human_2021', 'GTEx_Tissues_V8_2023', 'GTEx_Tissue_Expression_Down', 'Metabolomics_Workbench_Metabolites_2022', 'Tissue_Protein_Expression_from_Human_Proteome_Map', 'Epigenomics_Roadmap_HM_ChIP-seq', 'PhenGenI_Association_2021', 'MCF7_Perturbations_from_GEO_down', 'ProteomicsDB_2020', 'Virus-Host_PPI_P-HIPSTer_2020', 'OMIM_Expanded', 'Reactome_2022', 'Genes_Associated_with_NIH_Grants', 'CellMarker_Augmented_2021', 'ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X', 'HDSigDB_Mouse_2021', 'Jensen_COMPARTMENTS', 'ChEA_2015', 'ChEA_2016', 'KEGG_2019_Mouse', 'ChEA_2022', 'LINCS_L1000_Chem_Pert_Consensus_Sigs', 'Drug_Perturbations_from_GEO_2014', 'TargetScan_microRNA_2017', 'KEGG_2021_Mouse', 'Allen_Brain_Atlas_down', 'WikiPathways_2019_Human', 'Reactome_2013', 'BioCarta_2013', 'Rummagene_transcription_factors', 'Gene_Perturbations_from_GEO_up', 'GO_Cellular_Component_2015', 'Rummagene-signatures', 'GO_Cellular_Component_2013', 'BioCarta_2016', 'NIBR_Jensen_DISEASES_Experimental_2025', 'BioCarta_2015', 'Reactome_2015', 'Reactome_2016', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_2017', 'GO_Cellular_Component_2023', 'GO_Cellular_Component_2021', 'WikiPathway_2021_Mouse', 'ENCODE_TF_ChIP-seq_2015', 'ENCODE_TF_ChIP-seq_2014', 'RNAseq_Automatic_GEO_Signatures_Mouse_Up', 'GO_Molecular_Function_2017b', 'DRUGseq_2025_down', 'FANTOM6_lncRNA_KD_DEGs', 'MGI_Mammalian_Phenotype_2013', 'GO_Cellular_Component_2025', 'HMS_LINCS_KinomeScan', 'NCI-60_Cancer_Cell_Lines', 'Azimuth_2023', 'MGI_Mammalian_Phenotype_2017', 'Rare_Diseases_GeneRIF_ARCHS4_Predictions', 'Virus_Perturbations_from_GEO_up', 'PFOCR_Pathways', 'IDG_Drug_Targets_2022', 'Enrichr_Libraries_Most_Popular_Genes', 'Orphanet_Augmented_2021', 'NIBR_DRUGseq_2025_up', 'GO_Biological_Process_2021', 'TRANSFAC_and_JASPAR_PWMs', 'Reactome_Pathways_2024', 'GO_Biological_Process_2023', 'Rare_Diseases_AutoRIF_ARCHS4_Predictions', 'COVID-19_Related_Gene_Sets', 'Kinase_Perturbations_from_GEO_up', 'Descartes_Cell_Types_and_Tissue_2021', 'Tabula_Muris', 'Tabula_Sapiens', 'GO_Biological_Process_2025', 'TF_Perturbations_Followed_by_Expression', 'Rummagene_kinases', 'GTEx_Aging_Signatures_2021', 'WikiPathways_2024_Human', 'Tissue_Protein_Expression_from_ProteomicsDB', 'DGIdb_Drug_Targets_2024', 'Serine_Threonine_Kinome_Atlas_2023', 'Aging_Perturbations_from_GEO_down', 'DepMap_CRISPR_GeneDependency_CellLines_2023', 'GO_Biological_Process_2013', 'GO_Biological_Process_2017b', 'GO_Biological_Process_2018', 'CORUM', 'GO_Biological_Process_2015', 'Phosphatase_Substrates_from_DEPOD', 'BioPlex_2017', 'TRRUST_Transcription_Factors_2019', 'GO_Biological_Process_2017', 'Pfam_InterPro_Domains', 'HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression', 'Pfam_Domains_2019', 'WikiPathway_2023_Human', 'Allen_Brain_Atlas_up', 'Genome_Browser_PWMs', 'NURSA_Human_Endogenous_Complexome', 'HumanCyc_2015', 'HumanCyc_2016', 'Rummagene_signatures', 'Chromosome_Location_hg19', 'Mouse_Gene_Atlas', 'ChEA_2013', 'miRTarBase_2017', 'GO_Molecular_Function_2023', 'Jensen_DISEASES', 'RNAseq_Automatic_GEO_Signatures_Human_Down', 'GO_Molecular_Function_2025', 'Rummagene-transcription-factors', 'ARCHS4_Kinases_Coexp', 'Microbe_Perturbations_from_GEO_down', 'DRUGseqr_2025_up', 'PanglaoDB_Augmented_2021', 'ENCODE_Histone_Modifications_2013', 'ENCODE_Histone_Modifications_2015', 'Achilles_fitness_decrease', 'DSigDB', 'DepMap_WG_CRISPR_Screens_Broad_CellLines_2019', 'Disease_Perturbations_from_GEO_down', 'Drug_Perturbations_from_GEO_down', 'GO_Molecular_Function_2021', 'GO_Molecular_Function_2017', 'GO_Molecular_Function_2018', 'Mitchell_Proteomics_Drug_Atlas_2023', 'GO_Molecular_Function_2013', 'GO_Molecular_Function_2015', 'Rummagene-kinases', 'TG_GATES_2020', 'KEGG_2021_Human', 'HMDB_Metabolites', 'LINCS_L1000_CRISPR_KO_Consensus_Sigs']

Step 3: Run Enrichment

To perform enrichment analysis, call the grea.pheno_prerank_enrich(rank_df, libraries) function. You can customize the analysis using the following arguments:

prob_method: Method for p-value calculation. Currently supports 'perm' for permutation-based testing.
n_perm: Number of permutations to use for estimating the null distribution.

The function returns a GREA object containing all enrichment results, including enrichment scores and statistical significance for each library term.

%autoreload

from grea import grea
libraries = ['MSigDB_Hallmark_2020']
n_perm = 10
prob_method = 'perm'
obj = grea.pheno_prerank_enrich(rank_df, libraries,n_perm=n_perm, prob_method=prob_method)
obj

---Finished: Load MSigDB_Hallmark_2020 with 50 terms.
Low numer of permutations can lead to inaccurate p-value estimation. Symmetric Gamma distribution enabled to increase accuracy.
---WARMING: 99.0% of entries has zero overlap ratio.
Please check the consistency (upper/lower case) of signature names in rand_df and libraries.
Current rand_df sig name: ADO, library sig name: MARCKS
Current setting - sig_upper=True

<grea.grea._GREA at 0x270b0883020>

Step 4: Check Enrichment Results

The GREA object stores all enrichment results, including enrichment scores and statistical significance for each library term. GREA supports three types of enrichment scores, each reflecting a different scoring strategy:

'KS-ES': Kolmogorov–Smirnov-based Enrichment Score, capturing the peak deviation between hit and miss distributions.
'KS-ESD': KS-based enrichment Score Difference, the sum of the maximum positive and negative deviations from the running score.
'RC-AUC': Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.
'RC-nAUC': The normalized Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.
'nRC-AUC': Area Under the normalized Recovery Curve, summarizing early enrichment of target genes along the ranking, ranges from 0 to 1.

You can select the appropriate metric depending on your analysis goal or data characteristics.

To retrieve the enrichment results as a long DataFrame, use the get_enrich_results(metric) function.

%autoreload

df = obj.get_enrich_results(metric='KS-ES')
df.head()

	Term	Obs	KS-ES	Prob_method	KS-ES_pval	N_lead_sigs	Lead_sigs	KS-ES_fdr	KS-ES_sidak
29	MSigDB_Hallmark_2020\|Epithelial Mesenchymal Tr...	Case vs. Control	0.586199	perm	0.01	104	HTRA1;IL15;MGP;PCOLCE;BGN;CCN1;CTHRC1;VIM;IGFB...	0.013889	0.394994
21	MSigDB_Hallmark_2020\|Hedgehog Signaling	Case vs. Control	0.568802	perm	0.01	20	MYH9;PML;THY1;NRP2;ETS2;CRMP1;L1CAM;OPHN1;ACHE...	0.013889	0.394994
39	MSigDB_Hallmark_2020\|Angiogenesis	Case vs. Control	0.563786	perm	0.01	19	LUM;VAV2;TIMP1;JAG2;STC1;APP;TNFRSF21;VCAN;POS...	0.013889	0.394994
41	MSigDB_Hallmark_2020\|Coagulation	Case vs. Control	0.530405	perm	0.01	39	MST1;CAPN2;HTRA1;PLAT;A2M;C3;GSN;CRIP2;CFD;C2;...	0.013889	0.394994
49	MSigDB_Hallmark_2020\|Pancreas Beta Cells	Case vs. Control	0.504512	perm	0.01	7	STXBP1;ABCC8;LMO2;SRP14;ELP4;DPP4;PAK3	0.013889	0.394994

%autoreload

df = obj.get_enrich_results(metric='KS-ESD')
df.head()

	Term	Obs	KS-ESD	Prob_method	KS-ESD_pval	N_lead_sigs	Lead_sigs	KS-ESD_fdr	KS-ESD_sidak
29	MSigDB_Hallmark_2020\|Epithelial Mesenchymal Tr...	Case vs. Control	0.582130	perm	0.01	104	HTRA1;IL15;MGP;PCOLCE;BGN;CCN1;CTHRC1;VIM;IGFB...	0.012821	0.394994
21	MSigDB_Hallmark_2020\|Hedgehog Signaling	Case vs. Control	0.558907	perm	0.01	20	MYH9;PML;THY1;NRP2;ETS2;CRMP1;L1CAM;OPHN1;ACHE...	0.012821	0.394994
39	MSigDB_Hallmark_2020\|Angiogenesis	Case vs. Control	0.541065	perm	0.01	19	LUM;VAV2;TIMP1;JAG2;STC1;APP;TNFRSF21;VCAN;POS...	0.012821	0.394994
41	MSigDB_Hallmark_2020\|Coagulation	Case vs. Control	0.529377	perm	0.01	39	MST1;CAPN2;HTRA1;PLAT;A2M;C3;GSN;CRIP2;CFD;C2;...	0.012821	0.394994
17	MSigDB_Hallmark_2020\|Interferon Alpha Response	Case vs. Control	0.476122	perm	0.01	52	CD74;IL15;MOV10;LGALS3BP;IFITM1;DHX58;UBA7;IFI...	0.012821	0.394994

%autoreload

df = obj.get_enrich_results(metric='RC-AUC')
df.head()

	Term	Obs	RC-AUC	Prob_method	RC-AUC_pval	RC-AUC_fdr	RC-AUC_sidak
29	MSigDB_Hallmark_2020\|Epithelial Mesenchymal Tr...	Case vs. Control	364.765125	perm	0.01	0.033333	0.394994
19	MSigDB_Hallmark_2020\|Apical Junction	Case vs. Control	290.515232	perm	0.01	0.033333	0.394994
18	MSigDB_Hallmark_2020\|Interferon Gamma Response	Case vs. Control	279.576303	perm	0.01	0.033333	0.394994
15	MSigDB_Hallmark_2020\|Myogenesis	Case vs. Control	252.340210	perm	0.01	0.033333	0.394994
22	MSigDB_Hallmark_2020\|Complement	Case vs. Control	229.770458	perm	0.01	0.033333	0.394994

%autoreload

df = obj.get_enrich_results(metric='RC-nAUC')
df.head()

	Term	Obs	RC-nAUC	Prob_method	RC-nAUC_pval	RC-nAUC_fdr	RC-nAUC_sidak
29	MSigDB_Hallmark_2020\|Epithelial Mesenchymal Tr...	Case vs. Control	1.961103	perm	0.01	0.033333	0.394994
41	MSigDB_Hallmark_2020\|Coagulation	Case vs. Control	1.929050	perm	0.01	0.033333	0.394994
21	MSigDB_Hallmark_2020\|Hedgehog Signaling	Case vs. Control	1.777970	perm	0.01	0.033333	0.394994
17	MSigDB_Hallmark_2020\|Interferon Alpha Response	Case vs. Control	1.664051	perm	0.01	0.033333	0.394994
19	MSigDB_Hallmark_2020\|Apical Junction	Case vs. Control	1.660087	perm	0.01	0.033333	0.394994

%autoreload

df = obj.get_enrich_results(metric='nRC-AUC')
df.head()

	Term	Obs	nRC-AUC	Prob_method	nRC-AUC_pval	nRC-AUC_fdr	nRC-AUC_sidak
29	MSigDB_Hallmark_2020\|Epithelial Mesenchymal Tr...	Case vs. Control	0.828970	perm	0.01	0.021739	0.394994
21	MSigDB_Hallmark_2020\|Hedgehog Signaling	Case vs. Control	0.818242	perm	0.01	0.021739	0.394994
41	MSigDB_Hallmark_2020\|Coagulation	Case vs. Control	0.806329	perm	0.01	0.021739	0.394994
39	MSigDB_Hallmark_2020\|Angiogenesis	Case vs. Control	0.785538	perm	0.01	0.021739	0.394994
17	MSigDB_Hallmark_2020\|Interferon Alpha Response	Case vs. Control	0.758177	perm	0.01	0.021739	0.394994

Step 5: Visualize Enrichment Results

To visualize the enrichment results, use the pl_running_sum(metric, term, pheno_id) function by specifying the desired metric, term, and target phenotype.

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('KS-ES', term, pheno_id)
fig

No description has been provided for this image

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('KS-ESD', term, pheno_id)
fig

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('RC-AUC', term, pheno_id)
fig

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('RC-nAUC', term, pheno_id)
fig

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('nRC-AUC', term, pheno_id)
fig

pheno_prerank_enrich() Enrichment: Gene List with Rank Scores for Phenotypes

Step 1: Prepare rank_df