Skip to content

pheno_prerank_enrich() Enrichment: Gene List with Rank Scores for Phenotypes

Step 1: Prepare rank_df

Begin by preparing a list of genes along with their preranked scores. Store this information in a DataFrame called rank_df.

  • The index of rank_df should be the gene symbols.
  • The columns of rank_df should correspond to one target phenotype (e.g., Case vs. Control).

This format allows pheno_prerank_enrich to perform enrichment analysis using phenotype-level ranked gene-level statistics.

%load_ext autoreload

import pandas as pd
rank_df = pd.read_csv("grea/db/ageing_muscle_gtex.tsv", index_col=0)
rank_df.columns = ['Case vs. Control']
rank_df.head()
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Case vs. Control
0
ADO -7.833439
CHUK -7.800920
GOLGA4 -7.787221
EIF3J -7.716298
GID4 -7.551146

Step 2: Preparing Gene Set Libraries

There are several ways to prepare gene set libraries for use in GREA:

Option 1: Use Built-in Libraries

Simply specify the libraries you're interested in as a list. For example:

libraries = ['KEGG_2021_Human', 'MSigDB_Hallmark_2020']

You can use the grea.library.list_libraries() function to view all available pathway libraries included in GREA.

Option 2: Load from GMT File

You can load external gene set libraries from .gmt files using:

libraries = read_gmt('your_library_file.gmt')
Option 3: Define a Custom Library

Create your own gene set library using a Python dictionary, where each key is a pathway name and the corresponding value is a list of genes:

libraries = {
    'term1': ['GOLGA4', 'GID4'],
    'term2': ['ADO', 'CHUK']
}
%autoreload
from grea.library import list_libraries

print(list_libraries())
['GeneSigDB', 'Enrichr_Submissions_TF-Gene_Coocurrence', 'SysMyo_Muscle_Gene_Sets', 'WikiPathway_2021_Human', 'HomoloGene', 'WikiPathways_2013', 'PFOCR_Pathways_2023', 'OMIM_Disease', 'Data_Acquisition_Method_Most_Popular_Genes', 'NIBR_Jensen_DISEASES_Curated_2025', 'Cancer_Cell_Line_Encyclopedia', 'WikiPathways_2016', 'WikiPathways_2015', 'RNAseq_Automatic_GEO_Signatures_Human_Up', 'Human_Gene_Atlas', 'KOMP2_Mouse_Phenotypes_2022', 'MoTrPAC_2023', 'Kinase_Perturbations_from_GEO_down', 'Disease_Signatures_from_GEO_down_2014', 'Disease_Perturbations_from_GEO_up', 'Old_CMAP_down', 'MCF7_Perturbations_from_GEO_up', 'NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions', 'DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019', 'PPI_Hub_Proteins', 'Disease_Signatures_from_GEO_up_2014', 'GTEx_Tissue_Expression_Up', 'NIBR_DRUGseq_2025_down', 'L1000_Kinase_and_GPCR_Perturbations_up', 'ARCHS4_Cell-lines', 'VirusMINT', 'KEGG_2019_Human', 'ARCHS4_Tissues', 'MGI_Mammalian_Phenotype_Level_4', 'The_Kinase_Library_2024', 'The_Kinase_Library_2023', 'MGI_Mammalian_Phenotype_Level_3', 'InterPro_Domains_2019', 'WikiPathways_2024_Mouse', 'DRUGseq_2025_up', 'KEGG_2015', 'MSigDB_Computational', 'KEGG_2013', 'TF-LOF_Expression_from_GEO', 'GWAS_Catalog_2019', 'KEGG_2016', 'NCI-Nature_2015', 'NCI-Nature_2016', 'CCLE_Proteomics_2020', 'PheWeb_2019', 'GeDiPNet_2023', 'RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO', 'LINCS_L1000_Chem_Pert_down', 'Old_CMAP_up', 'LINCS_L1000_Ligand_Perturbations_down', 'Enrichr_Users_Contributed_Lists_2020', 'NIH_Funded_PIs_2017_Human_GeneRIF', 'Jensen_TISSUES', 'Azimuth_Cell_Types_2021', 'DisGeNET', 'Panther_2016', 'LINCS_L1000_Ligand_Perturbations_up', 'Rare_Diseases_AutoRIF_Gene_Lists', 'Achilles_fitness_increase', 'TargetScan_microRNA', 'Panther_2015', 'WikiPathways_2019_Mouse', 'ARCHS4_TFs_Coexp', 'LINCS_L1000_Chem_Pert_up', 'MSigDB_Oncogenic_Signatures', 'Gene_Perturbations_from_GEO_down', 'Table_Mining_of_CRISPR_Studies', 'Rare_Diseases_GeneRIF_Gene_Lists', 'Ligand_Perturbations_from_GEO_down', 'SILAC_Phosphoproteomics', 'Ligand_Perturbations_from_GEO_up', 'Drug_Perturbations_from_GEO_up', 'SynGO_2022', 'MGI_Mammalian_Phenotype_Level_4_2024', 'SynGO_2024', 'Allen_Brain_Atlas_10x_scRNA_2021', 'MGI_Mammalian_Phenotype_Level_4_2021', 'ClinVar_2019', 'GWAS_Catalog_2023', 'MAGMA_Drugs_and_Diseases', 'KEA_2015', 'KEA_2013', 'Microbe_Perturbations_from_GEO_up', 'Chromosome_Location', 'COVID-19_Related_Gene_Sets_2021', 'MGI_Mammalian_Phenotype_Level_4_2019', 'DRUGseqr_2025_down', 'ARCHS4_IDG_Coexp', 'NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions', 'Jensen_DISEASES_Experimental_2025', 'lncHUB_lncRNA_Co-Expression', 'PerturbAtlas', 'DrugMatrix', 'Virus_Perturbations_from_GEO_down', 'huMAP', 'L1000_Kinase_and_GPCR_Perturbations_down', 'Elsevier_Pathway_Collection', 'NIH_Funded_PIs_2017_Human_AutoRIF', 'Diabetes_Perturbations_GEO_2022', 'ESCAPE', 'RNAseq_Automatic_GEO_Signatures_Mouse_Down', 'UK_Biobank_GWAS_v1', 'Aging_Perturbations_from_GEO_up', 'Human_Phenotype_Ontology', 'Jensen_DISEASES_Curated_2025', 'Proteomics_Drug_Atlas_2023', 'dbGaP', 'SubCell_BarCode', 'Transcription_Factor_PPIs', 'GO_Cellular_Component_2017b', 'HuBMAP_ASCTplusB_augmented_2022', 'MSigDB_Hallmark_2020', 'GlyGen_Glycosylated_Proteins_2022', 'MAGNET_2023', 'CellMarker_2024', 'BioPlanet_2019', 'HDSigDB_Human_2021', 'GTEx_Tissues_V8_2023', 'GTEx_Tissue_Expression_Down', 'Metabolomics_Workbench_Metabolites_2022', 'Tissue_Protein_Expression_from_Human_Proteome_Map', 'Epigenomics_Roadmap_HM_ChIP-seq', 'PhenGenI_Association_2021', 'MCF7_Perturbations_from_GEO_down', 'ProteomicsDB_2020', 'Virus-Host_PPI_P-HIPSTer_2020', 'OMIM_Expanded', 'Reactome_2022', 'Genes_Associated_with_NIH_Grants', 'CellMarker_Augmented_2021', 'ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X', 'HDSigDB_Mouse_2021', 'Jensen_COMPARTMENTS', 'ChEA_2015', 'ChEA_2016', 'KEGG_2019_Mouse', 'ChEA_2022', 'LINCS_L1000_Chem_Pert_Consensus_Sigs', 'Drug_Perturbations_from_GEO_2014', 'TargetScan_microRNA_2017', 'KEGG_2021_Mouse', 'Allen_Brain_Atlas_down', 'WikiPathways_2019_Human', 'Reactome_2013', 'BioCarta_2013', 'Rummagene_transcription_factors', 'Gene_Perturbations_from_GEO_up', 'GO_Cellular_Component_2015', 'Rummagene-signatures', 'GO_Cellular_Component_2013', 'BioCarta_2016', 'NIBR_Jensen_DISEASES_Experimental_2025', 'BioCarta_2015', 'Reactome_2015', 'Reactome_2016', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_2017', 'GO_Cellular_Component_2023', 'GO_Cellular_Component_2021', 'WikiPathway_2021_Mouse', 'ENCODE_TF_ChIP-seq_2015', 'ENCODE_TF_ChIP-seq_2014', 'RNAseq_Automatic_GEO_Signatures_Mouse_Up', 'GO_Molecular_Function_2017b', 'DRUGseq_2025_down', 'FANTOM6_lncRNA_KD_DEGs', 'MGI_Mammalian_Phenotype_2013', 'GO_Cellular_Component_2025', 'HMS_LINCS_KinomeScan', 'NCI-60_Cancer_Cell_Lines', 'Azimuth_2023', 'MGI_Mammalian_Phenotype_2017', 'Rare_Diseases_GeneRIF_ARCHS4_Predictions', 'Virus_Perturbations_from_GEO_up', 'PFOCR_Pathways', 'IDG_Drug_Targets_2022', 'Enrichr_Libraries_Most_Popular_Genes', 'Orphanet_Augmented_2021', 'NIBR_DRUGseq_2025_up', 'GO_Biological_Process_2021', 'TRANSFAC_and_JASPAR_PWMs', 'Reactome_Pathways_2024', 'GO_Biological_Process_2023', 'Rare_Diseases_AutoRIF_ARCHS4_Predictions', 'COVID-19_Related_Gene_Sets', 'Kinase_Perturbations_from_GEO_up', 'Descartes_Cell_Types_and_Tissue_2021', 'Tabula_Muris', 'Tabula_Sapiens', 'GO_Biological_Process_2025', 'TF_Perturbations_Followed_by_Expression', 'Rummagene_kinases', 'GTEx_Aging_Signatures_2021', 'WikiPathways_2024_Human', 'Tissue_Protein_Expression_from_ProteomicsDB', 'DGIdb_Drug_Targets_2024', 'Serine_Threonine_Kinome_Atlas_2023', 'Aging_Perturbations_from_GEO_down', 'DepMap_CRISPR_GeneDependency_CellLines_2023', 'GO_Biological_Process_2013', 'GO_Biological_Process_2017b', 'GO_Biological_Process_2018', 'CORUM', 'GO_Biological_Process_2015', 'Phosphatase_Substrates_from_DEPOD', 'BioPlex_2017', 'TRRUST_Transcription_Factors_2019', 'GO_Biological_Process_2017', 'Pfam_InterPro_Domains', 'HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression', 'Pfam_Domains_2019', 'WikiPathway_2023_Human', 'Allen_Brain_Atlas_up', 'Genome_Browser_PWMs', 'NURSA_Human_Endogenous_Complexome', 'HumanCyc_2015', 'HumanCyc_2016', 'Rummagene_signatures', 'Chromosome_Location_hg19', 'Mouse_Gene_Atlas', 'ChEA_2013', 'miRTarBase_2017', 'GO_Molecular_Function_2023', 'Jensen_DISEASES', 'RNAseq_Automatic_GEO_Signatures_Human_Down', 'GO_Molecular_Function_2025', 'Rummagene-transcription-factors', 'ARCHS4_Kinases_Coexp', 'Microbe_Perturbations_from_GEO_down', 'DRUGseqr_2025_up', 'PanglaoDB_Augmented_2021', 'ENCODE_Histone_Modifications_2013', 'ENCODE_Histone_Modifications_2015', 'Achilles_fitness_decrease', 'DSigDB', 'DepMap_WG_CRISPR_Screens_Broad_CellLines_2019', 'Disease_Perturbations_from_GEO_down', 'Drug_Perturbations_from_GEO_down', 'GO_Molecular_Function_2021', 'GO_Molecular_Function_2017', 'GO_Molecular_Function_2018', 'Mitchell_Proteomics_Drug_Atlas_2023', 'GO_Molecular_Function_2013', 'GO_Molecular_Function_2015', 'Rummagene-kinases', 'TG_GATES_2020', 'KEGG_2021_Human', 'HMDB_Metabolites', 'LINCS_L1000_CRISPR_KO_Consensus_Sigs']

Step 3: Run Enrichment

To perform enrichment analysis, call the grea.pheno_prerank_enrich(rank_df, libraries) function. You can customize the analysis using the following arguments:

  • prob_method: Method for p-value calculation. Currently supports 'perm' for permutation-based testing.
  • n_perm: Number of permutations to use for estimating the null distribution.

The function returns a GREA object containing all enrichment results, including enrichment scores and statistical significance for each library term.

%autoreload

from grea import grea
libraries = ['MSigDB_Hallmark_2020']
n_perm = 10
prob_method = 'perm'
obj = grea.pheno_prerank_enrich(rank_df, libraries,n_perm=n_perm, prob_method=prob_method)
obj
---Finished: Load MSigDB_Hallmark_2020 with 50 terms.
Low numer of permutations can lead to inaccurate p-value estimation. Symmetric Gamma distribution enabled to increase accuracy.
---WARMING: 99.0% of entries has zero overlap ratio.
Please check the consistency (upper/lower case) of signature names in rand_df and libraries.
Current rand_df sig name: ADO, library sig name: MARCKS
Current setting - sig_upper=True

<grea.grea._GREA at 0x270b0883020>

Step 4: Check Enrichment Results

The GREA object stores all enrichment results, including enrichment scores and statistical significance for each library term. GREA supports three types of enrichment scores, each reflecting a different scoring strategy:

  • 'KS-ES': Kolmogorov–Smirnov-based Enrichment Score, capturing the peak deviation between hit and miss distributions.
  • 'KS-ESD': KS-based enrichment Score Difference, the sum of the maximum positive and negative deviations from the running score.
  • 'RC-AUC': Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.
  • 'RC-nAUC': The normalized Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.
  • 'nRC-AUC': Area Under the normalized Recovery Curve, summarizing early enrichment of target genes along the ranking, ranges from 0 to 1.

You can select the appropriate metric depending on your analysis goal or data characteristics.

To retrieve the enrichment results as a long DataFrame, use the get_enrich_results(metric) function.

%autoreload

df = obj.get_enrich_results(metric='KS-ES')
df.head()
Term Obs KS-ES Prob_method KS-ES_pval N_lead_sigs Lead_sigs KS-ES_fdr KS-ES_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 0.586199 perm 0.01 104 HTRA1;IL15;MGP;PCOLCE;BGN;CCN1;CTHRC1;VIM;IGFB... 0.013889 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 0.568802 perm 0.01 20 MYH9;PML;THY1;NRP2;ETS2;CRMP1;L1CAM;OPHN1;ACHE... 0.013889 0.394994
39 MSigDB_Hallmark_2020|Angiogenesis Case vs. Control 0.563786 perm 0.01 19 LUM;VAV2;TIMP1;JAG2;STC1;APP;TNFRSF21;VCAN;POS... 0.013889 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 0.530405 perm 0.01 39 MST1;CAPN2;HTRA1;PLAT;A2M;C3;GSN;CRIP2;CFD;C2;... 0.013889 0.394994
49 MSigDB_Hallmark_2020|Pancreas Beta Cells Case vs. Control 0.504512 perm 0.01 7 STXBP1;ABCC8;LMO2;SRP14;ELP4;DPP4;PAK3 0.013889 0.394994
%autoreload

df = obj.get_enrich_results(metric='KS-ESD')
df.head()
Term Obs KS-ESD Prob_method KS-ESD_pval N_lead_sigs Lead_sigs KS-ESD_fdr KS-ESD_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 0.582130 perm 0.01 104 HTRA1;IL15;MGP;PCOLCE;BGN;CCN1;CTHRC1;VIM;IGFB... 0.012821 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 0.558907 perm 0.01 20 MYH9;PML;THY1;NRP2;ETS2;CRMP1;L1CAM;OPHN1;ACHE... 0.012821 0.394994
39 MSigDB_Hallmark_2020|Angiogenesis Case vs. Control 0.541065 perm 0.01 19 LUM;VAV2;TIMP1;JAG2;STC1;APP;TNFRSF21;VCAN;POS... 0.012821 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 0.529377 perm 0.01 39 MST1;CAPN2;HTRA1;PLAT;A2M;C3;GSN;CRIP2;CFD;C2;... 0.012821 0.394994
17 MSigDB_Hallmark_2020|Interferon Alpha Response Case vs. Control 0.476122 perm 0.01 52 CD74;IL15;MOV10;LGALS3BP;IFITM1;DHX58;UBA7;IFI... 0.012821 0.394994
%autoreload

df = obj.get_enrich_results(metric='RC-AUC')
df.head()
Term Obs RC-AUC Prob_method RC-AUC_pval RC-AUC_fdr RC-AUC_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 364.765125 perm 0.01 0.033333 0.394994
19 MSigDB_Hallmark_2020|Apical Junction Case vs. Control 290.515232 perm 0.01 0.033333 0.394994
18 MSigDB_Hallmark_2020|Interferon Gamma Response Case vs. Control 279.576303 perm 0.01 0.033333 0.394994
15 MSigDB_Hallmark_2020|Myogenesis Case vs. Control 252.340210 perm 0.01 0.033333 0.394994
22 MSigDB_Hallmark_2020|Complement Case vs. Control 229.770458 perm 0.01 0.033333 0.394994
%autoreload

df = obj.get_enrich_results(metric='RC-nAUC')
df.head()
Term Obs RC-nAUC Prob_method RC-nAUC_pval RC-nAUC_fdr RC-nAUC_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 1.961103 perm 0.01 0.033333 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 1.929050 perm 0.01 0.033333 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 1.777970 perm 0.01 0.033333 0.394994
17 MSigDB_Hallmark_2020|Interferon Alpha Response Case vs. Control 1.664051 perm 0.01 0.033333 0.394994
19 MSigDB_Hallmark_2020|Apical Junction Case vs. Control 1.660087 perm 0.01 0.033333 0.394994
%autoreload

df = obj.get_enrich_results(metric='nRC-AUC')
df.head()
Term Obs nRC-AUC Prob_method nRC-AUC_pval nRC-AUC_fdr nRC-AUC_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 0.828970 perm 0.01 0.021739 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 0.818242 perm 0.01 0.021739 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 0.806329 perm 0.01 0.021739 0.394994
39 MSigDB_Hallmark_2020|Angiogenesis Case vs. Control 0.785538 perm 0.01 0.021739 0.394994
17 MSigDB_Hallmark_2020|Interferon Alpha Response Case vs. Control 0.758177 perm 0.01 0.021739 0.394994

Step 5: Visualize Enrichment Results

To visualize the enrichment results, use the pl_running_sum(metric, term, pheno_id) function by specifying the desired metric, term, and target phenotype.

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('KS-ES', term, pheno_id)
fig
No description has been provided for this image
%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('KS-ESD', term, pheno_id)
fig
No description has been provided for this image
%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('RC-AUC', term, pheno_id)
fig
No description has been provided for this image
%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('RC-nAUC', term, pheno_id)
fig
No description has been provided for this image
%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('nRC-AUC', term, pheno_id)
fig
No description has been provided for this image