Skip to content

pheno_prerank_enrich() Enrichment: Gene List with Rank Scores for Phenotypes

Step 1: Prepare rank_df

Begin by preparing a list of genes along with their preranked scores. Store this information in a DataFrame called rank_df.

  • The index of rank_df should be the gene symbols.
  • The columns of rank_df should correspond to one target phenotype (e.g., Case vs. Control).

This format allows pheno_prerank_enrich to perform enrichment analysis using phenotype-level ranked gene-level statistics.

%load_ext autoreload

import pandas as pd
rank_df = pd.read_csv("grea/data/ageing_muscle_gtex.tsv", index_col=0)
rank_df.columns = ['Case vs. Control']
rank_df.head()
Case vs. Control
0
ADO -7.833439
CHUK -7.800920
GOLGA4 -7.787221
EIF3J -7.716298
GID4 -7.551146

Step 2: Preparing Gene Set Libraries

There are several ways to prepare gene set libraries for use in GREA:

Option 1: Use Built-in Libraries

Simply specify the libraries you're interested in as a list. For example:

libraries = ['KEGG_2021_Human', 'MSigDB_Hallmark_2020']

You can use the grea.library.list_libraries() function to view all available pathway libraries included in GREA.

Option 2: Load from GMT File

You can load external gene set libraries from .gmt files using:

libraries = read_gmt('your_library_file.gmt')
Option 3: Define a Custom Library

Create your own gene set library using a Python dictionary, where each key is a pathway name and the corresponding value is a list of genes:

libraries = {
    'term1': ['GOLGA4', 'GID4'],
    'term2': ['ADO', 'CHUK']
}
from grea.library import list_libraries

print(list_libraries())
['GeneSigDB', 'Enrichr_Submissions_TF-Gene_Coocurrence', 'SysMyo_Muscle_Gene_Sets', 'WikiPathway_2021_Human', 'HomoloGene', 'WikiPathways_2013', 'PFOCR_Pathways_2023', 'OMIM_Disease', 'Data_Acquisition_Method_Most_Popular_Genes', 'NIBR_Jensen_DISEASES_Curated_2025', 'Cancer_Cell_Line_Encyclopedia', 'WikiPathways_2016', 'WikiPathways_2015', 'RNAseq_Automatic_GEO_Signatures_Human_Up', 'Human_Gene_Atlas', 'KOMP2_Mouse_Phenotypes_2022', 'MoTrPAC_2023', 'Kinase_Perturbations_from_GEO_down', 'Disease_Signatures_from_GEO_down_2014', 'Disease_Perturbations_from_GEO_up', 'Old_CMAP_down', 'MCF7_Perturbations_from_GEO_up', 'NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions', 'DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019', 'PPI_Hub_Proteins', 'Disease_Signatures_from_GEO_up_2014', 'GTEx_Tissue_Expression_Up', 'NIBR_DRUGseq_2025_down', 'L1000_Kinase_and_GPCR_Perturbations_up', 'ARCHS4_Cell-lines', 'VirusMINT', 'KEGG_2019_Human', 'ARCHS4_Tissues', 'MGI_Mammalian_Phenotype_Level_4', 'The_Kinase_Library_2024', 'The_Kinase_Library_2023', 'MGI_Mammalian_Phenotype_Level_3', 'InterPro_Domains_2019', 'WikiPathways_2024_Mouse', 'DRUGseq_2025_up', 'KEGG_2015', 'MSigDB_Computational', 'KEGG_2013', 'TF-LOF_Expression_from_GEO', 'GWAS_Catalog_2019', 'KEGG_2016', 'NCI-Nature_2015', 'NCI-Nature_2016', 'CCLE_Proteomics_2020', 'PheWeb_2019', 'GeDiPNet_2023', 'RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO', 'LINCS_L1000_Chem_Pert_down', 'Old_CMAP_up', 'LINCS_L1000_Ligand_Perturbations_down', 'Enrichr_Users_Contributed_Lists_2020', 'NIH_Funded_PIs_2017_Human_GeneRIF', 'Jensen_TISSUES', 'Azimuth_Cell_Types_2021', 'DisGeNET', 'Panther_2016', 'LINCS_L1000_Ligand_Perturbations_up', 'Rare_Diseases_AutoRIF_Gene_Lists', 'Achilles_fitness_increase', 'TargetScan_microRNA', 'Panther_2015', 'WikiPathways_2019_Mouse', 'ARCHS4_TFs_Coexp', 'LINCS_L1000_Chem_Pert_up', 'MSigDB_Oncogenic_Signatures', 'Gene_Perturbations_from_GEO_down', 'Table_Mining_of_CRISPR_Studies', 'Rare_Diseases_GeneRIF_Gene_Lists', 'Ligand_Perturbations_from_GEO_down', 'SILAC_Phosphoproteomics', 'Ligand_Perturbations_from_GEO_up', 'Drug_Perturbations_from_GEO_up', 'SynGO_2022', 'MGI_Mammalian_Phenotype_Level_4_2024', 'SynGO_2024', 'Allen_Brain_Atlas_10x_scRNA_2021', 'MGI_Mammalian_Phenotype_Level_4_2021', 'ClinVar_2019', 'GWAS_Catalog_2023', 'MAGMA_Drugs_and_Diseases', 'KEA_2015', 'KEA_2013', 'Microbe_Perturbations_from_GEO_up', 'Chromosome_Location', 'COVID-19_Related_Gene_Sets_2021', 'MGI_Mammalian_Phenotype_Level_4_2019', 'DRUGseqr_2025_down', 'ARCHS4_IDG_Coexp', 'NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions', 'Jensen_DISEASES_Experimental_2025', 'lncHUB_lncRNA_Co-Expression', 'PerturbAtlas', 'DrugMatrix', 'Virus_Perturbations_from_GEO_down', 'huMAP', 'L1000_Kinase_and_GPCR_Perturbations_down', 'Elsevier_Pathway_Collection', 'NIH_Funded_PIs_2017_Human_AutoRIF', 'Diabetes_Perturbations_GEO_2022', 'ESCAPE', 'RNAseq_Automatic_GEO_Signatures_Mouse_Down', 'UK_Biobank_GWAS_v1', 'Aging_Perturbations_from_GEO_up', 'Human_Phenotype_Ontology', 'Jensen_DISEASES_Curated_2025', 'Proteomics_Drug_Atlas_2023', 'dbGaP', 'SubCell_BarCode', 'Transcription_Factor_PPIs', 'GO_Cellular_Component_2017b', 'HuBMAP_ASCTplusB_augmented_2022', 'MSigDB_Hallmark_2020', 'GlyGen_Glycosylated_Proteins_2022', 'MAGNET_2023', 'CellMarker_2024', 'BioPlanet_2019', 'HDSigDB_Human_2021', 'GTEx_Tissues_V8_2023', 'GTEx_Tissue_Expression_Down', 'Metabolomics_Workbench_Metabolites_2022', 'Tissue_Protein_Expression_from_Human_Proteome_Map', 'Epigenomics_Roadmap_HM_ChIP-seq', 'PhenGenI_Association_2021', 'MCF7_Perturbations_from_GEO_down', 'ProteomicsDB_2020', 'Virus-Host_PPI_P-HIPSTer_2020', 'OMIM_Expanded', 'Reactome_2022', 'Genes_Associated_with_NIH_Grants', 'CellMarker_Augmented_2021', 'ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X', 'HDSigDB_Mouse_2021', 'Jensen_COMPARTMENTS', 'ChEA_2015', 'ChEA_2016', 'KEGG_2019_Mouse', 'ChEA_2022', 'LINCS_L1000_Chem_Pert_Consensus_Sigs', 'Drug_Perturbations_from_GEO_2014', 'TargetScan_microRNA_2017', 'KEGG_2021_Mouse', 'Allen_Brain_Atlas_down', 'WikiPathways_2019_Human', 'Reactome_2013', 'BioCarta_2013', 'Rummagene_transcription_factors', 'Gene_Perturbations_from_GEO_up', 'GO_Cellular_Component_2015', 'Rummagene-signatures', 'GO_Cellular_Component_2013', 'BioCarta_2016', 'NIBR_Jensen_DISEASES_Experimental_2025', 'BioCarta_2015', 'Reactome_2015', 'Reactome_2016', 'GO_Cellular_Component_2018', 'GO_Cellular_Component_2017', 'GO_Cellular_Component_2023', 'GO_Cellular_Component_2021', 'WikiPathway_2021_Mouse', 'ENCODE_TF_ChIP-seq_2015', 'ENCODE_TF_ChIP-seq_2014', 'RNAseq_Automatic_GEO_Signatures_Mouse_Up', 'GO_Molecular_Function_2017b', 'DRUGseq_2025_down', 'FANTOM6_lncRNA_KD_DEGs', 'MGI_Mammalian_Phenotype_2013', 'GO_Cellular_Component_2025', 'HMS_LINCS_KinomeScan', 'NCI-60_Cancer_Cell_Lines', 'Azimuth_2023', 'MGI_Mammalian_Phenotype_2017', 'Rare_Diseases_GeneRIF_ARCHS4_Predictions', 'Virus_Perturbations_from_GEO_up', 'PFOCR_Pathways', 'IDG_Drug_Targets_2022', 'Enrichr_Libraries_Most_Popular_Genes', 'Orphanet_Augmented_2021', 'NIBR_DRUGseq_2025_up', 'GO_Biological_Process_2021', 'TRANSFAC_and_JASPAR_PWMs', 'Reactome_Pathways_2024', 'GO_Biological_Process_2023', 'Rare_Diseases_AutoRIF_ARCHS4_Predictions', 'COVID-19_Related_Gene_Sets', 'Kinase_Perturbations_from_GEO_up', 'Descartes_Cell_Types_and_Tissue_2021', 'Tabula_Muris', 'Tabula_Sapiens', 'GO_Biological_Process_2025', 'TF_Perturbations_Followed_by_Expression', 'Rummagene_kinases', 'GTEx_Aging_Signatures_2021', 'WikiPathways_2024_Human', 'Tissue_Protein_Expression_from_ProteomicsDB', 'DGIdb_Drug_Targets_2024', 'Serine_Threonine_Kinome_Atlas_2023', 'Aging_Perturbations_from_GEO_down', 'DepMap_CRISPR_GeneDependency_CellLines_2023', 'GO_Biological_Process_2013', 'GO_Biological_Process_2017b', 'GO_Biological_Process_2018', 'CORUM', 'GO_Biological_Process_2015', 'Phosphatase_Substrates_from_DEPOD', 'BioPlex_2017', 'TRRUST_Transcription_Factors_2019', 'GO_Biological_Process_2017', 'Pfam_InterPro_Domains', 'HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression', 'Pfam_Domains_2019', 'WikiPathway_2023_Human', 'Allen_Brain_Atlas_up', 'Genome_Browser_PWMs', 'NURSA_Human_Endogenous_Complexome', 'HumanCyc_2015', 'HumanCyc_2016', 'Rummagene_signatures', 'Chromosome_Location_hg19', 'Mouse_Gene_Atlas', 'ChEA_2013', 'miRTarBase_2017', 'GO_Molecular_Function_2023', 'Jensen_DISEASES', 'RNAseq_Automatic_GEO_Signatures_Human_Down', 'GO_Molecular_Function_2025', 'Rummagene-transcription-factors', 'ARCHS4_Kinases_Coexp', 'Microbe_Perturbations_from_GEO_down', 'DRUGseqr_2025_up', 'PanglaoDB_Augmented_2021', 'ENCODE_Histone_Modifications_2013', 'ENCODE_Histone_Modifications_2015', 'Achilles_fitness_decrease', 'DSigDB', 'DepMap_WG_CRISPR_Screens_Broad_CellLines_2019', 'Disease_Perturbations_from_GEO_down', 'Drug_Perturbations_from_GEO_down', 'GO_Molecular_Function_2021', 'GO_Molecular_Function_2017', 'GO_Molecular_Function_2018', 'Mitchell_Proteomics_Drug_Atlas_2023', 'GO_Molecular_Function_2013', 'GO_Molecular_Function_2015', 'Rummagene-kinases', 'TG_GATES_2020', 'KEGG_2021_Human', 'HMDB_Metabolites', 'LINCS_L1000_CRISPR_KO_Consensus_Sigs']

Step 3: Run Enrichment

To perform enrichment analysis, call the grea.pheno_prerank_enrich(rank_df, libraries) function. You can customize the analysis using the following arguments:

  • prob_method: Method for p-value calculation. Currently supports 'perm' for permutation-based testing.
  • n_perm: Number of permutations to use for estimating the null distribution.

The function returns a GREA object containing all enrichment results, including enrichment scores and statistical significance for each library term.

%autoreload

from grea import grea
libraries = ['MSigDB_Hallmark_2020']
n_perm = 10
prob_method = 'perm'
obj = grea.pheno_prerank_enrich(rank_df, libraries,n_perm=n_perm, prob_method=prob_method)
obj
---Finished: Load MSigDB_Hallmark_2020 with 50 terms.
Low numer of permutations can lead to inaccurate p-value estimation. Symmetric Gamma distribution enabled to increase accuracy.

<grea.grea._GREA at 0x126397350>

Step 4: Check Enrichment Results

The GREA object stores all enrichment results, including enrichment scores and statistical significance for each library term. GREA supports three types of enrichment scores, each reflecting a different scoring strategy:

  • 'KS-ES': Kolmogorov–Smirnov-based Enrichment Score, capturing the peak deviation between hit and miss distributions.
  • 'KS-ESD': KS-based enrichment Score Difference, the sum of the maximum positive and negative deviations from the running score.
  • 'RC-AUC': Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.

You can select the appropriate metric depending on your analysis goal or data characteristics.

To retrieve the enrichment results as a long DataFrame, use the get_enrich_results(metric) function.

%autoreload

df = obj.get_enrich_results(metric='KS-ES')
df.head()
Term Obs KS-ES Prob_method KS-ES_pval N_lead_sigs Lead_sigs KS-ES_fdr KS-ES_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 0.586199 perm 0.01 104 HTRA1;IL15;MGP;PCOLCE;BGN;CCN1;CTHRC1;VIM;IGFB... 0.013889 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 0.568802 perm 0.01 20 MYH9;PML;THY1;NRP2;ETS2;CRMP1;L1CAM;OPHN1;ACHE... 0.013889 0.394994
39 MSigDB_Hallmark_2020|Angiogenesis Case vs. Control 0.563786 perm 0.01 19 LUM;VAV2;TIMP1;JAG2;STC1;APP;TNFRSF21;VCAN;POS... 0.013889 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 0.530405 perm 0.01 39 MST1;CAPN2;HTRA1;PLAT;A2M;C3;GSN;CRIP2;CFD;C2;... 0.013889 0.394994
49 MSigDB_Hallmark_2020|Pancreas Beta Cells Case vs. Control 0.504512 perm 0.01 7 STXBP1;ABCC8;LMO2;SRP14;ELP4;DPP4;PAK3 0.013889 0.394994
%autoreload

df = obj.get_enrich_results(metric='KS-ESD')
df.head()
Term Obs KS-ESD Prob_method KS-ESD_pval N_lead_sigs Lead_sigs KS-ESD_fdr KS-ESD_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 0.582130 perm 0.01 104 HTRA1;IL15;MGP;PCOLCE;BGN;CCN1;CTHRC1;VIM;IGFB... 0.012821 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 0.558907 perm 0.01 20 MYH9;PML;THY1;NRP2;ETS2;CRMP1;L1CAM;OPHN1;ACHE... 0.012821 0.394994
39 MSigDB_Hallmark_2020|Angiogenesis Case vs. Control 0.541065 perm 0.01 19 LUM;VAV2;TIMP1;JAG2;STC1;APP;TNFRSF21;VCAN;POS... 0.012821 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 0.529377 perm 0.01 39 MST1;CAPN2;HTRA1;PLAT;A2M;C3;GSN;CRIP2;CFD;C2;... 0.012821 0.394994
17 MSigDB_Hallmark_2020|Interferon Alpha Response Case vs. Control 0.476122 perm 0.01 52 CD74;IL15;MOV10;LGALS3BP;IFITM1;DHX58;UBA7;IFI... 0.012821 0.394994
%autoreload

df = obj.get_enrich_results(metric='RC-AUC')
df.head()
Term Obs RC-AUC Prob_method RC-AUC_pval RC-AUC_fdr RC-AUC_sidak
29 MSigDB_Hallmark_2020|Epithelial Mesenchymal Tr... Case vs. Control 0.828970 perm 0.01 0.021739 0.394994
21 MSigDB_Hallmark_2020|Hedgehog Signaling Case vs. Control 0.818242 perm 0.01 0.021739 0.394994
41 MSigDB_Hallmark_2020|Coagulation Case vs. Control 0.806329 perm 0.01 0.021739 0.394994
39 MSigDB_Hallmark_2020|Angiogenesis Case vs. Control 0.785538 perm 0.01 0.021739 0.394994
17 MSigDB_Hallmark_2020|Interferon Alpha Response Case vs. Control 0.758177 perm 0.01 0.021739 0.394994

Step 5: Visualize Enrichment Results

To visualize the enrichment results, use the pl_running_sum(metric, term, pheno_id) function by specifying the desired metric, term, and target phenotype.

%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('KS-ES', term, pheno_id)
fig
No description has been provided for this image
No description has been provided for this image
%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('KS-ESD', term, pheno_id)
fig
No description has been provided for this image
%autoreload
term = 'MSigDB_Hallmark_2020|Hedgehog Signaling'
pheno_id = 'Case vs. Control'
fig = obj.pl_running_sum('RC-AUC', term, pheno_id)
fig
No description has been provided for this image