obs_prerank_enrich()
Enrichment: Gene Interaction List with Rank Scores for Observations
Step 1: Prepare rank_df
Begin by preparing a list of gene interactions along with their preranked scores. Store this information in a DataFrame called rank_df
.
- The index of
rank_df
should be the gene interaction symbols. e.g.A1CF_APOB
- The columns of
rank_df
should correspond to observation IDs (e.g., sample or cell identifier).
This format allows obs_prerank_enrich()
to perform enrichment analysis using observation-level ranked interaction-level statistics.
In this tutorial, we use gene interaction data from the GSE30550 dataset for H3N2-infected subjects sampling across time.
For each sample, gene–gene interactions and their corresponding entropy-based Critical Transition (CT) scores were precomputed by CRISGI (grea/data/GSE30550_H3N2_CRISGI_score.csv
).
%load_ext autoreload
import pandas as pd
rank_df = pd.read_csv("grea/db/GSE30550_H3N2_CRISGI_score.csv", index_col=0)
rank_df.head()
Step 2: Preparing Gene Set Libraries
There are several ways to prepare gene set libraries for use in GREA:
Option 1: Use Built-in Libraries
Simply specify the libraries
you're interested in as a list. For example:
libraries = ['KEGG_2021_Human']
You can use the grea.library.list_libraries()
function to view all available pathway libraries included in GREA.
Option 2: Load from GMT File
You can load external gene set libraries from .gmt
files using:
libraries = read_gmt('your_library_file.gmt')
Option 3: Define a Custom Library
Create your own gene set library using a Python dictionary, where each key is a pathway name and the corresponding value is a list of genes:
libraries = {
'term1': ['A2M', 'APP'],
'term2': ['A1CF', 'APOB']
}
%autoreload
from grea.library import list_libraries
print(list_libraries())
Step 3: Run Enrichment
To perform enrichment analysis, call the grea.obs_prerank_enrich(rank_df, libraries)
function. The observation-level enrichment do not estimate the p-value for efficiency. You can customize the analysis using the following arguments:
sig_sep
: The delimiter used to separate gene names in an interaction string (e.g., setsig_sep='_'
for interactions likeA2M_AMBP
).
The function returns a GREA object containing all enrichment results, including enrichment scores and statistical significance for each library term.
%autoreload
from grea import grea
libraries = ['KEGG_2021_Human']
sig_sep = '_'
obj = grea.obs_prerank_enrich(rank_df, libraries, sig_sep=sig_sep)
obj
Step 4: Check Enrichment Results
The GREA
object stores all enrichment results, including enrichment scores and statistical significance for each library term. GREA supports three types of enrichment scores, each reflecting a different scoring strategy:
'KS-ES'
: Kolmogorov–Smirnov-based Enrichment Score, capturing the peak deviation between hit and miss distributions.'KS-ESD'
: KS-based enrichment Score Difference, the sum of the maximum positive and negative deviations from the running score.'RC-AUC'
: Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.'RC-nAUC'
: The normalized Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.'nRC-AUC'
: Area Under the normalized Recovery Curve, summarizing early enrichment of target genes along the ranking, ranges from 0 to 1.
You can select the appropriate metric depending on your analysis goal or data characteristics.
To retrieve the enrichment results as a long DataFrame, use the get_enrich_results(metric)
function, as a wide DataFrame, use the get_enrich_score(metric)
.
%autoreload
df = obj.get_enrich_results(metric='KS-ES')
df.head()
%autoreload
df = obj.get_enrich_score(metric='KS-ES')
df.head()
%autoreload
df = obj.get_enrich_results(metric='KS-ESD')
df.head()
%autoreload
df = obj.get_enrich_score(metric='KS-ESD')
df.head()
%autoreload
df = obj.get_enrich_results(metric='RC-AUC')
df.head()
%autoreload
df = obj.get_enrich_score(metric='RC-AUC')
df.head()
%autoreload
df = obj.get_enrich_results(metric='RC-nAUC')
df.head()
%autoreload
df = obj.get_enrich_score(metric='RC-nAUC')
df.head()
%autoreload
df = obj.get_enrich_results(metric='nRC-AUC')
df.head()
%autoreload
df = obj.get_enrich_score(metric='nRC-AUC')
df.head()
Step 5: Visualize Enrichment Results
To visualize the enrichment results, use the pl_running_sum(metric, term, obs_id)
function by specifying the desired metric, term, and observation ID.
%autoreload
term = 'KEGG_2021_Human|Oocyte meiosis'
obs_id = 's1 -24h'
fig = obj.pl_running_sum('KS-ES', term, obs_id)
fig
%autoreload
term = 'KEGG_2021_Human|Oocyte meiosis'
obs_id = 's1 -24h'
fig = obj.pl_running_sum('KS-ESD', term, obs_id)
fig
%autoreload
term = 'KEGG_2021_Human|Oocyte meiosis'
obs_id = 's1 -24h'
fig = obj.pl_running_sum('RC-AUC', term, obs_id)
fig
%autoreload
term = 'KEGG_2021_Human|Oocyte meiosis'
obs_id = 's1 -24h'
fig = obj.pl_running_sum('RC-nAUC', term, obs_id)
fig
%autoreload
term = 'KEGG_2021_Human|Oocyte meiosis'
obs_id = 's1 -24h'
fig = obj.pl_running_sum('nRC-AUC', term, obs_id)
fig