obs_prerank_enrich()
Enrichment: Gene List with Rank Scores for Observations
Step 1: Prepare rank_df
Begin by preparing a list of genes along with their preranked scores (e.g., gene expression). Store this information in a DataFrame called rank_df
.
- The index of
rank_df
should be the gene symbols. - The columns of
rank_df
should correspond to one observation ID (e.g., sample or cell identifier).
This format allows obs_prerank_enrich()
to perform enrichment analysis using observation-level ranked gene-level statistics.
%load_ext autoreload
import pandas as pd
rank_df = pd.read_csv("grea/db/GSE13268_expr.csv", index_col=0)
rank_df.head()
Step 2: Preparing Gene Set Libraries
There are several ways to prepare gene set libraries for use in GREA:
Option 1: Use Built-in Libraries
Simply specify the libraries
you're interested in as a list. For example:
libraries = ['KEGG_2019_Mouse', 'WikiPathways_2024_Mouse']
You can use the grea.library.list_libraries()
function to view all available pathway libraries included in GREA.
Option 2: Load from GMT File
You can load external gene set libraries from .gmt
files using:
libraries = read_gmt('your_library_file.gmt')
Option 3: Define a Custom Library
Create your own gene set library using a Python dictionary, where each key is a pathway name and the corresponding value is a list of genes:
libraries = {
'term1': ['A2ml1', 'A1cf'],
'term2': ['A3galt2', 'A4galt']
}
%autoreload
from grea.library import list_libraries
print(list_libraries())
Step 3: Run Enrichment
To perform enrichment analysis, call the grea.obs_prerank_enrich(rank_df, libraries)
function.
The observation-level enrichment do not estimate the p-value for efficiency.
The function returns a GREA object containing all enrichment results, including enrichment scores and statistical significance for each library term.
%autoreload
from grea import grea
libraries = ['WikiPathways_2024_Mouse']
obj = grea.obs_prerank_enrich(rank_df, libraries)
obj
Step 4: Check Enrichment Results
The GREA
object stores all enrichment results, including enrichment scores and statistical significance for each library term. GREA supports three types of enrichment scores, each reflecting a different scoring strategy:
'KS-ES'
: Kolmogorov–Smirnov-based Enrichment Score, capturing the peak deviation between hit and miss distributions.'KS-ESD'
: KS-based enrichment Score Difference, the sum of the maximum positive and negative deviations from the running score.'RC-AUC'
: Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.'RC-nAUC'
: The normalized Area Under the Recovery Curve, summarizing early enrichment of target genes along the ranking.'nRC-AUC'
: Area Under the normalized Recovery Curve, summarizing early enrichment of target genes along the ranking, ranges from 0 to 1.
You can select the appropriate metric depending on your analysis goal or data characteristics.
To retrieve the enrichment results as a long DataFrame, use the get_enrich_results(metric)
function, as a wide DataFrame, use the get_enrich_score(metric)
.
%autoreload
df = obj.get_enrich_results(metric='KS-ES')
df
%autoreload
df = obj.get_enrich_score(metric='KS-ES')
df
%autoreload
df = obj.get_enrich_results(metric='KS-ESD')
df
%autoreload
df = obj.get_enrich_score(metric='KS-ESD')
df
%autoreload
df = obj.get_enrich_results(metric='RC-AUC')
df
%autoreload
df = obj.get_enrich_score(metric='RC-AUC')
df
%autoreload
df = obj.get_enrich_results(metric='RC-nAUC')
df
%autoreload
df = obj.get_enrich_score(metric='RC-nAUC')
df
Step 5: Visualize Enrichment Results
To visualize the enrichment results, use the pl_running_sum(metric, term, obs_id)
function by specifying the desired metric, term, and observation ID.
%autoreload
term = 'WikiPathways_2024_Mouse|TCA Cycle WP434'
obs_id = 'GSM334850'
fig = obj.pl_running_sum('KS-ES', term, obs_id)
fig
%autoreload
term = 'WikiPathways_2024_Mouse|TCA Cycle WP434'
obs_id = 'GSM334850'
fig = obj.pl_running_sum('KS-ESD', term, obs_id)
fig
%autoreload
term = 'WikiPathways_2024_Mouse|TCA Cycle WP434'
obs_id = 'GSM334850'
fig = obj.pl_running_sum('RC-AUC', term, obs_id)
fig
%autoreload
term = 'WikiPathways_2024_Mouse|TCA Cycle WP434'
obs_id = 'GSM334850'
fig = obj.pl_running_sum('RC-nAUC', term, obs_id)
fig
%autoreload
term = 'WikiPathways_2024_Mouse|TCA Cycle WP434'
obs_id = 'GSM334850'
fig = obj.pl_running_sum('nRC-AUC', term, obs_id)
fig