Differential Gene Expression Analysis in Alzheimer’s Disease

  • Client or Institution: Reproducible Research Project
  • Project Goal: Identify and visualize differentially expressed genes (DEGs) in Alzheimer’s disease using RNA-Seq data and statistical methods
  • Tools Used: R, DESeq2, Ggplot2, ComplexHeatmap, Python (preprocessing)
  • Duration: 3 weeks
  • Outcome: Discovered top DEGs in Alzheimer’s disease; created a public Shiny app to interactively explore MA plots, dispersion, heatmaps, PCA, and gene counts
  • Dataset: GSE53697
  • Web App: Explore Webapp

From Preprocessing to Shiny App Exploration Using DESeq2

Abstract

This project presents a full workflow for differential gene expression analysis (DGE) using RNA-Seq data from the GSE53697 dataset. After performing quantile normalization and exploratory analysis using PCA, outliers were detected and excluded. Differential analysis was then carried out using the DESeq2 package in R.

The results were visualized through a variety of plots—MA, dispersion, volcano, PCA, and heatmaps—highlighting key transcriptional changes between Alzheimer’s patients and healthy controls. A Shiny dashboard was also created to allow collaborators and researchers to explore the data interactively.

🧠 Background & Problem Statement

Understanding how gene expression is altered in Alzheimer’s is crucial for identifying potential biomarkers and therapeutic targets. However, such datasets are high-dimensional and noisy, and require careful preprocessing, normalization, and statistical testing.

This project addresses the full spectrum of this challenge:

  • Preprocess the GSE53697 RNA-Seq dataset
  • Remove noise/outliers
  • Perform DGE analysis
  • Deliver interactive plots for better interpretation

🗂 Dataset

GSE53697 is a public gene expression dataset from the GEO repository. It contains RNA-Seq data from both Alzheimer’s and Control human samples.

📌 Preprocessing

  • Quantile normalization (via preprocessCore in R)
  • PCA used to identify and exclude strong outliers
  • Raw counts retained for DESeq2 analysis

⚙️ Workflow Overview

1. Exploratory Data Analysis (EDA)

PCA Scatter Plot

PCA was used to detect outlier samples and observe clustering by disease state.

2. Differential Expression Analysis with DESeq2

  • DESeq2 was used to fit models to count data
  • Statistical tests computed log2FoldChange, p-value, and adjusted p-value (padj)
  • Significant DEGs were defined by |log2FC| > 1 and padj < 0.05

📊 Visualizations

📈 Dispersion Plot

Assesses model fit and biological variability.

🧪 MA Plot

Mean of normalized counts vs log2 fold change.

🌋 Volcano Plot

Combines significance and magnitude for gene prioritization.

🧬 Heatmap of Top DEGs

Shows clustered gene expression patterns across conditions and genders.

🧰 Outcome

  • Identified multiple upregulated genes (e.g., CP, CD44, SERPINA3)
  • Built interactive Shiny app to explore:
    • PCA
    • Volcano plots
    • MA & dispersion plots
    • Count plots for any gene
    • Heatmaps with annotation tracks
  • Shiny dashboard uses .rds object and renders plots with ggplot2, plotly, ComplexHeatmap, and shinydashboard

💡 Lessons Learned

  • PCA was essential for quality control and sample exclusion
  • Outlier removal substantially improved the clarity of DE patterns
  • Automating Shiny dashboard deployment provides an efficient way to share complex analyses

💬 Discussion

This project offers a reproducible, interpretable, and shareable pipeline for DGE analysis. It highlights best practices in both statistical rigor (DESeq2, multiple testing correction) and communication (interactive visualization). The analysis aligns well with reproducibility standards and can serve as a blueprint for similar transcriptomics projects.

📣 Call to Action

Got a GEO dataset you’d like to explore?
🔬 I can help you turn raw expression data into clear, interactive insights.
💬 Start a conversation to build your own DESeq2 pipeline or Shiny dashboard.


Discover more from Your Bioinformatics Developer

Subscribe to get the latest posts sent to your email.


Comments

Leave a comment

Discover more from Your Bioinformatics Developer

Subscribe now to keep reading and get access to the full archive.

Continue reading