Author: Cyrille Njume
-

Benchmarking Germline and Somatic Variant Calling Pipelines Using CoSAP and Galaxy
Benchmarking Germline and Somatic Variant Calling Pipelines Using CoSAP and Galaxy A Comparative Study with 9 Pipelines on Real Exome Data Abstract This project evaluated nine variant calling pipelines using whole exome sequencing (WES) data. It involved combinations of two mappers (BWA, Bowtie2) and four variant callers (DeepVariant, HaplotypeCaller, Strelka, SomaticSniper) across both germline and…
-

From Curiosity to Code: My Path into Bioinformatics
A workshop on computer-aided drug design changed everything. What began as a dream to become a doctor evolved into a passion for bioinformatics. This is the story of how curiosity, coding, and real-world projects helped me find my pathβand how I now help others find theirs.
-

Getting Started with Trimmomatic for Illumina Sequencing
Trimmomatic is a vital tool for preprocessing Illumina NGS data, addressing issues like adapter contamination and low-quality bases. It offers various quality filtering methods and supports paired-end reads. As an open-source tool, it enhances mapping accuracy and is essential for accurate downstream analyses in genomics research.
-

Quality Control with MultiQC: A Complete Guide
MultiQC is a bioinformatics tool that consolidates results from multiple samples into a single HTML report, streamlining data analysis from high-throughput sequencing. It supports over 100 tools, providing summary statistics and visualizations. This automation enhances quality control, saves time, and minimizes errors, making it ideal for various genomic workflows.
-

Quality Control of High-Volume Sequencing Data with FastQC: A Complete Guide
High-throughput sequencing technologies have transformed genomics by enabling massive data generation in record time. But with large volumes of data comes the responsibility of ensuring its quality. This is where FastQC steps inβoffering a robust quality control solution tailored for sequencing data. π What is FastQC? FastQC is a Java-based application designed to analyze sequencing…
-

Exploring Essential Data Types and Formats in Bioinformatics: Origins and Applications
Bioinformatics merges biology and computational science to analyze complex biological data, vital for genomics, transcriptomics, and proteomics. Understanding data types like sequencing and file formats is essential for effective analysis. Each format addresses specific needs, enhancing data management and interpretation, crucial for impactful discoveries in life sciences and personalized medicine.
-

Predicting Malaria Incidence from Climate Data Using Machine Learning
The project aimed to predict malaria incidence using climate and geographical data through machine learning, deploying a Streamlit web app for visualization across 98+ countries. With data from WHO and others, the CatBoost model achieved a 96.7% correlation. It provides easily accessible insights for researchers and policymakers, addressing malaria’s global health challenge.
-

AutomatedMLPack: A Python Package for End-to-End Automated Machine Learning
AutomatedMLPack is a Python package designed for streamlined automated machine learning workflows, enabling efficient data ingestion, model training, and evaluation through a command-line interface. It supports classification and regression tasks, offers flexible feature selection, and provides visualizations and evaluation reports, significantly enhancing productivity in machine learning projects.
-

Building Scalable ML Applications: A Practical Approach
The project involved developing a comprehensive machine learning pipeline for classification and regression tasks, culminating in a Flask-based web application deployed on Azure. It features automated deployment via GitHub Actions, ensuring a user-friendly interface for real-time predictions. Key achievements include a modular pipeline and seamless integration, enhancing accessibility in ML applications.
-

Dynamic Exploratory Data Analysis with Streamlit
The Dynamic Exploratory Data Analysis app simplifies EDA for users of all skill levels by allowing CSV uploads and generating insightful visualizations. Developed with Streamlit, it automates data type detection and offers various analysis modules. Key features include univariate, bivariate, and multivariate visualizations, making data exploration accessible and effective.
