-

Benchmarking Germline and Somatic Variant Calling Pipelines Using CoSAP and Galaxy
Benchmarking Germline and Somatic Variant Calling Pipelines Using CoSAP and Galaxy A Comparative Study with 9 Pipelines on Real Exome Data Abstract This project evaluated nine variant calling pipelines using whole exome sequencing (WES) data. It involved combinations of two…
-

From Curiosity to Code: My Path into Bioinformatics
A workshop on computer-aided drug design changed everything. What began as a dream to become a doctor evolved into a passion for bioinformatics. This is the story of how curiosity, coding, and real-world projects helped me find my path—and how…
-

Getting Started with Trimmomatic for Illumina Sequencing
Trimmomatic is a vital tool for preprocessing Illumina NGS data, addressing issues like adapter contamination and low-quality bases. It offers various quality filtering methods and supports paired-end reads. As an open-source tool, it enhances mapping accuracy and is essential for…
-

Quality Control with MultiQC: A Complete Guide
MultiQC is a bioinformatics tool that consolidates results from multiple samples into a single HTML report, streamlining data analysis from high-throughput sequencing. It supports over 100 tools, providing summary statistics and visualizations. This automation enhances quality control, saves time, and…
-

Quality Control of High-Volume Sequencing Data with FastQC: A Complete Guide
High-throughput sequencing technologies have transformed genomics by enabling massive data generation in record time. But with large volumes of data comes the responsibility of ensuring its quality. This is where FastQC steps in—offering a robust quality control solution tailored for…
-

Exploring Essential Data Types and Formats in Bioinformatics: Origins and Applications
Bioinformatics merges biology and computational science to analyze complex biological data, vital for genomics, transcriptomics, and proteomics. Understanding data types like sequencing and file formats is essential for effective analysis. Each format addresses specific needs, enhancing data management and interpretation,…
-

Predicting Malaria Incidence from Climate Data Using Machine Learning
The project aimed to predict malaria incidence using climate and geographical data through machine learning, deploying a Streamlit web app for visualization across 98+ countries. With data from WHO and others, the CatBoost model achieved a 96.7% correlation. It provides…
-

AutomatedMLPack: A Python Package for End-to-End Automated Machine Learning
AutomatedMLPack is a Python package designed for streamlined automated machine learning workflows, enabling efficient data ingestion, model training, and evaluation through a command-line interface. It supports classification and regression tasks, offers flexible feature selection, and provides visualizations and evaluation reports,…
-

Building Scalable ML Applications: A Practical Approach
The project involved developing a comprehensive machine learning pipeline for classification and regression tasks, culminating in a Flask-based web application deployed on Azure. It features automated deployment via GitHub Actions, ensuring a user-friendly interface for real-time predictions. Key achievements include…
-

Dynamic Exploratory Data Analysis with Streamlit
The Dynamic Exploratory Data Analysis app simplifies EDA for users of all skill levels by allowing CSV uploads and generating insightful visualizations. Developed with Streamlit, it automates data type detection and offers various analysis modules. Key features include univariate, bivariate,…
Join 900+ subscribers
Stay in the loop with everything you need to know.
