CompuBio Platform

Module Overview

The Bacteria Analysis module performs taxonomic profiling of bacterial communities from sequencing data, enabling detailed exploration of microbial composition and diversity.

What This Module Does

This module is designed to answer a fundamental question:

Which bacteria are present in my samples, and in what relative abundance?

It provides a standardized, reproducible workflow for profiling bacterial communities across samples, cohorts, or experimental conditions.

Common use cases include:

Microbiome research
Environmental sampling
Host-associated microbial studies
Comparative cohort analysis

Input Data

Required Inputs

FASTQ files
- Paired-end or single-end sequencing reads
- Supports common Illumina formats

Optional Inputs

Metadata table (CSV / TSV)
- Sample identifiers
- Cohort or condition labels
- Experimental variables

Metadata Best Practice

Including rich metadata enables stratified analysis, cohort comparisons, and more informative visualizations.

Processing & Algorithms

The Bacteria Analysis pipeline combines well-established tools with additional normalization and filtering layers.

🔬 Core Algorithms

Kraken2 — fast, k-mer–based taxonomic classification
Bracken — accurate abundance estimation from Kraken2 results

Additional Processing

Custom normalization layers
Low-abundance filtering
Sample-level quality checks

All steps are containerized and versioned to ensure full reproducibility.

Reproducible by Design

Algorithm versions, reference databases, and parameters are recorded for every run, enabling exact re-execution at any time.

Output & Results

After the pipeline completes, results are available through interactive dashboards and exportable artifacts.

Visual Outputs

Relative abundance bar plots
Heatmaps across samples or cohorts
Taxonomic composition summaries

Tabular Outputs

Taxonomic abundance tables
Normalized count matrices
Diversity summaries

Diversity Metrics

Shannon index
Simpson index
Observed taxa (if enabled)

Exploratory Analysis

Use interactive heatmaps and abundance plots to quickly identify dominant taxa, outliers, or cohort-specific signatures.

Best Practices

Use consistent sequencing depth across samples
Apply the same reference database when comparing cohorts
Validate metadata alignment with sample IDs
Interpret low-abundance taxa cautiously

Interpretation Note

Taxonomic assignments depend on reference databases and read quality. Results should always be interpreted in biological and experimental context.

What’s Next?

Functional profiling modules
Diversity and differential abundance analysis
Integration with host genomics and AI models