Metabox 2.0 GUI Tutorial

Kwanjeera W

2026-04-05

METABOX 2.0 GUI

Metabox 2.0 GUI runs in a web browser and provides interactive data analysis and visualization through a user-friendly interface. It is designed for users with little or no experience in R programming. Below are alternative ways to use the Metabox 2.0 GUI:

Use Metabox 2.0 as an R package

##### INITIALIZATION #####
library(metabox2) #load the package
launch_gui()

Use Metabox 2.0 with Docker

cd [go to Dockerfile location]
docker build --no-cache=true --platform linux/x86_64 -t metaboxweb .
docker run --name mbdocker -p 8081:3838 metaboxweb

Use Metabox 2.0 via the online server

CONTENTS

This tutorial shows how to use the Metabox 2.0 GUI across five modules. Figure 1 provides an overview of the pipeline.


Figure 1. Overview of Metabox 2.0 GUI containing three key pipelines and five modules

Go to module:

PREPARE INPUT DATA

The input dataset should be a CSV table containing metadata columns along with feature variables, as illustrated in Table 1. Samples are in rows, and variables (features) are in columns. Experimental information—such as batch, sample type, and injection order—is required for IS- and QC-based normalization.

Table 1. Format of the input dataset. This example table includes experimental information (brown), metadata (blue), and metabolite features.

Sample SampleGroup RunOrder Batch SampleType IS_D3-Alanine IS_C13-Phenylalanine L-Alanine L-Valine Leucine
Sample11 DM 14 Batch1 Sample 3415100 2037269 3170319 3053960 918680
Sample14 DM 71 Batch1 Sample 6082869 4748038 5167928 4157701 1654664
Sample16 DM 19 Batch2 Sample 13975619 9542307 12652652 10157293 4233637
Sample20 KF 23 Batch2 Sample 14006790 9625350 7125928 3672381 525327
Sample23 KF 27 Batch2 Sample 6160146 4026638 3203784 2576521 816807
Sample25 KF 29 Batch2 Sample 11925818 10293816 7917620 5083482 2436176

Tip: The Metabox 2.0 GUI provides example datasets for testing all modules. To download the data, navigate to More > Example Data (see Figure 2).


Figure 2. How to access example datasets

Module 1-DATA PROCESSING

Step: Open the GUI → Click the DATA PROCESSING (1) button → Upload the data → Start the data processing steps

For this tutorial, download the GCGC_DM dataset (GCGC_DM_Samples.csv) from the provided Example Data (Figure 2) and save it to your working directory. This tutorial walks through all steps: missing value imputation, normalization, transformation, and scaling.

Table 2. Summary of normalization, transformation, and scaling in metabolomics.

Step Purpose Correction Methods
Normalization Remove technical bias Injection variation, batch effects IS-, QC-based, median/TIC
Transformation Improve data distribution Skewness, heteroscedasticity Log, square root, cube root
Scaling Balance variable importance Dominance of high-intensity metabolites Pareto scaling, autoscaling, range scaling

Note: In practice, not all steps are required; the choice depends on the data. Users are encouraged to examine their data first.

1. Upload and setup input data

When uploading the data, specify the required arguments to setup the Metabox object (Figure 3):


Figure 3. Setup Metabox object

Output: See the Summary tab for an overview of the dataset: 75 samples across four groups, 91 metabolites, and 7% missing values.

2. Missing value imputation

This step replaces missing values using information from the existing data.

Specify the required arguments to impute missing data (Figure 4):


Figure 4. Missing value imputation

Output: See the Summary tab for an overview of the processed dataset. Several plots are provided to compare the dataset before and after imputation. Try other methods or cutoff values if preferred.

Optional: To export the imputed data, click EXPORT RESULTS

3. Normalization

This step corrects technical variation using known reference compounds, quality control (QC) samples, or statistical assumptions about the dataset.

Try IS-based normalization with CCMN method

Specify the required arguments to normalize data (Figure 5):

  • Choose normalization method: Select normalization method. → ccmn
  • Class/factor column: Select the category (factor) column. → SampleGroup
  • Internal standard column(s): Select the internal standard(s) (IS). → IS_D3-Alanine, IS_C13-Phenylalanine

The following arguments are for the ‘serrf’ method:

  • sampleType column (require at least 3 QCs): Select the sample type column.
  • injectionOrder column: Select the injection (run) order column.
  • batch column: Select the batch column.


Figure 5. CCMN normalization

Tip: Try other methods if preferred. In this demo dataset, the QC samples help indicate how much technical variance is handled.

4. Transformation and scaling

Under Data-driven processing tab (Figure 6), users can perform sample-based normalization, transformation, or scaling.

Sample-based normalization corrects unwanted variation based on statistical assumptions about the dataset. Transformation changes the distribution of the data, while scaling adjusts the relative importance of variables (features) so that metabolites with large values do not dominate the analysis, particularly in multivariate analyses.

For this tutorial, a log2 transformation is performed before proceeding to the next modules.


Figure 6. Data-driven processing

Tip: Users can proceed to STATISTICAL ANALYSIS or BIOMARKER ANALYSIS from this page, or export the processed data for other downstream analyses.

Module 2-STATISTICAL ANALYSIS

After completing data processing, click STATISTICAL ANALYSIS button to proceed.

Tip1: Users can access this module directly by Click the STATISTICAL ANALYSIS (2) button on the homepage. In this case the step will be: Open the GUI → Click the STATISTICAL ANALYSIS (2) button → Upload the data → Start the data analysis steps

Tip2: Use the tabs to perform univariate analysis, multivariate analysis, correlation analysis, or linear mixed-effects modeling.

Univariate analysis

Choose Univariate analysis tab, then perform the analysis.

Table 3. Summary of univariate statistical methods in Metabox 2.0.

Pairwise
ANOVA
Type Independent Repeated Independent Repeated Correlation Linear Modeling
Parametric t.test t.test (paired) ANOVA -> posthoc.test ANOVA -> pairwise.t.test Pearson Linear mixed-effect
Non-parametric wilcox.test/mann-whitney wilcox.test (paired) kruskal.test -> dunn.test or Scheirer Ray Hare.test -> wilcox.test (2W-ANOVA) friedman.test -> dunn.test Spearman N/A

Tip: Metabox automatically determines the test methods based on the provided arguments.

Specify the required arguments for univariate analysis (Figure 7):


Figure 7. Univariate analysis

Output1: Statistical significance plot provides the -log10(adjusted p-value) of each variable. Dashed line represents statistical significance cutoff (adjusted p-value < 0.05). This plot displays only the top 100 variables, sorted by adjusted p-values. Click on a dot to toggle its boxplot.

Output2: See the Output table tab for statistical values. If a post hoc test is performed, a list of significant pairs will be displayed.

Multivariate analysis

Choose Multivariate analysis tab, then perform PLS-DA.

Specify the required arguments for multivariate analysis (Figure 8):


Figure 8. PLS-DA

Output: Several plots are provided, including score plot, loading plot, and VIP plot, along with a table of results.

Module 3-BIOMARKER ANALYSIS

Step: Open the GUI → Click the BIOMARKER ANALYSIS (3) button → Upload the data → Start the analysis

For this tutorial, download the LC_LN dataset (LC_LN_Samples.csv) from the provided Example Data (Figure 2) and save it to your working directory. This tutorial assumes the data is already processed. The dataset contains 116 samples across two groups and 9 metabolites.

Specify the required arguments for biomarker analysis (Figure 9):


Figure 9. Biomarker analysis

Output: Several plots are provided, including variable importance plot, performance plot, along with a table of results.

Module 4-INTEGRATIVE ANALYSIS

Step: Open the GUI → Click the INTEGRATIVE ANALYSIS (4) button → Upload the data → Start the analysis

For this tutorial, download the LC_Fat_Tissue dataset (LC_Fat_Tissue.csv) and the GC_Fat_Tissue dataset (GC_Fat_Tissue.csv) from the provided Example Data (Figure 2), and save them to your working directory. Integrative analysis requires at least two datasets from the same subjects measured on different platforms. Each dataset contains 94 samples, including 24 LC metabolites and 137 GC metabolites.

1. Upload the data

For each dataset specify the required arguments for uploading the datasets (Figure 10):


Figure 10. Setup data for integrative analysis

2. Perform the analysis

Specify the required arguments for integrative analysis (Figure 11):


Figure 11. Integrative analysis

Output: Several plots are provided, including block importance plot, variable importance plot, loading plot, validation plot.

Module 5-DATA INTERPRETATION

Step: Open the GUI → Click the DATA INTERPRETATION (5) button → Upload the data → Start the analysis

Metabox 2.0 provides data interpretation in a pathway context for genes, proteins, and metabolites, as well as chemical class context for metabolites.

For this tutorial, download the metabolite_list dataset (metabolite_list.csv) from the provided Example Data (Figure 2), and save them to your working directory. It is assumed that users already have a list of metabolites and associated statistical information, as shown in the demo dataset.

Overrepresentation analysis (ORA)

For this tutorial, pathway overrepresentation analysis will be performed on a given list of metabolites using the KEGG database as the pathway resource.

Navigate to Overrepresentation tab and specify the required arguments for overrepresentation analysis (Figure 12):

Tip: In case of chemical class overrepresentation analysis, it is performed using the HMDB database as the metabolite resource.


Figure 12. Overrepresentation analysis

Output: The results table contains statistical values for each pathway, along with the list of pathway members.

Pathway enrichment analysis

Pathway enrichment analysis will be performed on a given list of metabolites, incorporating statistical values (e.g., p-, t-, or F-values) and optional directionality (e.g., fold changes), using the KEGG database as the pathway resource.

Navigate to Enrichment tab and specify the required arguments for enrichment analysis (Figure 13):


Figure 13. Enrichment analysis

Output: The results table contains statistical values for each pathway, along with the list of pathway members.