Metabox 2.0 GUI runs in a web browser and provides interactive data analysis and visualization through a user-friendly interface. It is designed for users with little or no experience in R programming. Below are alternative ways to use the Metabox 2.0 GUI:
##### INITIALIZATION #####
library(metabox2) #load the package
launch_gui()
cd [go to Dockerfile location]
docker build --no-cache=true --platform linux/x86_64 -t metaboxweb .
docker run --name mbdocker -p 8081:3838 metaboxweb
This tutorial shows how to use the Metabox 2.0 GUI across five modules. Figure 1 provides an overview of the pipeline.
Figure
1. Overview of Metabox 2.0 GUI containing three key pipelines and five
modules
Go to module:
The input dataset should be a CSV table containing metadata columns along with feature variables, as illustrated in Table 1. Samples are in rows, and variables (features) are in columns. Experimental information—such as batch, sample type, and injection order—is required for IS- and QC-based normalization.
Table 1. Format of the input dataset. This example table includes experimental information (brown), metadata (blue), and metabolite features.
| Sample | SampleGroup | RunOrder | Batch | SampleType | IS_D3-Alanine | IS_C13-Phenylalanine | L-Alanine | L-Valine | Leucine |
|---|---|---|---|---|---|---|---|---|---|
| Sample11 | DM | 14 | Batch1 | Sample | 3415100 | 2037269 | 3170319 | 3053960 | 918680 |
| Sample14 | DM | 71 | Batch1 | Sample | 6082869 | 4748038 | 5167928 | 4157701 | 1654664 |
| Sample16 | DM | 19 | Batch2 | Sample | 13975619 | 9542307 | 12652652 | 10157293 | 4233637 |
| Sample20 | KF | 23 | Batch2 | Sample | 14006790 | 9625350 | 7125928 | 3672381 | 525327 |
| Sample23 | KF | 27 | Batch2 | Sample | 6160146 | 4026638 | 3203784 | 2576521 | 816807 |
| Sample25 | KF | 29 | Batch2 | Sample | 11925818 | 10293816 | 7917620 | 5083482 | 2436176 |
Tip: The Metabox 2.0 GUI provides example datasets for testing all modules. To download the data, navigate to More > Example Data (see Figure 2).
Figure
2. How to access example datasets
Step: Open the GUI → Click the DATA PROCESSING (1) button → Upload the data → Start the data processing steps
For this tutorial, download the GCGC_DM dataset (GCGC_DM_Samples.csv) from the provided Example Data (Figure 2) and save it to your working directory. This tutorial walks through all steps: missing value imputation, normalization, transformation, and scaling.
Table 2. Summary of normalization, transformation, and scaling in metabolomics.
| Step | Purpose | Correction | Methods |
|---|---|---|---|
| Normalization | Remove technical bias | Injection variation, batch effects | IS-, QC-based, median/TIC |
| Transformation | Improve data distribution | Skewness, heteroscedasticity | Log, square root, cube root |
| Scaling | Balance variable importance | Dominance of high-intensity metabolites | Pareto scaling, autoscaling, range scaling |
Note: In practice, not all steps are required; the choice depends on the data. Users are encouraged to examine their data first.
When uploading the data, specify the required arguments to setup the Metabox object (Figure 3):
Figure
3. Setup Metabox object
Output: See the Summary tab for an overview of the dataset: 75 samples across four groups, 91 metabolites, and 7% missing values.
This step replaces missing values using information from the existing data.
Specify the required arguments to impute missing data (Figure 4):
Figure
4. Missing value imputation
Output: See the Summary tab for an overview of the processed dataset. Several plots are provided to compare the dataset before and after imputation. Try other methods or cutoff values if preferred.
Optional: To export the imputed data, click EXPORT RESULTS
This step corrects technical variation using known reference compounds, quality control (QC) samples, or statistical assumptions about the dataset.
Specify the required arguments to normalize data (Figure 5):
The following arguments are for the ‘serrf’ method:
Figure
5. CCMN normalization
Tip: Try other methods if preferred. In this demo dataset, the QC samples help indicate how much technical variance is handled.
Under Data-driven processing tab (Figure 6), users can perform sample-based normalization, transformation, or scaling.
Sample-based normalization corrects unwanted variation based on statistical assumptions about the dataset. Transformation changes the distribution of the data, while scaling adjusts the relative importance of variables (features) so that metabolites with large values do not dominate the analysis, particularly in multivariate analyses.
For this tutorial, a log2 transformation is performed before proceeding to the next modules.
Figure
6. Data-driven processing
Tip: Users can proceed to STATISTICAL ANALYSIS or BIOMARKER ANALYSIS from this page, or export the processed data for other downstream analyses.
After completing data processing, click STATISTICAL ANALYSIS button to proceed.
Tip1: Users can access this module directly by Click the STATISTICAL ANALYSIS (2) button on the homepage. In this case the step will be: Open the GUI → Click the STATISTICAL ANALYSIS (2) button → Upload the data → Start the data analysis steps
Tip2: Use the tabs to perform univariate analysis, multivariate analysis, correlation analysis, or linear mixed-effects modeling.
Choose Univariate analysis tab, then perform the analysis.
Table 3. Summary of univariate statistical methods in Metabox 2.0.
| Type | Independent | Repeated | Independent | Repeated | Correlation | Linear Modeling |
|---|---|---|---|---|---|---|
| Parametric | t.test | t.test (paired) | ANOVA -> posthoc.test | ANOVA -> pairwise.t.test | Pearson | Linear mixed-effect |
| Non-parametric | wilcox.test/mann-whitney | wilcox.test (paired) | kruskal.test -> dunn.test or Scheirer Ray Hare.test -> wilcox.test (2W-ANOVA) | friedman.test -> dunn.test | Spearman | N/A |
Tip: Metabox automatically determines the test methods based on the provided arguments.
Specify the required arguments for univariate analysis (Figure 7):
Figure
7. Univariate analysis
Output1: Statistical significance plot provides the -log10(adjusted p-value) of each variable. Dashed line represents statistical significance cutoff (adjusted p-value < 0.05). This plot displays only the top 100 variables, sorted by adjusted p-values. Click on a dot to toggle its boxplot.
Output2: See the Output table tab for statistical values. If a post hoc test is performed, a list of significant pairs will be displayed.
Choose Multivariate analysis tab, then perform PLS-DA.
Specify the required arguments for multivariate analysis (Figure 8):
Figure
8. PLS-DA
Output: Several plots are provided, including score plot, loading plot, and VIP plot, along with a table of results.
Step: Open the GUI → Click the BIOMARKER ANALYSIS (3) button → Upload the data → Start the analysis
For this tutorial, download the LC_LN dataset (LC_LN_Samples.csv) from the provided Example Data (Figure 2) and save it to your working directory. This tutorial assumes the data is already processed. The dataset contains 116 samples across two groups and 9 metabolites.
Specify the required arguments for biomarker analysis (Figure 9):
Figure
9. Biomarker analysis
Output: Several plots are provided, including variable importance plot, performance plot, along with a table of results.
Step: Open the GUI → Click the INTEGRATIVE ANALYSIS (4) button → Upload the data → Start the analysis
For this tutorial, download the LC_Fat_Tissue dataset (LC_Fat_Tissue.csv) and the GC_Fat_Tissue dataset (GC_Fat_Tissue.csv) from the provided Example Data (Figure 2), and save them to your working directory. Integrative analysis requires at least two datasets from the same subjects measured on different platforms. Each dataset contains 94 samples, including 24 LC metabolites and 137 GC metabolites.
For each dataset specify the required arguments for uploading the datasets (Figure 10):
Figure 10. Setup data for integrative analysis
Specify the required arguments for integrative analysis (Figure 11):
Figure 11. Integrative analysis
Output: Several plots are provided, including block importance plot, variable importance plot, loading plot, validation plot.
Step: Open the GUI → Click the DATA INTERPRETATION (5) button → Upload the data → Start the analysis
Metabox 2.0 provides data interpretation in a pathway context for genes, proteins, and metabolites, as well as chemical class context for metabolites.
For this tutorial, download the metabolite_list dataset (metabolite_list.csv) from the provided Example Data (Figure 2), and save them to your working directory. It is assumed that users already have a list of metabolites and associated statistical information, as shown in the demo dataset.
For this tutorial, pathway overrepresentation analysis will be performed on a given list of metabolites using the KEGG database as the pathway resource.
Navigate to Overrepresentation tab and specify the required arguments for overrepresentation analysis (Figure 12):
Tip: In case of chemical class overrepresentation analysis, it is performed using the HMDB database as the metabolite resource.
Figure 12. Overrepresentation analysis
Output: The results table contains statistical values for each pathway, along with the list of pathway members.
Pathway enrichment analysis will be performed on a given list of metabolites, incorporating statistical values (e.g., p-, t-, or F-values) and optional directionality (e.g., fold changes), using the KEGG database as the pathway resource.
Navigate to Enrichment tab and specify the required arguments for enrichment analysis (Figure 13):
Figure 13. Enrichment analysis
Output: The results table contains statistical values for each pathway, along with the list of pathway members.