FAQ

1. Introduction

The property theory of traditional Chinese medicine (TCM) is a unique medical theory based on extensive clinical practice for thousands of years, guiding TCM practitioners in selecting appropriate herbs to treat specific diseases. The target organs, flavors, and toxicities of TCM are a high generalization of the drug's characteristics according to the property theory. Despite intensive investigations, the accurate identification of TCM properties still confronts several challenges, which greatly hampers the clinical rational application and novel drug discovery of TCM.

Herein, the TCM Artificial Intelligence-Powered Platform (TCM-AIPP) has been developed leveraging state-of-the-art deep learning technologies. TCM-AIPP contains four predictive tools that not only identify the potential target organs, tastes, toxicities and formulation of TCM, but also provide comprehensive information on TCMs and corresponding compounds, including candidate target profiling and functional enrichment data. Notably, this web server offers flexible and diverse forms of network visualization, for which users can choose to display different relationships among herbs, compounds, targets, target organs, flavors, and toxicities according to their research aims, as well as design and modify the network nodes and edges at will.

Uncovering the properties of TCM is of great importance for both clinical applications and TCM-derived drug R&D. TCM-AIPP may help to facilitate the recognition of the properties of TCMs, explain the underlying mechanisms of TCM against various human diseases, and provide guidance for TCM practitioners.

Figure 1. Overview framework of TCM-AIPP

1.1 Highlights

① TCM-AIPP is the first web server for TCM property research
② TCM-AIPP integrated the specific characteristics of TCMs to enhance its prediction performance
③ TCM-AIPP provides an intuitive interface and customizable network visualizations, facilitating the exploration of TCM-related bioinformation

1.2 Citations

① Zhang Y, Li X, Shi Y, Chen T, Xu Z, Wang P, Yu M, Chen W, Li B, Jing Z, Jiang H, Fu L, Gao W, Jiang Y, Du X, Gong Z, Zhu W, Yang H, Xu H. ETCM v2.0: An update with comprehensive resource and rich annotations for traditional Chinese medicine. Acta Pharm Sin B. 2023 Jun;13(6):2559-2571. doi: 10.1016/j.apsb.2023.03.012
② Xu HY, Zhang YQ, Liu ZM, Chen T, Lv CY, Tang SH, Zhang XB, Zhang W, Li ZY, Zhou RR, Yang HJ, Wang XJ, Huang LQ. ETCM: an encyclopaedia of traditional Chinese medicine. Nucleic Acids Res. 2019 Jan 8;47(D1):D976-D982. doi: 10.1093/nar/gky987
③ Zhang Y, Wang N, Du X, Chen T, Yu Z, Qin Y, Chen W, Yu M, Wang P, Zhang H, Zhou X, Huang L, Xu H. SoFDA: an integrated web platform from syndrome ontology to network-based evaluation of disease-syndrome-formula associations for precision medicine. Sci Bull (Beijing). 2022 Jun 15;67(11):1097-1101. doi: 10.1016/j.scib.2022.03.013
④ Liu Y, Xu J, Yu Z, Chen T, Wang N, Du X, Wang P, Zhou X, Xu H, Zhang Y. Ontology characterization, enrichment analysis, and similarity calculation-based evaluation of disease-syndrome-formula associations by applying SoFDA. Imeta. 2023 Jan 10;2(2):e80. doi: 10.1002/imt2.80

2. Model Info

2.1 Model information and validation

A total of four tools for predicting target organs, flavors, toxicities and formulation of TCMs were constructed and developed in the TCM-AIPP web server based on Random Forest (RF) machine learning, Graph Attention Network (GAT), graph autoencoder (GAE) respectively. These tools contain 20 prediction models, including 10 classification models and 10 regression models. For each endpoint, the dataset is randomly divided into training, validation (VAL) and test sets in the ratio of 8:1:1. The RF model is implemented using the tree ensemble learner and predictor nodes in KNIME, Gini coefficient is used for segmentation criteria, square root function is used for attribute sampling and different sets of attributes are selected for all trees. The GAT and GAE employs an Adam optimizer with hyperparameter tuning via Bayesian optimization. The regression task was assessed using the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE), whereas the classification task was evaluated based on accuracy, the area under the receiver operating characteristic (ROC-AUC) curve, Mathews correlation coefficient (MCC), precision, specificity and sensitivity. To guarantee the reliability and precision of the models, each training process was conducted 10 times, and the model with the best performance on the validation set was selected for deployment to the online platform.

2.2 TCM target organ prediction (HerbAI Meri Navigator)

The TCM target organ prediction model (HerbAI Meri Navigator) was developed based on GAT, which aims to reveal the action tendency of TCM on different organs. The targets of TCM are complex and diverse, and their mechanisms of action are difficult to explain intuitively, so they need to be accurately predicted by systematic network analysis methods. This model can effectively predict the effects of TCM on specific organs by integrating the information of human protein-protein interaction (PPI) network (1) and the effective targets of TCM.

The model was constructed based on a number of reliable data sources. Proteins specifically expressed in each organ were collected from The Human Protein Atlas. These proteins were screened for "Enhanced" and "High" levels of evidence, and occurring only once in all organs. To further substantiate the independence of organ target sets within the PPI network, a network separation analysis was employed to differentiate between organ-specific target sets. In the GAT, organ-specific targets were mapped to the PPI network, and interactions between protein nodes were analyzed through the graph attention mechanism to calculate the specificity scores of different protein nodes for each organ. The model integrates the organ-specific scores of each target. By aggregating these scores across all effective targets of a given TCM, it more accurately predicts the herb's tendency to act on specific organs.

To validate the performance of the model, we collected effective target data with high-quality literature support from the HIT database, which demonstrated optimal prediction quality in several models. A total of 442 TCMs with documented organ effects, as recorded in the Chinese Pharmacopoeia, were analyzed. Their corresponding effective targets were extracted, resulting in a total of 64,795 herb-target associations. During the validation process, if the actual organ of target of a TCM was located in the top three organs predicted by the model (a single TCM is known to target on up to 4 organs), the prediction was considered to be a true-positive result. The model was initially validated for the four organs of the liver, heart, lungs, and kidneys (as documented in the Pharmacopoeia) and achieved a more satisfactory model performance. Subsequently, the validation was extended to encompass modeling of additional organs, including the cerebellum, pancreas, retina, skeletal muscle, and testis. Notably, with the exception of the heart-skeletal muscle, the target sets of these organs showed significant topological separation in the PPI, which further enhanced the predictive performance of the model.

Table 1. Basic information of TCM target organ prediction tool

Organs	Cerebellum, Heart, Kidney, Liver, Lung, Pancreas, Retina, Skeletal muscle, Testis
Algorithmic Model	Graph Attention Network (3)
End Point	Probability score of TCM effective target acting on a certain organ
Descriptors	Human protein interaction network
Standard Dataset	685 targets
Data Sources	(2)

2.3 TCM flavor prediction

The flavor prediction model (HerbAI Flavor Atlas) of TCM-AIPP was constructed using the RF algorithm. This model evaluates the chemical structural similarities between the input compounds and compounds with known flavors obtained from PubChem, VirtualTaste and ChemTastesDB.

Then, the flavors of TCM were predicted by weighted averaging the compounds' flavors, especially the index compounds recorded by the Chinese Pharmacopoeia 2020 and the other compounds were assigned different weights. The predictive performance of this model was evaluated based on 558 TCMs with the flavor records in the Chinese Pharmacopoeia 2020 and the corresponding 19,068 compounds obtained from BATMAN-TCM. If the actual flavors of a certain TCM are included in the top three predicted flavors of the model (it is generally known that a single TCM have up to 3 flavors), the result was considered a true-positive.

Table 2. Basic information of TCM flavor prediction tool

Flavors	Flavors of compounds
Algorithmic Model	Random Forest (4)
End Point	Sour, Bitter, Sweet, Pungent, Salty
Descriptors	Molecular fingerprints
Standard Dataset	1595 compounds
Data Sources	(4-6)

2.4 TCM toxicity prediction (HerbAI ToxWarning)

The toxicity prediction model (HerbAI ToxWarning) of TCM-AIPP was constructed using the GAT algorithm. This model evaluates the chemical structural similarities between the input compounds and compounds with known toxicities obtained from TOXRIC, DIRIL, DrugBank and PubChem. On this basis, TCM-AIPP can predict the potential toxicities [acute and organ toxicities (including cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, and respiratory toxicity)] of the input compounds and the toxic risk of TCMs according to the number of toxic compounds in the chemical profiling. In addition, TCM-AIPP also provides the putative targets of the toxic compounds containing TCMs and their enriched biological functions and pathways. These data provide an important reference for the safety evaluation of TCMs and the investigation of the underlying toxic mechanisms.

2.4.1 Acute toxicity

Two prediction models for acute toxicity were developed for different application scenarios based on rat and mouse using the LD50 data collected from the TOXRIC database and the toxicity classification criteria referring to the Globally Harmonized System of Classification and Labelling of Chemicals. To minimize the risk of false negatives and to reduce overfitting, both the Random Oversampling and Adaptive Synthetic Sampling Algorithm (ADASYN) were used to improve the classification accuracy for a few classes of chemicals. The results of the two sampling algorithms are both available on the platform.

Table 3. Basic information of TCM acute toxicity prediction tool

Prediction Type	Lethal Dose value in mg/kg body weight (Rat and Mouse)
Algorithmic Model	Graph Attention Network (8)
End Point	Toxicity level from GHS (I, II, III, IV, V, VI)
Descriptors	Molecular fingerprints
Standard Dataset	9734 compounds (Rat); 21,831 compounds (Mouse)
Data Sampling	Random Oversampling and ADASYN
Data Sources	(7)

Table 4. Definition of the acute toxicity levels

Acute Toxicity Level	GHS (Oral LD50 mg/kg)
I	≤ 5
II	5 < LD50 ≤ 50
III	50 < LD50 ≤ 300
IV	300 < LD50 ≤ 2000
V	2000 < LD50 ≤ 5000
VI	> 5000

Table 5. Comparison of the two sampling algorithms

Feature	Random Oversampling	ADASYN
Sampling Method	Directly duplicates existing minority class samples	Generates new synthetic samples based on data distribution
Basis for Sampling	Randomly selects and duplicates minority samples	Focuses on generating samples in areas with low minority density
Overfitting Risk	High, as repeated samples can lead to overfitting	Lower, since generated samples are new and introduce diversity
Ability to Handle Imbalance	Increases sample size but doesn't address distribution complexity	Improves classifier learning by focusing on difficult-to-classify regions
Characteristics of Generated Samples	Samples are identical to original ones (simple duplication)	Samples are synthetic, created through interpolation with diversity

2.4.2 Cardiotoxicity

Accumulating studies have reported the cardiotoxicity induced by drugs, which may be usually associated with the inhibition of human ether-à-go-go-related gene (hERG). hERG genes encode proteins that form potassium channels in the membranes of cardiomyocytes, which are essential for normal electrical activity of the heart, especially during the repolarization phase. Many drugs can inadvertently inhibit hERG channels, causing abnormal repolarization of the heart, which lead to arrhythmias and even sudden death. To construct the cardiotoxicity prediction model for TCM-AIPP, compounds with hERG inhibition values were obtained from the ChEMBL database and categorized into four classes of cardiotoxicity based on their IC50 value.

Table 6. Basic information of TCM cardiotoxicity prediction tool

Prediction Type	Compounds induced cardiotoxicity
Algorithmic Model	Graph Attention Network (8)
End Point	Toxicity level by IC50 (I, II, III, IV)
Descriptors	Molecular fingerprints
Standard Dataset	8418 compounds
Data Sampling	ADASYN
Data Sources	(9,10)

Table 7. Definition of the hERG inhibition levels

Levels	hERG inhibition values [IC50 (μM)]
I	< 1
II	1 < IC50 ≤ 10
III	10 < IC50 ≤ 100
IV	> 100

2.4.3 Hepatotoxicity

Table 8. Basic information of TCM hepatotoxicity prediction tool

Prediction Type	Compounds induced hepatotoxicity
Algorithmic Model	Graph Attention Network (8)
End Point	Positive/Negative
Descriptors	Molecular fingerprints
Standard Dataset	2411 compounds
Data Sources	(5,7,11-15)

2.4.4 Nephrotoxicity

Table 9. Basic information of TCM nephrotoxicity prediction tool

Prediction Type	Compounds induced nephrotoxicity
Algorithmic Model	Graph Attention Network (8)
End Point	Positive/Negative
Descriptors	Molecular fingerprints
Standard Dataset	821 compounds
Data Sources	(11,16-19)

2.4.5 Neurotoxicity

Table 10. Basic information of TCM neurotoxicity prediction tool

Prediction Type	Compounds induced neurotoxicity
Algorithmic Model	Graph Attention Network (8)
End Point	Positive/Negative
Descriptors	Molecular fingerprints
Standard Dataset	757 compounds
Data Sources	(11,16,20-22)

2.4.6 Respiratory Toxicity

Table 11. Basic information of TCM respiratory prediction tool

Prediction Type	Compounds induced respiratory toxicity
Algorithmic Model	Graph Attention Network (8)
End Point	Positive/Negative
Descriptors	Molecular fingerprints
Standard Dataset	1760 compounds
Data Sources	(7,23,24)

2.5 TCM formula design (HerbAI Matrix)

The TCM formula design model (HerbAI Matrix) was built upon the GAE model and leverages artificial intelligence to modernize the principles of HerbAI compatibility. The model systematically mines large-scale prescription data to extract herb-pair co-occurrence patterns and integrates multi-dimensional features—such as nature, flavor, meridian tropism, pharmacological action, and chemical composition—into unified digital descriptors.

Through GAE-based node embedding, each herb is projected into a latent feature space where hidden synergistic relationships are quantitatively inferred. These embeddings give rise to an herb-pairing network that captures the structural and therapeutic logic of TCM formulations. By applying the Louvain community detection algorithm, HerbAI Matrix further identifies functionally cohesive herb clusters, thereby uncovering novel, data-driven formula candidates that bridge traditional theory and modern computational discovery.

2.6 Standard Datasets for Model Construction

Table 12. Detailed information of the standard datasets

Dataset	Total (Positive/Negative)	Training set (Positive/Negative)	Validation set (Positive/Negative)	Test set (Positive/Negative)
Cerebellum	685 (40/654)	549 (32/524)	68 (4/65)	68 (4/65)
Heart	685 (20/654)	549 (16/524)	68 (2/65)	68 (2/65)
Kidney	685 (51/654)	549 (41/524)	68 (5/65)	68 (5/65)
Liver	685 (31/654)	549 (25/524)	68 (3/65)	68 (3/65)
Lung	685 (14/654)	549 (10/524)	68 (2/65)	68 (2/65)
Pancreas	685 (24/654)	549 (18/524)	68 (3/65)	68 (3/65)
Retina	685 (37/654)	549 (29/524)	68 (4/65)	68 (4/65)
Skeletal muscle	685 (31/654)	549 (25/524)	68 (3/65)	68 (3/65)
Testis	685 (172/654)	549 (136/524)	68 (18/65)	68 (18/65)
Flavor (Bitter, Pungent, Salty, Sour, Sweet)	1595 (329, 304, 355, 282, 325)	1276 (263, 243, 284, 226, 260)	159 (33, 31, 35, 28, 32)	160 (33, 30, 36, 28, 33)
Acute toxicity (Rat: I, II, III, IV, V, VI)	9734 (236/720/1550/3693/2220/1315)	7787 (183/571/1237/2918/1817/1061)	973 (24/81/154/395/199/120)	974 (29/68/159/380/204/134)
Acute toxicity (Mouse: I, II, III, IV, V, VI)	21831 (140/696/3566/12829/3396/1204)	17464 (114/572/2848/10236/2737/957)	2183 (17/68/352/1297/341/108)	2184 (9/56/366/1296/318/139)
Cardiotoxicity (I, II, III, IV)	8418 (1581/3702/2629/506)	6734 (1265/2961/2103/405)	842 (158/370/263/51)	842 (158/371/263/50)
Hepatotoxicity	2411 (1135/1276)	1928 (908/1020)	241 (113/128)	242 (114/128)
Nephrotoxicity	821 (253/568)	656 (202/454)	82 (25/57)	83 (26/57)
Neurotoxicity	757 (194/563)	605 (155/450)	76 (20/56)	76 (19/57)
Respiratory toxicity	1760 (734/1026)	1408 (587/821)	176 (74/102)	176 (73/103)

2.7 Model Performance Evaluation

Table 13. Predictive performance of TCM target organ prediction model

Organ	Dataset	R²	RMSE	MAE
Cerebellum	Validation set	0.750 ± 0.026	0.090 ± 0.020	0.049 ± 0.002
Cerebellum	Test set	0.705 ± 0.073	0.101 ± 0.002	0.059 ± 0.002
Heart	Validation set	0.782 ± 0.014	0.081 ± 0.003	0.044 ± 0.002
Heart	Test set	0.768 ± 0.016	0.102 ± 0.003	0.050 ± 0.002
Kidney	Validation set	0.841 ± 0.047	0.111 ± 0.005	0.062 ± 0.010
Kidney	Test set	0.871 ± 0.039	0.090 ± 0.017	0.062 ± 0.010
Liver	Validation set	0.943 ± 0.009	0.042 ± 0.003	0.031 ± 0.003
Liver	Test set	0.823 ± 0.046	0.101 ± 0.013	0.054 ± 0.007
Lung	Validation set	0.793 ± 0.020	0.079 ± 0.004	0.048 ± 0.002
Lung	Test set	0.751 ± 0.002	0.062 ± 0.001	0.041 ± 0.002
Pancreas	Validation set	0.836 ± 0.007	0.070 ± 0.002	0.034 ± 0.001
Pancreas	Test set	0.739 ± 0.027	0.108 ± 0.006	0.046 ± 0.001
Retina	Validation set	0.815 ± 0.035	0.133 ± 0.014	0.047 ± 0.001
Retina	Test set	0.870 ± 0.032	0.076 ± 0.009	0.040 ± 0.001
Skeletal muscle	Validation set	0.944 ± 0.003	0.050 ± 0.001	0.030 ± 0.001
Skeletal muscle	Test set	0.718 ± 0.021	0.092 ± 0.004	0.051 ± 0.001
Testis	Validation set	0.877 ± 0.033	0.161 ± 0.024	0.129 ± 0.013
Testis	Test set	0.869 ± 0.031	0.166 ± 0.022	0.136 ± 0.013

Table 14. Predictive performance of TCM target organ prediction model

Dataset	Accuracy	ROC-AUC	MCC	Precision	Specificity	Sensitivity
Validation set	0.835 ± 0.034	0.724 ± 0.040	0.432 ± 0.054	0.833 ± 0.027	0.335 ± 0.049	0.994 ± 0.013
Test set	0.810 ± 0.037	0.605 ± 0.041	0.400 ± 0.057	0.825 ± 0.024	0.325 ± 0.053	0.969 ± 0.020

Table 15. Predictive performance of TCM flavor prediction model

Organ	Dataset	Accuracy	ROC-AUC	MCC	Precision	Specificity	Sensitivity
Flavors of Compounds	Validation set	0.824 ± 0.020	0.890 ± 0.013	0.781 ± 0.025	0.825 ± 0.023	0.956 ± 0.005	0.823 ± 0.022
Flavors of Compounds	Test set	0.839 ± 0.025	0.899 ± 0.015	0.800 ± 0.031	0.831 ± 0.012	0.956 ± 0.004	0.823 ± 0.017
Flavors of TCMs	Validation set	0.921 ± 0.018	0.799 ± 0.061	0.403 ± 0.044	0.940 ± 0.007	0.310 ± 0.028	0.977 ± 0.019
Flavors of TCMs	Test set	0.926 ± 0.011	0.801 ± 0.045	0.434 ± 0.045	0.926 ± 0.011	0.300 ± 0.020	0.980 ± 0.030

Table 16. Predictive performance of various TCM toxicity prediction models

Toxicity Type	Dataset	Accuracy	ROC-AUC	MCC	Precision	Specificity	Sensitivity
Acute toxicity	Mouse (Random) Validation set	0.945 ± 0.003	0.991 ± 0.001	0.935 ± 0.003	0.946 ± 0.003	0.989 ± 0.001	0.945 ± 0.003
	Mouse (Random) Test set	0.947 ± 0.005	0.992 ± 0.001	0.937 ± 0.003	0.949 ± 0.002	0.989 ± 0.001	0.948 ± 0.004
	Mouse (ADASYN) Validation set	0.864 ± 0.003	0.968 ± 0.001	0.837 ± 0.003	0.866 ± 0.003	0.973 ± 0.001	0.865 ± 0.003
	Mouse (ADASYN) Test set	0.706 ± 0.006	0.887 ± 0.003	0.648 ± 0.007	0.700 ± 0.005	0.941 ± 0.001	0.703 ± 0.006
Cardiotoxicity	Validation set	0.822 ± 0.005	0.937 ± 0.004	0.769 ± 0.006	0.822 ± 0.005	0.863 ± 0.002	0.850 ± 0.004
	Test set	0.825 ± 0.005	0.942 ± 0.003	0.769 ± 0.006	0.822 ± 0.005	0.865 ± 0.002	0.852 ± 0.004
Hepatotoxicity	Validation set	0.796 ± 0.007	0.858 ± 0.002	0.590 ± 0.013	0.791 ± 0.008	0.820 ± 0.001	0.768 ± 0.018
	Test set	0.764 ± 0.007	0.850 ± 0.002	0.551 ± 0.022	0.743 ± 0.006	0.759 ± 0.003	0.753 ± 0.023
Nephrotoxicity	Validation set	0.772 ± 0.007	0.776 ± 0.016	0.481 ± 0.034	0.667 ± 0.018	0.833 ± 0.012	0.643 ± 0.039
	Test set	0.834 ± 0.009	0.883 ± 0.009	0.586 ± 0.025	0.798 ± 0.015	0.934 ± 0.007	0.867 ± 0.041
Neurotoxicity	Validation set	0.882 ± 0.005	0.899 ± 0.003	0.730 ± 0.008	0.888 ± 0.003	0.938 ± 0.003	0.867 ± 0.041
	Test set	0.892 ± 0.017	0.924 ± 0.012	0.666 ± 0.040	0.696 ± 0.054	0.926 ± 0.016	0.743 ± 0.035
Respiratory Toxicity	Validation set	0.778 ± 0.010	0.874 ± 0.004	0.551 ± 0.022	0.743 ± 0.006	0.798 ± 0.007	0.753 ± 0.023
	Test set	0.818 ± 0.007	0.899 ± 0.007	0.634 ± 0.015	0.770 ± 0.007	0.812 ± 0.007	0.826 ± 0.015

3. User Guide

3.1 Input information

The TCM-AIPP web server offers an intuitive and interactive interface that enables users to seamlessly perform multiple prediction tasks. Users can input official gene symbols—either individually or in batches—to predict target organs; provide Simplified Molecular-Input Line Entry System (SMILES) strings of compounds to infer their flavors, acute toxicities, and organ-specific toxicities; or upload ancient, proprietary, or clinically empirical TCM formulas along with the attributes of constituent herbs to facilitate rational TCM formula design.

3.2 Output information

Depending on the number of elements entered by the user, the prediction results will be presented in the browser in the form of diversified charts and tables. In the target organ prediction interface, when the user enters a single gene, the system provides information on the related TCMs of the gene, the prediction results of the target organ, and the secondary network interacting with the input gene; if multiple genes are entered in a batch, the results include a summary of the target organ classification of all the genes, the related TCMs, and information on TCMs significantly enriched according to their effective targets. The flavor or toxicity prediction webpage provides the flavor, acute toxicity, organ toxicity prediction results of compounds, related TCMs, candidate targets, and biological function enrichment analysis based on the input SMILES. For batch input, the result will present a categorized summary of all prediction results, information on related TCMs and significantly enriched according to the components contained in the TCMs, candidate targets and corresponding biological function enrichment analysis, and support network visualization to clearly show the relationship between TCMs-compounds-targets. The methodologies employed for target prediction and enrichment analysis have been previously delineated in our research publications (25). Upon upload, the HerbAI Matrix webpage generates a series of quantitative and visual outputs include herb-pair frequency analysis revealing traditional compatibility patterns, GAE performance metrics assessing model reliability, predicted herb pairs with reliability scores and pharmacological relevance adjustable by threshold, and Louvain clustering that identifies candidate formulas, each annotated with cluster-formation probability and an innovation score quantifying deviation from known prescriptions.The new server allows users to download the results in several file formats: .csv, .png, .svg, and .sif (which supports linking to Cytoscape for further customization and analysis).

3.3 Processing times

TCM-AIPP can process up to 3,000 targets, 1,000 SMILES strings, or [?] known formulas (comprising [?] unique herbs) in a single run, enabling the prediction of target organs for specific genes, flavor and toxicity profiles for compounds, and the design of novel TCM formulas. In terms of computational efficiency, when the server queue is free and the resources are sufficient, the prediction output time of a single model is controlled to be less than 1 minute, thereby meeting the requirement of efficient computation.

3.4 Quick start

This section provides a quick start guide for key functionalities. The following subsections (3.4.4 onwards) detail the operation of specific tools. TCM-AIPP supports both Simplified Chinese and English, allowing users to switch languages via the upper-right corner of the interface. Prediction tools for TCM target organs, flavors, toxicities, and formula design can be accessed from the AI Tools section—either through the shortcut on the upper-left side or via the Services page on center of the Home page. The “upper-left side” offers a faster entry point, whereas the “center” page presents a more informative overview.

3.4.1 TCM Target Organ Prediction

Users can input Official Gene Symbol(s) or Entrez Gene ID(s) of one or multiple targets and click Submit to predict target organs of TCMs (click on the example to see the demo). Multiple input supports up to 3000 targets. The latest release of TCM-AIPP supports prediction of nine organs, including cerebellum, heart, kidney, liver, lung, pancreas, retina, skeletal muscle and testis.

For a single input, TCM-AIPP will output basic information of the input target, its related-TCM and potential target organs. Additionally, the system enables the visualization of the network of neighboring targets that interact with the input targets.

For multiple inputs, TCM-AIPP provides the statistic data regarding the potential target organs of TCMs acting on the input targets, accompanied by the detailed information of TCMs and the functions and pathways involved by their effective targets according to the enrichment analysis. All results are available for download in various forms of images and tables.

3.4.2 TCM Flavor Prediction

Users can input SMILES string(s) of one or multiple compounds and click Submit to predict TCM flavors (click on the example to see the demo). Multiple input supports up to 1000 SMILES. We advise users to standardize SMILES via PubChem or RDKit before inputting them. The latest release of TCM-AIPP supports prediction of five flavors, including Bitter, Pungent, Salty, Sour and Sweet.

For a single input, TCM-AIPP generates structural diagrams and potential flavors of the input compound, as well as the related TCM and candidate targets. Additionally, TCM-AIPP also provides functions and pathways enriched by its candidate targets.

For multiple inputs, TCM-AIPP provides statistical data regarding the potential flavors of all compounds in question, accompanied by their candidate targets and the involved functions and pathways. Moreover, TCM-AIPP also lists TCMs containing the input compounds through enrichment analysis based on the SMILES Strings. All results are available for download in various forms of images and tables.

NOTE: The reliable scores of candidate targets provided herein are higher than 0.6

3.4.3 TCM Toxicity Prediction

Users can input SMILES string(s) of one or multiple compounds and click Submit to predict TCM toxicities (click on the example to see the case). Multiple input supports up to 1000 SMILES. We advise users to standardize SMILES via PubChem or RDKit before inputting them. The latest release of TCM-AIPP supports predictions for acute toxicity and common organ toxicities including cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, and respiratory toxicity, of TCMs.

For a single input, TCM-AIPP generates structural diagrams and potential acute toxicity and organ toxicities of the input compound, as well as related TCM and candidate targets. Additionally, TCM-AIPP also provides functions and pathways enriched by its candidate targets.

For multiple inputs, TCM-AIPP provides statistical data regarding the potential toxicities of all compounds in question, accompanied by the candidate targets and the involved functions and pathways. Moreover, TCM-AIPP also lists TCMs containing the input compounds through enrichment analysis based on the SMILES Strings. All results are available for download in various forms of images and tables.

NOTE: Two sampling methods were used to predict acute toxicity (see 2.4.1). For the same compound, we recommend selecting the lower grade as the reference. This approach is more conservative and is designed to avoid false negatives, which are more harmful than false positives. The reliable scores of candidate targets provided herein are higher than 0.6.

3.4.4 TCM formulas design (HerbAI Matrix)

Users can upload existing formulas for specific diseases, including classical prescriptions, proprietary TCM formulas, and clinical empirical formulas. The characteristics of all constituent herbs must be provided in matrix format, with customizable features such as herb properties, efficacies, and pharmacological actions, depending on the disease context. Detailed data format specifications and upload requirements are provided in the right-side panel of the interface.

Upon completion of calculations, HerbAI Matrix provides multi-dimensional visual and quantitative results to facilitate the interpretation of TCM formula patterns and the discovery of novel herb combinations, include:

a) Herb-pair frequency statistics – visualize co-occurrence frequencies from uploaded prescriptions to reveal traditional compatibility patterns.
b) GAE performance evaluation – assess model reliability using accuracy, ROC–AUC, and MCC, displayed through ROC curves and summary tables.
c) Predicted herb pairs – generate potential synergistic pairs with reliability scores and pharmacological relevance, adjustable via user-defined thresholds.
d) Candidate formula clustering – apply the Louvain algorithm to identify novel formula clusters, each annotated with cluster-formation probability and an innovation score quantifying deviation from known prescriptions.

Additionally, the system enables the visualization of the network of herbs-compounds-targets.

NOTE: HerbAI Matrix functions as a task-adaptive framework for TCM formula prediction, rather than a static model with fixed parameters. The model’s performance is inherently determined by the characteristics of the user-provided dataset and the specific objectives of each training task. Upon the input of new formula or compound-feature data, the system automatically reconstructs and retrains a tailored model. Accordingly, each prediction task constitutes an independent modeling process, enabling users to evaluate the validity and robustness of the generated results based on quantitative performance metrics.

3.4.5 TCM property database

Users can directly access the corresponding databases via the “Database” entry located in the upper-left corner of the homepage or through the central panel.

3.4.5.1 Herb database

The database of Chinese medicinal materials provides a comprehensive compendium of information, including the Chinese name, Pinyin, English name, Latin name, family, four qi, five flavors, meridian tropism, toxicity, and actions of TCM. To guarantee the authority and accuracy of the information, this section of the data currently exclusively incorporates information on Chinese medicinal materials contained in the Chinese Pharmacopoeia.

For detailed information, the database provides data on related compounds, corresponding TCM targets, and the biological functions and pathways associated with their effective targets based on enrichment analysis.

3.4.5.2 Compound database

The compounds database is a comprehensive repository of information on Chinese medicinal materials, encompassing molecular weight, SMILES, aliases, and other pertinent data.

For detailed information, the database has predicted the physicochemical and pharmacognostic properties of each constituent, as well as its absorption, metabolism, distribution, and excretion parameters.

Meanwhile, the database provides potential flavor, acute toxicity and organ toxicities of the compound.

3.4.5.3 Target database

The target database principally furnishes target information pertaining to Chinese medicinal materials, comprising gene symbol, alias, full name, gene type, species, related herbs, related compounds and probability of target acting on each organ.

4. API Tutorial

For users with more advanced analytical requirements, TCM-AIPP offers a straightforward POST interface that enables data to be queried through the programming language of your choice. While the site responds quickly, there may be a slight delay in Python scripts due to the queuing of user requests. Please note that a maximum of 100 API queries can be made per source IP per day, with query intervals becoming longer as the number of requesting models increases. To run the script, the system must have Python (version 3.12 or higher) installed and executed from the command line.You can download this script to your local computer and use it, or write your own with the script as a reference: Sample API Script

For TCM Target Organ Prediction

Please enter the Official gene symbol or Entrez ID for the query in order to proceed with this command.

Example:

python tcmaipp_api.py -m m1 -o output.csv Target

For TCM Flavor Prediction

Please enter the SMILES strings for the query in order to proceed with this command.

Example:

python tcmaipp_api.py -m m2 -o output.csv SMILES

For TCM Toxicity Prediction

Please enter the SMILES strings for the query in order to proceed with this command.

Example:

python tcmaipp_api.py -m m3 -o output.csv SMILES

5. Acknowledgement

We would like to extend great appreciation to Beijing Bencaofangyuan Pharmacy Group LTD, Days in the north Chinese medicine yinpian Co.Ltd, Beijing Junda pharmaceutical Co.Ltd and Sichuan Zhongyao Yipian Co.Ltd for offering the photographs of TCMs displayed in this web server.

6. References

Morselli Gysi, D., do Valle, I., Zitnik, M., Ameli, A., Gan, X., Varol, O., Ghiassian, S.D., Patten, J.J., Davey, R.A., Loscalzo, J. et al. (2021) Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proc Natl Acad Sci U S A, 118.
Uhlen, M., Fagerberg, L., Hallstrom, B.M., Lindskog, C., Oksvold, P., Mardinoglu, A., Sivertsson, A., Kampf, C., Sjostedt, E., Asplund, A. et al. (2015) Proteomics. Tissue-based map of the human proteome. Science, 347, 1260419.
Xiang, Z., Gong, W., Li, Z., Yang, X., Wang, J. and Wang, H. (2021) Predicting Protein-Protein Interactions via Gated Graph Attention Signed Network. Biomolecules, 11.
Fritz, F., Preissner, R. and Banerjee, P. (2021) VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res, 49, W679-W684.
Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen, P.A., Yu, B. et al. (2023) PubChem 2023 update. Nucleic Acids Res, 51, D1373-D1380.
Rojas, C., Ballabio, D., Pacheco Sarmiento, K., Pacheco Jaramillo, E., Mendoza, M. and Garcia, F. (2022) ChemTastesDB: A curated database of molecular tastants. Food Chem (Oxf), 4, 100090.
Wu, L., Yan, B., Han, J., Li, R., Xiao, J., He, S. and Bo, X. (2023) TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res, 51, D1432-D1445.
De Carlo, A., Ronchi, D., Piastra, M., Tosca, E.M. and Magni, P. (2024) Predicting ADMET Properties from Molecule SMILE: A Bottom-Up Approach Using Attention-Based Graph Neural Networks. Pharmaceutics, 16.
Zdrazil, B., Felix, E., Hunter, F., Manners, E.J., Blackshaw, J., Corbett, S., de Veij, M., Ioannidis, H., Lopez, D.M., Mosquera, J.F. et al. (2024) The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res, 52, D1180-D1192.
Pang, X., Yan, B., Zhou, J., Cao, X. and Peng, C. (2020) Research Progress of Chinese Materia Medica-induced Cardiotoxicity. Progress in Pharmaceutical Sciences, 44, 730-742.
Yu, X., Xin, E., Yang, W., Huang, J., Guo, X., Lu, Y. and Li, Y. (2022) Research progress on the material basis and mechanism of toxic TCM medicine. Clinical Journal Of Chinese Medicine, 14, 141-145.
Peng, P. and Yuan, W. (2021) Research progress on drug-induced hepatotoxicity of Chinese materia medica. Drug Eval Res, 44, 1783-1792.
Hong, H., Du, W., Zhu, W., Hong, Z. and Ge, W. (2021) Research progress on organ toxicity of traditional Chinese medicine. China Journal of Traditional Chinese Medicine and Pharmacy, 36, 943-946.
Xiong, F., Jiang, F., Xiong, A., Ju, Z., Yang, L. and Wang, Z. (2020) Quantification of hepatotoxie pyrrolizidine alkaloid adonifoline in traditional Chinese medicine preparations containing Senecionis Scandentis Herba. CHINA JOURNAL OF CHINESE MATERIA MEDICA, 45, 92-97.
Zan, K., Jiang, H., Jin, H., Ma, S., Zhao, L. and Sun, Y. (2021) Research progress on quality control of hepatotoxic pyrrolizidine alkaloids in traditional Chinese medicine. Chinese Journal of Pharmaceutical Analysis, 41, 572-578.
Knox, C., Wilson, M., Klinger, C.M., Franklin, M., Oler, E., Wilson, A., Pon, A., Cox, J., Chin, N.E.L., Strawbridge, S.A. et al. (2024) DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Res, 52, D1265-D1275.
Connor, S., Li, T., Qu, Y., Roberts, R.A. and Tong, W. (2024) Generation of a drug-induced renal injury list to facilitate the development of new approach methodologies for nephrotoxicity. Drug Discov Today, 29, 103938.
Yang, L., Wu, R., Sa, L. and Ao, W. (2023) Research progress on nephrotoxic components in traditional Chinese medicine and its toxic mechanisms. Chinese Traditional and Herbal Drugs 54, 7934-7952.
Shang, H., Pang, X., Zhang, Q., Shi, X., Zhang, Y., Han, J. and Zheng, W. (2021) Research Progress of Toxicity Mechanisms in Kidney Injury Associated with Chinese Herbs and Its Compositions. HERALD OF MEDICINE, 40, 1210-1215.
Lian, D., Hou, H., Zhang, G., Li, J., Ye, Z. and Peng, B. (2021) Neurotoxicity and mechanism of octopamine. Chin J Pharmacol Toxicol, 35, 788.
He, H., Zhao, S., Xing, Y., Wang, Z., Du, L. and Shao, J. (2023) Research progress on material basis for neurotoxicity of traditional Chinese medicine and its mechanism. China Pharmacy, 34, 251-256.
Hu, Z., Huang, L., Hou, J. and Wang, X. (2022) Research progress in toxicity of alkaloids in traditional Chinese medicine. Central South Pharmacy, 20, 633-641.
Zong, S., Liu, Y., Sun, T., Zhang, H., Wang, C., Zhi, W. and Li, Y. (2020) Research Progress in the Toxicity and Control Methods of Asari Radix et Rhizoma. Chinese Pharmacist, 23, 942-945.
Wang, Y., Zhang, H., Ma, D., Wu, D., Deng, X., Li, F., Wu, Q. and Guo, S. (2021) Amygdalin Ameliorates Respiratory Failure in Corpulmonale Rats and Regulates EGFR/MAPK Signaling Pathway. ZHEJIANG CHINESE MEDICAL UNIVERSITY XUE BAO, 45, 384-390.
Zhang, Y., Li, X., Shi, Y., Chen, T., Xu, Z., Wang, P., Yu, M., Chen, W., Li, B., Jing, Z. et al. (2023) ETCM v2.0: An update with comprehensive resource and rich annotations for traditional Chinese medicine. Acta Pharm Sin B, 13, 2559-2571.

1. Introduction​

1.1 Highlights​

1.2 Citations​

2. Model Info​

2.1 Model information and validation​

2.2 TCM target organ prediction (HerbAI Meri Navigator)​

2.3 TCM flavor prediction​

2.4 TCM toxicity prediction (HerbAI ToxWarning)​

2.4.1 Acute toxicity​

2.4.2 Cardiotoxicity​

2.4.3 Hepatotoxicity​

2.4.4 Nephrotoxicity​

2.4.5 Neurotoxicity​

2.4.6 Respiratory Toxicity​

2.5 TCM formula design (HerbAI Matrix)​

2.6 Standard Datasets for Model Construction​

2.7 Model Performance Evaluation​

3. User Guide​

3.1 Input information​

3.2 Output information​

3.3 Processing times​

3.4 Quick start​

3.4.1 TCM Target Organ Prediction​

3.4.2 TCM Flavor Prediction​

3.4.3 TCM Toxicity Prediction​

3.4.4 TCM formulas design (HerbAI Matrix)​

3.4.5 TCM property database​

3.4.5.1 Herb database​

3.4.5.2 Compound database​

3.4.5.3 Target database​

4. API Tutorial​

5. Acknowledgement​

6. References​

1. Introduction

1.1 Highlights

1.2 Citations

2. Model Info

2.1 Model information and validation

2.2 TCM target organ prediction (HerbAI Meri Navigator)

2.3 TCM flavor prediction

2.4 TCM toxicity prediction (HerbAI ToxWarning)

2.4.1 Acute toxicity

2.4.2 Cardiotoxicity

2.4.3 Hepatotoxicity

2.4.4 Nephrotoxicity

2.4.5 Neurotoxicity

2.4.6 Respiratory Toxicity

2.5 TCM formula design (HerbAI Matrix)

2.6 Standard Datasets for Model Construction

2.7 Model Performance Evaluation

3. User Guide

3.1 Input information

3.2 Output information

3.3 Processing times

3.4 Quick start

3.4.1 TCM Target Organ Prediction

3.4.2 TCM Flavor Prediction

3.4.3 TCM Toxicity Prediction

3.4.4 TCM formulas design (HerbAI Matrix)

3.4.5 TCM property database

3.4.5.1 Herb database

3.4.5.2 Compound database

3.4.5.3 Target database

4. API Tutorial

5. Acknowledgement

6. References