Skip to main content

FAQ

1. Introduction

The property theory of traditional Chinese medicine (TCM) is a unique medical theory based on extensive clinical practice for thousands of years, guiding TCM practitioners in selecting appropriate herbs to treat specific diseases. The target organs, flavors, and toxicities of TCM are a high generalization of the drug's characteristics according to the property theory. Despite intensive investigations, the accurate identification of TCM properties still confronts several challenges, which greatly hampers the clinical rational application and novel drug discovery of TCM.

Herein, the TCM Artificial Intelligence-Powered Platform (TCM-AIPP) has been developed leveraging state-of-the-art deep learning technologies. TCM-AIPP contains four predictive tools that not only identify the potential target organs, tastes, toxicities and formulation of TCM, but also provide comprehensive information on TCMs and corresponding compounds, including candidate target profiling and functional enrichment data. Notably, this web server offers flexible and diverse forms of network visualization, for which users can choose to display different relationships among herbs, compounds, targets, target organs, flavors, and toxicities according to their research aims, as well as design and modify the network nodes and edges at will.

Uncovering the properties of TCM is of great importance for both clinical applications and TCM-derived drug R&D. TCM-AIPP may help to facilitate the recognition of the properties of TCMs, explain the underlying mechanisms of TCM against various human diseases, and provide guidance for TCM practitioners.

Figure 1. Overview framework of TCM-AIPP

1.1 Highlights

TCM-AIPP is the first web server for TCM property research
TCM-AIPP integrated the specific characteristics of TCMs to enhance its prediction performance
TCM-AIPP provides an intuitive interface and customizable network visualizations, facilitating the exploration of TCM-related bioinformation

1.2 Citations

Zhang Y, Li X, Shi Y, Chen T, Xu Z, Wang P, Yu M, Chen W, Li B, Jing Z, Jiang H, Fu L, Gao W, Jiang Y, Du X, Gong Z, Zhu W, Yang H, Xu H. ETCM v2.0: An update with comprehensive resource and rich annotations for traditional Chinese medicine. Acta Pharm Sin B. 2023 Jun;13(6):2559-2571. doi: 10.1016/j.apsb.2023.03.012
Xu HY, Zhang YQ, Liu ZM, Chen T, Lv CY, Tang SH, Zhang XB, Zhang W, Li ZY, Zhou RR, Yang HJ, Wang XJ, Huang LQ. ETCM: an encyclopaedia of traditional Chinese medicine. Nucleic Acids Res. 2019 Jan 8;47(D1):D976-D982. doi: 10.1093/nar/gky987
Zhang Y, Wang N, Du X, Chen T, Yu Z, Qin Y, Chen W, Yu M, Wang P, Zhang H, Zhou X, Huang L, Xu H. SoFDA: an integrated web platform from syndrome ontology to network-based evaluation of disease-syndrome-formula associations for precision medicine. Sci Bull (Beijing). 2022 Jun 15;67(11):1097-1101. doi: 10.1016/j.scib.2022.03.013
Liu Y, Xu J, Yu Z, Chen T, Wang N, Du X, Wang P, Zhou X, Xu H, Zhang Y. Ontology characterization, enrichment analysis, and similarity calculation-based evaluation of disease-syndrome-formula associations by applying SoFDA. Imeta. 2023 Jan 10;2(2):e80. doi: 10.1002/imt2.80

2. Model Info

2.1 Model information and validation

A total of four tools for predicting target organs, flavors, toxicities and formulation of TCMs were constructed and developed in the TCM-AIPP web server based on Random Forest (RF) machine learning, Graph Attention Network (GAT), graph autoencoder (GAE) respectively. These tools contain 20 prediction models, including 10 classification models and 10 regression models. For each endpoint, the dataset is randomly divided into training, validation (VAL) and test sets in the ratio of 8:1:1. The RF model is implemented using the tree ensemble learner and predictor nodes in KNIME, Gini coefficient is used for segmentation criteria, square root function is used for attribute sampling and different sets of attributes are selected for all trees. The GAT and GAE employs an Adam optimizer with hyperparameter tuning via Bayesian optimization. The regression task was assessed using the coefficient of determination (R2), root mean squared error (RMSE), and mean absolute error (MAE), whereas the classification task was evaluated based on accuracy, the area under the receiver operating characteristic (ROC-AUC) curve, Mathews correlation coefficient (MCC), precision, specificity and sensitivity. To guarantee the reliability and precision of the models, each training process was conducted 10 times, and the model with the best performance on the validation set was selected for deployment to the online platform.

2.2 TCM target organ prediction (HerbAI Meri Navigator)

The TCM target organ prediction model (HerbAI Meri Navigator) was developed based on GAT, which aims to reveal the action tendency of TCM on different organs. The targets of TCM are complex and diverse, and their mechanisms of action are difficult to explain intuitively, so they need to be accurately predicted by systematic network analysis methods. This model can effectively predict the effects of TCM on specific organs by integrating the information of human protein-protein interaction (PPI) network (1) and the effective targets of TCM.

The model was constructed based on a number of reliable data sources. Proteins specifically expressed in each organ were collected from The Human Protein Atlas. These proteins were screened for "Enhanced" and "High" levels of evidence, and occurring only once in all organs. To further substantiate the independence of organ target sets within the PPI network, a network separation analysis was employed to differentiate between organ-specific target sets. In the GAT, organ-specific targets were mapped to the PPI network, and interactions between protein nodes were analyzed through the graph attention mechanism to calculate the specificity scores of different protein nodes for each organ. The model integrates the organ-specific scores of each target. By aggregating these scores across all effective targets of a given TCM, it more accurately predicts the herb's tendency to act on specific organs.

To validate the performance of the model, we collected effective target data with high-quality literature support from the HIT database, which demonstrated optimal prediction quality in several models. A total of 442 TCMs with documented organ effects, as recorded in the Chinese Pharmacopoeia, were analyzed. Their corresponding effective targets were extracted, resulting in a total of 64,795 herb-target associations. During the validation process, if the actual organ of target of a TCM was located in the top three organs predicted by the model (a single TCM is known to target on up to 4 organs), the prediction was considered to be a true-positive result. The model was initially validated for the four organs of the liver, heart, lungs, and kidneys (as documented in the Pharmacopoeia) and achieved a more satisfactory model performance. Subsequently, the validation was extended to encompass modeling of additional organs, including the cerebellum, pancreas, retina, skeletal muscle, and testis. Notably, with the exception of the heart-skeletal muscle, the target sets of these organs showed significant topological separation in the PPI, which further enhanced the predictive performance of the model.

Table 1. Basic information of TCM target organ prediction tool

OrgansCerebellum, Heart, Kidney, Liver, Lung, Pancreas, Retina, Skeletal muscle, Testis
Algorithmic ModelGraph Attention Network (3)
End PointProbability score of TCM effective target acting on a certain organ
DescriptorsHuman protein interaction network
Standard Dataset685 targets
Data Sources(2)

2.3 TCM flavor prediction

The flavor prediction model (HerbAI Flavor Atlas) of TCM-AIPP was constructed using the RF algorithm. This model evaluates the chemical structural similarities between the input compounds and compounds with known flavors obtained from PubChem, VirtualTaste and ChemTastesDB.

Then, the flavors of TCM were predicted by weighted averaging the compounds' flavors, especially the index compounds recorded by the Chinese Pharmacopoeia 2020 and the other compounds were assigned different weights. The predictive performance of this model was evaluated based on 558 TCMs with the flavor records in the Chinese Pharmacopoeia 2020 and the corresponding 19,068 compounds obtained from BATMAN-TCM. If the actual flavors of a certain TCM are included in the top three predicted flavors of the model (it is generally known that a single TCM have up to 3 flavors), the result was considered a true-positive.

Table 2. Basic information of TCM flavor prediction tool

FlavorsFlavors of compounds
Algorithmic ModelRandom Forest (4)
End PointSour, Bitter, Sweet, Pungent, Salty
DescriptorsMolecular fingerprints
Standard Dataset1595 compounds
Data Sources(4-6)

2.4 TCM toxicity prediction (HerbAI ToxWarning)

The toxicity prediction model (HerbAI ToxWarning) of TCM-AIPP was constructed using the GAT algorithm. This model evaluates the chemical structural similarities between the input compounds and compounds with known toxicities obtained from TOXRIC, DIRIL, DrugBank and PubChem. On this basis, TCM-AIPP can predict the potential toxicities [acute and organ toxicities (including cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, and respiratory toxicity)] of the input compounds and the toxic risk of TCMs according to the number of toxic compounds in the chemical profiling. In addition, TCM-AIPP also provides the putative targets of the toxic compounds containing TCMs and their enriched biological functions and pathways. These data provide an important reference for the safety evaluation of TCMs and the investigation of the underlying toxic mechanisms.

2.4.1 Acute toxicity

Two prediction models for acute toxicity were developed for different application scenarios based on rat and mouse using the LD50 data collected from the TOXRIC database and the toxicity classification criteria referring to the Globally Harmonized System of Classification and Labelling of Chemicals. To minimize the risk of false negatives and to reduce overfitting, both the Random Oversampling and Adaptive Synthetic Sampling Algorithm (ADASYN) were used to improve the classification accuracy for a few classes of chemicals. The results of the two sampling algorithms are both available on the platform.

Table 3. Basic information of TCM acute toxicity prediction tool

Prediction TypeLethal Dose value in mg/kg body weight (Rat and Mouse)
Algorithmic ModelGraph Attention Network (8)
End PointToxicity level from GHS (I, II, III, IV, V, VI)
DescriptorsMolecular fingerprints
Standard Dataset9734 compounds (Rat); 21,831 compounds (Mouse)
Data SamplingRandom Oversampling and ADASYN
Data Sources(7)

Table 4. Definition of the acute toxicity levels

Acute Toxicity LevelGHS (Oral LD50 mg/kg)
I≤ 5
II5 < LD50 ≤ 50
III50 < LD50 ≤ 300
IV300 < LD50 ≤ 2000
V2000 < LD50 ≤ 5000
VI> 5000

Table 5. Comparison of the two sampling algorithms

FeatureRandom OversamplingADASYN
Sampling MethodDirectly duplicates existing minority class samplesGenerates new synthetic samples based on data distribution
Basis for SamplingRandomly selects and duplicates minority samplesFocuses on generating samples in areas with low minority density
Overfitting RiskHigh, as repeated samples can lead to overfittingLower, since generated samples are new and introduce diversity
Ability to Handle ImbalanceIncreases sample size but doesn't address distribution complexityImproves classifier learning by focusing on difficult-to-classify regions
Characteristics of Generated SamplesSamples are identical to original ones (simple duplication)Samples are synthetic, created through interpolation with diversity

2.4.2 Cardiotoxicity

Accumulating studies have reported the cardiotoxicity induced by drugs, which may be usually associated with the inhibition of human ether-à-go-go-related gene (hERG). hERG genes encode proteins that form potassium channels in the membranes of cardiomyocytes, which are essential for normal electrical activity of the heart, especially during the repolarization phase. Many drugs can inadvertently inhibit hERG channels, causing abnormal repolarization of the heart, which lead to arrhythmias and even sudden death. To construct the cardiotoxicity prediction model for TCM-AIPP, compounds with hERG inhibition values were obtained from the ChEMBL database and categorized into four classes of cardiotoxicity based on their IC50 value.

Table 6. Basic information of TCM cardiotoxicity prediction tool

Prediction TypeCompounds induced cardiotoxicity
Algorithmic ModelGraph Attention Network (8)
End PointToxicity level by IC50 (I, II, III, IV)
DescriptorsMolecular fingerprints
Standard Dataset8418 compounds
Data SamplingADASYN
Data Sources(9,10)

Table 7. Definition of the hERG inhibition levels

LevelshERG inhibition values [IC50 (μM)]
I< 1
II1 < IC50 ≤ 10
III10 < IC50 ≤ 100
IV> 100

2.4.3 Hepatotoxicity

Table 8. Basic information of TCM hepatotoxicity prediction tool

Prediction TypeCompounds induced hepatotoxicity
Algorithmic ModelGraph Attention Network (8)
End PointPositive/Negative
DescriptorsMolecular fingerprints
Standard Dataset2411 compounds
Data Sources(5,7,11-15)

2.4.4 Nephrotoxicity

Table 9. Basic information of TCM nephrotoxicity prediction tool

Prediction TypeCompounds induced nephrotoxicity
Algorithmic ModelGraph Attention Network (8)
End PointPositive/Negative
DescriptorsMolecular fingerprints
Standard Dataset821 compounds
Data Sources(11,16-19)

2.4.5 Neurotoxicity

Table 10. Basic information of TCM neurotoxicity prediction tool

Prediction TypeCompounds induced neurotoxicity
Algorithmic ModelGraph Attention Network (8)
End PointPositive/Negative
DescriptorsMolecular fingerprints
Standard Dataset757 compounds
Data Sources(11,16,20-22)

2.4.6 Respiratory Toxicity

Table 11. Basic information of TCM respiratory prediction tool

Prediction TypeCompounds induced respiratory toxicity
Algorithmic ModelGraph Attention Network (8)
End PointPositive/Negative
DescriptorsMolecular fingerprints
Standard Dataset1760 compounds
Data Sources(7,23,24)

2.5 TCM formula design (HerbAI Matrix)

The TCM formula design model (HerbAI Matrix) was built upon the GAE model and leverages artificial intelligence to modernize the principles of HerbAI compatibility. The model systematically mines large-scale prescription data to extract herb-pair co-occurrence patterns and integrates multi-dimensional features—such as nature, flavor, meridian tropism, pharmacological action, and chemical composition—into unified digital descriptors.

Through GAE-based node embedding, each herb is projected into a latent feature space where hidden synergistic relationships are quantitatively inferred. These embeddings give rise to an herb-pairing network that captures the structural and therapeutic logic of TCM formulations. By applying the Louvain community detection algorithm, HerbAI Matrix further identifies functionally cohesive herb clusters, thereby uncovering novel, data-driven formula candidates that bridge traditional theory and modern computational discovery.

2.6 Standard Datasets for Model Construction

Table 12. Detailed information of the standard datasets

DatasetTotal (Positive/Negative)Training set (Positive/Negative)Validation set (Positive/Negative)Test set (Positive/Negative)
Cerebellum685 (40/654)549 (32/524)68 (4/65)68 (4/65)
Heart685 (20/654)549 (16/524)68 (2/65)68 (2/65)
Kidney685 (51/654)549 (41/524)68 (5/65)68 (5/65)
Liver685 (31/654)549 (25/524)68 (3/65)68 (3/65)
Lung685 (14/654)549 (10/524)68 (2/65)68 (2/65)
Pancreas685 (24/654)549 (18/524)68 (3/65)68 (3/65)
Retina685 (37/654)549 (29/524)68 (4/65)68 (4/65)
Skeletal muscle685 (31/654)549 (25/524)68 (3/65)68 (3/65)
Testis685 (172/654)549 (136/524)68 (18/65)68 (18/65)
Flavor (Bitter, Pungent, Salty, Sour, Sweet)1595 (329, 304, 355, 282, 325)1276 (263, 243, 284, 226, 260)159 (33, 31, 35, 28, 32)160 (33, 30, 36, 28, 33)
Acute toxicity (Rat: I, II, III, IV, V, VI)9734 (236/720/1550/3693/2220/1315)7787 (183/571/1237/2918/1817/1061)973 (24/81/154/395/199/120)974 (29/68/159/380/204/134)
Acute toxicity (Mouse: I, II, III, IV, V, VI)21831 (140/696/3566/12829/3396/1204)17464 (114/572/2848/10236/2737/957)2183 (17/68/352/1297/341/108)2184 (9/56/366/1296/318/139)
Cardiotoxicity (I, II, III, IV)8418 (1581/3702/2629/506)6734 (1265/2961/2103/405)842 (158/370/263/51)842 (158/371/263/50)
Hepatotoxicity2411 (1135/1276)1928 (908/1020)241 (113/128)242 (114/128)
Nephrotoxicity821 (253/568)656 (202/454)82 (25/57)83 (26/57)
Neurotoxicity757 (194/563)605 (155/450)76 (20/56)76 (19/57)
Respiratory toxicity1760 (734/1026)1408 (587/821)176 (74/102)176 (73/103)

2.7 Model Performance Evaluation

Table 13. Predictive performance of TCM target organ prediction model

OrganDatasetRMSEMAE
CerebellumValidation set0.750 ± 0.0260.090 ± 0.0200.049 ± 0.002
Test set0.705 ± 0.0730.101 ± 0.0020.059 ± 0.002
HeartValidation set0.782 ± 0.0140.081 ± 0.0030.044 ± 0.002
Test set0.768 ± 0.0160.102 ± 0.0030.050 ± 0.002
KidneyValidation set0.841 ± 0.0470.111 ± 0.0050.062 ± 0.010
Test set0.871 ± 0.0390.090 ± 0.0170.062 ± 0.010
LiverValidation set0.943 ± 0.0090.042 ± 0.0030.031 ± 0.003
Test set0.823 ± 0.0460.101 ± 0.0130.054 ± 0.007
LungValidation set0.793 ± 0.0200.079 ± 0.0040.048 ± 0.002
Test set0.751 ± 0.0020.062 ± 0.0010.041 ± 0.002
PancreasValidation set0.836 ± 0.0070.070 ± 0.0020.034 ± 0.001
Test set0.739 ± 0.0270.108 ± 0.0060.046 ± 0.001
RetinaValidation set0.815 ± 0.0350.133 ± 0.0140.047 ± 0.001
Test set0.870 ± 0.0320.076 ± 0.0090.040 ± 0.001
Skeletal muscleValidation set0.944 ± 0.0030.050 ± 0.0010.030 ± 0.001
Test set0.718 ± 0.0210.092 ± 0.0040.051 ± 0.001
TestisValidation set0.877 ± 0.0330.161 ± 0.0240.129 ± 0.013
Test set0.869 ± 0.0310.166 ± 0.0220.136 ± 0.013

Table 14. Predictive performance of TCM target organ prediction model

DatasetAccuracyROC-AUCMCCPrecisionSpecificitySensitivity
Validation set0.835 ± 0.0340.724 ± 0.0400.432 ± 0.0540.833 ± 0.0270.335 ± 0.0490.994 ± 0.013
Test set0.810 ± 0.0370.605 ± 0.0410.400 ± 0.0570.825 ± 0.0240.325 ± 0.0530.969 ± 0.020

Table 15. Predictive performance of TCM flavor prediction model

OrganDatasetAccuracyROC-AUCMCCPrecisionSpecificitySensitivity
Flavors of CompoundsValidation set0.824 ± 0.0200.890 ± 0.0130.781 ± 0.0250.825 ± 0.0230.956 ± 0.0050.823 ± 0.022
Test set0.839 ± 0.0250.899 ± 0.0150.800 ± 0.0310.831 ± 0.0120.956 ± 0.0040.823 ± 0.017
Flavors of TCMsValidation set0.921 ± 0.0180.799 ± 0.0610.403 ± 0.0440.940 ± 0.0070.310 ± 0.0280.977 ± 0.019
Test set0.926 ± 0.0110.801 ± 0.0450.434 ± 0.0450.926 ± 0.0110.300 ± 0.0200.980 ± 0.030

Table 16. Predictive performance of various TCM toxicity prediction models

Toxicity TypeDatasetAccuracyROC-AUCMCCPrecisionSpecificitySensitivity
Acute toxicityMouse (Random) Validation set0.945 ± 0.0030.991 ± 0.0010.935 ± 0.0030.946 ± 0.0030.989 ± 0.0010.945 ± 0.003
Mouse (Random) Test set0.947 ± 0.0050.992 ± 0.0010.937 ± 0.0030.949 ± 0.0020.989 ± 0.0010.948 ± 0.004
Mouse (ADASYN) Validation set0.864 ± 0.0030.968 ± 0.0010.837 ± 0.0030.866 ± 0.0030.973 ± 0.0010.865 ± 0.003
Mouse (ADASYN) Test set0.706 ± 0.0060.887 ± 0.0030.648 ± 0.0070.700 ± 0.0050.941 ± 0.0010.703 ± 0.006
CardiotoxicityValidation set0.822 ± 0.0050.937 ± 0.0040.769 ± 0.0060.822 ± 0.0050.863 ± 0.0020.850 ± 0.004
Test set0.825 ± 0.0050.942 ± 0.0030.769 ± 0.0060.822 ± 0.0050.865 ± 0.0020.852 ± 0.004
HepatotoxicityValidation set0.796 ± 0.0070.858 ± 0.0020.590 ± 0.0130.791 ± 0.0080.820 ± 0.0010.768 ± 0.018
Test set0.764 ± 0.0070.850 ± 0.0020.551 ± 0.0220.743 ± 0.0060.759 ± 0.0030.753 ± 0.023
NephrotoxicityValidation set0.772 ± 0.0070.776 ± 0.0160.481 ± 0.0340.667 ± 0.0180.833 ± 0.0120.643 ± 0.039
Test set0.834 ± 0.0090.883 ± 0.0090.586 ± 0.0250.798 ± 0.0150.934 ± 0.0070.867 ± 0.041
NeurotoxicityValidation set0.882 ± 0.0050.899 ± 0.0030.730 ± 0.0080.888 ± 0.0030.938 ± 0.0030.867 ± 0.041
Test set0.892 ± 0.0170.924 ± 0.0120.666 ± 0.0400.696 ± 0.0540.926 ± 0.0160.743 ± 0.035
Respiratory ToxicityValidation set0.778 ± 0.0100.874 ± 0.0040.551 ± 0.0220.743 ± 0.0060.798 ± 0.0070.753 ± 0.023
Test set0.818 ± 0.0070.899 ± 0.0070.634 ± 0.0150.770 ± 0.0070.812 ± 0.0070.826 ± 0.015

3. User Guide

3.1 Input information

The TCM-AIPP web server offers an intuitive and interactive interface that enables users to seamlessly perform multiple prediction tasks. Users can input official gene symbols—either individually or in batches—to predict target organs; provide Simplified Molecular-Input Line Entry System (SMILES) strings of compounds to infer their flavors, acute toxicities, and organ-specific toxicities; or upload ancient, proprietary, or clinically empirical TCM formulas along with the attributes of constituent herbs to facilitate rational TCM formula design.

3.2 Output information

Depending on the number of elements entered by the user, the prediction results will be presented in the browser in the form of diversified charts and tables. In the target organ prediction interface, when the user enters a single gene, the system provides information on the related TCMs of the gene, the prediction results of the target organ, and the secondary network interacting with the input gene; if multiple genes are entered in a batch, the results include a summary of the target organ classification of all the genes, the related TCMs, and information on TCMs significantly enriched according to their effective targets. The flavor or toxicity prediction webpage provides the flavor, acute toxicity, organ toxicity prediction results of compounds, related TCMs, candidate targets, and biological function enrichment analysis based on the input SMILES. For batch input, the result will present a categorized summary of all prediction results, information on related TCMs and significantly enriched according to the components contained in the TCMs, candidate targets and corresponding biological function enrichment analysis, and support network visualization to clearly show the relationship between TCMs-compounds-targets. The methodologies employed for target prediction and enrichment analysis have been previously delineated in our research publications (25). Upon upload, the HerbAI Matrix webpage generates a series of quantitative and visual outputs include herb-pair frequency analysis revealing traditional compatibility patterns, GAE performance metrics assessing model reliability, predicted herb pairs with reliability scores and pharmacological relevance adjustable by threshold, and Louvain clustering that identifies candidate formulas, each annotated with cluster-formation probability and an innovation score quantifying deviation from known prescriptions.The new server allows users to download the results in several file formats: .csv, .png, .svg, and .sif (which supports linking to Cytoscape for further customization and analysis).

3.3 Processing times

TCM-AIPP can process up to 3,000 targets, 1,000 SMILES strings, or [?] known formulas (comprising [?] unique herbs) in a single run, enabling the prediction of target organs for specific genes, flavor and toxicity profiles for compounds, and the design of novel TCM formulas. In terms of computational efficiency, when the server queue is free and the resources are sufficient, the prediction output time of a single model is controlled to be less than 1 minute, thereby meeting the requirement of efficient computation.

3.4 Quick start

This section provides a quick start guide for key functionalities. The following subsections (3.4.4 onwards) detail the operation of specific tools. TCM-AIPP supports both Simplified Chinese and English, allowing users to switch languages via the upper-right corner of the interface. Prediction tools for TCM target organs, flavors, toxicities, and formula design can be accessed from the AI Tools section—either through the shortcut on the upper-left side or via the Services page on center of the Home page. The “upper-left side” offers a faster entry point, whereas the “center” page presents a more informative overview.

3.4.1 TCM Target Organ Prediction

Users can input Official Gene Symbol(s) or Entrez Gene ID(s) of one or multiple targets and click Submit to predict target organs of TCMs (click on the example to see the demo). Multiple input supports up to 3000 targets. The latest release of TCM-AIPP supports prediction of nine organs, including cerebellum, heart, kidney, liver, lung, pancreas, retina, skeletal muscle and testis.

For a single input, TCM-AIPP will output basic information of the input target, its related-TCM and potential target organs. Additionally, the system enables the visualization of the network of neighboring targets that interact with the input targets.

For multiple inputs, TCM-AIPP provides the statistic data regarding the potential target organs of TCMs acting on the input targets, accompanied by the detailed information of TCMs and the functions and pathways involved by their effective targets according to the enrichment analysis. All results are available for download in various forms of images and tables.

3.4.2 TCM Flavor Prediction

Users can input SMILES string(s) of one or multiple compounds and click Submit to predict TCM flavors (click on the example to see the demo). Multiple input supports up to 1000 SMILES. We advise users to standardize SMILES via PubChem or RDKit before inputting them. The latest release of TCM-AIPP supports prediction of five flavors, including Bitter, Pungent, Salty, Sour and Sweet.

For a single input, TCM-AIPP generates structural diagrams and potential flavors of the input compound, as well as the related TCM and candidate targets. Additionally, TCM-AIPP also provides functions and pathways enriched by its candidate targets.

For multiple inputs, TCM-AIPP provides statistical data regarding the potential flavors of all compounds in question, accompanied by their candidate targets and the involved functions and pathways. Moreover, TCM-AIPP also lists TCMs containing the input compounds through enrichment analysis based on the SMILES Strings. All results are available for download in various forms of images and tables.

NOTE: The reliable scores of candidate targets provided herein are higher than 0.6

3.4.3 TCM Toxicity Prediction

Users can input SMILES string(s) of one or multiple compounds and click Submit to predict TCM toxicities (click on the example to see the case). Multiple input supports up to 1000 SMILES. We advise users to standardize SMILES via PubChem or RDKit before inputting them. The latest release of TCM-AIPP supports predictions for acute toxicity and common organ toxicities including cardiotoxicity, hepatotoxicity, nephrotoxicity, neurotoxicity, and respiratory toxicity, of TCMs.

For a single input, TCM-AIPP generates structural diagrams and potential acute toxicity and organ toxicities of the input compound, as well as related TCM and candidate targets. Additionally, TCM-AIPP also provides functions and pathways enriched by its candidate targets.

For multiple inputs, TCM-AIPP provides statistical data regarding the potential toxicities of all compounds in question, accompanied by the candidate targets and the involved functions and pathways. Moreover, TCM-AIPP also lists TCMs containing the input compounds through enrichment analysis based on the SMILES Strings. All results are available for download in various forms of images and tables.

NOTE: Two sampling methods were used to predict acute toxicity (see 2.4.1). For the same compound, we recommend selecting the lower grade as the reference. This approach is more conservative and is designed to avoid false negatives, which are more harmful than false positives. The reliable scores of candidate targets provided herein are higher than 0.6.

3.4.4 TCM formulas design (HerbAI Matrix)

Users can upload existing formulas for specific diseases, including classical prescriptions, proprietary TCM formulas, and clinical empirical formulas. The characteristics of all constituent herbs must be provided in matrix format, with customizable features such as herb properties, efficacies, and pharmacological actions, depending on the disease context. Detailed data format specifications and upload requirements are provided in the right-side panel of the interface.

Upon completion of calculations, HerbAI Matrix provides multi-dimensional visual and quantitative results to facilitate the interpretation of TCM formula patterns and the discovery of novel herb combinations, include:

  • a) Herb-pair frequency statistics – visualize co-occurrence frequencies from uploaded prescriptions to reveal traditional compatibility patterns.

  • b) GAE performance evaluation – assess model reliability using accuracy, ROC–AUC, and MCC, displayed through ROC curves and summary tables.

  • c) Predicted herb pairs – generate potential synergistic pairs with reliability scores and pharmacological relevance, adjustable via user-defined thresholds.

  • d) Candidate formula clustering – apply the Louvain algorithm to identify novel formula clusters, each annotated with cluster-formation probability and an innovation score quantifying deviation from known prescriptions.

Additionally, the system enables the visualization of the network of herbs-compounds-targets.

NOTE: HerbAI Matrix functions as a task-adaptive framework for TCM formula prediction, rather than a static model with fixed parameters. The model’s performance is inherently determined by the characteristics of the user-provided dataset and the specific objectives of each training task. Upon the input of new formula or compound-feature data, the system automatically reconstructs and retrains a tailored model. Accordingly, each prediction task constitutes an independent modeling process, enabling users to evaluate the validity and robustness of the generated results based on quantitative performance metrics.

3.4.5 TCM property database

Users can directly access the corresponding databases via the “Database” entry located in the upper-left corner of the homepage or through the central panel.

3.4.5.1 Herb database

The database of Chinese medicinal materials provides a comprehensive compendium of information, including the Chinese name, Pinyin, English name, Latin name, family, four qi, five flavors, meridian tropism, toxicity, and actions of TCM. To guarantee the authority and accuracy of the information, this section of the data currently exclusively incorporates information on Chinese medicinal materials contained in the Chinese Pharmacopoeia.

For detailed information, the database provides data on related compounds, corresponding TCM targets, and the biological functions and pathways associated with their effective targets based on enrichment analysis.

3.4.5.2 Compound database

The compounds database is a comprehensive repository of information on Chinese medicinal materials, encompassing molecular weight, SMILES, aliases, and other pertinent data.

For detailed information, the database has predicted the physicochemical and pharmacognostic properties of each constituent, as well as its absorption, metabolism, distribution, and excretion parameters.

Meanwhile, the database provides potential flavor, acute toxicity and organ toxicities of the compound.

3.4.5.3 Target database

The target database principally furnishes target information pertaining to Chinese medicinal materials, comprising gene symbol, alias, full name, gene type, species, related herbs, related compounds and probability of target acting on each organ.

4. API Tutorial

For users with more advanced analytical requirements, TCM-AIPP offers a straightforward POST interface that enables data to be queried through the programming language of your choice. While the site responds quickly, there may be a slight delay in Python scripts due to the queuing of user requests. Please note that a maximum of 100 API queries can be made per source IP per day, with query intervals becoming longer as the number of requesting models increases. To run the script, the system must have Python (version 3.12 or higher) installed and executed from the command line.You can download this script to your local computer and use it, or write your own with the script as a reference: Sample API Script

For TCM Target Organ Prediction

Please enter the Official gene symbol or Entrez ID for the query in order to proceed with this command.

Example:

python tcmaipp_api.py -m m1 -o output.csv Target

For TCM Flavor Prediction

Please enter the SMILES strings for the query in order to proceed with this command.

Example:

python tcmaipp_api.py -m m2 -o output.csv SMILES

For TCM Toxicity Prediction

Please enter the SMILES strings for the query in order to proceed with this command.

Example:

python tcmaipp_api.py -m m3 -o output.csv SMILES

5. Acknowledgement

We would like to extend great appreciation to Beijing Bencaofangyuan Pharmacy Group LTD, Days in the north Chinese medicine yinpian Co.Ltd, Beijing Junda pharmaceutical Co.Ltd and Sichuan Zhongyao Yipian Co.Ltd for offering the photographs of TCMs displayed in this web server.

6. References

  1. Morselli Gysi, D., do Valle, I., Zitnik, M., Ameli, A., Gan, X., Varol, O., Ghiassian, S.D., Patten, J.J., Davey, R.A., Loscalzo, J. et al. (2021) Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proc Natl Acad Sci U S A, 118.
  2. Uhlen, M., Fagerberg, L., Hallstrom, B.M., Lindskog, C., Oksvold, P., Mardinoglu, A., Sivertsson, A., Kampf, C., Sjostedt, E., Asplund, A. et al. (2015) Proteomics. Tissue-based map of the human proteome. Science, 347, 1260419.
  3. Xiang, Z., Gong, W., Li, Z., Yang, X., Wang, J. and Wang, H. (2021) Predicting Protein-Protein Interactions via Gated Graph Attention Signed Network. Biomolecules, 11.
  4. Fritz, F., Preissner, R. and Banerjee, P. (2021) VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res, 49, W679-W684.
  5. Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B.A., Thiessen, P.A., Yu, B. et al. (2023) PubChem 2023 update. Nucleic Acids Res, 51, D1373-D1380.
  6. Rojas, C., Ballabio, D., Pacheco Sarmiento, K., Pacheco Jaramillo, E., Mendoza, M. and Garcia, F. (2022) ChemTastesDB: A curated database of molecular tastants. Food Chem (Oxf), 4, 100090.
  7. Wu, L., Yan, B., Han, J., Li, R., Xiao, J., He, S. and Bo, X. (2023) TOXRIC: a comprehensive database of toxicological data and benchmarks. Nucleic Acids Res, 51, D1432-D1445.
  8. De Carlo, A., Ronchi, D., Piastra, M., Tosca, E.M. and Magni, P. (2024) Predicting ADMET Properties from Molecule SMILE: A Bottom-Up Approach Using Attention-Based Graph Neural Networks. Pharmaceutics, 16.
  9. Zdrazil, B., Felix, E., Hunter, F., Manners, E.J., Blackshaw, J., Corbett, S., de Veij, M., Ioannidis, H., Lopez, D.M., Mosquera, J.F. et al. (2024) The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res, 52, D1180-D1192.
  10. Pang, X., Yan, B., Zhou, J., Cao, X. and Peng, C. (2020) Research Progress of Chinese Materia Medica-induced Cardiotoxicity. Progress in Pharmaceutical Sciences, 44, 730-742.
  11. Yu, X., Xin, E., Yang, W., Huang, J., Guo, X., Lu, Y. and Li, Y. (2022) Research progress on the material basis and mechanism of toxic TCM medicine. Clinical Journal Of Chinese Medicine, 14, 141-145.
  12. Peng, P. and Yuan, W. (2021) Research progress on drug-induced hepatotoxicity of Chinese materia medica. Drug Eval Res, 44, 1783-1792.
  13. Hong, H., Du, W., Zhu, W., Hong, Z. and Ge, W. (2021) Research progress on organ toxicity of traditional Chinese medicine. China Journal of Traditional Chinese Medicine and Pharmacy, 36, 943-946.
  14. Xiong, F., Jiang, F., Xiong, A., Ju, Z., Yang, L. and Wang, Z. (2020) Quantification of hepatotoxie pyrrolizidine alkaloid adonifoline in traditional Chinese medicine preparations containing Senecionis Scandentis Herba. CHINA JOURNAL OF CHINESE MATERIA MEDICA, 45, 92-97.
  15. Zan, K., Jiang, H., Jin, H., Ma, S., Zhao, L. and Sun, Y. (2021) Research progress on quality control of hepatotoxic pyrrolizidine alkaloids in traditional Chinese medicine. Chinese Journal of Pharmaceutical Analysis, 41, 572-578.
  16. Knox, C., Wilson, M., Klinger, C.M., Franklin, M., Oler, E., Wilson, A., Pon, A., Cox, J., Chin, N.E.L., Strawbridge, S.A. et al. (2024) DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Res, 52, D1265-D1275.
  17. Connor, S., Li, T., Qu, Y., Roberts, R.A. and Tong, W. (2024) Generation of a drug-induced renal injury list to facilitate the development of new approach methodologies for nephrotoxicity. Drug Discov Today, 29, 103938.
  18. Yang, L., Wu, R., Sa, L. and Ao, W. (2023) Research progress on nephrotoxic components in traditional Chinese medicine and its toxic mechanisms. Chinese Traditional and Herbal Drugs 54, 7934-7952.
  19. Shang, H., Pang, X., Zhang, Q., Shi, X., Zhang, Y., Han, J. and Zheng, W. (2021) Research Progress of Toxicity Mechanisms in Kidney Injury Associated with Chinese Herbs and Its Compositions. HERALD OF MEDICINE, 40, 1210-1215.
  20. Lian, D., Hou, H., Zhang, G., Li, J., Ye, Z. and Peng, B. (2021) Neurotoxicity and mechanism of octopamine. Chin J Pharmacol Toxicol, 35, 788.
  21. He, H., Zhao, S., Xing, Y., Wang, Z., Du, L. and Shao, J. (2023) Research progress on material basis for neurotoxicity of traditional Chinese medicine and its mechanism. China Pharmacy, 34, 251-256.
  22. Hu, Z., Huang, L., Hou, J. and Wang, X. (2022) Research progress in toxicity of alkaloids in traditional Chinese medicine. Central South Pharmacy, 20, 633-641.
  23. Zong, S., Liu, Y., Sun, T., Zhang, H., Wang, C., Zhi, W. and Li, Y. (2020) Research Progress in the Toxicity and Control Methods of Asari Radix et Rhizoma. Chinese Pharmacist, 23, 942-945.
  24. Wang, Y., Zhang, H., Ma, D., Wu, D., Deng, X., Li, F., Wu, Q. and Guo, S. (2021) Amygdalin Ameliorates Respiratory Failure in Corpulmonale Rats and Regulates EGFR/MAPK Signaling Pathway. ZHEJIANG CHINESE MEDICAL UNIVERSITY XUE BAO, 45, 384-390.
  25. Zhang, Y., Li, X., Shi, Y., Chen, T., Xu, Z., Wang, P., Yu, M., Chen, W., Li, B., Jing, Z. et al. (2023) ETCM v2.0: An update with comprehensive resource and rich annotations for traditional Chinese medicine. Acta Pharm Sin B, 13, 2559-2571.