The DILIst dataset is a dataset for liver toxicity clinical labels. It is widely used in drug discovery as a benchmark for ML models on liver toxicity.

Drug-induced liver injury (DILI) presents a significant challenge in drug discovery, often leading to clinical trial failures and necessitating drug withdrawals. We introduce a novel method for DILI prediction that employs not only chemical data but also 11 predicted proxy-DILI labels (in vitro and in vivo data) and pharmacokinetic properties. These proxy-DILI labels offer complementary insights, enhancing prediction accuracy.

DILIPredictor uses 9 predicted proxy-DILI labels, both in vitro (e.g., mitochondrial toxicity, bile salt export pump inhibition) and in vivo (e.g., preclinical rat hepatotoxicity studies) as well as pharmacokinetic parameters, structural fingerprints and physicochemical parameters for prediction.

Utilising feature interpretation, DILIPredictor can recognize chemical structure as well as biological mechanisms, and distinguish animal vs human liver toxicity.

DILIPredictor takes advantage of the diverse biological data related to DILI mechanisms as well as chemical structure for enhanced early detection.

For referencing DILI Predictor, please cite:
Seal et al. Improved Early Detection of Drug-Induced Liver Injury by Integrating Predicted in vivo and in vitro Data, bioRxiv 2024, 2024.01.10.575128. https://doi.org/10.1101/2024.01.10.575128.


compounds with liver toxicity readouts (DILI and proxy-DILI) covering compounds and approved, investigational, experimental and withdrawn drugs


curated in vivo and in vitro assays (proxy-DILI labels) and pharmacokinetic parameters related to liver injury


compounds with DILI labels from DILIst and DILIrank labels (716 toxic and 395 non-toxic compounds). This dataset is referred to as the gold standard DILI dataset.

Schema Image