2025 Volume 50 Issue 7 Pages 309-324
Identifying the molecular targets of toxic compounds remains a major challenge in toxicology, particularly when adverse effects occur in off-target organs and the mechanism of action is unknown. To address this issue, a comprehensive computational pipeline was developed to perform high-throughput molecular docking across the entire AlphaFold2-predicted structural proteome of representative organisms such as human and mouse, followed by enrichment analysis to estimate biological processes potentially affected by ligand binding. The pipeline was first evaluated using six known drug–target pairs. In several cases, the known targets were ranked between the top 2 and 250 proteins (top 0.009–1.15%) among more than 21,000 proteins, and displayed docking poses consistent with experimentally observed binding conformations. However, performance was limited for certain targets, such as carbonic anhydrase II with acetazolamide, where the binding pocket was broad, leading to inaccurate docking results. The pipeline was subsequently applied to puberulic acid, a compound suspected of causing severe nephrotoxicity. Screening identified sodium/myo-inositol cotransporter 2 (SLC5A11) as a high-affinity target in both human and mouse, suggesting a mechanism involving disruption of renal osmoregulation. Although docking scores represent only theoretical binding estimates and do not directly imply physiological effects, their distribution was independent of protein length and AlphaFold2 confidence scores (pLDDT), supporting the methodological robustness. This in silico framework enables hypothesis-driven identification of potential target proteins for toxicants or therapeutics and offers a useful tool for predictive toxicology, particularly when experimental data are limited. The pipeline is available at: https://github.com/toxtoxcat/reAlldock.
Drug discovery is a daunting enterprise, typically spanning over a decade and costing hundreds of millions to billions of dollars, with a success rate of only 1 in 31,000 candidate compounds (Hughes et al., 2011). One major reason for this low success rate in early drug discovery phase is that most compounds manifest not only their primary effects but also unexpected side effects, particularly in off-target organs, which are notoriously difficult to predict in advance. Although animal studies are routinely used to evaluate these side effects, they primarily provide phenotypic information and make it difficult to identify the precise molecular targets underlying the observed toxicity. A notable example is a drug-related incident in Japan in 2024 involving a red yeast rice supplement suspected of causing kidney dysfunction (Ushimaru and Tominaga, 2024). Subsequent reports suggested that puberulic acid, a contaminant produced by Aspergillus during the culturing process, might be responsible. Knowledge regarding the toxicity of puberulic acid is relatively limited. Iwatsuki et al. reported its cytotoxicity against human MRC-5 cells with an IC50 value of 57.2 μg/mL, while Kikuchi et al. documented apoptosis-mediated cytotoxicity in human leukemia U937 cells (Iwatsuki et al., 2010; Kikuchi et al., 2024). However, its detailed mechanism of action remains unclear, underscoring the urgent need for more efficient methods to identify toxicity-related target molecules.
Historically, a variety of non-animal experimental methods, such as in vitro assays using cultured cells, have been employed to evaluate toxicity (Lynch et al., 2024). Although these in vitro assays offer higher throughput than in vivo animal studies, testing tens of thousands of compounds remains impractical. As an alternative, in silico approaches—virtual experiments performed on computers—have been explored for drug discovery. Common computational toxicity prediction techniques include quantitative structure–activity relationship (QSAR) models, which generate molecular descriptors based on chemical structure and use them to build machine learning models, and read-across methods, which infer the toxicity of an unknown compound from known toxicities of structurally similar compounds (Fisher et al., 2024). However, all of these approaches rely on structural similarities to known compounds and, therefore, struggle when confronted with novel chemical architectures or previously uncharacterized toxicity mechanisms. Although there are machine learning methods that integrate information on both ligands and their protein interaction partners, these approaches also rely heavily on existing databases of adverse effects of drugs (Raies and Bajic, 2016).
To develop new methods for predicting toxicity, it is useful to revisit how chemical substances exert toxic effects. In general, mode of action of toxicity can be described using the adverse outcome pathway (AOP) concept (Ankley et al., 2010; Vinken, 2013). The AOP framework divides toxicity into three stages: the molecular initiating event (MIE), in which the chemical binds to target biological molecules such as nucleic acids and proteins; the subsequent cellular or subcellular key events (KE); and the final adverse outcome (AO) in in vivo. Prediction methods for the binding of nucleic acids and chemical substances, specifically mutagenicity, are relatively well-developed. Studies have demonstrated that common machine learning techniques, such as support vector machines and gradient boosting, can predict Ames test outcomes with high accuracy (Chu et al., 2021). In contrast, proteins exhibit greater structural diversity and a wider range of functions compared to nucleic acids, making systematic understanding and prediction of their interactions and effects much more challenging and still an unmet goal; however, accurately predicting which proteins a chemical might bind to provides a viable strategy for forecasting toxicity. Molecular docking simulations—an in silico technique—offer a way to make such predictions (Trisciuzzi et al., 2018). These simulations place a chemical compound and a protein within a computational space and search for the most energetically favorable binding pose (Prieto-Martínez et al., 2018). Unlike in vitro and in vivo experiments, docking calculations are highly efficient, often taking only a few seconds per compound. For instance, one study screened 1,314 compounds for potential HIV protease inhibitors using the docking programs AutoDock4 and AutoDock Vina (Chang et al., 2010). However, molecular docking requires three-dimensional structures of both the chemical compounds and the proteins. Whereas three-dimensional structures of small molecules can be readily generated from their chemical formulas, experimentally determining protein structures—often comprising tens of thousands of atoms—has long required resource-intensive methods such as X-ray crystallography or cryo-electron microscopy (Bertoline et al., 2023). Recently, however, protein structure prediction techniques based on amino acid sequences, exemplified by AlphaFold2, have attained high accuracy (Jumper et al., 2021). This progress enables a comprehensive acquisition of structural proteomes, even for proteins with limited database information. In turn, the ability to perform new docking simulations on such proteins will extend toxicity predictions beyond what can be inferred from existing knowledge bases.
In this study, we implemented an in silico pipeline in which any chosen ligand is docked against the entire structural proteome—predicted by AlphaFold2—of an organism of interest. We then conducted an enrichment analysis on the set of proteins whose docking scores exceeded a certain threshold, thereby predicting the potential biological changes that might arise from ligand binding (Fig. 1).
Overview of the Binding-Proteomics pipeline. This Figure illustrates the three-step computational pipeline implemented in the scripts main_prepare.py, main_dock.py, and main_string.py. The pipeline begins with the acquisition of the structural proteome of a selected organism from the AlphaFold Protein Structure Database (main_prepare.py), followed by protein structure preprocessing using Open Babel and binding site detection with Fpocket. Next, ligand docking is performed across the entire proteome using Vina-GPU (main_dock.py), based on SMILES or SDF input. Finally, enrichment analysis is conducted on high-affinity target proteins using the STRING database API (main_string.py). The selection of high-affinity proteins is user-defined—either by docking score threshold (e.g., ≤ −8 kcal/mol) or top-ranking N proteins (default: top 100).
A script was developed—primarily using the wget command—to download the 3D protein structure files (PDB files) for any given species from the AlphaFold Protein Structure Database (Varadi et al., 2022). As of September 16, 2024, the database provides downloadable full proteomes for 48 different species. Prior to conducting the docking simulations, all downloaded PDB files were processed with Open Babel to add polar hydrogens, assign Gasteiger charges, and convert the files into PDBQT format (O’Boyle et al., 2011). Gasteiger charges are approximate atomic charges computed from electronegativity values (Gasteiger and Marsili, 1978). PDBQT files are a modified version of the PDB format used by AutoDock Vina (Eberhardt et al., 2021; Trott and Olson, 2010). They contain the partial charges (e.g., those assigned by the Gasteiger method) as well as additional docking-related information such as hydrogen-bond donor/acceptor status and aromatic ring membership.
Detection of ligand-binding pocketsLigand-binding pockets for each protein were identified using Fpocket, a high-speed pocket detection program (Le Guilloux et al., 2009). Default parameters were applied. If multiple pockets were detected within a given protein, the one with the highest Pocket Score was selected as the representative ligand-binding site.
Molecular docking simulationsDocking calculations were conducted on the entire structural proteome using Vina-GPU for GPU-accelerated molecular docking (Ding et al., 2023). The binding pocket center determined by Fpocket was set as the center of the docking box. By default, the box size was set to 20 × 20 × 20 Å (adjustable as needed); in this example, a 20 × 20 × 20 Å box was employed. All other parameters for Vina-GPU were kept at their default values.
Enrichment analysisEnrichment analyses were performed using the STRING database API (Application Programming Interface) on two sets of proteins: (1) those with docking scores above a threshold (default: −8.0 kcal/mol) and (2) the top n proteins (default: n = 100) ranked by docking score (Szklarczyk et al., 2021). The version of STRING used was 11.5. First, the “Mapping identifiers” method in the STRING API was applied to convert the UniProt Accessions, associated with each PDB file, into STRING Identifiers (Bateman et al., 2023). Subsequently, the “Performing functional enrichment” method in the STRING API was used to obtain functional enrichment data. These analyses included annotations for Gene Ontology, KEGG pathways, UniProt Keywords, PubMed publications, Pfam domains, InterPro domains, and SMART domains (Consortium et al., 2023; Kanehisa et al., 2023; Letunic et al., 2021; Mistry et al., 2021; Paysan-Lafosse et al., 2023; Sayers et al., 2022).
Pipeline implementationThe pipeline was designed to accept ligands not only in 3D structure file formats such as SDF or MOL2, but also as SMILES notation, which is then converted automatically. Both SDF and MOL2 are text-based file formats containing atomic coordinates and bonding information, whereas SMILES denotes molecular structures as a single line of text. When SDF or MOL2 files are provided, hydrogens are added, Gasteiger charges are assigned, and the files are converted into PDBQT format via Open Babel, following the same procedure used for protein files. In the case of SMILES input, RDKit is used to add hydrogens and generate a 3D structure via the ETKDG method before converting the file to SDF format; from there, the same procedure is applied to generate the final PDBQT files (Bento et al., 2020; Riniker and Landrum, 2015).
Performance evaluation of the pipelineTo evaluate the docking accuracy of the pipeline, six ligands with known experimental complex structures were selected and docked across the entire AlphaFold2-predicted structural proteome. For each ligand, the docking pose on its known target protein (based on the AlphaFold2 model) was extracted and compared with the experimentally observed binding conformation. The PDB IDs of the reference complexes, along with the UniProt accession numbers of the corresponding human and mouse proteins, are listed in Table 1. Structural alignment between predicted and experimental poses was performed using the Match command in ChimeraX 1.6. The docking times for these six compounds using two NVIDIA RTX A6000 GPUs ranged from 3.2 min per 100 proteins for the fastest compound, teriflunomide, to 5.9 minutes for the slowest, 3-quinuclidinyl benzilate, with an average of 4.4 ± 0.9 min (mean ± standard deviation).
Target protein prediction for puberulic acid using the pipelineThe pipeline was applied to predict potential target proteins of puberulic acid, a compound implicated in fatal kidney injury and associated with a major public health incident, despite its targets remaining unknown. The molecular structure of puberulic acid was obtained from PubChem (CID: 592860) and evaluated using the pipeline for both the human and mouse structural proteomes. Several statistical analyses were performed in JMP Pro 17.0 to assess the reliability of the comprehensive molecular docking approach. Pearson correlation coefficients were also computed between docking scores and two factors: pLDDT values and the protein’s amino acid sequence length.
Validative flexible and covalent dockingRefinement of the docking results for candidate binding proteins (MIOX and SMIT1/2) of puberulic acid identified in the comprehensive screening was carried out via flexible docking and covalent docking in the Schrödinger Suite. The OPLS4 force field was employed for energy minimization of both the protein and the ligand (Lu et al., 2021). Binding pockets were identified with SiteMap, and flexible docking was performed using Glide Induced Fit Docking with XP scoring and otherwise default conditions (Friesner et al., 2004). For comparison, the physiological ligand myo-inositol (PubChem CID: 892), which is the known substrate of both MIOX and SMIT1/2, was also subjected to flexible docking using the same procedure (Fig. 2). Covalent docking (CovDock) was then conducted on the poses obtained from flexible docking. In the case of SMIT2, two lysine residues—Lys157 and Lys316—were identified in close proximity to the docked puberulic acid and selected as potential sites for covalent bond formation (Zhu et al., 2014).
2D and 3D structure of myo-inositol and puberulic acid.
Performance of the pipeline was evaluated using six known ligand–target protein pairs. Fig. 3 shows the superimposed binding poses of the experimental structures and the docking results for the corresponding human and mouse target proteins. Table 2 lists the docking scores obtained by the pipeline, the rank of the known targets among all proteins, and the root mean square deviation (RMSD) values between the predicted and experimentally observed ligand conformations. For the pairs warfarin–VKORC1 and 3-quinuclidinyl benzilate–CHRM2, both human and mouse docking scores were below −10.0, placing them within the top 2–10 proteins—corresponding to the top 0.009–0.046% of the entire proteome. Their predicted poses closely matched the experimental ones, with RMSD values below 2.0. Similarly, pioglitazone–PPARγ showed docking scores below −9.0 in both species and ranked within the top 1.0%. Teriflunomide–DHODH scored −9.0 in human and slightly above −8.0 in mouse, ranking within the top 0.03% and 1.15%, respectively. While these ligands docked to the correct binding pockets, their poses showed noticeable deviations from the crystal structures, resulting in RMSD values ranging from 7.13 to 9.37. In contrast, the ibuprofen–COX-2 pair yielded less accurate results: the docking score was above −6.0 in human (top 4.91%) but even weaker in mouse, failing to place within the top 10%. For acetazolamide–CA2, scores were similarly poor in both species, with neither ranking within the top 10%. In both cases, the predicted binding poses were located at positions distinct from those in the experimental structures. The enrichment analysis results for proteins with docking scores below −8.0 for each compound are provided in the supplementary Excel file.
Binding poses of known drug–target protein pairs. Superimposed structures of experimentally observed complexes from the PDB (shown in blue) and docking poses calculated using the present pipeline (magenta for human, green for mouse). The root-mean-square deviation (RMSD) values between the PDB and docking structures are listed in Table 2. VKORC1: Vitamin K epoxide reductase complex subunit 1, CHRM2: Muscarinic acetylcholine receptor M2, CA2: Carbonic anhydrase, PPAR-γ: Peroxisome Proliferator-Activated Receptor γ, DHODH: Dihydroorotate dehydrogenase (quinone), COX-2: Cyclooxygenase-2.
The entire set of 21,615 mouse and 23,391 human protein PDB files was successfully converted into PDBQT format. Fpocket identified ligand-binding pockets in 21,581 mouse structures and 23,349 human structures. Docking simulations of puberulic acid against these proteins yielded an average docking score of −5.2 kcal/mol (median: −5.1, minimum: −8.8) for mouse and −5.1 kcal/mol (median: −5.1, minimum: −8.5) for human proteins. Fig. 4A shows a 3D scatter plot of docking scores, pLDDT values (a confidence metric for AlphaFold2-predicted structures), and protein sequence lengths for both species. Spearman correlation coefficients between docking scores and these variables were −0.35 for pLDDT and −0.11 for sequence length (p < 0.001), with similar results observed independently for human and mouse. Fig. 4B presents histograms of docking scores for the human and mouse proteomes. The Shapiro–Wilk test indicated non-normal distribution in both cases (p < 0.001). Proteins with docking scores equal to or below −8.0 were limited to 38 in human (top 0.16%) and 43 in mouse (top 0.20%). These subsets were subjected to subsequent enrichment analysis.
A) Three-dimensional scatter plot showing docking scores for puberulic acid across the combined human and mouse proteomes, along with pLDDT values and amino acid sequence lengths. All positive docking scores, which indicate extremely unfavorable binding, were converted to zero. The pLDDT value (range: 0–100) is an AlphaFold2-derived metric that reflects protein structure prediction accuracy. Although pLDDT values are computed per amino acid, each protein is represented by its mean pLDDT value. B) Histogram of docking scores. The vertical axis represents the number of proteins in each histogram bin. Proteins with scores ≤ -8 were subjected to subsequent enrichment analyses.
Enrichment analysis was performed on proteins with high binding affinity to puberulic acid (docking score ≤ −8 kcal/mol) in both the human and mouse proteomes. In the human dataset (Table 3), significantly enriched terms included alpha-1,4-glucosidase activity, glycoside hydrolase family 31, and several motifs associated with lipoprotein receptors, such as LDLR class B repeat and low-density lipoprotein receptor YWTD domain. Terms related to sodium-coupled transport were also enriched, including glucose:sodium symporter activity, sodium:solute symporter family, and sodium/glucose symporter superfamily. A domain of unknown function (DUF5050) was also among the top-ranked annotations. In the mouse dataset (Table 4), the top-ranked enriched term was inositol transporters (MMU-429593). Additional transport-related terms such as glucose:sodium symporter activity, sodium:solute symporter family, and sodium/glucose symporter superfamily were also identified. Enrichment was further observed for terms related to fructose-bisphosphate aldolase activity and domains, as well as annotations linked to heat shock proteins, protein refolding, and guanylate-binding proteins.
The top 20 proteins with the highest binding affinities for puberulic acid are listed in Table 5. Among them, six proteins were commonly identified in both the human and mouse datasets: inositol oxygenase (MIOX), sodium/myo-inositol cotransporter 2 (SLC5A11), ectonucleoside triphosphate diphosphohydrolase 6 (ENTPD6), GTPase IMAP family member GIMD1 (GIMD1), GTP-binding protein 4 (GTPBP4), and atlastin-2 (ATL2). Functional categorization showed that these proteins largely fell into two groups. MIOX and SLC5A11 are both inositol-related proteins; SLC5A11 (also known as SMIT2) is a sodium-dependent transporter that imports myo-inositol into cells in response to hyperosmotic stress, functioning as part of the osmotic regulation system. MIOX catalyzes the oxidative catabolism of myo-inositol to D-glucuronic acid and is the only known enzyme responsible for inositol degradation in humans. GIMD1, GTPBP4, and ATL2 are GTP-related proteins with roles in immune cell function, ribosome biogenesis, and endoplasmic reticulum morphology. ENTPD6, which hydrolyzes extracellular nucleotides, was also shared between the two species but did not fall clearly into either functional group. These results indicate a convergence of high-affinity binding proteins on pathways related to inositol handling and GTPase activity.
Since the inositol-related proteins identified as high-affinity targets—MIOX and SMIT2—were consistent with the known pathology of puberulic acid-induced nephrotoxicity, a detailed analysis of the binding modes between puberulic acid or myo-inositol and both human and mouse MIOX, SMIT2 (SLC5A11), and SMIT1 (a transporter with similar function to SMIT2) was conducted using flexible and covalent docking. To better reflect binding interactions in a physiological environment, flexible docking, which accounts for protein-side mobility (Glide induced fit docking), was conducted (Table 6). Puberulic acid demonstrated binding affinities comparable to those of the physiological ligand myo-inositol for all tested proteins. Observations of interacting amino acid residues revealed that lysine or arginine residues, which are capable of covalent bonding, were consistently located in close proximity to puberulic acid for all proteins (Fig. S1). Glide covalent docking (CovDock) was used to predict whether these residues could form imine bonds with the carbonyl group of puberulic acid via imine condensation. In all three proteins examined, puberulic acid formed stable covalent complexes with the designated lysine residues under the simulated conditions, while maintaining favorable binding affinities (Table 6, Fig. 5, Fig. S2, and S3). These findings suggest that covalent bond formation is feasible in silico; however, the results remain computational and do not provide experimental confirmation.
Theoretical binding-mode analysis of puberulic acid with human/mouse SMIT2 (SLC5A11), verified by flexible docking. Both the endogenous ligand myo-inositol and puberulic acid were subjected to Glide Induced Fit Docking, followed by additional covalent docking of puberulic acid with Lys157 and Lys316. The two-dimensional interaction diagrams show the key contacts between SMIT2 and each ligand, and the right panel illustrates the three-dimensional pose of puberulic acid covalently bound to Lys316.
The pipeline was developed to dock a target ligand against the entire AlphaFold2-predicted proteome of any species, followed by enrichment analysis of proteins that exceed a specified docking score threshold, thereby predicting potential binding targets and their physiological effects. Because molecular docking requires the definition of a search center within the protein, the high-speed, high-accuracy Fpocket tool was employed to identify ligand-binding pockets. Among the 21,615 mouse proteins, Fpocket failed to detect any ligand-binding pockets in only 34 proteins (0.16%). Similarly, among 23,391 human proteins, pockets were not detected in 42 proteins (0.18%). All of these proteins contained fewer than 100 amino acid residues, with a mean length of 31 residues. These proteins were considered too small to form a stable binding pocket and were excluded from subsequent analyses without significantly impacting the results. The six reference compounds selected for benchmarking covered a diverse range of target protein classes, including enzymes (VKORC1, CA2, DHODH, COX-2), a nuclear receptor (PPAR-γ), and a G protein-coupled receptor (CHRM2). These targets also represented a mix of soluble and membrane-associated proteins. For example, CA2 and PPAR-γ are soluble proteins without membrane-spanning regions. VKORC1 and CHRM2 are classic multi-pass transmembrane proteins, localized in the endoplasmic reticulum and plasma membrane, respectively. DHODH and COX-2, on the other hand, are peripheral membrane proteins; DHODH is anchored to the mitochondrial inner membrane, while COX-2 is associated with the endoplasmic reticulum and nuclear envelope via membrane-binding domains. This diversity was intentionally included to evaluate the pipeline’s performance across structurally and functionally distinct target classes. Benchmarking results revealed that the pipeline successfully ranked known targets highly for several compounds. Warfarin–VKORC1 and 3-quinuclidinyl benzilate–CHRM2, for instance, were ranked within the top 10 proteins and demonstrated excellent pose agreement with experimental structures. Teriflunomide–DHODH and pioglitazone–PPAR-γ also showed favorable docking scores and ranked within the top 1–2% of the proteome, despite notable deviations in pose alignment. These cases suggest that in the context of proteome-scale screening, accurate pose reproduction is not strictly necessary—high docking scores alone may be sufficient to prioritize true targets effectively. In contrast, compounds such as acetazolamide and ibuprofen yielded lower docking scores and failed to reproduce the correct binding sites. In the case of CA2, this may be attributed to the wide and solvent-exposed nature of the binding pocket (Fig. 3), which offers limited geometric constraints and may not favor high-scoring docked conformations. Overall, proteins with structurally well-defined and enclosed binding pockets, particularly transmembrane proteins such as VKORC1 and CHRM2, appeared more amenable to successful docking against AlphaFold2 models. While certain cases showed partial alignment—for example, enrichment analysis for warfarin retrieved pathways related to menaquinone metabolism, which is relevant to its inhibition of vitamin K recycling—clear reflection of known pharmacological mechanisms was generally limited. This limitation may stem from several factors: (1) imperfect docking accuracy leading to suboptimal ranking of the true targets; (2) the limited number of functionally similar proteins for highly specific targets, reducing statistical power in enrichment calculations. These findings suggest that while enrichment analysis can support exploratory applications, its utility for validating well-established drug–target relationships may be limited and should be interpreted with caution.
Comprehensive docking simulations using puberulic acid were subsequently performed to estimate its potential protein targets. The correlation coefficient between docking scores and pLDDT (a structural prediction accuracy metric) was −0.35, indicating that the effect of AlphaFold2’s prediction accuracy on calculated binding affinities is relatively small. To assist in the interpretation of docking scores, the binding affinity corresponding to a docking score of −8 kcal/mol was estimated based on thermodynamic principles. Although docking scores are only rough approximations and cannot be directly converted into experimental Kd or Ki values, they can be related to binding affinity through the equation: Kd/i = eΔG/RT, where ΔG represents the Gibbs free energy change (docking score), R is the universal gas constant, T is the absolute temperature (set to 310.15 K, corresponding to 37°C), Substituting ΔG of −8 kcal/mol yields an estimated Kd/i of 2.3 µM. For comparison, docking scores of −7 and −9 kcal/mol correspond to approximate values of 17 µM and 0.3 µM, respectively, reflecting the exponential sensitivity of binding affinity to small changes in docking energy. Inspection of the puberulic acid docking score distribution revealed that only a small number of proteins reached or exceeded −8 kcal/mol, indicating that this ligand may have a high affinity for only a select subset of proteins. This is likely due to the highly hydrophilic nature of puberulic acid, which is surrounded by multiple hydroxyl groups, in contrast to the generally hydrophobic characteristics of many small-molecule ligands. Although the actual number of proteins bound in vivo remains unclear, the observed distribution appears reasonable for screening purposes (Fig. 4-B).
Out of more than 20,000 proteins in both the human and mouse proteomes, only six proteins exhibited consistently high binding affinities for puberulic acid across both species. These proteins were either related to myo-inositol symport or GTP-binding functions. Notably, the symporter group (MIOX and SMIT2)—comprising proteins involved in renal osmoregulation—was considered particularly relevant to the nephrotoxic effects of puberulic acid. Both MIOX and SMIT2 recognize myo-inositol as their physiological substrate. Myo-inositol has a structure in which each hydrogen atom on the carbons of a cyclohexane ring is replaced by a hydroxyl group. Puberulic acid, which contains multiple hydroxyl groups on a seven-membered ring, shares a certain degree of structural similarity with myo-inositol (Fig. 2). This structural similarity supports the possibility that puberulic acid may bind to these targets. Induced fit docking, a high-precision method that accounts for protein flexibility, is particularly suited for capturing physiologically relevant binding poses. This analysis revealed that puberulic acid binds strongly to inositol-related proteins, with SMIT2 consistently exhibiting higher binding affinities than SMIT1, aligning with trends observed in the initial screening (Table 5). Additionally, interaction analyses identified reactive lysine and arginine residues near puberulic acid within the binding pockets of MIOX, SMIT2, and SMIT1—residues that are conserved across humans and mice (Fig. S1). To evaluate the potential for covalent bond formation, covalent docking simulations were performed by integrating induced fit docking with imine condensation modeling. Although these simulations are purely computational, they demonstrated that puberulic acid could form covalent bonds with lysine residues via imine condensation in all three binding pockets (Fig. 5, S2, and S3). These results suggest the possibility of covalent bond formation with inositol-related proteins; however, experimental confirmation is required. Notably, imine (Schiff base) bonds between carbonyl groups and lysine side chains are generally reversible under physiological conditions and may require additional stabilization mechanisms such as local structural constraints or conjugation. Collectively, these results suggest that puberulic acid may covalently bind to inositol-related proteins, potentially disrupting their physiological functions. Both SMIT1 (SLC5A3) and SMIT2 (SLC5A11) are sodium-dependent transporters which play key roles in osmoregulation by importing myo-inositol into cells, accumulating it under high extracellular osmotic pressure to maintain osmotic balance between the intracellular and extracellular environments (Coady et al., 2002; Kwon et al., 1992) and are highly expressed in the kidney as well as in the brain and small intestine (Aouameur et al., 2007; Fu et al., 2012). In the kidney of rodents, SMIT1 is highly expressed in the renal medulla, with its expression markedly upregulated under hyperosmotic stress while SMIT2, also regulated by hyperosmotic conditions, is predominantly expressed in the brush border membrane of proximal tubules, where it plays a key role in the reabsorption of myo-inositol (Lahjouji et al., 2007; Lewis et al., 2021). Inhibition of SMIT1/2 is likely to disrupt renal osmoregulation, potentially leading to kidney injury. Interestingly, methylene myo-inositol, an inhibitor of SMIT1, is known to cause tubular degeneration—primarily in the outer medulla (Kitamura et al., 1998). This phenotype closely resembles the tubular degeneration and necrosis, as well as the Fanconi syndrome, observed in puberulic acid toxicity. MIOX is the only enzyme known to be responsible for inositol catabolism in humans and is highly expressed in the proximal tubules of the kidney (Hankes et al., 1969). In contrast to SMIT1/2, MIOX inhibition is not generally considered harmful. In fact, under hyperglycemic conditions, MIOX expression in proximal tubules is upregulated, leading to excessive inositol oxidation and increased production of reactive oxygen species (ROS). As a result, MIOX inhibition has been proposed as a potential therapeutic strategy for managing diabetic nephropathy (Tominaga et al., 2016). However, the long-term and chronic effects of MIOX inhibition on whole-body physiology, including in humans, remain poorly understood. Based on these findings, a plausible mechanism for puberulic acid–induced renal damage involves its structural similarity to myo-inositol. Puberulic acid may be taken up by SMIT1/2, forming covalent bonds with their highly reactive lysine residues, thereby inhibiting symporter activity. Loss of this critical osmoregulatory function is likely to lead to proximal tubular injury. It is important to note that all findings in this study are simulation-based, thus experimental validation for the binding affinity between the puberulic acid and SMIT1/2 and its physiological effect is required in future studies.
In addition to inositol-related proteins, a group of GTP-related proteins were also identified as high-affinity targets of puberulic acid in both human and mouse structural proteomes (Table 5). GTP-binding proteins serve as essential molecular switches in a wide range of biological processes, including immune regulation, protein synthesis, and intracellular trafficking. Among the high-affinity targets detected in this study were ENTPD6 and GIMD1, which are involved in modulating immune responses; GTPBP4, a key regulator of ribosome biogenesis and protein production; and ATL2, a GTPase critical for maintaining the architecture and dynamics of the endoplasmic reticulum. The binding of puberulic acid to these proteins may interfere with normal GTP hydrolysis, thereby disrupting immune signaling, inhibiting protein synthesis, or inducing endoplasmic reticulum stress. These findings suggest that puberulic acid toxicity may extend beyond renal effects and potentially involve systemic mechanisms affecting immunity, translation, and organelle function. While this study focused on the human and mouse proteomes, the identification of conserved high-affinity targets across species underscores the broader applicability of the approach. This pipeline is theoretically applicable to any animal species, provided that their protein sequences are available. The AlphaFold database already provides ready-made ZIP files containing cross-species structural proteomes for 48 representative model organisms, and this pipeline also makes use of those files. However, by modifying the script to download structures sequentially based on species name searches, the pipeline can be extended, in principle, to virtually any species.
As a limitation of the present study, the analysis is restricted to an energy-based molecular docking approach followed by enrichment analysis and does not constitute a direct prediction of side effects. Rather, the pipeline is intended for use as a component of binding proteomics in toxicological research, complementing transcriptomic or proteomic analyses by providing candidate binding proteins and pathways to toxicologists. A related approach involves using large-scale molecular docking data on compound–protein pairs to build training sets for side-effect prediction (Sawada et al., 2024). However, such methods are often influenced by existing knowledge of compound toxicity, making them less suitable for identifying novel target proteins. By contrast, the pipeline described here provides purely energy-based rankings that may facilitate new discoveries. The fact that SMIT1 and SMIT2 have not been well recognized as a nephrotoxicity target supports the pipeline’s potential utility as an energy calculation-based in silico omics analysis. Various methods exist for molecular docking and ligand-binding pocket detection, each with its own advantages and limitations. It is important to acknowledge that no current approach can perfectly predict binding poses with absolute accuracy as shown in Table 2. This pipeline employs Vina-GPU and Fpocket to achieve high-speed, comprehensive molecular docking calculations, but further comprehensive benchmarking using well-characterized compounds with known targets is needed. Additionally, the pipeline adopts only the highest-ranked pocket identified by Fpocket in order to control computational complexity, despite the possibility that some ligands may bind to allosteric sites. Future work may involve increasing the accuracy of the comprehensive docking calculations and refining methods for inferring biological changes from the results. In addition to structure prediction methods such as RoseTTAFold-all-atom and AlphaFold3—which can model protein–ligand and protein–nucleic acid complexes and may offer future utility for toxicity prediction, albeit with current restrictions on docking use for AlphaFold3 structures (Abramson et al., 2024; Krishna et al., 2024)—molecular dynamics (MD) simulations also represent a promising avenue for enhancing the accuracy of protein-ligand binding site conformations. In particular, MD could be applied to refine AlphaFold2-predicted structures, potentially improving the suitability of binding pockets for docking simulations. However, due to the high computational (GPU) cost associated with MD, such approaches may be more practical when used either as a preprocessing step for a selected set of proteins or for post-screening refinement of candidate complexes. Integration of MD into the Binding Proteomics pipeline may thus enhance structural realism and binding pose reliability, contributing to further advances in target-based toxicity prediction. Since the advent of AlphaFold2, structural biology has undergone rapid and transformative advancements, not only in its core methodologies but also in its widespread application across various fields of biological research. The approach proposed in this study, termed "Binding Proteomics", is expected to further accelerate the integration of structural biology into toxicology, enhancing our ability to predict and understand toxicity at the molecular level.
In conclusion, this study proposes an in silico screening method for identifying candidate binding target proteins via comprehensive molecular docking calculations of an organism’s structural proteome. An analysis focusing on puberulic acid—a compound that has raised significant public health concerns in Japan due to its potential for lethal kidney damage—highlighted the sodium/myo-inositol cotransporter as a prominent target candidate. These findings underscore the need for further experimental investigations to validate the toxicological effects of puberulic acid.
This study was supported by Grants‐in‐Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan awarded to K. Takeda (23K14091). This research was supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number JP23ama121027 awarded to M. Sekijima.
Conflict of interestThe authors declare that there is no conflict of interest.
Data availabilityThe source code implementing this pipeline is available at https://github.com/toxtoxcat/reAlldock.
Supplementary materialsSupplement file 2_1. Enrichment analysis of high-affinity binding proteins (docking score < −8 kcal/mol) for six reference ligands used in performance evaluation, based on the human AlphaFold2-predicted structural proteome.
Supplement file 2_2. Enrichment analysis of high-affinity binding proteins (docking score < −8 kcal/mol) for six reference ligands used in performance evaluation, based on the mouse AlphaFold2-predicted structural proteome.
Fig. S1. Interaction diagrams of induced-fit docking for MIOX, SMIT2, and SMIT1 in human and mouse.
Fig. S2. Interaction diagrams of covalent docking for SMIT1 in human and mouse.
Fig. S3. Interaction diagrams of covalent docking for MIOX in human and mouse.