Q1: How do you use AI to generate molecules?
A1: In SBDD approaches, AI generates the molecules from the binding pocket of the protein structure. AI extracts spatial and pharmacophore features from the binding pocket as vectors and then uses neural networks to generate molecules.
In LBDD approaches, AI generation algorithms incorporate 1D, 2D, or 3D molecular features of the reference compounds. These representations are fed into multiple neural networks, generating novel molecules with desired constraints (fix-scaffold or scaffold-hopping). Goal-directed generations are enabled by using reinforcement learning and chemical space exploration strategies.
Q2: When AI generates molecules, does it take into consideration the synthetic feasibility routes and/or options such as a traditional med chemist would do?
A2: It is not possible yet. Currently, we are generating the molecules first without considering the synthetic routes and estimating the synthetic feasibility later by synthetic feasibility score and the analysis by medicinal chemists.
We are trying to explore the synthetic route by combining AI and robotics automation. For more details, we ask you to attend the webinar on September 7 by Dr. Nathan Allen.
Q3: How many different ADMET models do you have for the lead optimization stage? Can these models apply to all chemical matters? How accurate can the models be?
A3: We can currently predict solubility, lipid bilayer permeability, MDCK cell permeability, Caco-2 cell permeability, liver microsome stability, site of metabolism and CYP inhibition, and hERG toxicity with our ADMET models.
No. Different chemical modalities require different models. For example, the solubility prediction of peptide molecules by the AI model is not accurate because peptides can have multiple conformations.
We got r2 of 0.7~0.8 from quantitative validations of predictions. In the case of predicting sites for modification such as we did in hERG toxicity prediction, we could successfully enhance the properties by increasing IC50 5~10 folds.
Q4: Could you use AIDD to design different modalities like PROTAC or peptide-based drugs?
A4: Yes. We can use AIDD in both cases. Regarding the peptide-based drug, we completed one project and successfully finished the lead generation. We published a paper on how to estimate the binding affinity in protein-protein interaction. (https://pubs.acs.org/doi/10.1021/acs.jcim.0c00679 , Zou, J. et al., J Chem Inf Model, 60, 5794–5802 (2020)) Regarding PROTAC, we have an ongoing project.
Q5: What are the best and worst classes of targets to investigate using AIDD?
A5: It differs based on what approaches we use between SBDD and LBDD. When we use the SBDD approach, it is best when we can specify the conformation of the protein structure and when the downstream mechanism is clear like in most kinase proteins. Likewise, it is worst to use SBDD when the downstream is unclear and protein has multiple states of conformation like in the case of many ion channels. When we use the LBDD approach, it depends on the quality and quantity of data. The data should have enough points and should be harmonized well to generate an accurate QSAR correlation.
Q6: How much cost and time are saved when using AIDD compared to not using AIDD?
A6: While it is notoriously difficult to measure discovery timelines on average, our experience can suggest AI is likely to reduce the discovery period. For fast-follow projects, AIDD could save about 50% time of the preclinical stage, which is 5.5 years on average. In one of our collaborations with biotech, it took 7 months for us in case study 2 from starting to filing patents. For first-in-class projects requiring in vitro and in vivo model development, we expect years of discovery time. Costs would differ from projects and be estimated based on the timeline as well.
Q7: Are you confident in predicting poses for all protein targets with Xpose?
A7: Xpose is quite useful in most of the other targets such as kinase, GPCR, etc. but not for all proteins. If the binding pocket is large or if the protein could experience significant conformational changes such as what happens in an ion channel, it is hard to predict the binding pose even with Xpose.
Q8:What is the accuracy of XFEP ?
A8: XFEP errors are within 1 kcal/mol corresponding to the 7~8 folds of error in Ki value, which could be useful to predict potent compounds for Ki in nM ranges.For more details, please check the paper from XtalPi. ( https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c01329 Lin, Z. et al. J Chem Inf Model 61, 2720–2732 (2021))
Q9: Are we going to lose some good compounds if we were only aiming to make the top few compounds predicted by calculations (traditional SAR usually would include a lot more data points)?
A9: No, if we carefully consider the diversity at virtual screening. If I rephrase this question, it becomes “Can we explore appropriate chemical space?”. Not to lose this possibility, we do not depend on AI-optimized routes. We add diversity perturbation manually to explore more routes.
Q10: As for synthesizability, papers/products are using NN to find reaction pathways. Are they good enough to determine synthesizability?
A10: It is challenging but we are trying to systematically estimate the synthesizability. For example, we estimate via synthetic feasibility scores(RASCORE, RASTEP2) to estimate the complexity of the predicted synthetic route. For more information, please attend MTX webinar#4 "Steps Towards Practical Automated Synthesis" by Dr.Nathan Allen on September 7, 2022.
Q11: What is the common source of these errors in the XFEP simulation or AI predictive models? How did you track the issues?
A11: Errors could arise from both XFEP and AI predictive models. The errors of XFEP could be from intrinsic properties of protein targets such as multiple conformational states of specific proteins or the quality of the structures, etc. The errors of XFEP are reduced in the verification procedure using public data. Using available SAR data, representative pose, and energy states of drug-protein interaction were carefully investigated and verified.
The accuracy of the AI prediction model depends on the quality of the data, size of the data, choice of molecule representation methods, and ML methods used for optimization. To enhance the quality of the data, we curate the data to get more harmonized ones. If the size of the data matters, we can get more data from our in-house facilities. The choice of fingerprint and optimization ML method depends on the properties of the system. We find the appropriate fingerprint and ML methods by exploring multiple methods.
Q12: Is it possible that AI virtual screening can have better accuracy than XFEP and if this happens in the future, how can we explain these AI inferences?
A12: AI virtual screening is not an exclusive concept with XFEP. AI virtual screening includes two folds:
① Using physics-based methods as scoring function but using AI active-learning protocol to accelerate the whole process. In this case, AI does not have better accuracy but aims to greatly increase the throughput.
② Using AI models as the scoring function. In LBDD, XFEP is not applicable, so QSAR would be a very important scoring method. In SBDD, AI models are built by using existing protein-ligand binding affinity data as the pre-training dataset and use project-specific data as the finetuning training set. However, we usually do not have a large amount of data, so currently XFEP is still more accurate than AI scoring functions. AI models are more likely to be used as a coarse screening before XFEP.
In the future, AI could exceed the physical model such as XFEP overall if more data is available. However, it might not be what we could see from the real pipelines, since interesting targets are mostly under-explored targets with very little affinity data than well-explored targets with enough affinity data. But AI can use virtual data from FEP so they can more quickly explore chemical space similar to the confined space explored by FEP to find better chemical designs.
Q13: Which part uses AI in the SBDD section (before slide 16)? They seem to be all physics-based methods.
A13: FEP ML model was trained with FEP scores and applied to AI molecular generation by leading to potently predicted compounds.
Q14: What is your pricing model for drug discovery service?
A14: For further details on pricing and service-related questions, please reach out to bd@xtalpi.com.