Learn from XtalPi's in-house experts
The 4-week "Meet the Xpert" webinar series will provide an opportunity for attendees to engage with XtlaPi’s in-house subject matter experts who will share their real-world experience in supporting drug development workflows. This series will bring together scientists to discuss the challenges and advances made in MicroED to solve crystal structures, AI in Drug Discovery, DNA-Enabled Library, and Innovation in Automated Synthesis.
Meet the Speakers
MicroED And Its Applications in Solid State Chemistry
Aug 24, 2022
11 am – Noon EDT
Dr. Xiang Liu
Xiang Liu received his Ph.D. in Chemistry in Drexel University in 2016. His research was focused on the crystal structure of semi-conductors and has many years of experience in solid-state physics. He joined XtalPi in 2019 as a scientist in the MicroED team focusing on structure determination and technology development. He has successfully solved more than 100 MicroED cases and has actively engaged in developing software for MicroED data collection and analysis
Micro-crystal electron diffraction (MicroED) is an emerging technology and has been successfully applied in solving the crystal structures of various systems, such as small molecules, proteins, metal-organic framework, and covalent-organic framework, etc. It has shown great value in drug discovery industry. MicroED is capable of determining the crystal structures directly from powder samples without growing large single crystals. This can help us solve the samples which could not be solved by the single-crystal XRD (SCXRD) due to the difficulty in growing large single crystals. In this presentation, we will briefly introduce this technology and its applications in pharmaceutical industry.
Q1: Can we perform MicroED experiments at temperatures higher than cryo temperatures?
A1: We cannot control the exact temperature during the experiment, and we can only perform a MicroED experiment at cryo-temperature (CT) or room-temperature (RT). There are some reports of samples that can be damaged by the freezing process and have better diffractions at RT. However, almost every sample we have tested so far has given significantly better results after being frozen.
Q2: What is the thickness of the electron beam?
A2: The diameter of the electron beam is adjustable. The smallest diameter beam is ~700 nm. The diameter should be larger than the crystal particle, so that the whole particle stays inside the beam during the diffraction experiment. However, the diameter should not be too large, otherwise too much background noise would be introduced from the surroundings.
Q3: What thickness is desirable for the crystalline samples?
A3: MicroED does not work well if the crystal thickness is > 500 nm.
Q4: What would be the absolute minimum of material that you would need (provided that the crystallinity and purity is ok)?
A4: 10mg is the absolute minimum for a project, but 50mg is much better. If the sample crystal can be grown directly on the sample grid, then we will not need any extra sample.
Q5: Does your project report include non-C-bound H-atoms located from diffraction maps, or is that unrealistic for MicroED?
A5: N-bound H-atoms can be located in many cases, but O-bound H-atoms can only be located in very few cases (less than 5 cases we have studied so far). It is possible from MicroED but is not guaranteed.
Q6: What is the lower OEL limit that you can work with?
A6: We can work with >100 ug per m^3, but it will mainly depend on the handling procedure. For example, if the handling procedure requires independent venting systems for the compound, it will be hard for us to handle it.
Q7: What is meant by grain thickness?
A7: If we think of the TEM grid as a plane, the sample particle is like a plate lying on the grid plane. The grain thickness is the thickness of the crystal perpendicular to the grid. If the sample is too thick, the electrons cannot penetrate. The thickness should be less than 500 nm.
Q8: Can MicroED resolve crystallized peptides and oligonucleotide?
A8: Theoretically this is possible. We are open to collaborate to test this service model further. The structure of macromolecules such as peptides and proteins can be solved by molecular replacement methods.
Q9: Does the remdesivir CSP use the microED resolved ASU as input or is it fully independent of microED result?
A9: Fully independent CSP search. For more details:
How Artificial Intelligence Enhances Drug Discovery?
Aug 31, 2022
11 am – Noon EDT
Dr. Sang Eun Jee
Sang Eun Jee is an application scientist in the business development department and has been with XtalPi Inc. since 2022. She is responsible for Computer Aided Drug Discovery (CADD) at the preclinical stage.
Her expertise is in predicting drug-protein interaction, structural changes of proteins, and virtual screening. She received her Ph.D. in the chemical & biomolecular engineering school at Georgia Tech and completed two postdoctoral training at Georgia Tech and Washington University in Saint Louis. Before joining XtalPi, she worked in Humanwell Pharmaceuticals Inc. in the United States and worked in drug discovery.
Drug discovery is time-consuming and expensive, largely owing to the substantial costs and long development period. It takes on average 2.6 billion dollars and over 10 years to bring a new drug to market. Artificial Intelligence (AI) is becoming more widely adopted in the pharmaceutical industry to enhance drug discovery efficiency. At the preclinical stage, AI helps to reduce the development time and cost significantly by removing the compounds with undesirable potency before the synthesis and biological test. More importantly, AI can suggest a new compound possibly with desirable potency, selectivity, and PK/PD properties by virtually exploring the expanded chemical space. Recent advancement in AI drug discovery has been driven by the acceleration of the computing rates via General Purpose computing on GPU(GPGPU) and steeply increasing database size along with the advancement of neural network technology. The key to success in AI-driven drug discovery depends on how well it is integrated into the biology and chemistry teams of the drug discovery. Well-immersed AI technology in the drug discovery cycle could shape the future landscape of drug discovery.
Q1: How do you use AI to generate molecules?
A1: In SBDD approaches, AI generates the molecules from the binding pocket of the protein structure. AI extracts spatial and pharmacophore features from the binding pocket as vectors and then uses neural networks to generate molecules.
In LBDD approaches, AI generation algorithms incorporate 1D, 2D, or 3D molecular features of the reference compounds. These representations are fed into multiple neural networks, generating novel molecules with desired constraints (fix-scaffold or scaffold-hopping). Goal-directed generations are enabled by using reinforcement learning and chemical space exploration strategies.
Q2: When AI generates molecules, does it take into consideration the synthetic feasibility routes and/or options such as a traditional med chemist would do?
A2: It is not possible yet. Currently, we are generating the molecules first without considering the synthetic routes and estimating the synthetic feasibility later by synthetic feasibility score and the analysis by medicinal chemists.
We are trying to explore the synthetic route by combining AI and robotics automation. For more details, we ask you to attend the webinar on September 7 by Dr. Nathan Allen.
Q3: How many different ADMET models do you have for the lead optimization stage? Can these models apply to all chemical matters? How accurate can the models be?
A3: We can currently predict solubility, lipid bilayer permeability, MDCK cell permeability, Caco-2 cell permeability, liver microsome stability, site of metabolism and CYP inhibition, and hERG toxicity with our ADMET models.
No. Different chemical modalities require different models. For example, the solubility prediction of peptide molecules by the AI model is not accurate because peptides can have multiple conformations.
We got r2 of 0.7~0.8 from quantitative validations of predictions. In the case of predicting sites for modification such as we did in hERG toxicity prediction, we could successfully enhance the properties by increasing IC50 5~10 folds.
Q4: Could you use AIDD to design different modalities like PROTAC or peptide-based drugs?
A4: Yes. We can use AIDD in both cases. Regarding the peptide-based drug, we completed one project and successfully finished the lead generation. We published a paper on how to estimate the binding affinity in protein-protein interaction. (https://pubs.acs.org/doi/10.1021/acs.jcim.0c00679 , Zou, J. et al., J Chem Inf Model, 60, 5794–5802 (2020)) Regarding PROTAC, we have an ongoing project.
Q5: What are the best and worst classes of targets to investigate using AIDD?
A5: It differs based on what approaches we use between SBDD and LBDD. When we use the SBDD approach, it is best when we can specify the conformation of the protein structure and when the downstream mechanism is clear like in most kinase proteins. Likewise, it is worst to use SBDD when the downstream is unclear and protein has multiple states of conformation like in the case of many ion channels. When we use the LBDD approach, it depends on the quality and quantity of data. The data should have enough points and should be harmonized well to generate an accurate QSAR correlation.
Q6: How much cost and time are saved when using AIDD compared to not using AIDD?
A6: While it is notoriously difficult to measure discovery timelines on average, our experience can suggest AI is likely to reduce the discovery period. For fast-follow projects, AIDD could save about 50% time of the preclinical stage, which is 5.5 years on average. In one of our collaborations with biotech, it took 7 months for us in case study 2 from starting to filing patents. For first-in-class projects requiring in vitro and in vivo model development, we expect years of discovery time. Costs would differ from projects and be estimated based on the timeline as well.
Q7: Are you confident in predicting poses for all protein targets with Xpose?
A7: Xpose is quite useful in most of the other targets such as kinase, GPCR, etc. but not for all proteins. If the binding pocket is large or if the protein could experience significant conformational changes such as what happens in an ion channel, it is hard to predict the binding pose even with Xpose.
Q8:What is the accuracy of XFEP ?
A8: XFEP errors are within 1 kcal/mol corresponding to the 7~8 folds of error in Ki value, which could be useful to predict potent compounds for Ki in nM ranges.For more details, please check the paper from XtalPi. ( https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c01329 Lin, Z. et al. J Chem Inf Model 61, 2720–2732 (2021))
Q9: Are we going to lose some good compounds if we were only aiming to make the top few compounds predicted by calculations (traditional SAR usually would include a lot more data points)?
A9: No, if we carefully consider the diversity at virtual screening. If I rephrase this question, it becomes “Can we explore appropriate chemical space?”. Not to lose this possibility, we do not depend on AI-optimized routes. We add diversity perturbation manually to explore more routes.
Q10: As for synthesizability, papers/products are using NN to find reaction pathways. Are they good enough to determine synthesizability?
A10: It is challenging but we are trying to systematically estimate the synthesizability. For example, we estimate via synthetic feasibility scores(RASCORE, RASTEP2) to estimate the complexity of the predicted synthetic route. For more information, please attend MTX webinar#4 "Steps Towards Practical Automated Synthesis" by Dr.Nathan Allen on September 7, 2022.
Q11: What is the common source of these errors in the XFEP simulation or AI predictive models? How did you track the issues?
A11: Errors could arise from both XFEP and AI predictive models. The errors of XFEP could be from intrinsic properties of protein targets such as multiple conformational states of specific proteins or the quality of the structures, etc. The errors of XFEP are reduced in the verification procedure using public data. Using available SAR data, representative pose, and energy states of drug-protein interaction were carefully investigated and verified.
The accuracy of the AI prediction model depends on the quality of the data, size of the data, choice of molecule representation methods, and ML methods used for optimization. To enhance the quality of the data, we curate the data to get more harmonized ones. If the size of the data matters, we can get more data from our in-house facilities. The choice of fingerprint and optimization ML method depends on the properties of the system. We find the appropriate fingerprint and ML methods by exploring multiple methods.
Q12: Is it possible that AI virtual screening can have better accuracy than XFEP and if this happens in the future, how can we explain these AI inferences?
A12: AI virtual screening is not an exclusive concept with XFEP. AI virtual screening includes two folds:
① Using physics-based methods as scoring function but using AI active-learning protocol to accelerate the whole process. In this case, AI does not have better accuracy but aims to greatly increase the throughput.
② Using AI models as the scoring function. In LBDD, XFEP is not applicable, so QSAR would be a very important scoring method. In SBDD, AI models are built by using existing protein-ligand binding affinity data as the pre-training dataset and use project-specific data as the finetuning training set. However, we usually do not have a large amount of data, so currently XFEP is still more accurate than AI scoring functions. AI models are more likely to be used as a coarse screening before XFEP.
In the future, AI could exceed the physical model such as XFEP overall if more data is available. However, it might not be what we could see from the real pipelines, since interesting targets are mostly under-explored targets with very little affinity data than well-explored targets with enough affinity data. But AI can use virtual data from FEP so they can more quickly explore chemical space similar to the confined space explored by FEP to find better chemical designs.
Q13: Which part uses AI in the SBDD section (before slide 16)? They seem to be all physics-based methods.
A13: FEP ML model was trained with FEP scores and applied to AI molecular generation by leading to potently predicted compounds.
Q14: What is your pricing model for drug discovery service?
A14: For further details on pricing and service-related questions, please reach out to email@example.com.
From DEL to X-DEL, Know - How and Beyond
Sept 7, 2022
11:00 AM – Noon EDT
Dr. Bing Xia
Dr. Bing XIA received his B.S. in chemistry from the Lanzhou University. He completed his Ph. D. graduate work at the Boston University with the widely recognized photochemist, Guilford Jones (pathogen detection, dipolar [3+2] photocycloadditions via ESIPT in collaboration with Prof. John Porco, natural product total synthesis, and charge transfer in biomimetic and bioinspired systems). In 2008, Bing joined Encoded Library Technologies (ELTs) at GSK/Boston (Principal Scientist, Investigator, then GSK Associate Fellow) where he has played a key role in a variety of projects, IDO1 and PDE12 for examples, across wide range of therapeutic areas. Bing then focused on development of on-DNA chemistry, design and synthesis of DNA encoded library (DEL), and High Throughput Binder Confirmation (HTBC). While with GSK, Bing won 3 Exceptional Science Awards, 2 Cool Chemistry Finalists, 5 Silver awards and numerous Bronze Awards. Now Bing is a Vice President in XtalPi, responsible for setting up two DEL R & D centers, and leading the company’s DEL-AI platform. Bing is an active member in the community, he served as an organizing committee member at 8th International Symposium on DNA-Encoded Chemical Libraries, and an organizing committee member at GSK 2019 Global Chemistry Conference; he published more than 20 articles, book chapters, and patents in various fields in esteemed journals, such as Nature Comm., JACS, J. Med. Chem.; He also reviewed over 100 manuscripts for 19 journals, Chem. Comm., JOC, Bioorg. Med. Chem. for example.
In this presentation, the traditional DNA encoded library (DEL) will be reviewed, followed by an introduction of XtalPi’s new, AI Automation powered version: DEL. The nonconventional application of DEL (X-DEL) out of the classical Hit Identification areas will also be briefly discussed.
Q1: “HTS or DEL, side reactions can occur, so the molecule you think hit the target may not be the molecule of interest.” How do you deal with false +ve results in such situation?
A1: There are a number of ways to deal with this situation:
1) While conducting off-DNA resynthesis, we intensionally included the known by-products and cycle-missing products, and then submitted them for testing.
2) We also do an on-DNA resynthesis followed by ASMS evaluation of the on-DNA compounds or the small molecule mixture released by a photo-cleavable linker (I published an article in ACS med chem letter* regarding this). As you may know, the DNA actually encodes the recipe of the DEL manufacturing.
*DNA-Encoded Library Hit Confirmation: Bridging the Gap Between On-DNA and Off-DNA Chemistry
1. Bing Xia, G. Joseph Franklin, Xiaojie Lu, Katie L. Bedard, LaShadric C. Grady, Jennifer D. Summerfield, Eric X. Shi, Bryan W. King, Kenneth E. Lind, Cynthia Chiu, Eleanor Watts, Vera Bodmer, Xiaopeng Bai, and Lisa A. Marcaurelle
2. ACS Medicinal Chemistry Letters 2021 12 (7), 1166-1172
3. DOI: 10.1021/acsmedchemlett.1c00156
Q2: It is sometimes difficult to determined what constitutes a derived compound stemming from original hit.” What would be your strategy?
A2: This question is very similar to the question mentioned above about on-DNA follow-up strategy plus ASMS triage of the obtained molecules before traditional PSC should address this issue nicely.
Q3: DEL screening is a difficult task. An expert could perform two identical screens on the same target and get different hits.
A3: Yes, in the early days, when there was too much information and sifting through large sets of data often resulted in informational bias and having a direct impact on decision making. To augment better decision making, we have integrated AI powered algorithm to get rid of bias.
Q4: What is the relationship between HTS and DEL, in which order should we employ them in?
A4: Their relationship are complementary and orthogonal. To carry out which campaign first is case-dependent . However, with limited resources and some novel targets that we don't know much about, running DEL selection would be a suitable option.
Q5: What’s your opinion of more and more DEL providers emerged?
A5: This is good news in general, meaning the technology is widely accepted, more mature. From something good to have, to something must to have. Nowadays, to my knowledge, many pharma or biotechs have either a small team or a point person running DEL.
Q6: Is one type of target more DEL friendly than others?
A6: Yes, some are more friendly than others, kinase for instance. But it is proved that DEL can screen a broad range of targets, including challenge ones such as PPI, membrane proteins and GPCR. We also found that the purer the target the better the performance.
Q7: How much protein is needed to run a selection?
A7: This is a major advantage of the DEL, only small amounts protein is needed, 5 ug per condition, less than 1 mg for the whole project, from selection all the way to the assays for off-DNA compounds confirmation.
Q8: Can you customize DNA-Encoded Library and what level of customization can provide?
A8:Yes we do, we will sit down with our customers and carefully go over the targets and design a specific DEL for our customer using our DEL expertise and customers deep understanding of the target.
Q9: How many selections can I run for a DEL I custom synthesized through you guys.
A9: Depending on the amount of DEL synthesized for the customer, but normally the DEL will last a long time and be sufficient for more than a hundred selections.
Q10: If I have multiple targets, but don’t know which one to follow up with a limited resource.
A10: This is exactly the scenario we'd like to recommend to run a quick DEL screening, especially using XtalPi's DEL design for this purpose, the selection data will guide you to the most promising one.
Q11: Do you test the AI selected synthons for reactivity prior to inclusion in a DEL production? (validation)
A11: Yes, we will include those BBs into validation set, and decide to keep them or not using the reaction yield with a model reaction.
Q12: How many libraries have XtalPi designed using AI? Can you comment on the diversity at each cycle, were you able to use AI to reduce the diversity significantly?
A12: We use AI to help design and collect BBs for most libraries one way or another, but at the current stage, we are not aiming to reduce the BB used in each cycle yet, we are aiming to include more novel, AI-recommended BBs first.
Q13 What is your pricing model for drug discovery service?
A13: For further details on pricing and service-related questions, please reach out to firstname.lastname@example.org.
Steps Towards Practical Automated Synthesis
Sept 14, 2022
11:00 AM – Noon EDT
Dr. Nathan Allen
Director of BD
After finishing his Ph.D. in organic chemistry at UC-Irvine under Prof. W.J. Evans, Dr. Allen started his industrial career at the Rohm and Haas Company, which later became part of the Dow Chemical Company working in various synthetic chemistry roles from sodium borohydride synthesis to macromolecule synthesis. He then moved to Millipore-Sigma as Head of New Product R & D for organic chemistry, followed by a role as Principal Scientist at Ascensus Specialties. He is now Director of Business Development for XtalPi, covering Synthetic and Automated Chemistry.
Automated synthesis of drug discovery targets has been a long standing goal in the life science industry for many years with varying levels of success. The strategies and challenges of developing a pragmatic automated synthesis lab for commercial CRO purposes will be discussed, as well as future directions based on the data currently being generated.