Fundamentals of Drug Discovery
This 2-week webinar series is for busy pharmaceutical professionals to discuss challenges and advances made in the field of drug discovery. Each webinar will feature a leading expert who will share their real-world experience in supporting discovery chemistry workflows. This series will bring together experts and like-minded chemists to discuss the journey of a molecule from “Hit Identification” to “Lead Optimization” and the factors critical to the success in delivering a suitable “Pre-Clinical Candidate”.
Guests Specially Invited
The Power of Aria’s Symphony™ AI Platform
11:00 am – Noon EDT
Dr. Aaron C. Daugherty
As the Vice President of Discovery and one of Aria Pharmaceuticals’ first employees, Aaron Daugherty has built Aria’s AI drug discovery platform Symphony and leads the Discovery team’s efforts to discover potential treatments across a wide range of diseases. Aaron earned his Ph.D. in Genetics from Stanford University and was a Fulbright Scholar.
Aria’s Symphony™ AI platform can save years from project initiation to in-vivo results while generating a 30x hit rate at those milestones compared to traditional methods. Aria also mitigates clinical drug development risks with traceable and rationalized predictions to help its researchers evaluate both Phase 1 safety and Phase 2 efficacy for potential drug candidates.
What You Will Learn
Understand how Symphony is differentiated from other AI platforms in identifying hits for 1000+ diseases.
Learn how Aria selects hits with the best chance of preclinical and clinical success.
Gain insight into the latest results for disease programs in systemic lupus erythematosus (SLE), idiopathic pulmonary fibrosis (IPF), chronic kidney disease (CKD), and nonalcoholic steatohepatitis (NASH).
Q1: Is the Aria platform based on Graph neural nets or other forms of deep learning?
A1: No, it is not. In fact, Aria’s proprietary AI platform, SymphonyTM was specifically built to address some of the limitations of standard deep learning (DL) models, of which graph neural networks are an example. The three prime limitations of standard DL models that SymphonyTM addresses are:
The requirement for homogeneous or unimodal input data, this limits the possible understanding of disease biology; SymphonyTM is unique in its ability to analyze heterogeneous and unrelated data in their native formats.
The need for very large amounts of input data, which limits which diseases you can readily model. We’ve carefully ensured SymphonyTM can make confident predictions with realistic amounts of data; in fact, we estimate we can work in over 1,000 diseases as of today.
The lack of interpretable predictions, a particular concern for AI in biology tasks where interpretable predictions can impact downstream drug development. All predictions coming out of SymphonyTM are entirely interpretable allowing us to know exactly what pieces of evidence led to the identification of a hit.
Q2: Does Aria include a design of experiments module to guide experimentation that feeds into AI analytics?
A2: There are two aspects to which our technology could address experiments.
The first is adding input data for our disease-specific models. This is something we can do today because we can readily identify which data modalities are lacking for a given disease. That said, it is something Aria has strategically decided not to do. Because of the wide swath of diseases our technology can work on, it does not economically make sense to pay for generating more data in disease A when we could instead turn to disease B immediately.
The second aspect in which our technology can help guide experimentation is preclinical testing of the hits SymphonyTM identifies. We know that any preclinical model is, at best, a partial representation of the human disease and so there is always a risk that a hit that could be efficacious in treating human disease erroneously fails a preclinical model. To address that, our team can first examine the interpretable predictions coming fromSymphonyTM to understand how that hits’ mechanism will be disease modifying and compare that to how the preclinical model represents the disease to ensure there is overlap. In more recent work, we have also experimented with repurposing our AI technology to build an in silico model of an animal model. We can then compare and contrast the in silico model we built for the human disease to the one built for the animal model and determine where there is overlap or not. This can be very helpful in selecting the best animal model, specifically for our hits.
Q3: What prevents you from finishing all 18 pipelines in 12 weeks?
A3: First a clarification on where we typically get after 12 weeks or so. This encompasses all steps of our initial discovery process which results in the identification of which molecules to preclinically test. These are the steps where our AI technology replaces traditional approaches, and it typically takes a month to complete. After this we turn to more traditional methods starting with preclinical testing. In some instances, for example our work in NASH, an in vitro model is an appropriate first screening step. Regardless, however we eventually turn to in vivo models, and it’s the completion of those studies and our in silico work that typically takes us through 12 weeks. At that point, or after some targeted follow up experiments, we identify our lead molecule. We then turn to medicinal chemistry to optimize our initial hit and generate our own chemistry IP before heading toward an IND and ultimately the clinic. This latter process can be variable but takes on the scale of 1.5 to 2 years.
As a quick aside, if you were instead asking about why we don’t process all 18 (or even more) programs at once, that all comes down to economics. Currently, using a virtual pharma model and our computational discovery process, our core team of less than 18 people can actively work on 18 programs, including 3 progressing toward the clinic. Most companies of this size could only work on one, potentially two programs at most, with the same resources. In Aria’s case, we have advanced 3 programs into optimization.
Q4: Why not put weed out the compounds with safety concerns and poorly predicted ADME as the first step? In other words, why screen 50 million compounds when that could be reduced to 30 million? (2 questions like this)
A4: Great question and a topic I didn’t have time to address in the webinar. Behind our innovative AI is a great deal of data engineering. One small aspect of that data engineering is the processing pipeline we put all our molecules through. A full description is out of scope for here but suffice to say that we do in fact weed out compounds before building our disease-specific models. As of today, our compound library has over 2 million compounds, but with the data vendors we work with, we could easily pull in tens of millions of compounds. We instead triage those tens of millions of compounds down to the ~2 million which are sufficiently well characterized for our AI technology to make a confident prediction.
We could of course continue to triage further, but to maximize our potential for success, we want to maintain a diversity of chemical structures and mechanisms in our compound library. Doing so increases our chances to discover novel biology even if that discovery comes via a molecule that has, for example, insufficient ADME properties.
In the case where SymphonyTM identifies a hit that shows promising efficacy, but that hit has other limitations, we have built a discovery process to address these issues. This is dramatically aided by the fact that all predictions from our AI are entirely interpretable. What this means in practice, is that we can identify what is driving a molecules’ predicted efficacy and then determine if there are other molecules, even molecules outside of our compound library that replicate that mechanism.
Q5: You've mentioned that one of your unique strengths is using purely patient data, why do other people use data from animal models if enough patient data exists?
A5: This all comes down to our core technology strength: our ability to analyze completely unrelated multimodal data in their native formats. Even in the space of multimodal data analysis this is a unique ability. Other multimodal approaches require data to either share some relationship (e.g., all measurements come from the same patients), or the different data types are analyzed separately and the results are overlapped at the very end. Because we are looking at so many different types of data raw data simultaneously, any single type of data does not need to be as crisp or abundant as you would need if you were relying on that data in isolation. Put another way, to have success, other approaches often need disease models to generate sufficient quantities of clean, homogenous data. Aria’s solution instead has been to build proprietary technology to make use of reasonable amounts of heterogeneous data.
Q6: You mention you process heterogeneous data types in a way that is different than others because you look at multiple puzzle pieces simultaneously. Can you talk a little more about the impact of taking a multimodal approach? Maybe an example?
A6: We built SymphonyTM to analyze unrelated multimodal data because looking at so many different data types simultaneously provides us with the most thorough understanding of disease biology possible. In effect, our AI can identify connections between distinct types of unrelated data that allows us to then distinguish aspects of disease biology that you could not otherwise see. The impact of this is best shown with an example, the case of our backup hit in lupus, TXR-712. Across all the individual data sources and analysis methods we used, TXR-712 was ranked too low to identify it – it was indistinguishable from thousands of other molecules. However, when our technology brought all of those different data sources and methods together into a single ensemble model, TXR-712 rose right up to the top of our predictions. A result we’ve since confirmed by showing TXR-712 is significantly disease modifying in a preclinical model of SLE.
Q7: On slide 8 you estimate there are 1000+ diseases your platform can work in, are there any areas where you’ve seen more success or less success? Are there diseases you don’t work in?
A7: Our technology is largely disease agnostic. We need to have sufficient amounts of data – which we can measure before we even begin a program, and we need a well-defined disease. As a third requirement we’ve taken the strategic decision to work only in disease areas where a lack of disease understanding is the limiting step. This leaves many disease areas where we could work, but we find that our biggest advantage is in particularly complex diseases like autoimmune, inflammatory, or metabolic diseases where understanding all the nuances and details of disease biology is extremely challenging for the human mind.
Q8: You showed your platform could achieve 80% success at Phase 2 milestones, can you explain how you determined that value?
A8: At Aria we have always used our ability to rediscover previously investigated treatments as a measure of the quality of our disease-specific models. While we’re looking for novel treatments for a given disease, we know that if we see previously investigated treatments right alongside our potential hits, we have a predictive model.
More recently however we decided to start sharing those results as part of an extensive retrospective study. To do so we built disease-specific models of over 30 diseases and made carefully blinded predictions on more than 420 previously completed Phase 2 trials. These are the exact same models we would use to discover novel treatments, but rather than looking to find new treatments, we examined the molecules other companies had put into the clinic. Specifically, we looked to see what portion of the molecules that we predicted would be efficacious in their disease ended up transitioning to Phase 3. The answer was over 80%.
Q9: Aria currently only works on small molecules. Can your AI technology work with other drug modalities?
A9: Yes, the technology is agonistic to drug modality or where the starting compound library comes from. SymphonyTM could work with large molecules like biologics, we could examine a partners’ compound library, we could even make predictions on virtual libraries of never-before made compounds. We have decided to focus on small molecules for the time being as we see a large opportunity for orally bio-available treatments for the complex and hard to treat diseases we are pursuing.
Q10: You mentioned that your predictions are interpretable. Can you talk about why that is important in drug discovery?
A10: In the AI for biology space, I would argue that an interpretable prediction is mandatory for an optimal outcome. By knowing not just if a molecule is likely to be efficacious, but how it’s likely to be efficacious you can dramatically improve your downstream drug development. You’ll be able to select models that better reflect the biology your mechanism is modifying.
You’ll be better able to modify that molecule because you know which of its characteristics are important for its efficacy. And even in the clinic, by knowing how your molecule is disease modifying, you can identify biomarkers to help with patient selection, dramatically increasing your chances of success.
Q11: On slide 11 you talked about how in your discovery process you triage molecules really rapidly. Can you talk a little about how you're able to do that?
A11: This is primarily about the unique discovery process we’ve built to harness our AI technology. The way this works is that a research scientist uses our web-based user interface (UI) to build a disease-specific model. This largely includes targeted and AI- assisted data annotation. The concept is to have scientists make select key decisions and then allow software to perform the heavy lifting of analyzing and integrating data. This whole process usually takes one full time scientist 1-2 working days. The output is an efficacy score for every molecule in our compound library and the identification of the molecules most likely to be efficacious.
Next we turn to identifying molecules that are not just efficacious but are likely to be good drugs. From the top 2 to 3 thousand molecules and using additional AI on Symphony’sTM UI, a single scientist then filters out molecules that are non-novel, duplicative with other hits, or clearly not safe for the disease. This typically takes an hour or two and results in roughly 90 molecules that are then examined in a more manual diligence process. During that diligence process we further investigate safety and ADME properties, but primarily dig into how a molecule is likely to be disease modifying. This is only possible because our predictions are entirely interpretable. We can quantify which targets, algorithms, and input data led to a hit, and manually verify the veracity of every key data input. The output of this process is typically 10 unique and novel molecules for preclinical testing and is complete about 4 weeks after started a disease program.
Q12: Can you talk more about how your AI platform is put to use in practice? Are scientists able to interact with it or do you need to be able to write software to make use of technology?
A12: I talked a little about this in the preceding question, but we made the decision early on that we wanted drug discovery researchers using our AI in their hands. Because of that SymphonyTM is exclusively used via our web-based user interface, and for a researcher to create a new disease model we do not need to write any code. I like to say if you’re a drug discovery researcher who can make a slide deck, you can use SymphonyTM to build a disease model.
Q13: Can you provide some background or history on how the technology was built?
A13: We started on the technology that would become SymphonyTM when the company was founded in 2014. In the intervening 8 years, we have iteratively built SymphonyTM up in over 340 versions of our software. Briefly, our process is a weekly cycle of using data-driven analysis to identify areas for improvement, building solutions (e.g., adding a new data modality, improving an algorithm, or creating a plot to inspect output), and quickly getting those out to researchers to experiment with the new and improved platform, before beginning the cycle again.
Chemistry in the Fast Lane: The Application of High Throughput Chemistry Technologies for Med Chem Research
11:00 am – Noon EDT
Dr. Ying Wang
Head of Advanced Chemistry Technologies Group
Ying Wang has more than 18 years of pharmaceutical industry experience. She had successfully co-led the multimillion-dollar File Enhancement initiative at Abbott from 2006 to 2011. She had worked on/led lead generation and optimization medicinal chemistry programs. Since 2014, she heads up the Advanced Chemistry Technologies group at AbbVie. The group is industry leading in the application of high throughput library synthesis in drug discovery. She had led and implemented high throughput experimentation function in 2016 and various chemistry technology platforms within AbbVie Discovery. She is the leading/corresponding author for more than 35 publications and is a key contributor on several patents.
The speed and efficiency of small molecule candidate discovery relies heavily on the hit identification and lead optimization through structure-activity relationship studies. The ability to generate novel bioactive analogs fast and effectively is of utter importance for any medicinal chemistry program. In this context, the Advanced Chemistry Technologies (ACT) group at AbbVie utilizes high throughput chemistry technologies that have been developed and implemented internally to enable and facilitate drug discovery Med Chem research across AbbVie.
What You Will Learn
Library processes and library chemistry development at AbbVie.
The impact of high throughput chemistry technologies on Med Chem research for the last two decades at AbbVie.
Novel ChemBeads technology to enable micromole scale high throughput experimentation.
Q1: Can you comment on the scope of the data captured in your library dataset (slide 4)?
A1: We capture the structures of the common scaffolds, building blocks and products. All the synthesis data from reaction conditions, reagents, and isolated yields were captured. All analytical data and registration data including crude LC/MS traces, purification information, mass and NMR characterizations are captured as well.
Q2: Can you elaboration more on the trends you observed with Suzuki catalysts change over the years?
A2: Polymer supported catalysts were used predominantly in the old days because of the ease of operation and the perception it helped the reverse phase purification process. As the structures of the substrates get more complex and the increased availability of diverse boronic acids/esters, we switched to homogenous catalysts because of their higher reactivity. This also became feasible with advancements in library purification techniques. The establishment of the HTE function made it possible to screen very diverse catalysts for optimal conditions and that is reflected by the diverse catalysts used in libraries in recent years.
Q3: Any thoughts on how we can improve the C-N coupling success rates you mentioned on slide 6?
A3: The continued adoption of HTE technology, the ideal ways to capture, store and analyze the datasets thus generated, and close collaboration with academic groups on developing novel methods to address unmet needs on challenging substrate classes.
Q4: Can you give more details on why Negishi is picked over Kumada and Stille coupling (slide 7)?
A4: The broader commercial availability of organozinc reagents and the relative ease of synthesizing the organozinc reagents in-situ, the less stringent requirement on reaction conditions (a factor to consider for library synthesis), and the milder reaction conditions compared to the other two couplings.
Q5: Have you investigated the deoxygenative photoredox coupling method published recently by McMillian’s lab (slide 8)?
A5: Yes we did. It was published after we completed the study. It is also a very promising transformation for med chem SAR studies. Currently it is one of our go-to methods for enabling aryl-alkyl coupling library synthesis.
Q6: You alluded to the differences between HTE for process chemistry and for medicinal chemistry research. Can you elaborate more on the difference (slide 14)?
A6: First, there’s scale difference. Medicinal chemists generally have much less material for HTE, often at least ten times less. We only ask for < 100>
Secondly, there’s timeline difference. The fast-paced med chem programs means med chemists need to synthesize the analogs quick to test a hypothesis. We strive for less than 48-hour turnaround time from initiation of the screen to providing three top conditions, if successful.
Lastly, the endpoint is different. Our screens give a “hit”, a condition that is good enough to get the desired product to be tested on med chem program. On the contrary, most of the time, HTE in Process is designed to find the most optimal condition.
Q7: Do you always see good correlation between HTE and scaleup (slide 17)?
A7: Most of the time. In general, the lack of correlation came from modification of the reaction condition identified by the HTE screen by the chemists, such as changing the base or solvent used.
Q8: Have you or do you plan to augment your datasets with literature datasets (slide 18)?
A8: Not at present. Our library and HTE datasets are inherent consistent to a greater extent with high integrity data, maintain a good balance of positive and negative data points, and of high relevance to med chem research. This is mostly not the case with literature dataset.