[00:00:16] Speaker A: Welcome to the few and far between podcast. I'm your host, Chris O'Brien. Microsoft Corporate Vice President Peter Lee famously said that AI is both smarter and dumber than any person you've ever met. In today's episode, we're going to find out exactly how smart AI is now and how integral it is becoming in the areas of healthcare and clinical research.
[00:00:36] Speaker B: AI.
[00:00:37] Speaker A: And its associated large language models such as GPT, are changing the industry in real time and becoming a more effective solution for both physicians and patients. Today's guest is going to spotlight the latest applications of these AI models, realizing efficiency, solving diagnostic challenges, and accelerating the clinical trial process. Zach Kohane is the professor and chair of the Department of Biomedical Informatics and professor of Pediatrics and Health Sciences and Technology at Harvard Medical School. He is also the inaugural editor in chief of the New England Journal of Medicine's AI Publication. Zach's insight into AI's past, present, and future has been integral to moving innovation towards practical usage in science and medicine. He and I had a chance to discuss where AI is now and where it's headed. From improving a physician's bedside manner to more agile patient assessments, from what primary care will look like in four to five years, to the promise of AI in drug discovery. We'll also talk about exactly how we should be engaging with, balancing and regulating AI. This is a must listen episode for anyone trying to separate the hype from the reality of AI in this time of explosive change in the healthcare industry. Okay, let's start the podcast.
Professor Zak Kohane. You are the chair of the Department of Biomedical Informatics and the Marion V. Nelson professor of Biomedical Informatics at Harvard Medical School. I am delighted to be having this conversation with you. Welcome to few and far between I'm.
[00:02:15] Speaker B: Really glad to join. Thank you for your patience in arranging this.
[00:02:19] Speaker A: Well, part of why I was excited is that as we know, things are changing in nearly real time. And so this kind of a conversation enables us to see where the state of things is, no matter when we do it. So let's do a little bit of quick background on you. I think one of the things that makes you a particularly interesting thinker on these topics is your background in training in both biology and computer science. So will you tell us a little bit about how you got here? Sure.
[00:02:44] Speaker B: Very quickly. Born and bred in Geneva, Switzerland. Come as an immigrant when I'm 17, to Brown University, thinking I'm going to be a straight up biology major. It's the late seventy s. And I run into a computer center at Brown University and I realize this is the future and I minor in computer science. I then go on to medical school and I realize, oh my gosh, this is a wonderful, wonderful profession, but it's not science. What have I done? Because I was the first in my family to go into medicine. I really did not know what I was getting into. And so I panic and then I panic successfully and I meet my thesis advisor, Peter Soloich at MIT. And I realize I can do a PhD in computer science in this very then exciting field. This is now early eighty s of artificial intelligence in medicine. And the excitement was actually quite similar to what we have today, except we didn't have either the right technology, the right infrastructure, the right amount of data, and we did not have it released to the general public either. And so subsequently I went back, finished my clinical training, started a research group called Biomedical Informatics and kept on doing work in machine learning and AI. And it became very exciting again with the advent of success in the use of convolutional neural nets in imaging, and then ultimately Transformers. And now what we are seeing in the commercialization of these generative models. But along the way I created a research group, a center at Harvard Medical School, a department biomedical informatics at Harvard Medical School that I'm the chair of. We have a bunch of great faculty doing wonderful work all the way from genomics to clinical care. So there you have it.
[00:04:25] Speaker A: Fantastic. And so I guess when you first started studying AI, it was pre AI winter, is that right?
[00:04:30] Speaker B: That's right. Yeah, that's right. I got to the full dose of going from every venture capitalist being super excited, all the MIT professors telling us how we're all going to be obsolete to oh, I'm not going to even use the word artificial intelligence because it's such disgrace.
[00:04:46] Speaker A: Yeah, fantastic. That's my sense is that in a way, well, there are lots of ways in which years of experience are valuable here, but one of them is having been through the boom and bust cycles that have just sort of been the nature of the beast here. Okay, well, let's fast forward to now because there's so much to talk about. Let's talk a little bit about the current state of AI in healthcare. In your book The AI revolution in Medicine, GBT Four and beyond, you talk about four paradigms for AI going from trial to trainee to partner to torch bearer. Where do you see us in that journey today? Or maybe there are multiple places that we're at, but would you talk a little bit about how you're seeing things now?
[00:05:21] Speaker B: Yes, so we're all over the map. So the old school, what I mean is pretransformer pregenerative model AI is actually going pretty well in the old paths. That is, there's well over 500 AI widgets mostly in the imaging space that are approved by the FDA under their software as a medical device rubric. And that's fairly a straightforward process. However, none of these large language models have gone through the FDA process. It's not even clear that they are going to be submitted to the FDA process, and yet they're being used by doctors right now in a slightly subversive fashion. That is, they're using it on their smartphones even when their hospitals are not supporting it. Obviously, patients are using it when they are stuck. But the place that it's getting the most traction is not actually, strictly speaking, health care delivery. It's the business of healthcare. 25% of the cost of health care in the United States is administrative overhead. And a lot of it that is around just saying, this is how much I want you to pay me, and this is how much I'm willing to pay you. So both the billing and reimbursement part and historically, that's employed thousands of individuals with some medical training to adjudicate on how to bill and how to reimburse by the insurance companies and by the hospitals. And all that's being replaced now by these generative programs, which we can read the chart and say this is likely going to be approved, this is likely not approved, and the Edge cases will send to a human.
[00:06:58] Speaker A: Do you have a feel for what the Edge case, what do you think that looks like? Does it become 80% automated, 20%, 90?
[00:07:05] Speaker B: I think it's 90% automated, yeah, it's 90% automated and probably more. And so it's already being used by these payer companies. We see hospitals replacing functions in their billing rooms. They have literally rooms of people whose only job is to look at the billing code and say, how can I maximize reimbursement by changes in the billing codes? And so those people also are gradually not so gradually replaced. And I think it's going to be 90%. Now, the way doctors are using it today is, again, not so much for the diagnostic acumen, although I can talk to you about some doctors who are but it's more for, again, their administrative functions. They're dumping in the patient history into a chat and saying, please generate the request or the authorization request. And that turns a five minute thing into 5 seconds. And where the mainstream is going is and this is led by Microsoft working with Epic, but also a lot of other companies in the space was how can we turn the patient doctor interaction into an automated process? So the doctor can work with the patient and the words and actions that are said around the patient get turned into the clinical note based on the previous clinical note and what was said.
[00:08:25] Speaker A: Kind of eliminating the administrative component of the doctor's day. Is that fair?
[00:08:31] Speaker B: Yeah, that's right. And so that is now in play, and it's in the play in some of the leading hospitals. And I've spoken to some doctors who are old enough to have 15 year history being doctors, but young enough to have never used a paper medical record, right. And they tell me this is the first time in their life as a doctor that they're actually looking at their patients while they're talking to them.
[00:08:54] Speaker A: Oh, that's fascinating.
[00:08:55] Speaker B: As opposed to just looking at a computer and glancing over to the patient as they're entering the history. And so that's going to work well, but it still doesn't fit into the paradigm that I was talking about because we're not yet talking about diagnostics.
[00:09:09] Speaker A: Right. This is the least sexy stuff in a way. But incredibly, I mean, as you said, if 25% of healthcare costs are administrative and if you think there's a path to eliminating something like 90% of that, it's an eye popping impact on the whole delivery of care system.
[00:09:24] Speaker B: And also, by the way, there's another even less stated issue, which is when you're going to have essentially AI talking to AI, the billing AI talking to the reimbursement AI, that's going to happen much faster, also become much clearer what the rules are. And I think that's going to cause initially some interesting gaming, but ultimately some real contracting about what are we really going to pay for and what are we really not going to pay for to create.
[00:09:49] Speaker A: So I want to double click on that, just make sure I'm tracking. So I think what you're saying is that because you have human to human interaction, maybe you could get a different result if you happen to speak to a different Adjudicator. And you're suggesting this will stand eyes over time, is that right?
[00:10:01] Speaker B: That's right. Standardized formalize. And so who knows where it's going to go in terms of being friendly to patients versus providers. I think that's where regulation is going to have to step in.
[00:10:11] Speaker A: Yeah, that makes a lot of sense.
[00:10:12] Speaker B: But I'll make it much more explicit than these squishy conversations that depend on how the Adjudicator is feeling, how much effort the doctor putting in the authorization request is. So I think it's going to make it, at one level, more rational. Key question here, is it more rational in favor of the patient, in favor of the insurer or the provider?
[00:10:33] Speaker A: There's a lot to talk about on regulation, but I don't want to get ahead of us. So as you said a second ago, the paradigm isn't even in play yet a little bit. So we talk about where you see that going in drug development.
[00:10:44] Speaker B: Okay, so where is it going? There's already some out of distribution cases by which I mean real outliers where individuals are not getting diagnosed and they're in desperate shape. There was a well publicized case just a few weeks ago of mother whose child had had problems chewing, walking, and was developing increasingly debilitating headaches. And she had multiple doctor visits. She had multiple imaging studies, no results. So she then typed into GPT four the entire history. And she got a diagnosis of tethered cord syndrome, which is basically when the end of your spinal cord gets trapped. And she went to a neurosurgeon. Neurosurgeon looked at the imaging. Oh yeah, that's what it is. That's what it is, yes. And boom. And so that exception is, I think, going to start driving a lot more both patient activation but also use of this diagnostically as a sort of instant second opinion on the doctor side. And I think you will see that both health plans and the insurers, medical legal insurers around health plans, that they'll want this as a backup. And where I'm seeing but still at the cutting edge, I'm involved in something called the Undiagnosed Disease network. This is a network where we take patients who have been undiagnosed for years. And previously the way this has worked, we do genomic sequencing, we bring them to the right expert and we come up with what are the likely gene mutations that cause the disease. What we're finding is that we can now stick in front of that process one of these generative models and we can actually accelerate the pace at which we see them. And we're still having human oversight, but we can see many more cases that way. The accuracy is pretty good. And in fact, I would say that we can get to the issues around hallucinations and inaccuracy. But I would say the strength of these models is actually in the zebras. The weird cases that doctors are the worst at, doctors are very good at the common cases. How do you manage common cases? They're not so good in the literally tens of thousands of super rare cases because no human could. But that's easier for these large language models.
[00:12:50] Speaker A: So the combination, it's a game changer then particularly for rare and ultra rare disease. Something we've talked about on the podcast a number of times is these long diagnostic journeys that folks will have. So I guess what you're saying here is this part anyways is less about a novel solution, an end of one diagnostic treatment or something, and it's more about marshaling the collective knowledge that even a good doc is unlikely to have about some of these rare conditions. Is that right?
[00:13:18] Speaker B: That is absolutely right. And we literally, I mean, it's not everything, but it's literally in the United States, hundreds of thousands of individuals who fall into that bucket and they're actually very not only suffering but they're very expensive because they keep looking for the answers and getting lots of studies. So that's an application where you're going to see increasing use of these large language models within the year.
[00:13:41] Speaker A: Wow, really exciting.
[00:13:42] Speaker B: So that's very exciting. And in terms of doctors actually using it, I think that'll be it's going to take just because everything happens more slowly in the medical establishment and you'll see some doctors who are stuck will in fact copy and paste into their smartphone and say, what else could this be? But that functionality, I think, will take at least two to three years to percolate through the decision making at hospitals and the availability of such functionality and the right lawyers looking at medical legal liability for the hospitals, but patients are using it today. And that's what truly the fascinating sociology is. We've given doctor strength tools to patients today.
[00:14:28] Speaker A: That's a great way to describe that. I love that.
[00:14:30] Speaker B: Yeah.
[00:14:31] Speaker A: No, I 100% agree on that. That's a game changer for patients.
[00:14:35] Speaker B: And when people say, oh, my God, how can you let these things, which are hallucinating, which are not always up to date, although people kept saying, it's all up to date until 2021. GPT Four just is now refreshed to 2023, and all the other major vendors are going to do the same. And you have to compare it not to the imagined healthcare system, but to the real healthcare system.
[00:14:59] Speaker A: Not the Platonic ideal of a doctor, but the reality of the medical experience.
[00:15:03] Speaker B: The reality of the doctor. Yeah. So what is the reality of the medical experience in Boston, when I get a new faculty member in my department and they say, zach, where can I get primary care? I now know the answer. But I used to do this dance where I go and talk to all my colleagues, is your practice open? Is your practice no, no, I'm closing. And one of them who's closed, a really elegant practice associated with Mass General Hospital told me, zach, other than concierge practices, which are hugely expensive, there are no high quality open primary care practices. So you literally can't get a opinion. And so if you're lucky enough through your existing resources to get, let's say, a visit in three months, and now you have another question, you're out of luck. You have to who you're going to talk to. And so the ability to ask questions and yes, maybe it'll hallucinate, but you'll ask another question and you can check it against other resources, it creates an invaluable second opinion resource and just ability to check, is this medication, interact with that medication? Would that be the reason I'm having a rash? All these resources are available to doctors internally, but now you're making them available to patients. And in the spiral that primary care is, that's going to be very helpful. Now, I do think that in a four to five year time frame, in a four to five year time frame, what you'll see is the continued trend of primary care doctors being replaced by nurse practitioners and physician assistants. There we don't have a shortage. And if you can take those individuals and augment them to be at least as good as primary care doctors, potentially better, you're going to have a first class primary care system. And the things that give me optimism about this is that one of the things that I have acquired in the last year is this title of Editor in Chief of New England Journal of Medicine, AI. It's a spin off of the New England Journal of Medicine. And so I get to see a lot of early manuscripts, and I can tell you that there are tools now that allow echocardiographer cardiographers who are not doctors, who are just very good trained technicians. They don't have medical school debt, they didn't have to go to school for eight years. And taking their output and running through a large language model that's seen a million echocardiograms results in high quality interpretations of the echocardiogram, as good as an expert cardiologist. So you can imagine physician assistant with that kind of capability working with an echocardiographer cardiographer can do an amazing amount of diagnostic and screening activity and even management, let's say, of heart failure, without actually having to use an expert cardiologist. So I think the simple addition of AI plus non doctor health professional is really going to increase the quality of medical care and it's not just in our way too expensive. Fee for service healthcare system. I was talking recently to the head of transformation at the National Health Service in the UK. They have the same problem. They have a good primary care system, but they don't have enough primary care doctors. So the wait time is huge. So I'll stop there.
[00:18:00] Speaker A: I love that. So a couple of other then things to sort of expand on, if you would. So you said four to five years. Obviously crystal balls are imperfect, but why do you think that time frame, why not faster? What are some of the obstacles?
[00:18:12] Speaker B: Well, I think the obstacles are going to be a little bit of human cry from the existing professionals saying you're not going to get reimbursed for that echocardiogram unless a cardiologist sees it. So they'll be part of that. And part of it is I just think we need to have, and forgive me, because I know he's not popular with everybody, we sort of need the elon Musk of healthcare, someone who's ready to push the boundaries and see where can we go. So, for example, we have Amazon that has acquired this company, one medical, and they already have a lot of the assets, they see all the prescriptions, they're getting clinical observations, I don't see why they could not do that. Next step, we should say let's start having a higher level of service by augmenting. They have a lot of nurse practitioners, a lot of Pas, who are great, by the way, and they have a concierge like service that is not expensive, like the typical concierge services. It's hundreds of dollars, not thousands of dollars. And I'm actually surprised that Amazon has not done that. And not knowing anything, it makes me conclude that perhaps the team that's involved in execution is having some issues. But it's going to require a company like that to actually move things forward.
[00:19:23] Speaker A: High end computer science, massive processing power and a consumer facing healthcare network.
[00:19:30] Speaker B: That's right. And again, we've heard for years that Walmart was going to do this, that Target was going to do it. These tools may actually enable them to do it, but it remains to be seen whether the leaders of those companies have the appetite to rattle the cage. Yeah.
[00:19:45] Speaker A: And so I think you're saying, but I want to confirm the obstacles are not technical. They're more regulatory, societal surpoking, the bear and the status quo, all that kind of stuff.
[00:19:56] Speaker B: That's correct. And let me just tell you, among my doctor friends, when we have in our family a healthcare problem, of course we talk to our friends, but we also talk to these large language models to get a best view because we know humans are human. And I can tell you, I can't share them publicly instances of friends who had cancers who then called me up and said, oh, that's really bad news, Zach. I have this cancer. It has these mutations. And I just plugged those mutations and that type of cancer into a general model, and it said, that doesn't make any sense. You don't get those mutations. That cancer. And then my friend actually did due diligence and was a misdiagnosis wow. With huge implications about how this cancer would be treated. So it's really, with all its warts, there's an opportunity today to improve care. So, yes, the barriers are business, regulatory, cultural, sociological.
[00:20:50] Speaker A: Do you think that the so we're on an improvement curve for these models? GPT four or 4.5 turbo, whatever the thing that was that just came out, things are changing very, very quickly. Do you think that curve will flatten and then the opportunity will be around these societal and business models, kind of application of the technology? Or do you think the models are.
[00:21:10] Speaker B: Going to keep getting better for a while? They're going to keep on getting better. So I cannot overstate the impressive performance of all the models when they become multimodal x plus vision. I just recently took a case, a puzzler from the New England Journal of Medicine. You see a picture of a back, and it looks like it has scratches, and welts and it's a 17 year old man. And I give it the history, and I ask, what is this? And it was a puzzler. And actually, the first top two diagnoses were the correct ones.
Wow. And one of them was something very obscure, shiitake mushroom toxicity.
And I had not even shared the history of eating mushrooms.
[00:21:54] Speaker A: Are you suggesting that not all docs would recognize shiitake mushroom toxicity? That does sound a little bit of an edge case.
[00:22:01] Speaker B: I missed that class. I missed shiitake mushroom toxicity in medical school. That was a great class, but I heard that I missed it. And so it's going to keep on improving. Of course I think it will level off, but I think it's the leveling off is not going to be for another couple of years at least. And the bigger problem is desperate patients and smart inquisitive doctors are using it. But medical education, which is beginning to start to address it, is way behind. And so a good way to think about how fast things are improving, even though passing national medical boards, it has nothing to do well, has not that much to do with being a good doctor.
[00:22:38] Speaker A: Sure.
[00:22:38] Speaker B: We went from two years from GPT-2 barely being able to do as well as the worst doctor to GPT four. And Google Med palm outperforming 90% of doctors in two years.
[00:22:50] Speaker A: Two years.
[00:22:51] Speaker B: And that's going to keep on improving. And so we need to get going, because otherwise, how are we serving the best possible medicine to our patients?
[00:23:02] Speaker A: I assume Harvard Medical School does not currently have a course in generative AI. That's part of the core curriculum, is that right? Or if so, do you see that changing in the near future?
[00:23:11] Speaker B: So, first of all, I can tell you that I teach a course on computationally aid enabled medicine for the 30 year medical students. And I guess students who take the course are the self driven ones, because I can tell you that many of them had their final presentations presented with generative AI with annoying British accents last year. So at least the students were smart enough to know that British accents somehow are more convincing than American accents.
[00:23:35] Speaker A: Extra 5% in credibility.
[00:23:37] Speaker B: Credibility, exactly. But more substantively, the dean of the medical school, George Daley, actually told a bunch of us who are involved in medical education that AI, Generative AI has to be now made pervasive throughout the curriculum because he sees the challenge. For example, we spend a lot of time teaching the medical students how to take good clinical notes. Yeah. Right. And so, okay, they should know what a good clinical note is and how to do it. But just like using a calculator, at some point, automation could do it faster, better, and get on with the other things. And so he does want to do that. I think it's going to be an interesting challenge because it means we have to bring along entire teaching faculty. But I'll give them credit for making this a priority. Fascinating. Yeah.
[00:24:20] Speaker A: Okay, I have to ask a big picture question, and then I want to get back to clinical research in a second. So when you were a young man in this space, talked a lot about the Turing Test, and someday maybe we'll have models that can pass the Turing Test. Kind of think all of these models pass the Turing Test now. So how do we think about artificial general intelligence? Or is that just a distraction? How do you think about it?
[00:24:41] Speaker B: I think it's a major distraction. Yeah, it's a major distraction. I remember when I first discovered GPT four a little bit more than a year ago when Peter Lee from Microsoft, vice president at Microsoft called me up. He told me, a lot of people are asking, is this AGI or not? And I think it's at this point, it's an interesting philosophical question, but the real question is from our purpose is, does this help us practice medicine better or not? That's the only question. And can this have a conversation with a patient that is helpful to the patient? And, in fact, it almost does it too well. So, Peter Lee, for example, is very impressed with the genuinely touching social skills that it shows in interactions with patients. In fact, a study that came out of, I think, Stanford and or UCSD showed that notes sent to patients from generative AI after clinic visits were seen as both more complete and more empathic than those of doctors. And that part of the Turing test does not impress me because it is also true that some of our most popular doctors with patients are not necessarily the best doctors. They're the nicest doctors. They're the nicest bedside matter. And of course, although you'd like to have the best performing doctors also be super nice, that's not always the case. And in a pinch, I'd like my doctor to be the best excellence number one. Right? Yeah. Right. And so that part of the Turing test I don't even care that much about. But, yes, they are passing the Turing test. And let's think about the way medicine practice. We know that the third largest killer of patients in hospitals is Dr. Errors. Yes. And so I don't want that Turing test to be passed. I'd like us to actually be like human doctors who don't make mistakes. Right.
I think we can get there. So although there is the very real likelihood, like, as is the case for these rare disease patients, to be superhuman, I think merely by having performance be above the average of doctors would already be a huge step forward for the overall health of our country.
[00:26:52] Speaker A: Hi, this is Chris O'Brien, host of Few and Far between. We'll be right back with this episode in a moment. I personally want to thank you for listening to our podcast, now in our third season, it continues to be an amazing opportunity to speak with some of the top thought leaders in the clinical trials industry. If you're enjoying this episode, please leave us a review on Apple podcasts. It really helps people discover the podcast. And don't forget to subscribe to Few and Far Between so that you never miss an episode. One last request. Know someone with a great story you'd like to hear me interview. Reach out to us at
[email protected]. Thank you. And now back to the podcast.
Okay, I'm going to pull us into clinical research for a minute, if I can. How do you see AI reshaping the design of clinical trials or having an impact in clinical research, either on the front end in design or on a practical level with kind of the way in which we manage these things?
[00:27:47] Speaker B: So I think its biggest impact ultimately will be on the preclinical phase, but not now, today. I think it can actually serve incredibly well. As accelerant of the whole clinical trial pipeline. And I've often said that if you look at what it takes to start a trial and complete it all the way from design to end, there's literally tens of thousands of decisions. And each of those decisions doesn't take necessarily that much time, but together they add up to years. If you could half the time of them, you'd half the time, not only which is important for the budget of pharmaceutical companies, but for human beings who are waiting for results. Yes. And so let me give you a foretaste, and then I'll go into more depth. Just yesterday, I saw this chief technical officer of OpenAI refer to a publication that we're going to publish very soon. It's work that was done out of Rhode Island, where they took their consent forms for patients and reran them through GPT so that they would be available at different levels of verbal and health literacy. And these forms are approved. They're being used now, can accelerate the and it's been shown to accelerate the consent process.
[00:29:05] Speaker A: So this is turning consent documents from something that you need maybe a medical degree to really understand or a legal degree to understand to something that maybe a high school student can understand or something like that?
[00:29:13] Speaker B: Exactly. Not even high school student. I think it was 6th grader level. And so think about it. I've been in trials, and I often have questions after I gave the consent. So these documents can be intelligent documents. You can ask it more questions. How many times will I get stuck actually in the trial? And it'll answer the question. Yeah. And then in recruiting, reading through electronic medical records, does this patient qualify for this trial? Bringing the patient who doesn't qualify just because they had a few diagnosis that makes them look that they don't, that they actually have some disqualifying features, all that can be avoided by detailed reading of the trial, which now takes a human being hours to do, and this takes seconds. The other part is looking for when you're selecting a drug, just looking for what are the possible adverse events that we should be looking out for and letting that inform the go no go decision. AI can already these generative AI programs can already do that. And then there's a huge amount of documentation that has to be generated, literally tens of thousands of pages and all that can be generated now from the results in a fairly automated fashion using generative AI. And then last of all analyses.
It was a very small demonstration, but I just took a file, I downloaded a file from the FDA of adverse events. I've never looked at this file previously, and I uploaded it into a general model that's publicly available. And I said, tell me what's in this data file. And it says it's a zip file. I'll zip it. It has directory oh, it has a bunch of files. This leaves look like data files based on the file name on adverse event reactions. This is on Dosage recommendations. So on what do you want to know? I'd like to know what are the most common adverse events? And it says, okay, in order to do that, I have to join data from this file and that file. Wow. And it does it if wants, it can show me the Python code it used to do it. And then it gives me a nice table. So that's great. I see arthralgia pains in the joints. Which drugs actually cause the most have the most arthritis events, says, okay, let me use these other files and it gives me the results. And that's great. I want to start reporting adverse events in my institution. Can you give me a program that I could put up on my website that would allow patients to report and it just generates the HTML with a big database back end all that. So this is kind of what I'm describing is a multidisciplinary disciplinary process that works across multiple teams that now is being accelerated by the use of this intelligence. And we're already seeing, for example, in a different world in programming, 50% improvements in productivity. And so I think in trials, because a lot of the tasks require basically textual or linguistic virtuosity and I include in that programming, it's going to have similar effects. Fascinating. But I had said that there's another area where it's really going to pay off. And so everybody's taking notice of the wonderful work that's coming out of DeepMind, which is a company that was acquired by Google, and they had this program called they have a program called Alpha Fold and Alpha Fold Two, which basically is itself a large language model. And instead of the language of English, it's the language of amino acids. How do amino acids link up to form different functions of proteins? And they have not been sitting on their hands from their first victory, where they used that large language model, a bit of evolutionary biology and a bit of molecular dynamics, mostly large language model to basically predict the structures of a huge majority of known proteins.
[00:32:43] Speaker A: Previously, it had taken an enormous amount of time for us to calculate the protein folding in one instance.
[00:32:48] Speaker B: Right, right. And in fact, it didn't work well enough. We'd still have to use X ray crystallography for many such structures, but they didn't sit on their hands. Now, there's something called alpha missins, which says, for proteins, let's look at all the known mutations in all databases. How much do these affect the function of that protein? Seems to work well. Wow. And I happen to know that they're also not sitting on their hands either. And they're doing the next thing, which is how do small molecules, drugs fit into different pockets in the proteins? How do different large macromolecules fit together? So the whole chemistry design and screening process is going to be accelerated. Now, that's not something that's going to happen next year. There are companies that are being created around it, but I think you're going to start seeing impressive results probably as soon as in about three years from now.
[00:33:35] Speaker A: So Zach, we've seen a few drugs come into the clinic that were created through advanced computation, maybe sourced that way, and they've not succeeded. That's caused some skepticism, I think, of late on this topic. Are you saying it's just early?
[00:33:48] Speaker B: It's just very early and the tools are getting rapidly better. So given the accelerated performance of these models, I'm fairly confident, and a lot of other companies are confident, that these baby steps that we're seeing now, which are more like analogous to GPT-2, will actually result in much better performance in a year or two. And it's very clear the curve is, if anything, getting steeper and accelerating so that we can be really confident that in a few years we're actually going to have some interesting breakthroughs in therapeutics. And the real disappointments that happened are just disappointments of early days.
[00:34:31] Speaker A: Yeah, that's a perfect segue to the next topic. So what we've just said is that already there are huge implications in the clinic for making the clinical trial process faster, more efficient. I also think that simplifying language might have an impact on diversity and trial recruitment, which is another hot button issue these days, of course. So lots and lots of stuff there. And then you've just said, hey, hold on to your hat because computer driven or large language, model driven drug design is in its infancy and it's really going to take off. If you were a regulator and you're staring at this pace of change, this, as you said, increasingly steep curve, what should you do? What should FDA or other regulators be trying to do as they try to navigate this world? Obviously prioritizing both patient safety and innovation?
[00:35:14] Speaker B: Well, it's a very tough thing to do because it's like asking someone to regulate highways before highways were built.
All the problems that we know about highways happened afterwards. And so I think that well, let me actually answer you concretely. We just had a very small by invitation only conference that we held in Maine with 60 individuals from around the world china, US, Europe, UK regulators, tech, lawyers, ethicists. And we asked ourselves exactly this question. And so there was a general agreement that it's great that several governments are putting in place some broad governance initiatives that are going to result in thinking about what the regulation is. At the same time, we had an interesting discussion between the Europeans and the Americans. The Europeans who were attending our meeting were saying that they were struck how different their conversations they were having in Europe were. The ones that we had. We were talking about the risks, without a doubt. We were also talking about the ways that patients could benefit. And in Europe they were only talking about the risks. And so in my mind, and I think we all agree that that's probably the wrong way to think about it. If you just look at the risks and don't look at the benefits, then as they put it, there is going to be 450,000,000 Europeans who are not going to benefit in this generation from the way healthcare can be advanced. Their healthcare systems are suffering too. And yes, there are risks, but healthcare is really, really broken and both financially and in terms of performance. And so we have to have a discussion that balances risk and benefit. And I think the other thing that we said is because patients have been exposed to these large language models, yes, they are aware of the risks, but they are also very aware of the benefits 100%. And we need to engage the public and not just a few pointy headed academics and policymakers into defining what those policies are. And so a tough trade off that we're going to have to have an ongoing conversation about for the next few years, which is how much do we regulate and how much do we wait and see what requires regulation? And especially given how fast things are changing, we don't know if the weaknesses of these systems will be the same in two years. And so doing big changes right now may be unwise but putting the discussion into place so that each society has its own discussion. And let's be clear, not every society will have the same, reach the same conclusions because there are different trade offs. But at least we should understand what those trade offs are.
[00:37:48] Speaker A: So I'm not going to ask you to speak for Europeans despite your Swiss ancestry, but in general, did that seem like a consensus opinion that we need to balance those things?
[00:37:57] Speaker B: That was a consensus opinion, but there was also a sense that not enough of the public was aware of those trade offs. And what they're hearing a lot of is the fear, because what you're hearing a lot is about the fear of its misuse by humans, which is a real consideration. There are some doomer fears as well and those are all legitimate conversations but they don't seem to be balanced against the very real opportunity cost of hundreds of millions of lives that are just not getting good health care now.
[00:38:31] Speaker A: So do you think then that you mentioned involving the voice of the patient and especially in rare disease and some of the diagnostic tools you talked about, some of the patient journey stuff that's happening right now, is that maybe an antidote to some of the fear mongering?
[00:38:42] Speaker B: Exactly. I think that those patients, most of us, thank God, are healthy most of the time. And so when someone talks about problems, we're likely to listen to it when you're actually sick.
[00:38:53] Speaker A: Right.
[00:38:54] Speaker B: Trade offs look very, very different. These patients have stared sick in the face for years and suffering, and they can actually be extremely articulate about the problems in our healthcare system, the multiple tests, and people are not awake to the diagnosis, not getting into ruts, not thinking outside the box, and so on. They are indeed the parents and the patients are going to be, I think, the most potent force in articulating the benefits of improving healthcare by bringing this new player into healthcare.
[00:39:27] Speaker A: Yeah, that makes a ton of sense. That sort of brings us to your work on ethical AI. So will you talk a little bit about this is obviously incredibly relevant, given what we just talked about. But also in general, I hear a lot of people worrying about this. We do hear in fact, somebody told me a story the other day that if you add to a prompt, this question is extremely important to my career or my health, or something like that, that you might get better responses back from the which is a complete head scratcher to me for why that would be the case. But so we are all concerned with questions around ethics here, ethical behavior from models that we don't understand. So how should we think about that?
[00:40:02] Speaker B: So I think it's a real issue. But again, I'd like to start from where are we today and today doctors, a lot of doctors are great people and well motivated people. Do they have biases? Yes. Some of it is explicit, some of it is unknown. How do we assess that? Very difficult to assess that, but we know it's there. With these large language models and other AI, we can actually explicitly investigate what that bias is. We can say case X with an African American patient. Case Y with a Hispanic patient. Does it behave differently? We can actually assess this. I think that assessment has to happen. And which actually leads me to I didn't answer your question about the FDA, because I think the FDA is this is too big a job for FDA. I think it requires essentially the equivalent of industrial strength Consumer Reports, which is which of these models is better? And FDA doesn't answer those questions. It just says, does this do what they say it does, doesn't say whether this model is better than the other. And I think that those same Consumer Reports should be doing those inspections to see whether there is bias in these. And once you identify it, you can actually steer it around what's called align it so that it doesn't have that bias. You can say, I want you to ignore any mention of race in this history and just give the same answer regardless of race. And there are companies speaking of Consumer Reports, I'm not sure it's there yet, but a colleague of mine who's created a nonprofit called Nightingale and Nightingale, its goal is to acquire datasets so that machine learning specialists can run their programs against it and see how it performs. And that's one route I can tell you that the motivation in creating any JM AI the spinoff of new journal Medicine is we need, just as we have for drugs assessment. Does this work or not? We need that for AI. And what I don't want to see, I think what things that we should worry the most about here's where I think the real worries are A, is computer programs that provide dissolve support that again, are neither peer reviewed nor regulated. Where does that happen? A lot of the electronic health record vendors have actually created AI programs and because of their relationship with the hospitals, it's not viewed necessarily as additional sale. So it's part of the product. And we saw in some pretty bad poor performance around predicting who was going to get worse from COVID predicting sepsis. And so I think there should be no get out of jail card that says you're not going to be examined benchmarked by independent peer reviewers.
[00:42:47] Speaker A: Somehow this doesn't count because it's part of an electronic health record system.
[00:42:51] Speaker B: Yeah, that's right. That's number one. Number two, where did this data come from for many of these large commercial systems? We don't know where the data came from and why does that matter? Because if, let's say they start training it on hospitals in Alabama and they do a good job even, it'll represent the way medicine is practiced in Alabama with a case mix and ethic mix and socio economic mix of people in Alabama may not be the same thing as we see in Massachusetts. And it's important to know where's the data coming from. There's been various arguments all the way from this is somehow their property a trade secret? All the way to, oh, we can't let you know that because if we let you know that, they'll be letting the genie out of the bottle which would lead to destruction of the world. And I just think that's not the right argument. And having transparency about where the data is coming from is going to be really important for us to understand the performance. Because to get back to your original question about where are we in the various stages of validation? So, as I said, we have a bunch of AI programs which are narrow use, which work well on the trial basis. They don't work well as a trainee because they pass all the trainee benchmarks. I call these exams. But they're not human beings. So they don't actually just because they pass exams doesn't mean they're going to work well. As a doctor, I loved your description.
[00:44:13] Speaker A: At some point you said that Chat GPT is both the smartest and the dumbest person you've ever met.
[00:44:17] Speaker B: Something like that, yeah, that's a Peter Lee quote. Absolutely correct. Yeah. And so therefore we really have to understand where it's getting its data from and also keep our eye on it. It's easier said than done to keep a human in the loop. And why do I say that every time we put a human into an automated process, every human becomes lazy because things are going well.
[00:44:39] Speaker A: Mostly it's fine.
[00:44:41] Speaker B: Mostly it's fine. Why should I pay attention? And again, at the risk of being unpopular, I'll cop to owning a Tesla. And Tesla has autopilot. It turns out I'm not a great driver.
But I know enough when we're doing Autopilot to wiggle the wheel to know I'm in control. Don't worry. But then I do something that my kids want to kill me about. I bring out a phone and start looking at my phone. And then the Tesla has cameras on you that says, I'm paraphrasing. Zach. Don't do that.
The Autopilot. And then I keep on looking, and then it switches off the Autopilot. Wow. And then it says at the end of the drive, that was not nice. If you do that again three more times, I'm switching off the Autopilot until the next software update.
[00:45:24] Speaker A: Wow.
[00:45:25] Speaker B: And so think about that in medicine, maybe we should be having something like that so can make sure the doctors are not asleep at the wheel. Like inject into it, into the record, something that they should be really alarmed about and do something right away. And if they don't, they say, Sorry, Dr. Kwane, time to go back to a little training here. Yeah.
[00:45:44] Speaker A: Slap on the wrist. Yeah, there's a great the science fiction novelist Neil Stevenson had a bit in one of his books where they insert false negatives to get people to focus and pay attention.
[00:45:54] Speaker B: Which novel was that? Oh, God, you would ask me.
[00:45:57] Speaker A: It's the one where a guy's running he's running a large video game company.
[00:46:02] Speaker B: Not Snow crash.
[00:46:03] Speaker A: No, it's more recent than that. It's more recent than that. I'll find it and send it to you afterwards. But they find a way to monetize having humans do some observational tasks, but the actual error rate is maybe one in 100 or one in 1000. So they create fake errors in order to keep people on their toes. I love that as a concept.
[00:46:20] Speaker B: And by the way, just to show this is not science fiction, back in the 90s, my colleague at the Brigham, David Bates, did the following study. There's a drug called Adanitron, which is an antimetic drug. It stops you from using nausea, step to use for cancer patients. And it's expensive. And so there was a lot of discussion in the pharmacy and therapeutics committee, what should the dose be? How often should we give it? And went on for hours. And then, because he had to do something on the order entry system that was implemented at the brigade, he just put in a default dose at a default frequency. And 95% of doctors never changed.
[00:46:56] Speaker A: Right.
These are real concerns. Yeah, I think it was the novel REMD.
[00:47:04] Speaker B: That'S right.
[00:47:05] Speaker A: Okay, so one last question about, I guess about regulation. So is something around transparency? Is that something you think we should be doing now? It sounds like you're. Yeah, that one's now.
[00:47:15] Speaker B: Yeah, that one's now. Yeah, I think that's right. The detailed regulations are probably premature, but so the basic ground rules about transparency and also about governance, how does data get added to these systems?
It's almost a civil rights issue.
On the one hand, should the state or healthcare system say, because of public health, we should be able to take all your data, inject them into models? Or should people say, well, I'm part of a discriminated minority, I don't want it's not obvious what the answer is here.
[00:47:49] Speaker A: Right, but that's the discussion. Yeah, 100%. I mean, you could argue the other way, I suppose. I want my data in there because I want people like me in the model so that I get interesting results out. I could see arguments in both directions.
[00:48:00] Speaker B: Exactly. But we need to have that discussion. We're not having that discussion.
[00:48:03] Speaker A: Yeah, that makes a lot of sense.
[00:48:05] Speaker B: Okay.
[00:48:05] Speaker A: I want to be mindful of time and bring us to conclusion here. So advice for me and all the other people whose minds have been blown over the last hour about what's happening and where things are going, how should we stay current? What should we be reading? Where would you point us in addition to your Twitter feed?
[00:48:22] Speaker B: Okay, so it all depends on where you are on the attention deficit disorder spectrum. If you're on the extreme end, then you actually do have to look at Twitter for the latest things. And I would then just recommend at least one of the free large language models, whether it's Bard or the free version of GPT, start trying to use it for some of your everyday tasks. Yes. Because that'll give you a good sense of capabilities and not you need to make it very real. I've spoken for several audiences, and invariably when I ask show hands, everybody is very convinced it's going to change everything. But when I ask them how many of you are using it on a daily basis, it's about 10%. Wow. Okay. And when OpenAI said there were 100 million people who were using it, I asked myself the question, who the heck are these 100 million people? Most of the people I talk to don't use it. And I'm surrounded by pretty smart people, and I figured it out. It's all high school kids because they actually have a use for it on a daily basis, sometimes in ways that teachers don't like. But I could tell you in some schools, they're saying, oh, if you want to debug your program, you can use GPT because they know that's the future anyway. So I want you to be part of the future by using this now. If you're not too deeply on the attention deficit disordered fray, I would strongly recommend just going around to YouTube and just say introduction to Transformers. There's some good ones in that. Introductions to large language models and cannot say enough about Ethan Mollick. Ethan Mollick is a professor at Wharton School at Penn. He has thought so deeply about phenomenal guy.
[00:50:02] Speaker A: Yes, phenomenal guy.
[00:50:03] Speaker B: And the real testament to his intellectual integrity is he built millions of dollars worth of software over years for education. And then GPT Four came along and he said, I'm done with that and complete Pivoted. And he's created recipes. How the rest of us can actually create your own course to teach yourself using it.
[00:50:21] Speaker A: That's a fantastic tip. All right, we'll try and link to some of that in the show notes. Professor Zakohane, thank you so much for joining us today. You are doing some of the most exciting stuff on the front end of AI. Thanks for giving us a little look into the near future.
[00:50:34] Speaker B: What a pleasure. Thanks for chatting.
[00:50:46] Speaker A: Thank you for listening to the latest episode of Few and Far Between. Our podcast is now available on Apple podcasts and other major streaming services. Please take a moment and leave us a user review and rating today. It really helps people discover the podcast, and we read all the comments. Those comments help us to make few and far between better and better. Also, be sure to subscribe to Few and Far Between so that you don't miss a single episode. Got an idea for a future episode? Email us at
[email protected] or contact us on our
[email protected]. I'm your host, Chris O'Brien. See you next time.