One Month In, Elsa Highlights FDA’s AI Ambitions and Program Limitations

It has been just over a month since the Food and Drug Administration announced the agency-wide rollout of Elsa a generative AI tool “designed to help employees—from scientific reviewers to investigators—work more efficiently.” The launch, which was completed ahead of schedule, came after FDA Commissioner Marty Makary said he had been “blown away by the success of [the agency’s] first AI-assisted scientific review pilot.”

According to several sources, including STAT News, Elsa, a large language model (LLM), was built with Deloitte on Anthropic’s Claude platform within Amazon Web Services’ secure GovCloud environment. It evolved through the Center for Drug Evaluation and Research under an AI development program that has been ongoing across divisions in the FDA since 2020.

In sharing news of the program’s agency-wide launch on June 2, Makary said, “The agency is using Elsa to expedite clinical protocol reviews and reduce the overall time to complete scientific reviews.” FDA Chief AI Officer Jeremy Walsh characterized Elsa as the “dawn of the AI era at the FDA.”

Makary’s and Walsh’s enthusiasm for Elsa reflects the broader, high-profile embrace of AI by the Trump administration. As Applied Policy has previously noted, the January 23 executive order, Removing Barriers to American Leadership in Artificial Intelligence, both revoked Biden-era AI regulations and promoted federal preemption of state-level oversight. The administration is also pursuing legislative changes to accelerate the adoption of AI across both public and private sectors.

This AI-forward approach—most notably championed by former special government employee Elon Musk and DOGE—has not been without public stumbles. The most notable may have been the “Make Our Children Healthy Again Assessment,” released by the Make America Healthy Again Commission in May. While the report presented ambitious goals, investigative reporting quickly revealed certain flaws: multiple citations referenced studies that do not exist, at least seven fabricated sources, and telltale signs of AI-generated “hallucination” — including footnotes marked “oaicite,” suggesting that parts may have been generated by OpenAI’s ChatGPT.

The errors in the MAHA report have added to concerns about the use of AI in policymaking. Walsh addressed some of these directly in remarks he delivered at the DIA Global Conference in June. Emphasizing that the FDA has put safeguards in place for the use of Elsa, Walsh said, “It can’t hallucinate; it’s not allowed to come up with figments of its imagination.” He later clarified to Regulatory Focus that the system does not hallucinate when used as intended. It remains unclear whether Walsh was dismissing the New York Times reporting that some FDA employees had stated Elsa did hallucinate, or whether he was suggesting that any such instances would have resulted from improper use. As STAT reported, FDA staff are required to click through an agreement acknowledging that “While Elsa is a powerful tool, it can make errors” to access the program.

Concerns over the use of AI are not limited to its potential to make up “facts.” There is a growing body of evidence to suggest that LLMs fall short in accurately analyzing and interpreting scientific data. This limitation may be increasing even as the models themselves progress. In a study released last month, the Royal Society found that summaries prepared by LLMs “may oversimplify or exaggerate scientific findings, which can lead to large-scale misunderstandings of science.” Interestingly, research suggests that “explicitly requesting” that LLMs provide accurate responses can increase generalizations. In addition, including evidence in a prompt may negatively impact the accuracy of a program’s responses.

With cybersecurity a growing concern, and healthcare a favored target of hackers and state-sponsored cybercriminals, questions arise about maintaining system security and protecting the information shared with it. Acknowledging security questions, Makary said, “All information stays within the agency, and the AI models are not being trained on data submitted by the industry.”

While several pharmaceutical and medical device manufacturers have expressed enthusiasm for the FDA’s use of AI, the industry has several concerns which remain unanswered. The agency has not yet specified how Elsa has been or will be, verified and validated. Nor has it detailed what safeguards will be in place to ensure the system’s use does not expose trade secrets or sensitive device data. It also remains unclear whether and how applicants will be informed about the extent to which AI tools are used in reviewing their submissions.

The FDA’s agency-wide rollout of Elsa marked a shift from controlled testing to real-time implementation. It also made the program subject to sharper scrutiny from a broader range of stakeholders. The coming months will test not only the tool’s utility, but also the agency’s ability to navigate the challenges that come with deploying AI in a regulatory setting.

Applied Policy ®

One Month In, Elsa Highlights FDA’s AI Ambitions and Program Limitations 07.08.2025