It’s Time for Clinical Trial Data Review to Evolve

by

Sanjay Kunchakarra

September 5, 2025

Blog post image

Clean, reliable data is the foundation of every trial.

Before any real insights can be gleaned from a trials’ results, all collected data needs to be cleaned, verified and signed off on. But while the drug modalities and trial designs have seen incredible innovation over the last few decades, the way the clinical research industry reviews and cleans clinical trial data has not changed much.

Most studies today rely on batch-based query cycles. They involve monthly data freezes, static error reports and several weeks before queries reach sites. For data managers and clinical research teams, this means weeks of delay (nearly 45 days before a query reaches a site and 2-3 months before data is considered “clean”), stale insights, and errors that surface long after they should have been caught.

The Batch Review Status Quo

In most trials, the process works like this:

Data Freeze: EDC study data is “frozen” before a review cycle begins (every 2-4 weeks)
Listings Prepared: Pre-configured SAS listings are prepared based on the frozen study data
Error Reports Generated: SAS programmers generate and distribute error reports to data managers, medical monitors, and clinical operations.
Reports Reviewed and Queries Issues: Teams review the reports and issue queries in the EDC over the following 2-4 weeks

For data managers, this process involves weeks of manual investigation. And for the site, it means getting queries about data they entered weeks ago, sometimes even when the patient has already moved on to their next visit.

Why this is bad?

Until now, the industry has had no choice but to rely on freezing the EDC system and batch-based review. If a data manager tried to review and analyze all the EDC data before the next update is made, it simply wouldn’t be humanly possible. But the status quo of batch-based review unfortunately leads to real (and costly) problems:

Trial Milestone Delays - Queries raised weeks late may delay access to clean data needed for interim analyses, DMC reviews, or database lock. Even a week delay in DB lock can cost sponsors ~$250K in added expense
Stale Data for Review - By the time trial teams sit down with reports, much of the data is already outdated, making it harder to act with confidence
Site Burden and Frustration - Sites struggle to respond to queries about old visits, increasing cycle times and challenging site relationships
Inefficient use of Data Management Talent - Highly trained data managers spend hours reconciling listings and drafting repetitive queries instead of focusing on oversight and decision-making
Regulatory Risk - Under ICH E6(R3), regulators now expect proactive sponsor oversight. A delayed batch process makes it harder to demonstrate compliance

Opportunity to Move from Batch to Real-Time Review

The technological advances in AI have opened up new possibilities in the clinical research industry. The amount of AI noise can feel overwhelming at times, but our team has seen first-hand the real impact that today’s large language models can have on real clinical research data and processes. It’s not a dream anymore.

The limitations of technology had forced the industry to rely on batch-based review. But now that we can create AI Assistants (a.k.a AI agents) that can harness the immense power of LLMs and analyze trial data around the clock, data review can now be performed continuously and in real-time.

Let me explain how our platform Reveal uses AI assistants to create a live feed of data quality issues so that data-review can be performed in real time:

First, updates to study data are analyzed daily
Then, our assistants gather all updated data, and identify (using their powerful underlying large language models) any relevant data inconsistencies or discrepancies
Once these discrepancies are found, our AI assistants will then recommend a path forward around resolution. If it makes sense to issue a query, then context-specific query messages are drafted and surfaced to data managers for their approval

In this new paradigm, data quality issues can be flagged within 24 hours of data entry and queries can be issued within 7 days. This represents an 80% decrease in the cycle time between data entry and query issuance. By compressing the cycle time, we can resolve issues quicker, analyze fresh data throughout a trial and avoid delays in trial timelines.

To Sum Up

Batch review has been the clinical research industry’s only solution to generate clean trial data. But now that we have the tech to eliminate the bottlenecks in the data review process, it’s time for the industry to level up. Let’s give data managers the tools they need to deliver higher quality data and faster turnarounds, so we can put the days of keeping study teams in the dark behind us.

Explore more

September 5, 2025

Beyond Edit Checks & SAS: How AI Elevates Data Quality in Clinical Trials

Anyone who has been involved with clinical data review will recognize this cycle: build edit checks, wait for SAS listings, analyze monthly reports, and issue queries weeks after the data was first entered. And it works, sort of. These checks catch missing values, out-of-range numbers, and date inconsistencies. But they also leave behind a long tail of errors that only surface late in the trial, or worse, during database lock crunch time.

Read more

Blog post image

September 5, 2025

Enabling The 10X Data Manager

Data managers are the unsung heroes of the clinical trial process. They build a foundational level of hygiene into the study database, enabling all downstream analysis. They own the delivery of that final clean database, managing the challenges of protocol variability, site performance, and cross-functional responsibilities throughout the study. It's a thankless job candidly, and as studies get more complex and data volumes increase, the burden on data managers will only continue to grow.

Read more

Blog post image

September 5, 2025

Become Compliant with AI

Hear me out. The average phase 3 clinical trial today involves 3.5 million data points collected (up from 1 million in 2012). As protocols become more complex, the burden on study teams to collect, clean and analyze data continues to worsen, leading to mistakes, delays and rework.

Read more

Blog post image

See What a Prediction Engine Can Do for Your Program

We'll walk through your disease area, your data, and how Reveal Clinical fits into your development timeline.