Rating Scales In Clinical Trials
10 Ways To Minimize The Data Noise In Clinical Research
By Monika Vance
The main thing about rating scales is exactly what Dr. Stephen Covey teaches over and over:
“The main thing is to keep the main thing the main thing.”
Every scale has its unique utility. It is designed to measure a main thing, within a specific context and within a specific group of people.
Rating scales are also referred to by different names, depending on the professional background of the individual talking about them and on their general utility in a given setting…
…psychological tests, psychiatric measures, clinical outcome assessments, psychiatric scales, psychiatric outcome measures, patient-reported outcomes, clinician-reported outcomes, patient-experience measures, neuropsychological assessments…
I know, it sounds a lot like an excerpt from George Carlin’s Euphemisms routine. And, then there are the acronyms!
Here, I want to address how rating scales impact international clinical trials. Any trial, in any therapeutic area, that has one or more rating scales designated as a primary, or even secondary, study endpoint. Based on today’s implementation practices, there are things we can do, right away, to fix the noise in the data the scales generate.
What are you measuring, exactly?
I’ve been around the clinical development community for nearly 20 years.
For the last 8 years, as an entrepreneur with an ambitious goal, and just the right amount of naivete blended with just the right amount of fearless craziness. I’m really glad that I didn’t know what I know today about working in the pharma industry, because it would seem insanely intimidating! Maybe I’d find myself in some alternate reality today.
My work is never boring.
This is despite the fact that when I tell most people who don’t know me about what I do for a living, they look like they just took a sleeping pill. I make it quick, but still wonder if they’re momentarily meditating with visions of serenity in the Maldives instead of staying with my enthusiastic spiel about rating scales. I’ll never know, because they’re too well mannered. Except one person. His reaction made me cut it down and make it fit the attention span of a squirrel.
It came down to…”I proudly peddle rating scales in pharma!”
My focus is simple, pointed, but it has broad applications – it’s on rating scales used in clinical research and in private practice. In pharma, my awesome team and I help research teams find, select, adapt, license and use rating scales that can actually measure what they want to measure on that very specific group of subjects.
If this sounds like a piece of cake to you, let me tell you, this work is not easy!
It’s really hard. It’s worse than selling the value of rater training.
Our pharma industry customers are skeptical scientists and medical directors, and process-focused clinical operations leaders. We debate the virtues of clinical outcome measures, whether or not it makes sense to switch from a standard FDA-friendly scale to a new one, or if it’s feasible to adapt items, or maybe even develop something new. Imagine the anxious abstractness of that!
(What would that alternate reality look like?…)
Depending on what role each of us plays in the greater context of getting a new drug to market, our common goals are organized in different order of priority. This naturally changes our focus and perspective on how we perceive the importance and value of what a service provider brings to the clinical program. Peddling rating scales is hence no better or worse than peddling molecules; it’s just waaay less glamorous in every conceivable way!
Rating scales represent a tiny fraction of the overall clinical development program to-do list. It’s miniscule and relatively short-term, so there is little time for sponsors to retain its importance and value in collective memory. If it was more prominent in its position on the list of priorities, it would be possible to plan farther ahead for evaluation of what is available out there, select with greater scrutiny and rigor, and to adapt and validate, with translation included. It would be nice, but right now it is what it is.
My ethical role is to remind those who select and implement rating scales in study protocols that they’re gambling with data integrity when they choose scales just because the FDA is familiar with them, and because regulatory review will take a shorter time, and/or because someone else had used them successfully. My role is to remind my customers that their peers or competitors weren’t measuring the exact same thing, with the same molecule, on the same group of people, and maybe even not at the same sites. Why that matters, I’ll explain in another article.
The main thing about rating scales that impacts your trial success
Interestingly, the subject of rating scales (as opposed to the broader subject of measurement) is not what clinical research people enjoy talking about. Since rating scales represent a tiny fraction of the overall development program, the approach to selection, adaptation and implementation has become more mechanical than it remains thoughtful. There are exceptions, of course.
In the minds of clinical researchers who design studies and those who rate study subjects with them, rating scales have acquired a questionable reputation in performing as reliable trial outcome measures. The general attitude toward them is, “they’ll be gone as soon as we can replace them with biomarkers”. While I totally agree that an addition of biomarkers would be a stellar improvement in outcome measurement, let’s face it, we are still far away from that. Far enough to have to fix what we are doing over and over, even though we’ve acknowledged at scientific meetings that what we are doing with measurement isn’t working very well. Especially in CNS.
Trials fail for many reasons. A researcher friend from the industry says, “we have to celebrate those failures rather than condemn them, because we just learned something we didn’t know before; and in science, that’s important and expected.” When you break it down, no matter which way you look at it, trial failure rolls down to human error. Whether it’s understanding etiology, mechanism of action, formulating the research question, applying study design, project management, vendor selection, subject enrollment, rating quality, construct of measurement, rating scale selection, etc., it’s all related to how much we know about the problem we’re trying to solve, how much experience we have with each aspect, and how much time we are willing to put into learning about it.
Start fixing noisy rating scale data today!
In the interest of managing tight deadlines, there are steps we can take right away to start fixing what we do over and over and which contributes to noisy trial data.
We can start with not messing with the specifics of the main thing, and hence with quality assurance. This may not prevent a trial from failing, but it will naturally boost the chance of success, and crystalize the story that your data will convey.
This is a starting list of things we can change as of today. It is by no means complete, but includes the prominent ones…
1. Choosing clinical endpoints based on available scales
If you don’t have the necessary resources to explore adaptation or development of customized scales, choose your clinical endpoints based on PROs and ClinROs that are already available. By this, I mean what’s available worldwide, and definitely outside of the relatively narrow kaleidoscope of existing pharma protocols. Be open to something new. The FDA will accept it if you can psychometrically demonstrate that the scale measures that main thing, and within the group on which you’re using it.
2. Choosing clinical endpoints based on construct
If adaptation or development of customized scales is an option, you’ll have more creative freedom in your investigation, and you can choose your clinical endpoints based on the construct of what you want to measure.
Understand that construct really well. Once you begin testing it, you’re guaranteed to learn a lot more about it and that will help you refine your study’s clinical endpoints to a greater degree of precision.
3. Planning farther ahead for translation
Allow more than 12 weeks for translation.
We can debate this, but from lots of experience, it’s really not enough time for a translation company to do a good job of it. It’s definitely not enough time when it’s a technical clinician-reported scale that involves a subject interview component. It’s just good enough for a mediocre job of it.
4. Qualifying translators
Why trust a translation, when you don’t know the translator? Think about it. This is especially true with a sensitive psychometric instrument. Who, exactly, is translating the measure that will generate your data? You’d be amazed who translates highly technical ClinROs.
Qualify your translators – ask for the CVs (build that into your scope of work requirements), and take the time to set criteria that will help you hire them just like you would hire a site principal investigator. The fields of psychology and linguistics have done this work already. No need to reinvent the wheel.
Your translation company is not the translator, so your relationship with them is not relevant here. They are the project management team.
As it is, you’re already committing a sizable chunk of your budget for the service, and for enhancing the value of intellectual property that doesn’t belong to you. It’s counter-intuitive from a business standpoint, but in the end, you own the data and the quality thereof, which can be more valuable than the scale itself.
5. Validating translations
Do not skip the validation part of the Linguistic Validation process.
It’s part of the bare-bones process for a really good psychometric reason, and it’s equally important for the integrity of your data.
Linguistic validation in itself is an extremely minimalistic approach to adapting a psychometric instrument to a new cultural group. It’s practical for the pharma industry, but keep in mind that this type of translation already adds noise to your data set. By psychometric inference, not validating your translation makes the noise in your data set even louder.
6. Changing the test-retest time frame
Do not change the re-test time frame without testing it for sensitivity to the change you just made to it. You cannot back up the psychometric integrity of that change without testing it. If you don’t test and the scale isn’t sensitive to your new re-test time frame, then you’re losing important data that can crash your trial.
7. Changing the items/questions
Never change any of the items without testing it for equality in test-taker competency and/or comprehension and relevance to the construct. Validation is essential here. In some countries, the construct will be different and difficult to rate if not adapted.
8. Changing the clinical setting
Do not change the clinical setting. For example, if the scale was designed for inpatient groups and validated on that group, don’t use it on outpatient groups.
9. Changing the subject group
Do not change the target subject group without adaptation and validation. For example, if the scale was tested on outpatients with major depressive disorder in the U.S., don’t use it on outpatients with major depressive disorder in India, China or Russia (to name some).
Clinical training, medical judgment and culturally-driven symptom acknowledgment – without translation and adequate calibration training – will introduce some pretty funky scores.
10. Knowing what version of the scale you have
If you don’t know where the version of the scale in your possession came from, or why the items on your version are modified from the source version, don’t use it. Again, you cannot back up the psychometric integrity of that version, in case the FDA requests it.
Get rid of it and find the original – and hopefully validated – version. Use that instead, and/or adapt it and test it.
If you do just these 10 things starting today, you will do what nearly all of our customers were not doing, simply because they didn’t know how important taking this kind of action is.
We are overthinking where the failure happens. It happens most blatantly within the process loopholes and shortcuts, that are sometimes unfortunately approved as “ok” by inexperienced or passive service providers on whom sponsors rely for advice, and whose priorities are not aligned with the sponsors’.
There will always be a myriad of reasons for trial failure for as long as humans make decisions about the myriad of actions that must be taken to get the study done. So all this, my colleagues, is definitely a transformative gap between success and failure in clinical trials.
Do this. Today. And when you get stuck and need solid unbiased advice, now you know where to find me:
I am and shall remain (here) at Santium, proudly peddling rating scales in pharma!
(Originally published on June 7, 2017 on LinkedIn.com)
Follow me! Don’t miss out on good stuff! Sign up to have new posts and helpful tips on selection, use, translation and validation of rating scales delivered to your inbox. You’ll also get first dibs on special tutorials!