Alyson Essex (SAMHSA) said that SAMHSA’s preliminary recovery measurement pilot study was designed as part of the agency’s recovery support strategic initiative. The goal of the pilot was to test recovery measures for potential use with populations in SAMHSA’s discretionary grants program. This effort is largely separate from the work conducted to expand the collection of data on recovery in the general population, but it is highly relevant.
Essex said that prior to her tenure at SAMHSA, there were discussions about the potential use of the World Health Organization Quality of Life Instrument (WHOQOL) BREF (see Chapter 5) in the pilot study. The concern was the length of the instrument. She said that after conversations with the World Health Organization, the idea of using a new tool called the WHOQOL-8 emerged. The WHOQOL-8 comprises four domains: physical health, psychological health, social relationships, and environment. This instrument was identified as best capturing the four dimensions of recovery that have been discussed—health, home, purpose, and community.
Preliminary work on the WHOQOL-8 has been done in 10 countries, and the scale demonstrates good psychometric properties. SAMHSA is the first group to be testing this tool with the U.S. population.
The WHOQOL-8 includes two questions on overall health: “How would you rate your quality of life?” and “How satisfied are you with your health?” There are also two physical health questions: “Do you
have enough energy for everyday life?” and “How satisfied are you with your ability to perform your daily activities?” There is one psychological question: “How satisfied are you with yourself?” There is one question on social relationships: “How satisfied are you with your personal relationships?” Finally, the scale contains two questions that measure the environment: “Have you enough money to meet your needs?” and “How satisfied are you with the conditions of your living space?”
Essex said that partly by using the WHOQOL-8, SAMHSA developed what the agency is calling a recovery measurement package. The package includes the WHOQOL-8; Government Performance and Results Act (GPRA) measures on alcohol, drug use, and mental health recovery, which were decided based on a meeting with a workgroup; and one measure of empowerment.
The goal for the pilot study was to collect data from 300 clients in SAMHSA’s discretionary grant programs and conduct psychometric testing. The initial design called for a longitudinal self-administered survey, and the plan was to collect the data as part of the standard GPRA data collection at intake and then follow up with another survey 6 months later. Essex said that despite several activities focused on promoting the study to grantee representatives, participation was lower than anticipated because the grantees were concerned about the additional burden, on the staff and on clients, imposed by the study. As a result, SAMHSA decided to do only a one-time, baseline data collection, instead of the longitudinal study that had been intended.
The data collection was fielded between February and July of 2015, at 14 grantee sites with 171 individuals. The participants were involved in one of three SAMHSA grant programs that were focused on the areas of housing and recovery services for individuals experiencing chronic homelessness, expansion of infrastructure to integrate co-occurring and housing services, and integrated primary and behavioral health care for individuals with serious mental illness.
In terms of psychometric testing, Essex said that due to the small sample size the agency was not able to use advanced techniques, such as structural equation modeling or confirmatory factor analysis. Instead, SAMHSA used principal components analysis, which showed that the eight-item measure was a one-dimension construct, as hypothesized. The scale had a coefficient alpha of 0.848, indicating a high degree of reliability.
SAMHSA also conducted psychometric testing for all 21 items that were included in the recovery measurement package, and the results provided conditional evidence that the items were measuring one underlying construct. Two items had loadings that were low enough to suggest that they were not strongly associated with the underlying construct: self-efficacy for managing one’s health care needs and enrollment in a job or
training program. The coefficient alpha for this scale was 0.745, which is in the acceptable range for scales in their early development.
Essex said that although the results were positive, the findings must be interpreted with caution, given the small sample size. She added that information gleaned from the survey provides a framework for a more robust study and analysis. SAMHSA is leaning toward support for use of the WHOQOL-8 with SAMHSA discretionary grantees, but additional testing will be needed for the 13 questions in the package. The agency is also considering customizing the tool for specific SAMHSA grantee populations and exploring the development of an adolescent recovery measure.
Kim Mueser (Boston University) commented that these types of scales can behave differently, depending on a person’s specific type of mental disorder. For example, people with schizophrenia spectrum disorders tend to self-rate themselves higher on functioning items than people with mood disorders, even though objective measures of the same individuals indicate that they are functioning somewhat lower. Because of these differences, if a pilot test does not include a sufficiently large sample of people with disorders such as schizophrenia, generalizability could be a concern.
Michael Dennis (Chestnut Health Systems) said that he and his colleagues also conducted psychometric testing for the WHOQOL-8, using a sample of 480 women being released from jail. The sample included women with both substance use and mental health issues, and about 10 percent had serious mental illness. The sample was followed for 3 years. Dennis said that the researchers observed differences that were similar to those Mueser described. However, the WHOQOL-8 worked well within subjects.
Essex asked whether Dennis included questions other than the WHOQOL-8 in his study. Dennis said that he and his colleagues included the same questions that were in the original SAMHSA proposal. The additional questions increased the effect size. He added that the scale also worked well to distinguish people who are in the community with no use, abuse, or dependence.
Mueser noted that improvements in housing stability could have a big impact on the findings. He said that among people with severe mental illness, the biggest change in improvement in general life satisfaction is typically noticed when they go from being homeless, in jail, or in the hospital to stable housing in the community. Dennis said that he and his colleagues have not yet analyzed the data in a way that would allow them to look at this issue. Essex said that the SAMHSA sample included a large proportion of homeless populations, and there was an increase in the reliability of the 21-item tool when the home item was removed.
Alexandre Laudet (Center for the Study of Addictions and Recovery, National Development and Research Institutes, director emeritus) noted that the psychometric properties from the tests conducted in various countries look very strong. She said that she was hoping that this will work equally well in the context of addiction research. She recommended the use of the WHOQOL-BREF (see Chapter 5), but a shorter version would be much more practical to administer, and, along with the substance use items, this set could provide all of the data that are needed.
Dennis began his discussion of tradeoffs of different data collection designs by describing some common data collection strategies for measuring recovery, including duration questions, multiple intervals or recency, event history, and repeated measures.
Duration Questions: One option for measuring recovery is to ask duration questions, which can provide data on (1) the prevalence of various durations of abstinence or remission and (2) changes in the facets of recovery over the duration, which taps into the process idea. The main advantage of this approach is that it is very low burden. The disadvantage is that these types of questions can be subject to recall bias.
Multiple Intervals or Recency: Another strategy for measuring recovery is to collect data at multiple intervals or focused on recency. This approach provides data on (1) the prevalence of various durations, (2) change in facets, (3) the number and pattern of episodes, and (4) trajectories and trends. The main advantages of this approach, Dennis said, are that it allows researchers to capture a clear, clinical definition of remission and that the respondent burden is only moderate. There is still a potential recall bias, and the data can only be combined in a limited number of ways, depending on how many periods are asked.
Event History: Event history involves asking questions that collect dates for key events, such as when did the abstinence begin, when did it end, when did the treatment begin, and when did it end. This approach can provide information on (1) prevalence of various durations, (2) changes in facets, (3) the number and patterns of episodes, (4) and trajectories and trends. The main advantage of this strategy, Dennis pointed out, is that the data obtained can be summarized in multiple ways: for example, researchers can create summary measures. This strategy, too, can be subject to recall bias, he noted. Another concern related to this strategy is that respondent burden increases with multiple measures.
Repeated Measures: Collecting data more than once prospectively is the most elaborate design option. It allows researchers to examine pattern of change within individuals and evaluate predictors of transition. This
method has the lowest potential for recall bias, but a study of this type can be logistically more difficult to conduct.
Dennis noted two crosscutting issues with implications for study designs. One issue is the role of multi-morbidity, which is common: it can lead to specification errors when researchers study effects. He also argued that there is a great need to study service utilization and costs because services in these areas are underfunded, and it is important to demonstrate their value.
Dennis discussed the advantages and disadvantages of the National Survey on Drug Use in Health (NSDUH) as a vehicle for measuring recovery. One advantage of the NSDUH is that it is a very large cross-sectional sample by state planning districts. It measures prevalence, recency, and frequency of substance use. It also measures past-year substance use disorder symptoms by substance. In addition, the survey collects data on some symptoms of mood disorders and prior diagnosis related to mood or anxiety. The NSDUH also has several measures of past-year service utilization.
One of the disadvantages of the NSDUH as a potential vehicle for measuring recovery is that the current survey lacks data on duration of abstinence, multiple time periods, event history, or repeated measures for substance use or other mental disorders. Although some mental disorders have been measured periodically or as part of substudies linked to the NSDUH, the survey does not regularly collect data on internalizing disorders, such as anxiety, trauma, and suicide or externalizing disorders, such as attention deficit disorder, hyperactivity, gambling, and impulse control. In addition, the NSDUH does not have any data on multi-morbidity, and quality of life is not measured. Finally, Dennis pointed out that the NSDUH’s data on service utilization and costs could be more comprehensive.
Dennis also discussed the SAMHSA GPRA measures. He noted that there are separate measures for substance use and mental health. The substance use data for individuals served by grants are typically collected at intake, 6 months, and from patient records at discharge. The measures include detailed days of substance use by substance in the past month and days of mental health problems by symptoms. Self-reported data are also collected on past-month days of service utilization in 12 areas: substance use, mental health, and physical health in outpatient, inpatient, and emergency department settings; days of medication; arrest and incarceration. Medical records data are collected for treatment episodes in over 40 areas. The GPRA measures also include a lifetime trauma symptom screener and past 30-day social connectedness measures. He commented that the fact that the self-report measures only refer to the past 30 days is a particularly problematic characteristic of these measures. Although data
are obtained from patient records at discharge, the grantee records do not contain information on treatment or services received from other sources. He added that the GPRA data collection instrument for substance use is long and has many redundant items.
The schedule for the GPRA data collected about individuals served by mental health grants is similar to that of the GPRA data on substance use: information is obtained at intake, 6 months, and at discharge. The data collection includes past month Likert measures of functioning, substance use, depression and trauma symptoms, perception of care, and social connectedness. In addition, there are yes/no questions on 20 types of service utilization during the treatment episode. Dennis noted that the yes/no questions on service use are not able to capture change or important distinctions, such as that between 1 day or 10 days in a hospital. Like the GPRA measure on substance use, the mental illness measure also lacks self-reported information about services received during key periods.
Dennis noted that neither of the GPRA data collection instruments includes measures of substance use disorder, although, as noted, some data on substance use are collected. The data collections do not include a scale or calculation of multi-morbidity, and they do not include quality-of-life measures. He added that the GPRA measures have no published psychometric properties, no maps onto existing literature, and no linkages to other measures or to NSDUH norms.
Dennis next described in further detail the data collection approaches he had initially listed and provided some examples from his own work. In one study, he and his colleagues looked at how the duration of abstinence predicts the risk of relapse in the next year.1 Using an event-history approach, at the point of a 7-year interview the researchers asked how long from that 7-year point had participants been abstinent. The researchers found that if at year 7 a person had been abstinent for 1-12 months, then 64 percent will have relapsed by year 8. If at year 7 a person had been abstinent for 1-3 years, then only 35 percent will have relapsed by year 8. And if a person had been abstinent for 4-7 years, then only 14 percent will have relapsed at year 8. Dennis said that this phenomenon is measured with a single item, and the results delineate a process. Consistent with other research, Dennis and his colleagues found that the turning point seemed to be around 3 years of abstinence when one begins to see some stability.
However, Dennis noted that recovery is not just about abstinence. To examine how the duration of abstinence is related to other aspects
1Dennis, M.L., Foss, M.A., and Scott, C.K. (2007). An eight-year perspective on the relationship between the duration of abstinence and other aspects of recovery. Evaluation Review, 31(6), 585-612.
of recovery, the researchers looked at changes across three periods of abstinence in the same study. In the first 12 months of abstinence they found more clean and sober friends; less illegal activity and incarceration; less homelessness, violence and victimization; and less use by others at home, work, and among social peers. Between 1-3 years of abstinence, the researchers found virtual elimination of illegal activity and illegal income, better housing and living situations, and increasing employment and income. The years 4-7 of abstinence were characterized by more social and spiritual support, better mental health, continued improvement in housing and living situations, dramatic rise in employment and income, and dramatic drop in people living below the poverty line.
Dennis argued that instead of trying to define recovery by one number, which has been ruled out as a reasonable option by other speakers, simply adding a duration item to a data collection instrument makes it possible to convey the richness of how recovery changes over time. He added that this approach can also be useful to illustrate the duration of remission from substance use disorder, where most of the change happens in the first 1-3 years after remission.
As others have discussed, Dennis said, asking about multiple intervals or recency can involve questions about lifetime addiction, followed by questions about the past year, or just questions about the past year. In a yet unpublished study, he used data from the National Comorbidity Study and calculated the prevalence of remission for those with a disorder in their lifetime. Dennis noted that it makes a difference how the data categories are collapsed. Remission rates from drug use and remission rates from alcohol use are higher than remission rates from substance use overall, due to comorbidity. This is also true for remission from certain mental disorders, such as an overall category of anxiety in comparison with specific types of anxiety.
Dennis added that the higher the number of disorders included when studying remission, the fewer the number of people will be categorized as being in remission. Not factoring this in, especially when using community data, can lead to the incorrect impression that most people are getting better or getting better without treatment. The odds of getting better for people with three or four disorders is lower than for those with one disorder, and treatment and service is more important for them.
Dennis said that event-history measures can be frequency, quantity of use, or problems by a specific calendar date. The method can capture start and end dates for episodes of abstinence, treatment, incarceration or other things in a log format. The data can then be used to approximate repeated measures by summarizing across multiple combinations of time periods (e.g., rates per week or year). One limitation of event history measures is that they are typically time consuming to collect, and the more dimen-
sions are measured, the higher the burden on respondents. In addition, it can be difficult to have the right temporal order and get the timing of predictors right, unless this information is also collected with the same event history grid. Dennis said that this can be done to answer a specific question, but the burden can be very high.
Dennis argued that repeated measures are the sine qua non if one wants to study change over time. The follow-up could be limited to people with certain disorders and perhaps a random sample of people that do not have disorders. Studying change at the cohort level without repeated measures can lead to an ecological fallacy. For example, it may appear that at the group level there is steady improvement, particularly around the time of a treatment, but what is being observed is the mean value for the group. At the individual level, one study found that more than one-half of the people are changing status every year between relapse, incarceration, treatment, and recovery.2 Dennis added that once the focus is on individual patterns, it is possible to assign probabilities to certain changes. For example, the probability of going from using in the community to being in recovery decreases as the number of mental health problems increases. The probability of sustaining abstinence increases as the number of sober friends increases. Dennis said that it is difficult to understand the process and see the influence of the various factors that have an impact on recovery unless repeated measures are used.
Dennis reiterated that multi-morbidity is an important consideration when studying recovery. Looking at the prevalence of several common past-year problems in the 2011 NSDUH, Dennis noted that 60 percent of the U.S. population has one or more of the following problems: any health problem, missed any work, any mental health problem, any substance use disorder, any school problem, any justice system involvement, or any violence. Of the sample, 20 percent had two or more of these problems in the past year. By contrast, most clients in treatment are showing up with three or more problems of this type. In other words, Dennis said, the two populations are not similar, and generalizations from one population to the other might not work. In addition, substance use disorder severity is strongly related to multi-morbidity. Co-occuring disorders are approximately 26 percent more likely in the case of severe substance use disorder. Finally, multi-morbidity is also related to health care utilization costs. Dennis added that the National Institutes of Health’s common data workgroup3 recommended a common set of 15 measures of service utilization
2Scott, C.K., Foss, M.A., and Dennis, M.L. (2005). Pathways in the relapse-treatment-recovery cycle over 3 years. Journal of Substance Abuse Treatment, 28(2), S63-S72.
(from the Global Appraisal of Individual Needs) and quality of life (from the EQ-5D instrument) that already have extensive norms.
Dennis concluded his presentation with a summary of his main points and some suggestions for SAMHSA. He said that recovery is a process and that it is important to understand how long it lasts and how facets change over time. Measuring lifetime remission is feasible, but requires at least two periods, with recency or repeated measures. Because people cycle through multiple periods of using, incarceration, treatment, and recovery, it is important to examine within-person change and the predictors of transition. Multiple morbidity is important to measure and understand because it is common and affects the rates of remission, service utilization, and cost. Finally, there is a need for more integration, norms, and cross validation of the NSDUH and GPRA measures, in the interest of better support for program evaluation.
Dennis argued that the number of items needed to measure recovery is a concern, especially if NSDUH is being considered as a potential data collection vehicle. One way to address this concern is to administer the questions to only a subset of the sample, oversampling those who have disorders and are likely to require services. Instead of screening for one disorder at a time, SAMHSA could consider screening for classes of disorders. Dennis pointed out that the 20-item Global Appraisal of Individual Needs Short Screener can identify 90 percent of the people that have a disorder and rule out 90 percent of the ones that do not. As discussed, a longitudinal component would be useful.
In terms of specific measures, Dennis said that he would add a one-item symptom duration question, as well as questions on the recency of symptoms for substance use disorder and internalizing and externalizing disorders. He said that this could be accomplished with approximately 20 questions, and more questions could be asked if the disorder is severe. Dennis said that a quality-of-life measure is also needed, whether the WHOQOL-8 or the EQ-5D, or something else. Finally, he said he also considers it important to add questions on service utilization.
Wilson Compton (National Institute on Drug Abuse) asked why SAMHSA has piloted only the WHOQOL-8 and not the EQ-5D. Essex said that the decision to test the WHOQOL BREF was made before she started working on the project, based on input from an expert panel. Once she came on board, the staff learned about the shorter version of the scale, the WHOQOL-8, and proceeded to pilot test that. Dennis added that the WHOQOL has some overlap with the SAMHSA definition of recovery, while the EQ-5D is just a measure of health-related quality of life, and it is very much focused on the absence of dysfunction.
Compton noted that it appears that adding quality-adjusted life years (QALYs) would be important. Dennis said that if one had the U.S. norms
and could collect the data, then it could be done. Sherry Glied (New York University) said that the advantage of the EQ-5D is that everyone is using it for creating QALY measures, and so it is possible to use it across disorders, which would be ideal for policy purposes. Dennis commented that it is important to remember that replacing the WHOQOL-8 with the EQ-5D would mean that dimensions such as life satisfaction are not measured and would have to be introduced some other way.