Median
25th/75th
4
4/5
4
4/5
Median
25th/75th
4
4/5
4
4/5
Median
25th/75th
4
4/5
4*
4/4
Median
25th/75th
4
4/4
4
4/4
Median
25th/75th
5
4/5
4
4/5
Median
25th/75th
5
4/5
4*
4/5
Median
25th/75th
4
4/5
4
4/5
Median
25th/75th
4
3/4
4
3/4
Median
25th/75th
4
4/5
4
4/4
Median
25th/75th
5
4/5
4***
4/5
Median
25th/75th
4
3/4
4
3/4
Survey responses to questionnaires by profession
| | ||||
---|---|---|---|---|---|
3. My boss provides conditions for conducting safe care | Median 25th/75th | 5 5/5 | 4 4/5 | 5 4.75/5 | 4 4/5 |
4. In my workplace, we learn from what works well | Median 25th/75th | 5 - | 4 4/4.5 | 4,5 4/5 | 4 4/5 |
5. In my workplace, we always act on the risks we see | Median 25th/75th | 4 4/4 | 4 4/5 | 4 4/5 | 4 4/5 |
6. In my workplace, improvements are always made after negative events | Median 25th/75th | 4 4/4 | 4 3/4 | 4 4/5 | 4 4/4 |
7. I point out when i think something is about to go wrong | Median 25th/75th | 5 5/5 | 4 4/5 | 4 3.75/5 | 5 4/5 |
8. I dare to talk about my mistakes | Median 25th/75th | 4 - | 4 4/5 | 4,5 4/5 | 4 4/5 |
9. I am always well received at my workplace when i need help | Median 25th/75th | 5 5/5 | 4 4/5 | 5 3.75/5 | 4 4/5 |
10. At my workplace, we have a well-functioning collaboration with other units | Median 25th/75th | 4 4/4 | 4 3/4 | 4* 4/4.25 | 4 3/4 |
11. At my workplace, we adapt the work so that safety is maintained when conditions change | Median 25th/75th | 4 4/4 | 4 4/4 | 5** ** 4.75/5 | 4 4/4 |
13. I would feel safe if a close relative was cared for at my workplace | Median 25th/75th | 5 - | 4 4/5 | 5 4.75/5 | 4 4/5 |
14. At my workplace, we offer parents / relatives the opportunity to be involved in our patient safety work | Median 25th/75th | 3 3/3 | 4 3/4 | 3.5 2.75/5 | 4 3/4 |
The answer alternatives in the questionnaire were a five-point Likert scale, 1–5 (1 = I totally disagree, 5 = I fully agree) (Additional file 1 ). N is number of answers. Median, percentile 25th, percentile 75th, * = P < 0.05, ** = P < 0.01, *** = P < 0.001. * 1 and ** 1 is doctor in comparison with assistant nurses. ** 2 is doctor in comparison with nurse
Due to the small number of respondents in the group, other occupational group, 25th and 75th is not reported in question 4, 8, and 13
Survey responses to the question, “The green line reflections lead to a learning”
Median 25TH/75TH | 4 3/4 | 4 3/4 | 4 3/4 | |
Comparison/profession |
|
|
|
|
Median 25TH/75TH | 4 4/4 | 3,5 3/4 | 4 4/4 | 4 3/4 |
Comparison/year at the unit |
|
| ||
Median 25TH/75TH | 4 3/4 | 4 3/4 |
The answer alternatives in the questionnaire were a five-point Likert scale, 1–5 (1 = I totally disagree, 5 = I fully agree) (Additional file 1 ). The comparison is made by three occasions, by profession and by year at the unit respectively
N is number of answers. Median, percentile 25th, percentile 75th, * = P < 0.05, ** = P < 0.01, *** = P < 0.001
Demographic data for each one of the interviews, are presented in Table Table8 8 .
Demographic information of interviewed persons, their profession, years of employment at the unit and number of occasions they participated in the safety huddles
Participants | Profession | Year at the unit | Participation in safety huddles |
---|---|---|---|
P1 | Assistant nurse | > 10 Year | 5–10 times |
P2 | Nurse | > 10 Year | > 10 times |
P3 | Nurse | > 10 Year | > 10 times |
P4 | Doctor | > 10 Year | 5–10 times |
P5 | Nurse | ≤ 10 year | > 10 times |
P6 | Nurse | ≤ 10 year | 5–10 times |
P7 | Manager | ≤ 10 year | > 10 times |
P8 | Nurse | ≤ 10 year | 1–5 times |
P9 | Doctor | > 10 Year | 1–5 times |
P10 | Manager | > 10 Year | 5–10 times |
P11 | Nurse | > 10 Year | > 10 times |
P12 | Assistant nurse | > 10 Year | 5–10 times |
P13 | Assistant nurse | > 10 Year | > 10 times |
P14 | Nurse | ≤ 10 year | 1–5 times |
a In the article cited interviewed persons
The experiences reported were categorised according to the potentials described in resilience, i.e. respond, monitor, learn, and anticipate. There were many examples from learn and respond, fewer from anticipate and only one example from monitor.
The respondents gave examples of how they adapted to the conditions of work, for example by distributing patients and staff more evenly over the ward. NICU staff, both new and experienced ones, said that training was necessary to be able to respond properly to unusual situations.
‘When a very sick child arrives, suddenly large parts of the staff disappear to care for that child. How did we then manage to take care of all the other patients? A parent could take care of their child in a way that we had not planned. Someone from the general paediatric unit came to help us…” (P3).
“Everyone knows this except me…..then you realise that almost everyone wanted training…. some things are unusual…..”(P6).
This staff member indicated that she thought she was the only one interested in these issues, but the safety huddles indicated that her concerns were shared by her colleagues. The respondents believe that it is important to highlight examples in the safety huddles on how things work out, and how they have managed and responded to different situations.
Only one experience emerged that can be traced to the monitor potential. The respondent describes the importance of staff being with the patients and observing changes in the patient´s status so they do not miss anything. “You should not miss anyone…you have to observe, it is how I think…we observe them” (P12). This is an experience that is at the individual patient level; there were no experiences reported of the potential monitor at the system level.
Experiences were described of learning from activities in daily work, from mistakes and learning from colleagues but it was described as difficult to learn from situations that had been resolved.
“Narcanti, we have two different dilutions. First a mistake was discussed in the Green Line and then it was close to a mistake again ….. it was a nurse who reacted before the drug was given, we had talked about that kind of problem before (in the safety huddles)” (P7).
“….we highlight the good examples and learn from them …….what have we done well today?—Well, we have substituted for each other in coffee breaks, that was good …..There were no real learning opportunities” (P5).
The learning in the safety huddles were mainly from negative events, very few from things that had gone well, when problems were resolved. It seems difficult to get an in-depth reflection on why situations were resolved in a good way.
The respondents experienced that everyday work was becoming more and more unpredictable and complex, and some examples came up in the safety huddles of how they anticipated problems to be better prepared to deal with them.
“….plan your day with the person you work with…lunch and everything…who should go first…otherwise…no one has a break” (P1).
“Say you need help, instead of saying today I did not get a break…don’t think that you should sort it out yourself” (P1).
Work is often unpredictable, but it is important to plan the day when it is busy, for example, who should go on a break first and when, and ask for help instead of complaining afterwards. So even if it was not intentional in the format of the safety huddles, anticipation and preparedness for difficulties in the coming work shifts were subjects raised according to the respondents.
In both the analysis of interviews and the open answers from the questionnaire, two themes emerged, "Supporting factors" and "Hindering factors”. There was also an overlap between the codes; hence the results are reported together since both sources were reflections on the same phenomenon. Three and two codes respectively were found for each theme (Table (Table9, 9 , Additional file 8 ).
Themes and codes from the analysis of the interviews and from the questionnaire
Theme | Code |
---|---|
Supporting factors | Seeing benefits with reflection |
Learning from what happens | |
Finding improvements for a rewarding reflection | |
Hindering factors | Seeing difficulties with reflection |
The impact of the work climate |
The theme “Supporting factors” describes the codes "Seeing benefits with reflection", "Learning from what happens" and "Finding improvements for a rewarding reflection".
All respondents mentioned that it was valuable to have reflections in general. The safety huddles offer an opportunity for those who do not speak out in any other context in daily work or staff meetings at the unit. There is an opportunity to get confirmation that they have done the right thing and they can get input from others´ solutions.
On the basis of good examples from the reflections in the huddles it can be easier to address negative things. The safety huddles also support creating common values and cohesion in a unit where the employees are working in different areas.
“I really think it's great …. It’s not only mistakes that should be noticed, you can learn a lot from each other, everyone has different experiences….” (P7). "It's such a scattered department you may not even see others throughout the shift… I think there will be a little more cohesion in the group because of reflections…" (P11).
It is said that it is valuable to reflect on what happened during a day, and the safety huddles improve the cohesion in the working group. Without reflections in the huddles it is difficult to get the opportunity to share experiences because the unit is divided into different care rooms.
It was experienced as difficult to talk about and learn from things that went well; these positive experiences are taken for granted, and it was easier to talk about something negative. But the view was expressed that it was good to highlight and concretise things that went well so others could learn from them. There were also notes that negative comments were not taken seriously. Sometimes it was perceived as taboo to talk about when something went wrong.
“It is difficult because…for everything in life really, if you do not hear anything, then it is probably often good… you are only told the bad things” (P1).
“No one dared to say anything that was negative” (free comment survey 2020).
This may depend on expectations that only positive events should be addressed in the huddle, or on the working climate and the role of openness. Hence, it is difficult to focus on what goes well when nothing negative happens and the challenge of talking about both the positive and negative things instead of just the negative things.
The respondents pointed out suggestions for improvements to make reflections in the patient safety huddles more useful and valuable. The role of the safety huddle leader was important; he/she needed to be interested, direct the conversations and believe that the reflection was important. It was good that the quality and patient safety developer sometimes facilitated the huddles. It was important to develop the method without changing to new methods or giving new names to the method; the method was just the tool. The huddles needed to be varied and inspiring, not static with too limited conversation. It was good to vary the questions for example with different focus topics.
“… Someone who is clear about the purpose and who agrees with the purpose, I think so, not just someone who is set to lead that reflection” (P9).“…they became more inspired… the (reflection) leader must have the ability to angle the questions” (P2). “It became more lively when we started using focus topics” (P11).
The safety huddles should be regular, short, objective and with the right focus. There were different opinions about the frequency: every day, twice a week or on demand. The safety huddles needed to be planned in the schedule, so that doctors could also participate, since they were better if all professions participated. It was also regarded as important that the managers were involved and supportive.
“…that the managers try to participate and are interested and also think it is important” (P3).
It seems necessary to clarify the purpose of the safety huddles and to find ways to spread lessons from them. The leadership role of the safety huddle is important, as well as the ability of the leader of the huddle to get in-depth reflections. It is important to involve all, to schedule all professions so that they can participate in safety huddles.
The theme “Hindering factors” describes the codes: "Seeing difficulties with reflection" and "The impact of the work climate”.
Some respondents said that safety huddles got stuck in the format, so that the format was more important than what was reflected on, which was perceived as inhibiting, and there was a need to clarify the purpose of the huddles. There were no learning opportunities and it was hard to keep them serious and focused. There was more focus on staff working hours and breaks than on the actual task of creating good care. When the reflection was based on what went well, it was often the same things that came up, which was not useful. It was difficult to find times that suited all employees to attend the huddles. Things that needed to come up were not discussed, and things that came up were not carried forward, since those who could answer were not present, which was frustrating.
“..There was often a lot of repetition, it was the same thing. And everything that becomes the same thing becomes very boring” (P2). “Feedback is given but stays there …it may need to reach other people … or make improvements… often it stays in the small group… and the challenge may continue to bother you” (P6).
Difficulties with the safety huddles were described; it was hard to keep the reflections serious and focused, there were no learning opportunities and the purpose was not clear. It was difficult to make improvements when not all professions participated.
The respondents felt that how easy it was to dare to reflect openly varied, depending on the situation and the constellation of participants. Comments emerged that sometimes the atmosphere during the safety huddles was not inviting and the conversations were superficial. There was a desire for an open and permissive climate but the experience was that this was not always the case. The work in the unit meant that the employees were scattered inside different care rooms and did not see each other during the working day if there is no opportunity to gather, for example during a safety huddle. There was a great work experience and skill in the group; many had worked for a long time and it was difficult as a new person to dare to talk, and it was difficult to join the working group. The managers were important for creating the climate and supporting the Green Line reflections.
“We should have a slightly more open climate in our department….the attitude of some in the staff group may be….judges a little too easily sometimes” (P6). “…when you are new, you are invisible” (P5). “Managers must be involved in the Green Line project and support an open climate…..” (P3).
The impact of the work climate and the difficulties that exist when employees were scattered in individual care rooms throughout the day were described. It was difficult to join the working group when there was great work experience and skill in the group. A safety huddle could lead to improved cohesion and community in the group but needed support from the managers.
This study evaluates the introduction of Safety-II inspired reflections in patient safety huddles for staff at a hospital ward. Thus, it is an attempt to draw empirical knowledge from interventions designed to operationalise changes based on Safety-II and resilience engineering principles [ 8 , 9 ]. Most respondents were positive towards safety huddles generally, but it was found that to really lead to learning and improvement, the format and support for a Safety-II inspired reflection needs to be developed and the purpose needs to be clarified further. There were different opinions about what was easy or difficult when performing the safety huddles. Our findings suggest these matters depended on the situation, who took part in the safety huddle and who led it. There were minor changes in some aspects of patient safety culture measurements over time during the study period. In the experiences discussed in the safety huddles, there were examples of the system potentials of resilience: learn, respond and anticipate, but only one of the potential monitor.
It was perceived difficult to reflect on and learn from what was going well, the Safety-II perspective. The literature on learning in a Safety-II perspective is still sparse. In one study on an intervention based on written reports on things that had gone well, the number of reports was smaller than expected, problems getting staff engaged on a wide scale were discussed, and it was concluded that learning from how things go well is a simple yet compelling concept [ 16 ]. Our study supports this reflection; Safety-I learning took precedence in relation to Safety-II in the safety huddles at the NICU, even though both can co-exist. We normally “see” when an adverse event takes place, but we do not “see” when an adverse event does not take place, when things go well [ 15 ]. Healthcare professionals are trained to see and report adverse events [ 2 ]. If they do not see what is going well, it is difficult for them to understand and describe it. In a study of nurses' experiences of the incident reporting culture after implementation of the Green Cross method, it was found that it was not good to focus only on things that went wrong, and it was suggested that health care would benefit from learning both from successes and errors [ 22 ]. In the present study it was found necessary that staff understood that shifting focus from Safety-I to II should include learning from both Safety-I and II perspectives [ 23 ].
One goal of the Green Line reflections was to support learning. Adults learn what they experience as meaningful, they take as much responsibility as they are interested in, and they do not get involved if they do not see any meaning to what they are learning [ 24 ]. Leadership is important to create good conditions and a permissive climate for learning [ 25 ]. Our study supports the view that the role of managers is important; reflections in patient safety huddles need support from clear leadership by the managers at the unit, and the purpose of the safety huddles needs to be constantly clarified [ 14 ]. Managers at the clinical level are central to the system’s capacity for expressing resilience but they need more models and training in how to approach their work [ 25 ]. Managers need to continuously follow up an intervention to reinforce commitment for a change to be fully accepted and established in the workplace [ 26 ]. A development-oriented leadership where managers support employees' learning as part of development can be successful. The manager's role is to clarify expectations, prioritise development issues, create resources, and to follow up [ 27 ]. In this study, there were shortcomings in how the improvement work was followed up in the long term.
The impact of the work environment is central. Psychological safety, a belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns or mistakes, is important in a workplace to support tolerance and openness [ 28 ]. Tolerance and openness in a workplace helps patient safety huddles to be perceived as rewarding so that they support learning based on reflections on both negative and positive events. The desire is to improve the work environment based on experiences expressed in the reflections; but there also needs to be a good work environment to encourage people to risk sharing experiences. There was a significant change from October 2019 and December 2020 regarding whether the employees pointed out when something was about to go wrong (with a lower rating in December 2020). Apart from aspects of psychological safety that were expressed in the interviews, this may also be explained by the management changes, and changes in format of and support for the safety huddles. In an NICU, collective learning based on safety reporting and accumulated knowledge in prioritizing of and performing the work may be difficult, since the work is performed inside individual care rooms, as has been pointed out by Hybinett et al. [ 25 ]. To see each other and share experiences may contribute to psychological safety. Safety huddles for all staff can offer an opportunity to share experiences and increase knowledge in a unit where work is dispersed, as in a NICU.
In this study, the Safety-II inspired safety huddles were found sometimes to be worthwhile and sometimes not. They turned out differently depending on who led the reflections, their experience in doing so, their knowledge of the theoretical background, and their ability to get an in-depth reflection. How open and tolerant the participants and the work climate were, was also of importance. Furthermore, it was appreciated if someone else from outside the unit, for example the quality and patient safety developer, sometimes led the reflection to ensure it was deeper. In a study of the Green Cross method in healthcare (i. e. Safety-I), Schwarz et al. also found that the leadership role in the meeting is important [ 14 ]. It helps to have supporting questions and open questions, and the questions need to be varied. In addition, it is good if all professions participate; in our study it was reported that physicians did not attend the safety huddles as regularly as other professions. In another study on the effect of the Green Cross method on incident reporting the participation of physicians was also highlighted [ 29 ]. For learning according to Safety-II to happen, the importance of reflecting on everyday practice, and ensuring that such reflection is routinely carried out in practice, is important [ 30 ]. Our findings support this; schedule planning is needed and the safety huddles have to occur regularly. The safety huddles need to be tailored to the staff's needs and have an actual impact on improving their work to be experienced as rewarding and valuable. There is a need to develop methods to spread lessons and support improvements based on positive events in the same manner as from negative events, i. e. to explore everyday work [ 31 ].
The respondents experienced that everyday life was becoming increasingly complex and there was a need to adapt to different situations. In complex enterprises such as modern health care it is necessary to make pragmatic adaptations to changing contexts, including in the introduction of improvement interventions [ 32 ]. To support the development and testing of improvements in complex healthcare systems, PDSA cycles are well established. In these, ideas are transformed into action, the actions are tested and studied to learn and to improve them, and this is continued in a cycle, for continuous improvement [ 17 , 33 ]. PDSA is an established improvement tool in the NICU, and it was used initially to support the Green Line work, but not thoroughly over time. It may have been valuable to continue the PDSA cycles until the Green Line reflections had been satisfactory established.
All four potentials that have been suggested to describe a resilient system [ 7 ], were exemplified in the reflections, but to varying degrees. Communication, for example the safety huddles in this study, can contribute to the four potentials, but do not directly contribute to resilient performance [ 34 ]. There were more examples reported in the interviews from the potentials learn and respond, while there were fewer from anticipate and only one from monitor. There is possibly a greater propensity for healthcare professionals to act, than to be actively aware of what they can expect from the future and from measurements. This can be exemplified by the work situation at an NICU, as has been described by Hybinette et al. The focus in an NICU is on unpredictable factors such as acute admissions, where one has to quickly readjust plans and actions and where the inflow of emergency patients may have the highest priority [ 25 ]. One possible way forward to better highlight and describe examples of expression of resilient capacity might be to use a number of pre-defined issues for the reflections which can reflect and draw attention to all four potentials.
Different methods were used with the aim of capturing aspects of the experience and impact of the Green Line method. This project was originally designed as an improvement project, not a research study. Had it been so, another approach to evaluating possible effects on patient safety culture would have been chosen. Measuring the patient safety culture using questionnaires can be useful. Hospitals that have good results in patient safety measurements also have lower numbers of adverse safety events; but further research is needed to investigate the relationship between the measured safety culture and the improvement in clinical safety [ 35 ]. The instrument chosen for evaluation of patient safety culture is widely known and used in Swedish healthcare in different contexts [ 18 , 19 ], and was therefore chosen by the improvement group. The introduction of the Green Line reflections was less likely to improve patient safety culture. However, the hope for that was one of the reasons behind the project for the management of the ward, and the improvement group. Therefore it was found most true and honest to the project and the workplace to include this in the study.
When the interviews were conducted, there had been no safety huddles for a while, which may have affected the answers. It is difficult to draw conclusions from comparisons (surveys) over time when the conditions are constantly changing; there were changes in the local context and in the design of the improvement work, which may have affected the results. The response rate was quite low in the survey responses; hence, conclusions have to be drawn carefully. However, the survey was supplemented with interviews and the results were largely consistent across both methods.
The first author's pre-understanding from being a quality and patient safety developer at the department of paediatrics and from taking an active part in the improvement work may have affected the results. Pre-understanding can also be important in the analysis and interpretation of data and can contribute to in-depth knowledge and understanding [ 36 ].
Based on this study´s results, it may be difficult to introduce reflections based on learning from everything that happens, including when things go well (Safety-II) into patient safety huddles. Careful planning is important for such interventions to be able to succeed. To make the reflections better, it is important to have support from managers, and for those who lead the safety huddles to have knowledge of the theories underpinning the Safety-II approach. For the participants, there needs to be an open and permissive climate, a plan to ensure that all professions can participate, and stable conditions in management and support of the safety huddles for them to be experienced as valuable for learning. Further studies are needed to understand how Safety-II-inspired safety huddles are best implemented and to determine whether increased understanding amongst employees of the purpose of the huddles may contribute to better patient safety and an improved patient safety culture.
Erik Hollnagel, for support when designing the improvement work.
Bo Rolander, statistician at Futurum, for help with the statistics.
Maria Olsson, librarian at Futurum library, for help with the reference list.
All authors contributed to the design of the study. KW made the quantitative analysis, along with a statistician. All authors contributed to the qualitative analysis as described in the methods section. KW made the first draft of the manuscript which was then finalised by all authors, who then read and approved the final manuscript.
Research funding was obtained from Futurum – the academy of health and care, Region Jönköping County.
Declarations.
Ethical approval for the study was obtained from the Swedish Ethical Review Authority (Registration number 2020–04448). Participation was voluntary and oral consent was collected from each participant. Additionally, oral informed consents have been approved by the Swedish Ethical Review Authority. The material was saved without linking it to individual participants. The study was in all aspects performed in accordance with the Declaration of Helsinki guidelines.
Not applicable.
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Karina Wahl, Email: [email protected] .
Margaretha Stenmarker, Email: [email protected] .
Axel Ros, Email: [email protected] .
It’s been two years since the U.S. Supreme Court ruling in the Dobbs case that overturned the federal right to an abortion, and the troubling concurring opinion by Justice Clarence Thomas in which he expressed a desire to “revisit” other landmark precedents, including the freedom to marry for same-sex couples, codified nationally by the Obergefell Supreme Court decision, nine years ago Wednesday
Since that ruling, the LGBTQ+ and allied community has done much to protect the fundamental freedom to marry — passing the Respect for Marriage Act in Congress in 2022; sharing their stories this year to mark the 20th anniversary of the first state legalization of same-sex marriages, in Massachusetts; and in California , Hawaii and Colorado launching ballot campaigns to repeal dormant but still-on-the-books anti-marriage constitutional amendments.
The party’s executive board voted Sunday on which measures they would endorse.
May 19, 2024
This winter, I worked with a team at the Williams Institute at UCLA School of Law to survey nearly 500 married LGBTQ+ people about their relationships. Respondents included couples from every state in the country; on average they had been together for more than 16 years and married for more than nine years. Sixty-two percent married after the court’s 2015 Obergefell marriage decision, although their relationships started before before that. More than 30% of the couples had children and another 25% wanted children in the future.
One finding that jumped out of the data: Almost 80% of married same-sex couples surveyed said they were “very” or “somewhat” concerned about the Obergefell decision being overturned. Around a quarter of them said they’d taken action to shore up their family’s legal protections — pursuing a second-parent adoption, having children earlier than originally planned or marrying on a faster-than-expected timeline — because of concerns about marriage equality being challenged. One respondent said, “We got engaged the day that the Supreme Court ruled on the Dobbs decision and got married one week after.”
World & Nation
The Supreme Court’s historic ruling Friday granting gays and lesbians an equal right to marry nationwide puts an exclamation point on a profound shift in law and public attitudes, and creates the most significant and controversial new constitutional liberty in more than a generation.
June 26, 2015
As we examined the survey results, it became clearer than ever why LGBTQ+ families and same-sex couples are fighting so hard to protect marriage access — and the answer is really quite simple: The freedom to marry has been transformative for them. It has not only granted them hundreds of additional rights and responsibilities, but it has also strengthened their bonds in very real ways.
Nearly every person surveyed (93%) said they married for love; three-quarters added that they married for companionship or legal protections. When asked how marriage changed their lives, 83% reported positive changes in their sense of safety and security, and 75% reported positive changes in terms of life satisfaction. “I feel secure in our relationship in a way I never thought would be possible,” one participant told us. “I love being married.”
I’ve been studying LGBTQ+ people and families for my entire career — and even still, many of the findings of the survey touched and inspired me.
Individual respondents talked about the ways that marriage expanded their personal family networks, granting them (for better and worse!) an additional set of parents, siblings and loved ones. More than 40% relied on each other’s families of origin in times of financial or healthcare crisis, or to help out with childcare. Some told of in-laws who provided financial assistance to buy a house, or cared for them while they were undergoing chemotherapy for cancer.
The legal world may have become inured to wildly rhetorical opinions by Justice Antonin Scalia, but his dissent in the Supreme Court’s same-sex marriage decision Friday reaches new heights for its expression of utter contempt for the majority of his colleagues.
And then there was the effect on children. Many respondents explained that their marriage has provided security for their children, and dignity and respect for the family unit. Marriage enabled parents to share child-rearing responsibilities — to take turns being the primary earner (and carrying the health insurance), and spending more time at home with the kids.
The big takeaway from this study is that same-sex couples have a lot on the line when it comes to the freedom to marry — and they’re going to do everything possible to ensure that future political shifts don’t interfere with their lives. As couples across the country continue to speak out, share their stories — and in California, head to the ballot box in November to protect their hard-earned freedoms — it’s clear to me that it’s because they believe wholeheartedly, and with good reason, that their lives depend on it.
Abbie E. Goldberg is an affiliated scholar at the Williams Institute at UCLA School of Law and a psychology professor at Clark University, where she directs the women’s and gender studies.
June 9, 2024
June 7, 2024
June 6, 2024
A cure for the common opinion
Get thought-provoking perspectives with our weekly newsletter.
You may occasionally receive promotional content from the Los Angeles Times.
July 1, 2024
June 30, 2024
June 28, 2024
You have full access to this open access article
82 Accesses
Explore all metrics
Establishing thresholds of change that are actually meaningful for the patient in an outcome measurement instrument is paramount. This concept is called the minimum clinically important difference (MCID). We summarize available MCID calculation methods relevant to spine surgery, and outline key considerations, followed by a step-by-step working example of how MCID can be calculated, using publicly available data, to enable the readers to follow the calculations themselves.
Thirteen MCID calculations methods were summarized, including anchor-based methods, distribution-based methods, Reliable Change Index, 30% Reduction from Baseline, Social Comparison Approach and the Delphi method. All methods, except the latter two, were used to calculate MCID for improvement of Zurich Claudication Questionnaire (ZCQ) Symptom Severity of patients with lumbar spinal stenosis. Numeric Rating Scale for Leg Pain and Japanese Orthopaedic Association Back Pain Evaluation Questionnaire Walking Ability domain were used as anchors.
The MCID for improvement of ZCQ Symptom Severity ranged from 0.8 to 5.1. On average, distribution-based methods yielded lower MCID values, than anchor-based methods. The percentage of patients who achieved the calculated MCID threshold ranged from 9.5% to 61.9%.
MCID calculations are encouraged in spinal research to evaluate treatment success. Anchor-based methods, relying on scales assessing patient preferences, continue to be the “gold-standard” with receiver operating characteristic curve approach being optimal. In their absence, the minimum detectable change approach is acceptable. The provided explanation and step-by-step example of MCID calculations with statistical code and publicly available data can act as guidance in planning future MCID calculation studies.
Determining the clinical importance of treatment benefits for interventions for painful orthopedic conditions.
Avoid common mistakes on your manuscript.
The notion of minimum clinically important difference (MCID) was introduced to establish thresholds of change in an outcome measurement instrument that are actually meaningful for the patient. Jaeschke et al . originally defined it “as the smallest difference in score in the domain of interest which the patient perceives as beneficial and which would mandate, in the absence of troublesome side-effects and excessive cost, a change in the patient’s management” [ 1 ].
In many clinical trials statistical analyses only focuses on intergroup comparisons of raw outcome scores using parametric/non-parametric tests and deriving conclusions based on the p -value. Using the classical threshold of p- value < 0.05 only suggests that the observed effect is unlikely to have occurred by chance, but it does not equate to a change that is clinically meaningful for the patient [ 2 ]. Calculating MCID scores, and using them as thresholds for “treatment success”, ensures that patients’ needs and preferences are considered and allows for comparison of proportion of patients experiencing a clinically relevant improvement among different groups [ 3 ]. Through MCID, clinicians can better understand the impact of an intervention on their patients’ lives, sample size calculations can become more robust and health policy makers may decide which treatments deserve reimbursement [ 4 , 5 , 6 ].
The MCID can be determined from the patient’s perspective, where it is the patient who decides whether a change in their health was meaningful [ 4 , 7 , 8 , 9 ]. This is the most common “gold-standard” approach and one that we will focus on. Occasionally, the clinician’s perspective can also be used to determine MCID. However, MCID for a clinician may not necessarily mean an increase in a patient’s functionality, but rather a change in disease survival or treatment planning [ 10 ]. MCID can also be defined at a societal level, as e.g. improvement in a patient’s functionality significant enough to aid their return to work [ 11 ].
MCID thresholds are intended to assess an individual’s clinical improvement and ought not to be applied to mean scores of entire groups post-intervention, as doing so may falsely over-estimate treatment effectiveness. It is also noteworthy to mention that obtained MCID values are not treatment-specific but broadly disease category-specific. They rely on a patient’s perception of clinical benefit, which is influenced by their diagnosis and subsequent symptoms, not just treatment modality.
In this study, we summarize available MCID calculation methods and outline key considerations when designing a MCID study, followed by a step-by-step working example of how MCID can be calculated.
To illustrate the MCID methods and to enable the reader to follow the practical calculation guide of different MCID values, based on the described methods along the way, a previously published data set of 84 patients, as described in Minetama et al ., was used based on CC0.10 license [ 12 ]. Data can be downloaded at https://data.mendeley.com/datasets/vm8rg6rvsw/1 . The statistical R code can be found in Supplementry content 1 including instructions on formatting the data set for MCID calculations The title of different MCID methods in the paper (listed below) and their number correspond to the same title and respective number in the R code. All analyses in this case study were carried out using R version 2023.12 + 402 (The R Foundation for Statistical Computing, Vienna Austria) [ 13 ].
The aim of Minetama et al . was to assess the effectiveness of supervised physical therapy (PT) with unsupervised at-home-exercises (HE) in patients with lumbar spinal stenosis (LSS). The main inclusion criteria were presence of neurogenic intermittent claudication and pain/or numbness in the lower extremities with or without back pain and > 50 years of age; diagnosis of LSS confirmed on MRI and a history of ineffective response to therapy for ≥ 3 months. Patients were then randomized into a 6-week PT or HE programme [ 12 ]. All data was pooled, as a clinically significant benefit for patients is independent of group allocation and because MCID is disease-specific. Therefore, the derived MCID will be applicable to most patients with lumbar spinal stenosis, irrespective of treatment modality. Change scores were calculated by subtracting baseline scores from follow-up scores.
There are multiple approaches to calculate MCID, mainly divided into anchor-based and distribution-based methods (Fig. 1 ) [ 4 , 10 , 14 , 15 , 16 , 17 ]. Before deciding on the method, it needs to be defined whether the calculated MCID will be for improvement or deterioration [ 18 ]. Most commonly, MCID is used to measure improvement (as per Jaeschke et al . definition) [ 1 , 4 , 7 , 14 , 15 , 16 , 19 , 20 ]. The value of MCID for improvement should not be directly applied in reverse to determine whether a decrease in patients' scores signifies a clinically meaningful deterioration – those are two separate concepts [ 18 ]. In addition, the actual MCID value ought to be applied to post-intervention score of an individual patient (not the overall score for the whole group), to determine whether, at follow-up, he or she experienced a change equating to MCID or more, compared to their baseline score. Such patient is then classified as “responders”.
Flow diagram presenting range of Minimum clinically important difference calculation methods stratified into anchor, distribution-based and “other” described in the study. MCID, Minimum Clinically Important Difference; MIC, Minimal Important Change
According to the Consensus-based Standards for the selection of health measurement instruments (COSMIN) guidelines, the “anchor-based” approach is regarded as the “gold-standard” [ 21 , 22 , 23 ]. In this approach, we determine the MCID of a chosen outcome measurement, based on whether a pre-defined MCID (usually derived from another published study) was achieved by an external criterion, known as the anchor, usually another patient-reported outcome measure (PROM) or an objective test of functionality [ 4 , 7 , 8 , 15 , 16 , 17 , 18 , 20 ]. It is best to use scales which allow the patient to rate the specific aspect of their health related to the disease of interest post-intervention compared to baseline on a Likert-type scale. This scale may range, for example, from “much worse”, “somewhat worse”, “about the same”, “somewhat better”, to “much better”, such as the established Global Assessment Rating tool [ 7 , 8 , 24 , 25 ]. Depending on the scale, some studies determine MCID by calculating change scores for patients who only ranked themselves as “somewhat better”, and some only consider patients who ranked themselves as “much better” [ 7 , 25 , 26 , 27 , 28 , 29 ]. This discrepancy is likely an explanation for a range of MCID for a single outcome measure dependent on the methodology. There appears to be no singular “correct” approach. One of the alternatives to the Global assessment rating is the use of the health transition item (HTI) from the SF-36 questionnaire, where patients are asked about their overall health compared to one year ago [ 7 , 30 , 31 ]. Although quick and easy to conduct, the patient’s response may be influenced by comorbid health issues other than those targeted by intervention. Nevertheless, any anchor where the patient is the one to decide what change is clinically meaningful, captures the true essence of the MCID. One should however, be mindful of the not easily addressed recall bias with such anchors – patients at times do not reliably remember their baseline health status [ 32 ]. Moreover, what the above anchors do not consider is, whether the patient would still choose the intervention for the same condition despite experiencing side-effects or cost. That can be addressed through implementing anchors such as the Satisfaction with Results scale described in Copay et al ., who found that MCID values based on the Satisfaction with Results scale were slightly higher than those derived from HTI-SF-36 [ 7 , 33 ].
Other commonly used outcome scales, such as Oswestry Disability Index (ODI), Roland–Morris Disability Questionnaire (RMDQ), Visual Analogue Scale (VAS), or EQ5D-3L Health-Related Quality of Life, can also act as anchors [ 7 , 14 , 16 , 34 , 35 ]. In such instances, patients complete the “anchor” questionnaire at baseline and post-intervention and the MCID of that anchor is derived from a previous publication [ 12 , 16 , 35 ]. Before deciding on the MCID, full understanding of how it was derived in that previous publication is crucial. Ideally, this should be done for a population similar to our study cohort, with comparable follow-up periods [ 18 , 20 ]. Correlations between the anchor instrument and the investigated outcome measurement instrument must be recorded, and ought to be at least moderate (> 0.05), as that is the best indicator of construct validity (whether both the anchor instrument and outcome instrument represent a similar construct of patient health) [ 18 , 36 ]. If such correlation is not available, the anchor-based MCID credibility instrument is available to aid in assessing construct proximity between the two [ 36 , 37 ].
Once the process for selecting an anchor and classifying “responders” and “non-responders” is established, the MCID can be calculated. The outcome instrument of interest will be defined as an outcome for which we want to calculate the MCID. The first anchor-based method (within-patient change) focuses on the average improvement seen among clear responders in the anchor. The between-patient change anchor-based method additionally subtracts the average improvement seen among non-responders (unchanged and/or worsened) and consequently ends up with a smaller MCID value. Finally, an anchor-based method based on Receiver Operating Characteristic (ROC) curve analysis–that can be considered the current “gold standard”- also exists, which effectively looks at the MCID calculation as a sort of diagnostic instrument and aims to improve the discriminatory performance of our MCID threshold. In the following paragraphs, the three anchor-based methods are described in more detail. The R code (Supplementry Content 1 ) enables the reader to follow the text and to calculate MCID for the Zurich Claudication Questionnaire (ZCQ) Symptom Severity domain, based on a publicly available dataset [ 12 ].
The chosen outcome measurement instrument in this case study for which MCID for improvement will be calculated is ZCQ Symptom Severity domain [ 12 ]. The ZCQ is composed of three subscales: symptom severity (7 questions, score per question ranging from 1 to 5 points); physical function (5 questions, score per question ranging from 1 to 4 points) and patient satisfaction with treatment scale (6 questions, score per question ranging from to 4 points). Higher scores indicate greater disability/worse satisfaction [ 38 ]. To visualize different MCID values, Numeric Rating Scale (NRS) for Leg Pain (score from 0 “no pain” to 10 “worse possible pain) and Japanese Orthopaedic Association Back Pain Evaluation Questionnaire (JOABPEQ) Walking Ability domain are chosen, as they showed high responsiveness in patients with LSS post-operatively [ 39 ].Through 25 questions, the JOABPEQ assesses five distinctive domains: pain-related symptoms, lumbar spine dysfunction, walking ability, impairment in social functioning and psychological disturbances. The score for each domain ranges from 0 to 100 points (higher score indicating better health status) [ 40 ]. The correlation of ZCQ symptom severity with NRS Leg Pain and JOABPEQ Walking Ability domain, is 0.56 and − 0.51, respectively [ 39 ]. For a patient to be classified as a “responder”, using the NRS for Leg pain or JOABPEQ walking ability, the score at 6-week follow-up must have improved by 1.6 points or 20 points, respectively [ 7 , 40 , 41 ].
This publicly available dataset does not report patient satisfaction or any kind of global assessment rating.
To enable calculation of global assessment rating-based MCID methods for educational purposes, despite very limited availability of studies providing MCID for deterioration of JOABPEQ, we decided to stratify patients in this dataset into the three following groups, based on the JOABPEQ Walking Ability as an anchor: likely improved (change score above 20 points according to Kasai et al . ), no significant change (− 20– + 20 points change score), and likely deteriorated (lower than − 20 points change score) [ 41 ]. As obtained MCID values were expected to be negative, all values, for clarity of presentation, were multiplied by − 1, except in Method (IX), where graphical data distribution was shown.
Method (i) calculating mcid using “within-patient” score change.
The first method focuses on calculating the change between baseline and post-intervention score of our outcome instrument, for each patient classified as a “responder”. A “responder” is a patient who, at follow-up, has achieved the pre-defined MCID of the anchor (or ranks themselves high enough on Global assessment rating type scale based on our methodology). The MCID is then defined as the mean change in the outcome instrument of interest of those classified as “responders” [ 4 , 7 , 16 , 31 ].
The corresponding R-Code formula is described in Step 5a of Supplementry Content 1 . Calculated within-patient MCID of ZCQ Symptom Severity based on NRS Leg Pain and JOABPEQ Walking Ability domain was 4.4 and 4.2, respectively.
In this approach, the mean change in our outcome instrument is calculated for not only “responders” but also for “non-responders”. “Non-responders” are patients who did not achieve the pre-defined MCID of our anchor or who did not rank themselves high enough (unchanged, or sometimes: unchanged + worsened) on Global Assessment Rating type scale according to our methodology. The minimum clinically important difference of our outcome instrument is then defined as the difference between the mean change scores of “responders” and “non-responders” [ 4 , 7 , 16 , 19 ].
The corresponding R-Code formula is described in Step 5b of Supplementry content 1 . Calculated between-patient MCID of ZCQ Symptom Severity based on NRS Leg Pain and JOABPEQ Walking Ability domain was 3.5 and 2.8, respectively.
Here the MCID is derived through ROC analysis to identify the “threshold” score of our outcome instrument that best discriminates between “responders” and “non-responders” of the anchor [ 4 , 7 , 16 , 19 , 27 ]. To understand ROC, one must familiarize oneself with the concept of sensitivity and specificity. In ROC analysis, sensitivity is defined as the ability of the test to correctly detect “true positives”, which in this context refers to patients who have achieved a clinically meaningful change.
“False negative” would be a patient, who was classified as “non-responder” but is really a “responder”. Specificity is defined as the ability of a test to correctly detect a “true negative” result- a patient who did not achieve a clinically meaningful change – a “non-responder” [ 25 ].
A “false positive” would be a patient, who was classified as a “responder” but who was a “non-responder”. Values for sensitivity and specificity range from 0 to 1. Sensitivity of 1 means that the test can detect 100% of “true positives”’ (“responders”), while specificity of 1 reflects the ability to detect 100% of “true negatives” (“non-responders”). It is unclear what the minimum sensitivity and specificity should be for a “gold-standard” MCID, which is why the most established approach is to opt for a MCID threshold that maximizes both sensitivity and specificity at the same time, which can be done using ROC analysis [ 4 , 7 , 25 , 31 , 42 ]. During ROC analysis, the “closest-to-(0,1)-criterion” (the top left most point of the curve) or the Youden index are the two methods to automatically determine the optimal threshold point [ 43 ].
When conducting the ROC analysis, the Area under the curve (AUC) is also determined–a measure of how well the MCID threshold discriminates responders and non-responders in general. Values in AUC can range 0–1. An AUC of 0.5 signifies that the score discriminates no better than random chance, whereas a value of 1 means that the score perfectly discriminates between responders and non-responders. In the literature, an AUC of 0.7 and 0.8 is deemed fair (acceptable), while ≥ 0.8 to < 0.9 is considered good and values ≥ 0.9 are considered excellent [ 44 ]. Calculating the AUC provides a rough estimate of how well the chosen MCID threshold performs. The corresponding R-Code formula is described in Step 5c of Supplementry content 1 . Statistical package pROC was used. The calculated MCID of ZCQ symptom severity based on NRS Leg Pain and JOABPEQ Walking Ability domain was for both 1.5.
Calculation of MCID using the distribution-based approach focuses on statistical properties of the dataset [ 7 , 14 , 16 , 27 , 45 ]. Those methods are objective, easy to calculate, and in some cases, yield values close to anchor-based MCID. The advantage of this approach is that it does not rely on any external criterion or require additional studies on previously established MCIDs or other validated “gold standard” questionnaires for the specific disease in each clinical setting. However, it fails to include the patient’s perspective of a clinically meaningful change, which will be discussed later in this study. In this sense, distribution-based methods focus on finding MCID thresholds that enable mathematical distinction of what is considered a changed vs. unchanged score, whereas anchor-based methods focus on finding MCID thresholds which represent a patient-centered, meaningful improvement.
The standard error of measurement conceptualizes the reliability of the outcome measure, by determining how repeated measurements of an outcome may differ from the “true score”. Greater SEM equates to lower reliability, which is suggestive of meaningful inconsistencies in the values produced by the outcome instrument despite similar measuring conditions. Hence, it has been theorized that 1 SEM is equal to MCID, because a change score ≥ 1 SEM, is unlikely to be due to measurement error and therefore is also more likely to be clinically meaningful [ 46 , 47 ]. The following formula is used: [ 1 , 7 , 35 , 46 , 48 ].
The ICC, also called reliability coefficient, signifies level of agreement or consistency between measurements taken on different occasions or by different raters [ 49 ]. There are various ways of calculating the ICC depending on the used model with values < 0.5, 0.5– 0.75, 0.75–0.9 and > 0.90 indicating poor, moderate, good and excellent reliability, respectively [ 49 ]. While a value of 1 × SEM is probably the most established way to calculate MCID, in the literature, a range of multiplication factors for SEM-based MCID have been used, including 1.96 SEM or even 2.77 SEM to identify a more specific threshold for improvement [ 48 , 50 ]. The corresponding R-Code formula is described in Step 6a of Supplementry Content 1 . The chosen ZCQ Symptom Severity ICC was 0.81 [ 51 ]. The SEM-based MCID was 1.9.
Effect size (ES) is a standardized measure of the strength of the relationship or difference between two variables [ 52 ]. It is described by Cohen et al . as “degree to which the null hypothesis (there is no difference between the two groups) is false”. It allows for direct comparison of different instruments with different units between studies. There are multiple forms to calculate ES, but for the purpose of MCID calculations, the ES represents the number of SDs by which the post-intervention score has changed from baseline score. It is calculated based on the following formula incorporating the average change score divided by the SD of the baseline score: [ 52 ].
According to Cohen et al . 0.2 is considered small ES, 0.5 is moderate ES and 0.8 or more is large ES [ 53 ]. Most commonly, a change score with an ES of 0.2 is considered equivalent to MCID [ 7 , 16 , 31 , 54 , 55 , 56 ]. Using this method, we are basically identifying the mean change score (in this case reflecting the MCID) that equates to an ES of 0.2: [ 7 , 55 ].
Practically, if a patient experienced small improvement in an outcome measure post intervention, the ES will be smaller than for a patient who experienced a large improvement in outcomes measure. The corresponding R-Code formula is described in Step 6b of Supplementry Content 1 . The ES-based MCID was 0.9.
The Standardized Response Mean (SRM) aims to gauge the responsiveness of an outcome similarly to ES. Initially described by Cohen et al . as a derivative of ES assessing differences of paired observations in a single sample, later renamed as SRM, it is also considered an “index of responsiveness” [ 38 , 53 ]. However, the denominator is SD of the change scores–not the SD of the baseline scores–while the numerator remains the average change score from baseline to follow-up: [ 10 , 45 , 57 , 58 , 59 ].
Similarly, to Cohen’s rule of interpreting ES, it has been theorized that responsiveness can be considered low if SRM is 0.2–0.5, moderate if > 0.5–0.8 and large if > 0.8 [ 58 , 59 , 60 ]. Again, a change score equating to SRM of 0.2 (although SRM of 1/3 or 0.5 were also proposed) can be considered MCID, although studies have used the overall SRM as MCID as well [ 45 , 54 , 56 , 61 ]. However, since SRM is a standardized index, similarly to ES, the aim of the SRM-based method ought to be to identify a change score that indicates responsiveness of 0.2: [ 61 ].
Similar to the ES-based method, the SRM-based approach for calculating the MCID is not commonly used in in spine surgery studies [ 14 ]. It is a measure of responsiveness, which is the ability to detect change over time in a construct to be measured by the instrument, and ought to be therefore calculated for the study-specific change score rather than extrapolated as a “universal” MCID threshold to other studies. The corresponding R-Code formula is described in Step 6c of Supplementry Content 1 . The SRM-based MCID was 0.8.
The limitation of using Method (V) and (VI) in MCID calculations will be later described in Discussion.
Standard Deviation represents the average spread of individual data points around the mean value of the outcome measure. Norman et al . found in their review of studies using MCID in health-related quality of life instruments that most studies had an average ES of 0.5, which equated to clinically meaningful change score of 0.5 × SD of baseline score [ 7 , 16 , 30 ].
The corresponding R-Code formula is described in Step 6d of Supplementry content 1 . The SD-based MCID was 2.1.
The MDC is defined as the minimal change below which there is a 95% chance that it is due to measurement error of the outcome measurement instrument: [ 7 , 61 ].
Usually, value corresponding to z is the desired level of confidence, which for 95% confidence level is 1.96. Although MDC–like all distribution-based methods–does not consider whether a change is clinically meaningful, the calculated MCID should be at least the same or greater than MDC to enable distinguishing true mathematical change from measurement noise. The 95% MDC calculation, is the most common distribution-based approach in spinal surgery, and it appears to most closely resemble anchor-derived MCID values, as demonstrated by Copay et al . [ 7 , 14 , 62 ]. The corresponding R-Code formula is described in Step 6e of Supplementry Content 1 . The 95% MDC was 5.1.
Another less frequently applied method through which “responders and “non-responders” can be classified but which does not rely on an external criterion is the Reliable Change Index (RCI), also called the Jacobson–Truax index [ 63 , 64 ]. It indicates whether an individual change score is statistically significantly greater than a change in score that could have occurred due to random measurement error alone [ 63 ].
In theory, a patient can be considered to experience a statistically reliably identifiable improvement ( p < 0.05), if the individual RCI is > 1.96. Again, it does not reflect whether the change is clinically meaningful for the patient but rather that the change should not be attributed to measurement error alone and likely has a component of true score change. Therefore, this method is discouraged in MCID calculations as it relies on statistical properties of the sample and not patient preferences–as all distribution-based methods do [ 65 ]. In the example of Bolton et al . who focused on the Bournemouth Questionnaire in patients with neck pain, RCI was subsequently used to discriminate between “responders” and “non-responders”. The ROC analysis approach was then used to determine the MCID [ 64 ]. The corresponding R-Code formula is described in Step 6f of Supplementry Content 1 . Again, pROC package was used. The ROC-derived MCID was 2.5.
Method (x) calculating mcid through anchor-based minimal important change (mic) distribution model.
In theory, combining anchor- and distribution-based methods could yield superior results. Some suggestions include averaging the values of various methods, simply combining two different methods (i.e. both an anchor-based criterion such as ROC-based MCID from patient satisfaction and 95% MDC-based MCID have to both be met to consider a patient as having achieved MCID) [ 25 ]. In 2007, de Vet et al . introduced a new visual method of MCID calculations that does not only combine but also integrates both anchor- and distribution-based calculations [ 25 ]. In addition, their method allows the calculation of both MCID for improvement and for deterioration, as these can differ.
In short form, using an anchor, patients were divided into three “importantly improved”, “not importantly changed” and “importantly deteriorated” groups (Fig. 2 ) . Then distribution expressed in percentiles of patients who “importantly improved”, “importantly deteriorated” and “not importantly changed” were plotted on a graph. This is the anchor-based part of the approach, ensuring that MCID thresholds chosen have clinical value.
Distribution of the Zurich Claudication Questionnaire Symptom Severity change scores for patients categorized as experiencing “important improvement”, “no important change” or “important deterioration” in JOABPEQ walking ability as an anchor (Method (X)). For ZCQ Symptom Severity score to improve, the actual value must decrease explaining the negative values in the model. ROC , Receiver Operating Characteristic; ZCQ , Zurich Claudication Questionnaire; JOABPEQ , Japanese Orthopaedic Association Back Pain Evaluation Questionnaire
The second part of the approach is then entirely focused on the group of patients determined by the anchor to be “unchanged”, and can be either distribution- or anchor-based:
In the first and more anchor-based method, the ROC-based method described in Method (III) is applied to find the threshold for improvement (by finding the ROC-based threshold point that optimizes sensitivity and specificity of identifying improved vs unchanged patients) or for deterioration (by finding the ROC-based threshold point that optimizes sensitivity and specificity of identifying deteriorated vs unchanged patients). For example, the threshold for improvement is found by combining the improved and unchanged groups, and then testing out different thresholds for discriminating those two groups from each other. The optimal point on the resulting ROC curve based on the closest-to-(0,1)-criterion is then found.
In the second method, which is distribution-based, the upper 95% (for improvement) and lower 95% (for deterioration) limits are found based solely on the group of patients determined to be unchanged. The following formula is used (instead, subtracting instead of adding the 1.645 × SD for deterioration or improvement, respectively): [ 25 ]
The corresponding R-Code formula can be found under Step 7a in Supplementry Content 1 . The model is presented in Fig. 2 . The 95% upper limit and 95% lower limit was 4.1 and − 7.2 respectively. The ROC-derived MCID using RCI was − 2.5 (important improvement vs unchanged) and − 0.5 (important deterioration vs unchanged). For the purpose of the model, MCID values were not multiplied by − 1 but remained in original form.
In recent years, a simple 30% reduction from baseline values has been introduced as an alternative to MCID calculations [ 66 ]. It has been speculated that absolute-point changes are difficult to interpret and have limited value in context of “ceiling” and “floor” effects (i.e. values that are on the extreme spectra of the measurement scale) [ 4 ]. To overcome this, Khan et al . found that 30% reduction in PROMs has similar effectiveness as traditional anchored or distribution-based methods in detecting patients with clinically meaningful differences post lumbar spine surgery [ 15 ]. The corresponding R-Code formula can be found under Step 7b in Supplementry Content 1 .
The Delphi Method is a systemic approach using the collective opinion of experts to establish a consensus regarding a medical issue [ 67 ]. It has mostly been used to develop best practice guidelines [ 68 ]. However, it can also be used to aid MCID determination [ 69 ]. The method focuses on distributing questionnaires or surveys to panel of members. The anonymized answers are grouped together and shared again with the expert panel in subsequent rounds. This allows the experts to reflect on their opinions and consider strengths and weaknesses of the others response. The process is repeated until consensus is reached. Ensuring anonymity, this prevents any potential bias linked to a specific participant’s concern about their own opinion being viewed or influenced by other personal factors [ 67 ].
The final approach is asking patients to compare themselves to other patients, which requires time and resources [ 70 ]. In a study by Redelmeier et al . patients with chronic obstructive pulmonary disease in a rehabilitation program were organized into small groups and observed each other at multiple occasions [ 70 ]. Additionally, each patient was paired with another participant and had a one-to-one interview with them discussing different aspects of their health. Finally, each patient anonymously rated themselves against their partner on a scale “much better”, “somewhat better”, “a little bit better”, “about the same”, “a little bit worse” “somewhat worse” and “much worse”. MCID was then calculated based on the mean change score of patients who graded themselves as “a little bit better” (MCID for improvement) or a “little bit worse” (MCID for deterioration), like in the within-patient change and between-patient change method described in Method (I) and (II) [ 70 ].
Over the years, it has been noted that MCID calculations based either purely on distribution-based method or only group of patients rating themselves as “somewhat better” or “slightly better” does not necessarily constitute a change that patients would consider beneficial enough “to mandate, in the absence of troublesome side effects and excessive cost, to undergo the treatment again” [ 3 , 24 ]. Therefore, the concept of substantial clinical benefit (SCB) has been introduced as a way of identifying a threshold of clinical success of intervention rather than a “floor” value for improvement- that is MCID [ 24 ]. For example, in Carreon et al ., ROC derived SCB “thresholds” were defined as a change score with equal sensitivity and specificity to distinguish “much better” from “somewhat better” patients post cervical spinal fusion [ 71 ]. Glassman et al . on the other hand used ROC derived SCB thresholds to discriminate between “much better” and “about the same” patients following lumbar spinal fusion. The authors stress that SCB and MCID are indeed separate entities, and one should not be used to derive the other [ 24 ]. Thus, while the methods to derive SCB and MCID thresholds can be carried out similarly based on anchors, the ultimate goal of applying SCB versus MCID is different.
Using the various methods explained above, overall, MCID for improvement for ZCQ Symptoms Severity domain ranged from 0.8 to 5.1 (Table 1 ). Here, the readers obtained results can be checked for correctness. On average distribution-based MCID values were lower than anchor-based MCID values. Within distribution-based approach, method (VIII) “Minimum detectable change” resulted in MCID of 5.1, which exceeded the MCID’s derived using the “gold-standard” anchor-based approaches. The average MCID based on anchor of NRS Leg pain and JOABPEQ walking ability was 3.1 and 2.8, respectively. Dependent on methods used, percentage of responders to HE and PT intervention fell within range of 9.5% for “30% Reduction from Baseline” method to 61.9% using ES- and SRM-based method (Table 2 ). Method (X) is graphically presented in Fig. 2 .
As demonstrated above, the MCID is dependent upon the methodology and the chosen anchor, highlighting the necessity for careful preparation in MCID calculations. The lowest MCID of 0.8 was calculated for Method (VI) being SRM. Logically, if a patient on average had a baseline ZCQ Symptom Severity score of 23.2, an improvement of 0.8 is unlikely to be clinically meaningful, even if rounded up. It rather informs on the measurement error property of our instrument as explained by COSMIN. Additionally, the distribution-based methods rely on statistical properties of the sample, which varies from cohort to cohort making it only generalizable to patient groups with similar SD but not applicable to others with a different spread of data [ 52 ]. Not surprisingly, anchor-based methods considering patient preferences yielded on average higher MCID values than distribution-based methods, which again varied from anchor to anchor. The mean MCID for improvement calculated for NPRS Leg Pain was 3.1, while for JOABPEQ Walking Ability it was 2.8—such similar values prove the importance of selecting responsive anchors with at least moderate correlations. Despite assessing different aspects of LSS disease, the MCID remained comparable in this specific case.
Interestingly, Method (VIII) MDC yielded the highest value of 5.1, exceeding the “gold-standard” ROC-derived MCID. This suggests that, in this example, using this ROC-derived MCID in clinical practice would be illogical, as the value falls within the measurement error determined by MDC. Here it would be appropriate to choose MDC approach as the MCID. Interestingly, ROC-derived MCID values based on Global Assessment Rating like stratification of patients based on their JOABPEQ Walking Ability (Method X) yielded higher MCID, than in Method (III). This may be attributed to a more a balanced distribution of “responders” and “non-responders” (only unchanged patients) in Method (X), unlike in the latter (Method III) where patients were strictly categorized into “responders” and “non-responders” (including both deteriorated and unchanged). This further highlights the importance of using global assessment rating type scales in determining the extent of clinical benefit.
Although ES-based (Method (V)) and SRM-based (Method (VI)) MCID calculations have been described in the literature, ES and SRM were originally created to quantify the strength of relationship between scores of two samples (in case of ES) and change score of paired observations in one sample (in case of SRM) [ 53 , 58 , 59 ]. They do offer an alternative to MCID calculations. However, verification with other MCID calculation methods, ideally anchor-based, is strongly recommended. As seen in this case study and other MCID’s derived similarly, they often result small estimates [ 7 , 55 ]. There is also no consensus regarding the choice of SD of Change Score vs. SD of Baseline Score as denominator. Additionally, whether the calculated MCID (mean change score) should represent value, such as the ES is 0.2 indicating small effect, or value should be 0.5 suggesting moderate effect is currently arbitrary and often relies on the researcher’s preference [ 53 , 55 , 59 ]. Both ES and SRM can be used to assess whether the overall change score observed in single study is suggestive of a clinically meaningful benefit in that specific cohort or in case of SRM, whether the outcome measure is responsive. However, it is our perspective that extending such value as “MCID” from one study to another is not recommended.
One can argue whether there is even a place for distribution-based methods in MCID calculations. They ultimately fail to provide an MCID value that meets the original definition of Jaeschke et al . “of smallest change in the outcome that the patient would identify as important”. At no point are patients asked about what constitutes a meaningful change for them, and the value is derived from statistical properties of the sample solely [ 1 ]. Nevertheless, conduction of studies on MCID implementing scales such as Global Assessment Rating is time-consuming and performing studies for each patient outcome and each disease is likely not feasible. Distribution-based methods still have some merit in that they–like the 95% MDC method—can help distinguish measurement noise and inaccuracy from true change. Even if anchor-based methods should probably be used to define MCID thresholds, they ought to be supported by a calculation of MDC so that it can be decided whether the chosen threshold makes sense mathematically (i.e., can reliably be distinguished from measurement inaccuracies) as seen in our case study.
Previously, MCID thresholds for outcome measurement instruments were calculated for generic populations, such as patients suffering from low back pain. More recently, MCID values for commonly used PROMs in spine surgery, such as ODI, RMDQ or NRS have been calculated for more narrowly defined diagnoses, such as lumbar disc herniation (LDH) or LSS. The question arises as to whether a separate MCID is needed for all the different spinal conditions. In general, establishing an MCID specific to these patient groups is only recommended if these patient’s perception of meaningful change is different from that of low back pain in general. Importantly, again, the MCID should not be treatment-specific, but rather broadly disease specific. Therefore, it is advisable to use MCID based on patients who had the most similar disease characteristics to our cohort. For example, an MCID for NRS Back Pain based on study group composed of different types of lumbar degenerative disease, may in some cases, be applied to study cohort composed solely of patients with LDH. However, no such extrapolation should be performed for populations with back pain secondary to malignancy, due to a totally different pathogenesis and associated symptoms that may influence the ability to detect a clinically meaningful change in the above NRS Back Pain such as fatigue or anorexia.
Regardless of robust methodology, it can be expected that it is impossible to obtain the same MCID on different occasions even in the same population due to the inherent subjectivity of what is perceived as “clinically beneficial” and day-to-day symptom fluctuation. However, it was found that patients who have worse baseline scores, reflecting e.g., more advanced disease, require greater overall change at follow-up to report it as clinically meaningful [ 72 ]. One should also be mindful of “regression to the mean” where extremely high or low-scoring patients then subsequently score closer to baseline at second measurement [ 73 ]. Therefore, adequate cohort characteristics need to be presented, for the readers to judge how generalizable the MCID may be to their study cohort. If a patient pre-operatively experiences NRS Leg Pain of 1, and the MCID is 1.6, they cannot achieve MCID at all, as the maximum possible change score is smaller than the MCID threshold (“floor effect”). A similar situation can occur with patients closer to the higher end of the scale (“ceiling effect”). The general rule is, that if at least 15% of the study cohort has the highest or lowest possible score for a given outcome instrument, one can expect significant “ceiling/floor effects” [ 50 ]. One way to overcome this, is through transferring absolute MCID scores to percentage change scores [ 4 , 45 ]. However, percentage change scores only account for high baseline scores, if high baseline scores indicate larger disability (as seen with ODI) and have a possibility of larger change. If a high score in an instruments reflects better health status (as seen in in SF-36), than percentage change scores will increase the association with baseline score [ 4 ]. In general, it is important to consider which patient to exclude from certain analyses when applying MCID: For example, patients without relevant disease preoperatively (for example, those exhibiting so-called “patient-accepted symptom states”, PASS) should probably be excluded altogether when reporting the percentage of patients achieving MCID [ 74 ].
Establishing reliable thresholds for MCID is key in clinical research and forms the basis of patient-centered treatment evaluations when using patient-reported outcome measures or objective functional tests. Calculation of MCID thresholds can be achieved using a variety of different methods, each yielding completely different results, as is demonstrated in this practical guide. Generally, anchor-based methods relying on scales assessing patient preferences/satisfaction or global assessment ratings continue to be the “gold-standard” approach- the most common being ROC analysis. In the absence of appropriate anchors, the distribution-based MCID based on the 95% MDC approach is acceptable, as it appears to yield the most similar results compared to anchor-based approaches. Moreover, we recommend using it as a supplement to any anchor-based MCID thresholds to check if they can reliably distinguish true change from measurement inaccuracies. The explanation provided in this practical guide with step-by-step examples along with public data and statistical code can add as guidance for future studies calculating MCID thresholds.
Jaeschke R, Singer J, Guyatt GH (1989) Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 10:407–415. https://doi.org/10.1016/0197-2456(89)90005-6
Article CAS PubMed Google Scholar
Concato J, Hartigan JA (2016) P values: from suggestion to superstition. J Investig Med 64:1166. https://doi.org/10.1136/jim-2016-000206
Article PubMed PubMed Central Google Scholar
Zannikos S, Lee L, Smith HE (2014) Minimum clinically important difference and substantial clinical benefit: Does one size fit all diagnoses and patients? Semin Spine Surg 26:8–11. https://doi.org/10.1053/j.semss.2013.07.004
Article Google Scholar
Copay AG, Subach BR, Glassman SD et al (2007) Understanding the minimum clinically important difference: a review of concepts and methods. Spine J 7:541–546. https://doi.org/10.1016/j.spinee.2007.01.008
Article PubMed Google Scholar
Lanario J, Hyland M, Menzies-Gow A et al (2020) Is the minimally clinically important difference (MCID) fit for purpose? a planned study using the SAQ. Euro Respirat J. https://doi.org/10.1183/13993003.congress-2020.2241
Neely JG, Karni RJ, Engel SH, Fraley PL, Nussenbaum B, Paniello RC (2007) Practical guides to understanding sample size and minimal clinically important difference (MCID). Otolaryngol Head Neck Surg 136(1):14–18. https://doi.org/10.1016/j.otohns.2006.11.001
Copay AG, Glassman SD, Subach BR et al (2008) Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the Oswestry disability index, medical outcomes study questionnaire short form 36, and pain scales. Spine J 8:968–974. https://doi.org/10.1016/j.spinee.2007.11.006
Andersson EI, Lin CC, Smeets RJ (2010) Performance tests in people with chronic low back pain: responsiveness and minimal clinically important change. Spine 35(26):E1559-1563. https://doi.org/10.1097/BRS.0b013e3181cea12e
Mannion AF, Porchet F, Kleinstück FS, Lattig F, Jeszenszky D, Bartanusz V, Dvorak J, Grob D (2009) The quality of spine surgery from the patient’s perspective. Part 1: the core outcome measures index in clinical practice. Euro Spine J 18:367–373. https://doi.org/10.1007/s00586-009-0942-8
Crosby RD, Kolotkin RL, Williams GR (2003) Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 56:395–407. https://doi.org/10.1016/S0895-4356(03)00044-1
Gatchel RJ, Mayer TG (2010) Testing minimal clinically important difference: consensus or conundrum? Spine J 10:321–327. https://doi.org/10.1016/j.spinee.2009.10.015
Minetama M, Kawakami M, Teraguchi M et al (2019) Supervised physical therapy vs. home exercise for patients with lumbar spinal stenosis: a randomized controlled trial. Spine J 19:1310–1318. https://doi.org/10.1016/j.spinee.2019.04.009
R Core Team (2021) R A Language and Environment for Statistical Computing
Chung AS, Copay AG, Olmscheid N, Campbell D, Walker JB, Chutkan N (2017) Minimum clinically important difference: current trends in the spine literature. Spine 42(14):1096–1105. https://doi.org/10.1097/BRS.0000000000001990
Khan I, Pennings JS, Devin CJ, Asher AM, Oleisky ER, Bydon M, Asher AL, Archer KR (2021) Clinically meaningful improvement following cervical spine surgery: 30% reduction versus absolute point-change MCID values. Spine 46(11):717–725. https://doi.org/10.1097/BRS.0000000000003887
Gautschi OP, Stienen MN, Corniola MV et al (2016) Assessment of the minimum clinically important difference in the timed up and go test after surgery for lumbar degenerative disc disease. Neurosurgery. https://doi.org/10.1227/NEU.0000000000001320
Kulkarni AV (2006) Distribution-based and anchor-based approaches provided different interpretability estimates for the hydrocephalus outcome questionnaire. J Clin Epidemiol 59:176–184. https://doi.org/10.1016/j.jclinepi.2005.07.011
Wang Y, Devji T, Qasim A et al (2022) A systematic survey identified methodological issues in studies estimating anchor-based minimal important differences in patient-reported outcomes. J Clin Epidemiol 142:144–151. https://doi.org/10.1016/j.jclinepi.2021.10.028
Parker SL, Godil SS, Shau DN et al (2013) Assessment of the minimum clinically important difference in pain, disability, and quality of life after anterior cervical discectomy and fusion: clinical article. J Neurosurg Spine 18:154–160. https://doi.org/10.3171/2012.10.SPINE12312
Carrasco-Labra A, Devji T, Qasim A et al (2021) Minimal important difference estimates for patient-reported outcomes: a systematic survey. J Clin Epidemiol 133:61–71. https://doi.org/10.1016/j.jclinepi.2020.11.024
Prinsen CAC, Mokkink LB, Bouter LM et al (2018) COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res 27:1147–1157. https://doi.org/10.1007/s11136-018-1798-3
Article CAS PubMed PubMed Central Google Scholar
Mokkink LB, de Vet HCW, Prinsen CAC et al (2018) COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res 27:1171–1179. https://doi.org/10.1007/s11136-017-1765-4
Terwee CB, Prinsen CAC, Chiarotto A et al (2018) COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res 27:1159–1170. https://doi.org/10.1007/s11136-018-1829-0
Glassman SD, Copay AG, Berven SH et al (2008) Defining substantial clinical benefit following lumbar spine arthrodesis. J Bone Joint Surg Am 90:1839–1847. https://doi.org/10.2106/JBJS.G.01095
de Vet HCW, Ostelo RWJG, Terwee CB et al (2007) Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res 16:131–142. https://doi.org/10.1007/s11136-006-9109-9
Solberg T, Johnsen LG, Nygaard ØP, Grotle M (2013) Can we define success criteria for lumbar disc surgery? Acta Orthop 84:196–201. https://doi.org/10.3109/17453674.2013.786634
Power JD, Perruccio AV, Canizares M et al (2023) Determining minimal clinically important difference estimates following surgery for degenerative conditions of the lumbar spine: analysis of the Canadian spine outcomes and research network (CSORN) registry. The Spine Journal 23:1323–1333. https://doi.org/10.1016/j.spinee.2023.05.001
Asher AL, Kerezoudis P, Mummaneni PV et al (2018) Defining the minimum clinically important difference for grade I degenerative lumbar spondylolisthesis: insights from the quality outcomes database. Neurosurg Focus 44:E2. https://doi.org/10.3171/2017.10.FOCUS17554
Cleland JA, Whitman JM, Houser JL et al (2012) Psychometric properties of selected tests in patients with lumbar spinal stenosis. Spine J 12:921–931. https://doi.org/10.1016/j.spinee.2012.05.004
Norman GR, Sloan JA, Wyrwich KW (2003) Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 41:582–592. https://doi.org/10.1097/01.MLR.0000062554.74615.4C
Parker SL, Mendenhall SK, Shau DN et al (2012) Minimum clinically important difference in pain, disability, and quality of life after neural decompression and fusion for same-level recurrent lumbar stenosis: understanding clinical versus statistical significance. J Neurosurg Spine 16:471–478. https://doi.org/10.3171/2012.1.SPINE11842
Gatchel RJ, Mayer TG, Chou R (2012) What does/should the minimum clinically important difference measure?: a reconsideration of its clinical value in evaluating efficacy of lumbar fusion surgery. Clin J Pain 28:387. https://doi.org/10.1097/AJP.0b013e3182327f20
Lloyd H, Jenkinson C, Hadi M et al (2014) Patient reports of the outcomes of treatment: a structured review of approaches. Health Qual Life Outcomes 12:5. https://doi.org/10.1186/1477-7525-12-5
Beighley A, Zhang A, Huang B et al (2022) Patient-reported outcome measures in spine surgery: a systematic review. J Craniovertebr Junction Spine 13:378–389. https://doi.org/10.4103/jcvjs.jcvjs_101_22
Ogura Y, Ogura K, Kobayashi Y et al (2020) Minimum clinically important difference of major patient-reported outcome measures in patients undergoing decompression surgery for lumbar spinal stenosis. Clin Neurol Neurosurg 196:105966. https://doi.org/10.1016/j.clineuro.2020.105966
Wang Y, Devji T, Carrasco-Labra A et al (2023) An extension minimal important difference credibility item addressing construct proximity is a reliable alternative to the correlation item. J Clin Epidemiol 157:46–52. https://doi.org/10.1016/j.jclinepi.2023.03.001
Devji T, Carrasco-Labra A, Qasim A et al (2020) Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ 369:m1714. https://doi.org/10.1136/bmj.m1714
Stucki G, Daltroy L, Liang MH et al (1996) Measurement properties of a self-administered outcome measure in lumbar spinal stenosis. Spine 21:796
Fujimori T, Ikegami D, Sugiura T, Sakaura H (2022) Responsiveness of the Zurich claudication questionnaire, the Oswestry disability index, the Japanese orthopaedic association back pain evaluation questionnaire, the 8-item short form health survey, and the Euroqol 5 dimensions 5 level in the assessment of patients with lumbar spinal stenosis. Eur Spine J 31:1399–1412. https://doi.org/10.1007/s00586-022-07236-5
Fukui M, Chiba K, Kawakami M et al (2009) JOA back pain evaluation questionnaire (JOABPEQ)/ JOA cervical myelopathy evaluation questionnaire (JOACMEQ) the report on the development of revised versions April 16, 2007: the subcommittee of the clinical outcome committee of the Japanese orthopaedic association on low back pain and cervical myelopathy evaluation. J Orthop Sci 14:348–365. https://doi.org/10.1007/s00776-009-1337-8
Kasai Y, Fukui M, Takahashi K et al (2017) Verification of the sensitivity of functional scores for treatment results–substantial clinical benefit thresholds for the Japanese orthopaedic association back pain evaluation questionnaire (JOABPEQ). J Orthop Sci 22:665–669. https://doi.org/10.1016/j.jos.2017.02.012
Glassman SD, Carreon LY, Anderson PA, Resnick DK (2011) A diagnostic classification for lumbar spine registry development. Spine J 11:1108–1116. https://doi.org/10.1016/j.spinee.2011.11.016
Perkins NJ, Schisterman EF (2006) The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol 163:670–675. https://doi.org/10.1093/aje/kwj063
Nahm FS (2022) Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol 75:25–36. https://doi.org/10.4097/kja.21209
Angst F, Aeschlimann A, Angst J (2017) The minimal clinically important difference raised the significance of outcome effects above the statistical level, with methodological implications for future studies. J Clin Epidemiol 82:128–136. https://doi.org/10.1016/j.jclinepi.2016.11.016
Wyrwich KW, Tierney WM, Wolinsky FD (1999) Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 52:861–873. https://doi.org/10.1016/s0895-4356(99)00071-2
Wolinsky FD, Wan GJ, Tierney WM (1998) Changes in the SF-36 in 12 months in a clinical sample of disadvantaged older adults. Med Care 36:1589–1598
Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD (1999) Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care 37:469–478. https://doi.org/10.1097/00005650-199905000-00006
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15:155–163. https://doi.org/10.1016/j.jcm.2016.02.012
McHorney CA, Tarlov AR (1995) Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res 4:293–307
Hara N, Matsudaira K, Masuda K et al (2016) Psychometric assessment of the Japanese version of the Zurich claudication questionnaire (ZCQ): reliability and validity. PLoS ONE 11:e0160183. https://doi.org/10.1371/journal.pone.0160183
Kazis LE, Anderson JJ, Meenan RF (1989) Effect sizes for interpreting changes in health status. Med Care 27:S178–S189. https://doi.org/10.1097/00005650-198903001-00015
Cohen J (1988) Statistical power analysis for the behavioral sciences. L Erlbaum Associates, Hillsdale, NJ
Franceschini M, Boffa A, Pignotti E et al (2023) The minimal clinically important difference changes greatly based on the different calculation methods. Am J Sports Med 51:1067–1073. https://doi.org/10.1177/03635465231152484
Samsa G, Edelman D, Rothman ML et al (1999) Determining clinically important differences in health status measures: a general approach with illustration to the health utilities index mark II. Pharmacoeconomics 15:141–155. https://doi.org/10.2165/00019053-199915020-00003
Wright A, Hannon J, Hegedus EJ, Kavchak AE (2012) Clinimetrics corner: a closer look at the minimal clinically important difference (MCID). J Man Manip Ther 20:160–166. https://doi.org/10.1179/2042618612Y.0000000001
Stucki G, Liang MH, Fossel AH, Katz JN (1995) Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol 48:1369–1378. https://doi.org/10.1016/0895-4356(95)00054-2
Liang MH, Fossel AH, Larson MGS (1990) Comparisons of five health status instruments for orthopedic evaluation. Med Care 28:632–642
Middel B, Van Sonderen E (2002) Statistical significant change versus relevant or important change in (quasi) experimental design: some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. Int J Integr care. https://doi.org/10.5334/ijic.65
Revicki D, Hays RD, Cella D, Sloan J (2008) Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 61:102–109. https://doi.org/10.1016/j.jclinepi.2007.03.012
Woaye-Hune P, Hardouin J-B, Lehur P-A et al (2020) Practical issues encountered while determining minimal clinically important difference in patient-reported outcomes. Health Qual Life Outcomes 18:156. https://doi.org/10.1186/s12955-020-01398-w
Parker SL, Mendenhall SK, Shau D et al (2012) Determination of minimum clinically important difference in pain, disability, and quality of life after extension of fusion for adjacent-segment disease. J Neurosurg Spine 16:61–67. https://doi.org/10.3171/2011.8.SPINE1194
Jacobson NS, Truax P (1991) Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 59:12–19
Bolton JE (2004) Sensitivity and specificity of outcome measures in patients with neck pain: detecting clinically significant improvement. Spine 29(21):2410–2417. https://doi.org/10.1097/01.brs.0000143080.74061.25
Blampied NM (2022) Reliable change and the reliable change index: Still useful after all these years? Cogn Behav Ther 15:e50. https://doi.org/10.1017/S1754470X22000484
Asher AM, Oleisky ER, Pennings JS et al (2020) Measuring clinically relevant improvement after lumbar spine surgery: Is it time for something new? Spine J 20:847–856. https://doi.org/10.1016/j.spinee.2020.01.010
Barrett D, Heale R (2020) What are Delphi studies? Evid Based Nurs 23:68–69. https://doi.org/10.1136/ebnurs-2020-103303
Droeghaag R, Schuermans VNE, Hermans SMM et al (2021) Evidence-based recommendations for economic evaluations in spine surgery: study protocol for a Delphi consensus. BMJ Open 11:e052988. https://doi.org/10.1136/bmjopen-2021-052988
Henderson EJ, Morgan GS, Amin J et al (2019) The minimum clinically important difference (MCID) for a falls intervention in Parkinson’s: a delphi study. Parkinsonism Relat Disord 61:106–110. https://doi.org/10.1016/j.parkreldis.2018.11.008
Redelmeier DA, Guyatt GH, Goldstein RS (1996) Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol 49:1215–1219. https://doi.org/10.1016/s0895-4356(96)00206-5
Carreon LY, Glassman SD, Campbell MJ, Anderson PA (2010) Neck disability index, short form-36 physical component summary, and pain scales for neck and arm pain: the minimum clinically important difference and substantial clinical benefit after cervical spine fusion. Spine J 10:469–474. https://doi.org/10.1016/j.spinee.2010.02.007
Wang Y-C, Hart DL, Stratford PW, Mioduski JE (2011) Baseline dependency of minimal clinically important improvement. Phys Ther 91:675–688. https://doi.org/10.2522/ptj.20100229
Tenan MS, Simon JE, Robins RJ et al (2021) Anchored minimal clinically important difference metrics: considerations for bias and regression to the mean. J Athl Train 56:1042–1049. https://doi.org/10.4085/1062-6050-0368.20
Staartjes VE, Stumpo V, Ricciardi L et al (2022) FUSE-ML: development and external validation of a clinical prediction model for mid-term outcomes after lumbar spinal fusion for degenerative disease. Eur Spine J 31:2629–2638. https://doi.org/10.1007/s00586-022-07135-9
Download references
Open access funding provided by University of Zurich. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Authors and affiliations.
Department of Neurosurgery, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, The Netherlands
Anita M. Klukowska & W. Peter Vandertop
Department of Neurosurgery, University Clinical Hospital of Bialystok, Bialystok, Poland
Anita M. Klukowska
Department of Neurosurgery, Park Medical Center, Rotterdam, The Netherlands
Marc L. Schröder
Machine Intelligence in Clinical Neuroscience and Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Victor E. Staartjes
You can also search for this author in PubMed Google Scholar
Correspondence to Victor E. Staartjes .
Conflict of interest.
The authors declare that the article and its content were composed in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Klukowska, A.M., Vandertop, W.P., Schröder, M.L. et al. Calculation of the minimum clinically important difference (MCID) using different methodologies: case study and practical guide. Eur Spine J (2024). https://doi.org/10.1007/s00586-024-08369-5
Download citation
Received : 03 May 2024
Revised : 17 May 2024
Accepted : 10 June 2024
Published : 28 June 2024
DOI : https://doi.org/10.1007/s00586-024-08369-5
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
IMAGES
VIDEO
COMMENTS
These Chief Executive Officers show the key role of management leadership in building a culture of safety. The Robert W. Campbell Award Business Case Studies are designed to show future business leaders the business value of environmental, health, and safety (EHS) management. Established in 2004, the award recognizes companies who are the "best ...
Result: By emphasizing the importance of safety protocols and encouraging communication among staff, the hospital saw a significant reduction in patient safety incidents and workplace injuries. ... These real-life case studies demonstrate that a proactive approach to workplace safety not only saves lives but also has a positive impact on an ...
Safety Leadership. How Did They Do That? Case Studies on EHS Excellence. Jan. 29, 2024. Campbell Award winners, such as Dow, Johnson & Johnson and Alcan, reveal the secrets to their safety success. Adrienne Selko. While it might be an overused phrase — "no need to reinvent the wheel" — there is a lot to be learned from others who have ...
At the 2022 NSC Safety Congress & Expo in September, OSHA staffers highlighted three investigations - and the lessons learned - during the agency's "Most Interesting Cases" Technical Session. The panel for the session included: Brian Elmore, an OSHA inspector based in Omaha, NE. Marie Lord, assistant area director of the OSHA office ...
Here are a few examples of client case studies showing how safety culture advancement helps prevent incidents. Unpacking the Importance of Safety Culture: Insightful Case Studies Case Study 1: Power Generation Facility. Propulo partnered with a large coal mine and power generation facility to help improve safety culture. In the shadow of a ...
With this in mind, we conducted case studies of ten potentially promising examples of safety improvement programs in health care institutions around the country. ... make walk rounds on hospital wards to cultivate an awareness of safety issues and demonstrate to the staff that safety is important. The kind of top-to-bottom organizational safety ...
This case study explored a holistic approach to implementation, addressing a range of concerns that restrict the ability of incident reporting to fuel a learning culture. 29 The experience of ...
Risk assessment practices to reveal leading indicators. Risk assessment is a process used to gather knowledge and information around a specific health threat or safety hazard (Smith and Harrison, 2005).Based on the probability of a negative incident, risk assessment also includes determining whether or not the level of risk is acceptable (Lindhe et al., 2010; International Electrotechnical ...
As the federal agency responsible for enforcing workplace safety, the Occupational Safety and Health Administration is often at the center of controversy. Associate Professor Michael W. Toffel and colleague David I. Levine report surprising findings about randomized government inspections. Key concepts include: In a natural field experiment ...
These case studies show how organisations have successfully involved their workforce in managing health and safety. They demonstrate that businesses with good worker involvement achieve better performance in health and safety, which in turn increases productivity and reduces costs. When the culture of health and safety became Bardsley ...
These case studies underscore the multifaceted nature of construction safety, emphasizing the need for comprehensive safety management systems. They highlight the importance of adhering to safety protocols, continuous training, and proactive risk management to prevent accidents and protect workers. By learning from these incidents, construction ...
Safety case studies are fun, challenging, interactive, and a highly effective training method. ... An important part of training for confined space workers includes learning about hazards such as the symptoms of a lack of oxygen or exposure to toxic chemicals. Workers should never enter a space, and should immediately leave a space, in which ...
An important feature of the programme we examined—essentially a feasibility study—was that the Safety Case approach was being used outside the regulatory frameworks and infrastructures characteristic of use of the technique in most other sectors. Without an external regulatory requirement to satisfy, participating organisations in the Safer ...
Case study - Mid and West Wales Fire and Rescue Service. To give health and safety a high priority, Mid and West Wales Fire and Rescue Service recognised that it was critical for its leadership to demonstrate to its staff that accountability for health and safety was a fundamental element in the success of its overall service delivery.
In one survey from Nature and UCLA of 2,400 scientists, 30% reported having witnessed a lab injury severe enough to warrant attention from a medical professional 21. A small pilot study of 56 lab ...
Psychological safety is a multi-dimensional, dynamic phenomenon that concerns team members' perception of whether it is safe to take interpersonal risks at work [].It is particularly important within healthcare teams who need to work interdependently to co-ordinate safe patient care within a highly complex, dynamic and high stakes work environment [].
3 Ways Hospitals Can Boost Worker Engagement. Employee engagement Research. Niharika Garud. Rakesh Pati. Victor Sojo. Simon J. Bell. Robyn Hudson. Helen Shaw. A study of more than 80 hospitals in ...
6.1 Introduction. To test the effectiveness of safety measures, one way is to conduct case study. Case study refers to an intensive research of a single individual or event. It involves an in-depth descriptive record, kept by an outside observer of an individual or group of individuals. Case study enables the researcher to obtain in-depth ...
Case Study User Guide. This User Guide was developed to help you use the various safety culture case studies more effectively, providing you with a better understanding of what safety culture is and how it applies to you, whether you are an NRC employee interacting with an external stakeholder, an NRC licensee, an Agreement State regulator, an ...
Abstract. The success of the Emergency Plan depends on the ability of its occupants to respond. For this reason, it is fundamental to develop an appropriate training strategy for each organization. This pilot study aimed to understand the influence of specific training program on the emergency response. This study included a total of twenty-two ...
Relative importance of safety against natural disasters for residential selection: a case study at Osaka prefecture, Japan - Author: Tomoyuki Takabatake, Nanami Hasegawa, Suguru Nishigaki ... Japan, to gauge people's relative importance of safety against natural disasters regarding residential preference. The obtained results were analysed ...
Through the case method, you can "try on" roles you may not have considered and feel more prepared to change or advance your career. 5. Build Your Self-Confidence. Finally, learning through the case study method can build your confidence. Each time you assume a business leader's perspective, aim to solve a new challenge, and express and ...
Using sociotechnical theory to understand medication safety work in primary care and prescribers' use of clinical decision support: a qualitative study. May 24, 2023 Proactive patient safety: focusing on what goes right in the perioperative environment.
A foundational 1984 decision required courts to defer to agencies' reasonable interpretations of ambiguous statutes, underpinning regulations on health care, safety and the environment.
My studies have shown me how interconnected and collaborative space research is. It's a field that needs global cooperation, constant learning, and public support. Through initiatives like International Asteroid Day, we can help ensure these efforts get the attention and funding they deserve.
Staying ahead of the curve in machine learning often means adapting to unexpected changes. Recently, our team at Appsilon encountered a situation that highlights the importance of constant monitoring and flexible solutions when working with cloud-based Large Language Models (LLMs). Interested in a demo of our Text2Graph application? Reach out to experts to set up […]
The promise of AI is alluring — optimized productivity, lightning-fast data analysis, and freedom from mundane tasks — and both companies and workers alike are fascinated (and more than a ...
The literature on resilience and Safety-II and their application in healthcare is expanding, but mainly based on case studies and from theoretical and methodological perspectives. ... The leadership role of the safety huddle is important, as well as the ability of the leader of the huddle to get in-depth reflections. It is important to involve ...
It's been two years since the U.S. Supreme Court ruling in the Dobbs case that overturned the federal right to an abortion, and the troubling concurring opinion by Justice Clarence Thomas in ...
Choice of outcome measurement instruments for MCID calculation case study. The chosen outcome measurement instrument in this case study for which MCID for improvement will be calculated is ZCQ Symptom Severity domain . The ZCQ is composed of three subscales: symptom severity (7 questions, score per question ranging from 1 to 5 points); physical ...