Ask A Librarian

  • Collections
  • Research Help
  • Teaching & Learning
  • Library Home

Systematic Reviews

  • Getting Started
  • Additional Frameworks
  • More Types of Reviews
  • Timeline & Resources
  • Inclusion/Exclusion Criteria
  • Resources & More

PICOT Tutorials

What is PICOT - A Tutorial

Using PICOT to Formulate Your Literature Search

Librarian Profile

Profile Photo

Developing Your Question

Developing your research question is one of the most important steps in the review process. At this stage in the process, you and your team have identified a knowledge gap in your field and are aiming to answer a specific question, such as

  • If X is prescribed, then Y will happen to patients?

OR assess an intervention

  • How does X affect Y?

OR synthesize the existing evidence 

  • What is the nature of X? ​

​​Whatever your aim, formulating a clear, well-defined research question of appropriate scope is key to a successful review. The research question will be the foundation of your review and from it your research team will identify 2-5 possible search concepts. These search concepts will later be used to build your search strategy. 

PICOT Questions

Formulating a research question takes time and your team may go through different versions until settling on the right research question.  A research question framework can help structure your systematic review question.  

PICO/T is an acronym which stands for

  • P        Population/Problem
  • I         Intervention/Exposure
  • C        Comparison
  • O       Outcome
  • T       Time

Each PICO includes at least a P, I, and an O, and some include a C or a T. Below are some sample PICO/T questions to help you use the framework to your advantage. 

For an intervention/therapy

In _______(P), what is the effect of _______(I) on ______(O) compared with 

Visual representation of the PICO/T Question Framework. text reads: P - Population/Problem; I - Intervention/Exposure; C - Comparison; O - Outcome; T - Time

_______(C) within ________ (T)?

For etiology

Are ____ (P) who have _______ (I) at ___ (Increased/decreased) risk for/of_______ (O) compared with ______ (P) with/without ______ (C) over _____ (T)?

Diagnosis or diagnostic test

Are (is) _________ (I) more accurate in diagnosing ________ (P) compared with ______ (C) for _______ (O)?

For ________ (P) does the use of ______ (I) reduce the future risk of ________ (O) compared with _________ (C)?


Does __________ (I) influence ________ (O) in patients who have _______ (P) over ______ (T)?

How do ________ (P) diagnosed with _______ (I) perceive ______ (O) during _____ (T)?

Melnyk B., & Fineout-Overholt E. (2010). Evidence-based practice in nursing & healthcare. New York: Lippincott Williams & Wilkins.

Ghezzi-Kopel, Kate. (2019, September 16). Developing your research question. (research guide). Retrieved from

  • << Previous: Getting Started
  • Next: Additional Frameworks >>
  • Last Updated: Feb 6, 2024 11:21 AM
  • URL:
  • Help and Support
  • Research Guides

Systematic Reviews - Research Guide

  • Defining your review question
  • Starting a Systematic Review
  • Developing your search strategy
  • Where to search
  • Appraising Your Results
  • Documenting Your Review
  • Find Systematic Reviews
  • Software and Tools for Systematic Reviews
  • Guidance for conducting systematic reviews by discipline
  • Library Support

Review question

A systematic review aims to answer a clear and focused clinical question. The question guides the rest of the systematic review process. This includes determining inclusion and exclusion criteria, developing the search strategy, collecting data and presenting findings. Therefore, developing a clear, focused and well-formulated question is critical to successfully undertaking a systematic review.

 A good review question:

  • allows you to find information quickly
  • allows you to find relevant information (applicable to the patient) and valid (accurately measures stated objectives)
  • provides a checklist for the main concepts to be included in your search strategy.

How to define your systematic review question and create your protocol

  • Starting the process
  • Defining the question
  • Creating a protocol

Types of clinical questions

  • PICO/PICo framework
  • Other frameworks

Research topic vs review question

A  research topic   is the area of study you are researching, and the  review question   is the straightforward, focused question that your systematic review will attempt to answer. 

Developing a suitable review question from a research topic can take some time. You should:

  • perform some scoping searches
  • use a framework such as PICO  
  • consider the FINER criteria; review questions should be  F easible, I nteresting, N ovel, E thical and R elevant
  • check for existing or prospective systematic reviews.

When considering the feasibility of a potential review question, there should be enough evidence to answer the question whilst ensuring that the quantity of information retrieved remains manageable. A scoping search will aid in defining the boundaries of the question and determining feasibility.

For more information on FINER criteria in systematic review questions, read Section 2.1 of the Cochrane Handbook .

Check for existing or prospective systematic reviews

Before finalising your review question, you should determine if any other systematic review is in progress or has been completed on your intended question (i.e. consider if the review is N ovel).

To find systematic reviews you might search specialist resources such as the Cochrane Library , Joanna Briggs Institute EBP Database  or the Campbell Collaboration . "Systematic review" can also be used as a search term or limit when searching the recommended databases .

You should appraise any systematic reviews you find to assess their quality. An article may include ‘systematic review’ in its title without correctly following the systematic review methodology. Checklists, including those developed by AMSTAR and JBI , are useful tools for appraisal.

You may undertake a review on a similar question if that posed by a previously published review had issues with its methodology such as not having a comprehensive search strategy, for example. You may choose to narrow the parameters of a previously conducted search or to update the review if it was published some years ago. 

Searching a register of prospective systematic reviews such as PROSPERO  will allow you to check that you are not duplicating research already underway.

Once you have performed scoping searches and checked for other systematic reviews on your topic, you can focus and refine your review question. Any PICO elements identified during the initial development of the review question from the research topic should now be further refined.

The review question should always be:

  • unambiguous
  • structured.

Review questions may be broad or narrow in focus; however, you should consider the FINER criteria when determining the breadth of the PICO elements of your review question.

A question that is too broad may present difficulty with searching, data collection, analysis, and writing, as the number of studies retrieved would be unwieldy. A broad review question could be more suited to another type of review .

A question that is too narrow may not have enough evidence to allow you to answer your review question. Table 2.3.a in the Cochrane Handbook summarises the advantages and disadvantages of broad versus narrow reviews and provides examples of how you could broaden or narrow different PICO elements.

It is essential to formulate your research question with care to avoid missing relevant studies or collecting a potentially biased result set.

A systematic review protocol is a document that describes the rationale, question, and planned methods of a systematic review. Creating a protocol is an essential part of the systematic review process, ensuring careful planning and detailed documentation of what is planned before undertaking the review.

The Preferred Reporting Items for Systematic review and Meta-Analysis Protocols (PRISMA-P) checklist outlines recommended items to address in a systematic review protocol, including:

  • review question, with PICO elements defined
  • eligibility criteria 
  • information sources (e.g. planned databases, trial registers, grey literature sources, etc.)
  • draft search strategy. 

Systematic reviews must have pre-specified criteria for including and excluding studies in the review. The Cochrane Handbook states that "predefined, unambiguous eligibility criteria are a fundamental prerequisite for a systematic review." 

The first step in developing a protocol is determining the PICO elements   of the review question and how the intervention produces the expected outcomes in the specified population. You should then specify the types of studies   that will provide the evidence to answer your review question. Then outline the inclusion and exclusion criteria based on these PICO elements.

For more information on defining eligibility criteria, see Chapter 3 of the Cochrane Handbook .

A key purpose of a protocol is to make plans to minimise bias in the findings of the review; where possible, changes should not be made to the eligibility criteria of a published protocol. Where such changes are made, they must be justified and documented in the review. Appropriate time and consideration should be given to creating the protocol.

You may wish to register your protocol in a publicly accessible way. This will help prevent other people from completing a review on your topic.

If you intend to publish a systematic review in the health sciences, it should conform to the IOM Standards for Reporting Systematic Reviews .

If you intend to publish a systematic review in the Cochrane Database of Systematic Reviews , it should conform to the Methodological Expectations in Cochrane Intervention Review s (MECIR).

A clinical question needs to be directly relevant to the patient or problem and phrased to facilitate the search for an answer. A clear and focused question is more likely to lead to a credible and useful answer, whereas a poorly formulated question can lead to an uncertain answer and create confusion.

The population and intervention should be specific, but if any or both are described too narrowly, it may not be easy to find relevant studies or sufficient data to demonstrate a reliable answer.

PICO is a framework for developing a focused clinical question. 

Slightly different versions of this concept are used to search for   quantitative   and   qualitative reviews, examples are given below:

PICO for quantitative studies

Here is an example of a clinical question that outlines the PICO components:

systematic review pico tools

PICo for qualitative studies

Here is an example of a clinical question that outlines the PICo components:

systematic review pico tools

Two other mnemonics may be used to frame questions for qualitative and quantitative studies -  SPIDER   and  SPICE .

SPIDER for qualitative or quantitative studies

SPIDER   can be used for both qualitative and quantitative studies:

Within social sciences research,  SPICE  may be more appropriate for formulating research questions:

More question frameworks

For more question frameworks, see the following:

  • Table 1 Types of reviews , from ' What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences. '
  • Framing your research question , CQ University 
  • Asking focused questions - Centre for Evidence Based Medicine Tips and examples for formulating focused questions
  • Cochrane Handbook, Chapter 2: Determining the scope of the review and the questions it will address Discusses the formulation of review questions in detail
  • PICO for Evidence-Based Veterinary Medicine EBVM Toolkit from the RCVS
  • PICO worksheet
  • PICo worksheet
  • << Previous: Starting a Systematic Review
  • Next: Developing your search strategy >>
  • Last Updated: Dec 14, 2023 1:37 PM
  • URL:

Synthesize evidence at rapid speed

Intelligently accelerate your research and innovation with PICO Portal.

systematic review pico tools

PICO Portal's AI-powered Systematic Literature Review platform and managed services let you focus on developing high quality research, generating evidence you can trust - faster than ever.

systematic review pico tools

Supercharged Systematic Reviews

Relevant articles first.

Review the most relevant articles first to conduct the screening process efficiently leveraging our ML-prediction model.

Fast and Accurate

Reduce review timeline by 40-60% compared to traditional tools with the most accurate prediction and deduplication technology on the market.

Fully Customizable Reviews

One size does not fit all. Set up your workflow to suit your needs. Follow single or dual review, create tags, customize keywords, complete PRISMA, and more.

Professional Services

Staff your reviews how and when you need to with our certified methodologists and flexible mananged services

PICO Portal in numbers


Institutions Trust Us

systematic review pico tools

Using AI prediction we identified 95% to 99% of all of eligible citations by screening fewer than 10% of our initial citations. We're all used to the standard of two people having to independently screen every citation. Do we need to rethink that?

systematic review pico tools

We had 15 committee members at 15 different institutions that all needed to help with this review. We identified 5,172 references and the AI showed likely includes before likely excludes. That helped us get the literature out to the community sooner.

systematic review pico tools

The value for us was the tagging function. It was just a matter of adding a couple of clicks of the mouse and we were able to watch the distribution of literature across this wide range and approach it as evidence ecology. We looked at the full range of research designs from RCTs to qualitative research.

Tiered pricing for teams of all sizes.

Get the power of an enterprise solution without the high price tag. Ready to partner with us for your upcoming research?

Available on All Devices

PICO Portal's SLR platform supports screening using your desktop browser such as Chrome, Safari or Firefox, your tablet browser and your mobile device. The mobile support can help screen your citations and full text on the go!


Your Partner in Evidence Synthesis

With the most comprehensive technology including artificial intelligence, deduplication, and natural language processing, PICO Portal helps institutions af all kinds in citation screening and systematic review.

systematic review pico tools

US Department of Veterans Affairs

National academy of sciences.

systematic review pico tools

Johns Hopkins University

systematic review pico tools

University of Colorado

systematic review pico tools

Abilene Christian University

systematic review pico tools

University of Arizona

systematic review pico tools

Oregon Health and Science University

systematic review pico tools

Cornell University

systematic review pico tools

Florida Atlantic University

systematic review pico tools

Florida Gulf Coast University

systematic review pico tools

Indiana University

systematic review pico tools

Loma Linda University

systematic review pico tools

Monash University

systematic review pico tools

University of Minnesota

systematic review pico tools

University of Toronto

systematic review pico tools

Royal Victoria Regional Health Center

systematic review pico tools

University of Cologne

systematic review pico tools

University of South Florida

What our customers are saying.

“PICO Portal has greatly streamlined the review process: de-duplication, highlighting importing keywords during abstract review, and presenting the full text for full text review. All the members of the team were able to access the portal at the same time, thus enabling seamless collaboration across teams and locations. I would absolutely recommend it to other researchers for their reviews because it is easy to set up, intuitive to use, it standardizes the process, and it saves time.“

Researcher, Univeristy of South Florida

“It was easy to observe the efficiency of PICO Portal for the logistics management for multiple party classification of the same judgements, because I’ve built systems to do such a thing. But here I didn’t have to build the system, and it was very smooth. Knowing what it takes to build that system, that was a very efficient way to get it done. Multiple people could jump in and screen in a streamlined way.“

Brian S. Alper, MD, MSPH, FAAFP, FAMIA

Founder, COVID-19 Knowledge Accelerator (COKA)

“PICO Portal is easy to use, efficient, and time saving; especially when collaborating with colleagues. At Coreva Scientific, we conduct many literature reviews, and therefore we greatly appreciate that the PICO Portal team had an open ear to our suggestions. We were also impressed by how quickly they implemented additional functions we needed to streamline our review process. Functions such as tags, extensive reports, and highlighting of key words are very helpful.“

Juliane Hafermann

HEOR Medical Writer, Coreva

“All in all, I would recommend PICO Portal and would use it again. It makes it easy to manage the large volume of studies in my review. It is also helpful when working with a large team with members from multiple institutions who may not have access to the same shared drives or intranet."

Researcher in a Government Organization

“My experience with PICO Portal has been extremely positive. PICO Portal has efficiently and expediently helped us work through the process of systematically reviewing articles. The team at PICO Portal is helpful and I would recommend PICO Portal to other researchers as well."

Researcher at a University in Florida

“Highlighting and bolding of words specific to our review made things so much faster. It made it quick to look – you see it right there. You know within seconds it’s something you want to include. You click ‘include’ and that’s all you have to do."

Researcher in University of Minnesota

  • Research article
  • Open access
  • Published: 21 November 2014

PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews

  • Abigail M Methley 1 ,
  • Stephen Campbell 1 , 4 ,
  • Carolyn Chew-Graham 2 ,
  • Rosalind McNally 3 &
  • Sudeh Cheraghi-Sohi 1 , 4  

BMC Health Services Research volume  14 , Article number:  579 ( 2014 ) Cite this article

197k Accesses

889 Citations

58 Altmetric

Metrics details

Qualitative systematic reviews are increasing in popularity in evidence based health care. Difficulties have been reported in conducting literature searches of qualitative research using the PICO search tool. An alternative search tool, entitled SPIDER, was recently developed for more effective searching of qualitative research, but remained untested beyond its development team.

In this article we tested the ‘SPIDER’ search tool in a systematic narrative review of qualitative literature investigating the health care experiences of people with Multiple Sclerosis. Identical search terms were combined into the PICO or SPIDER search tool and compared across Ovid MEDLINE, Ovid EMBASE and EBSCO CINAHL Plus databases. In addition, we added to this method by comparing initial SPIDER and PICO tools to a modified version of PICO with added qualitative search terms (PICOS).

Results showed a greater number of hits from the PICO searches, in comparison to the SPIDER searches, with greater sensitivity. SPIDER searches showed greatest specificity for every database. The modified PICO demonstrated equal or higher sensitivity than SPIDER searches, and equal or lower specificity than SPIDER searches. The modified PICO demonstrated lower sensitivity and greater specificity than PICO searches.


The recommendations for practice are therefore to use the PICO tool for a fully comprehensive search but the PICOS tool where time and resources are limited. Based on these limited findings the SPIDER tool would not be recommended due to the risk of not identifying relevant papers, but has potential due to its greater specificity.

Peer Review reports

Systematic reviews are a crucial method, underpinning evidence based practice and informing health care decisions [ 1 ],[ 2 ]. Traditionally systematic reviews are completed using an objective and primarily quantitative approach [ 3 ] whereby a comprehensive search is conducted, attempting to identify all relevant articles which are then integrated and assimilated through statistical analysis. The comprehensiveness of the search process has been viewed as a key factor in preventing bias and providing a true representation of available research [ 4 ]. Current research investigating the process of quantitative systematic reviews therefore focuses on methods for ensuring the most comprehensive and bias free searches possible [ 5 ]. Because of the time and resources required to complete a systematic and comprehensive search, efforts have been made to investigate the sensitivity of searches, and thus lessen the amount of time spent reviewing irrelevant articles with no benefit [ 6 ].

However, conducting comprehensive searches also forms the bedrock of qualitative or narrative reviews, now commonly referred to as qualitative evidence syntheses [ 7 ]. Qualitative evidence syntheses are now acknowledged as a necessary and valuable type of information to answer health services research questions [ 8 ]. However, difficulties in completing a sensitive yet comprehensive search of qualitative literature have been previously noted [ 9 ]-[ 11 ] including: poor indexing and use of key words of qualitative studies, the common use of titles that lack the keywords describing the article, and unstructured abstracts.

When devising a search strategy, a search tool is used as an organising framework to list terms by the main concepts in the search question, especially in teams where it is not possible to have an experienced information specialist as a member of the review team. The PICO tool focuses on the Population, Intervention, Comparison and Outcomes of a (usually quantitative) article. It is commonly used to identify components of clinical evidence for systematic reviews in evidence based medicine and is endorsed by the Cochrane Collaboration [ 2 ]. Due to its target literature base several of these search terms such as “control group” and “intervention” are not relevant to qualitative research which traditionally does not utilise control groups or interventions, and therefore may not appropriately locate qualitative research. However, these terms may become more relevant in the future as more trials and interventions incorporate qualitative research [ 12 ].

As the PICO tool does not currently accommodate terms relating to qualitative research or specific qualitative designs, it has often been modified in practice to “PICOS” where the “S” refers to the Study design [ 4 ], thus limiting the number of irrelevant articles.

Cooke et al. also addressed this issue of relevance by developing a new search tool entitled “SPIDER” (sample, phenomenon of interest, design, evaluation, research type), designed specifically to identify relevant qualitative and mixed-method studies [ 9 ]. The key features and differences of the SPIDER and PICO search tools are shown in Table  1 . The addition of the “design” and “research type” categories to the SPIDER tool was intended to further increase the ability of this tool to identify qualitative articles, whilst removing irrelevant PICO categories such as the “comparison” group [ 9 ].

Cooke et al. recommended that the SPIDER tool was tested further in qualitative literature searches [ 9 ]. Although it has been used previously in a scoping review to investigate gaps in an evidence base on community participation in rural health care [ 13 ], SPIDER has not yet been tested and evaluated in a qualitative systematic narrative review context. The authors of this article recently completed a systematic review of the qualitative research investigating experiences of health care services for people with Multiple Sclerosis [ 14 ]. On embarking on this review topic we faced many of the difficulties commonly discussed in identifying qualitative literature on a given topic, and identified SPIDER as a potential way of overcoming some of these difficulties. Therefore, the aim of this article was to test SPIDER by broadly replicating the work of Cooke et al. [ 9 ], specifically by comparing the two approaches: 1) the traditional PICO method of searching electronic databases with 2) the newly devised SPIDER tool, developed for qualitative and mixed-method research. In addition we wished to build and expand on the work of Cooke et al. [ 9 ] and so our third aim was to compare PICO and SPIDER to a modified PICO with qualitative study designs (PICOS, see Table  1 by investigating specificity and sensitivity across 3 major databases.

Inclusion and exclusion criteria

Studies eligible for inclusion were those that qualitatively investigated patients’ experiences, views, attitudes to and perceptions of health care services for Multiple Sclerosis. No date restriction was imposed on searches as this was an original review. Qualitative research, for this purpose, was defined by the Cochrane qualitative methods group [ 7 ] as using both a qualitative data collection method and qualitative analysis. Quantitative and mixed method studies were therefore excluded.

We define experience as “ Patients’ reports of how care was organised and delivered to meet their needs p.301” [ 15 ]. Patients’ reports could refer to either experience of health care services delivery and organisation overall or their experiences of care by specific health care personnel. We included studies that investigated adults (aged 18 years old and older) with a diagnosis of Multiple Sclerosis, who had experience of utilising health care services at any time point. There were no restrictions on subtype of Multiple Sclerosis, gender, ethnicity or frequency of use of health care. Health care in this sense referred to routine clinical care (either state funded or privately funded) not trial protocols or interventions. Excluded studies included studies that focussed on self-management and studies that investigated quality of life.

Because of the focus on Multiple Sclerosis, studies were excluded if they used a mixed sample of various conditions (e.g. studies reported a mixed sample of people with neurological conditions) or if they used a sample of mixed respondents (i.e. people with Multiple Sclerosis and their carers) where results of patients with Multiple Sclerosis could not be clearly separated. If an article had a section or subtheme on health care services but this was not the main research area of the article, then that article was included; however only data from the relevant subtheme were extracted and included in the findings. Additional exclusion criteria were articles that only described carer or health care professional experiences not patient experiences. Conference abstracts, editorials and commentaries were not included.

Search strategy

For this systematic search we developed a detailed search strategy in collaboration with a specialist librarian and information specialist. This search strategy was tailored to the three largest medical and nursing databases (Ovid MEDLINE, Ovid EMBASE, and EBSCO CINAHL Plus) as in Cooke et al.’s study [ 9 ] and search terms used a mixture of medical subject headings and keywords. To investigate the benefit of the SPIDER,PICO and PICOS tools we used identical search terms but combined them in different ways as shown in Tables  2 , 3 and 4 below.

One reviewer judged titles and abstracts against the inclusion criteria. If a title and abstract met the inclusion criteria then full text copies of all articles were retrieved for further investigation. Two authors reviewed these full text articles independently for relevance to the search aim (i.e. patients/service users with multiple sclerosis, experiences of health care services and qualitative research). Any disagreements were resolved via discussion. Data from included studies were extracted by both reviewers independently to ensure accuracy and then stored on a Microsoft Excel spread sheet. No ethical approval was required for this study.

All searches spanned from database inception until 12th October 2013. As in Cooke et al. [ 9 ], we reviewed our findings based on two metrics; the number of hits generated and of these, the number relevant to the search aim (see Table  5 ).

Number of articles generated

As found in Cooke et al. [ 9 ], PICO created a much greater number of hits compared to SPIDER. A total of 23758 hits were generated using PICO, 448 hits were generated using PICOS and 239 hits were generated using SPIDER. Overall, the average reduction of hits (% across all three databases) was 98.58% for SPIDER vs. PICO, 97.94% for PICO vs. PICOS and 68.64% for PICOS vs. SPIDER. The time spent screening hits for relevant articles equated to weeks for the PICO hits and hours for the PICOS and SPIDER hits.

Proportion of relevant articles

Articles which met the inclusion criteria after full text review are displayed in Table  6 [ 16 ]-[ 33 ]. Examination of the titles and abstracts of the identified articles resulted in the obtainment of 18 full text articles relevant at full text, across all databases and search tools.

For the PICO tool in CINAHL Plus, 5.78% of hits were deemed relevant after the title and abstract stage (78 articles/1350 articles), and 14/78 articles (17.95%) were confirmed to meet the inclusion criteria after full text review. For the PICO tool in MEDLINE, 0.42% of hits were deemed relevant after the title and abstract stage (34 articles/8158 articles) and 12/34 (35.29%) articles were confirmed to meet the inclusion criteria after full text review. For the PICO tool in EMBASE, 0.25% hits were deemed relevant after the title and abstract stage (35 articles/ 14250 articles) and 14/35(40%) articles were confirmed to meet the inclusion criteria after full text review.

For the PICOS tool in CINAHL Plus, 38.36% of articles were relevant after the title and abstract stage (56 articles/146 articles) and 12/56 (21.43%) were confirmed to meet the inclusion criteria after full text review. For the PICOS tool in MEDLINE 14.16% of articles were relevant after the title and abstract stage (16 articles/ 113 articles) and 6/16 (37.5%) were confirmed to meet the inclusion criteria after full text review. For the PICOS tool in EMBASE 7.94% of articles were deemed relevant after the title and abstract stage (15 articles/189 articles) and 7/15 (46.67%) were confirmed to meet the inclusion criteria after full text review.


For the SPIDER tool in CINAHL Plus 38.36% of articles were relevant after the title and abstract stage (56 articles/146 articles) and 12/56 (21.43%) were confirmed to meet the inclusion criteria after full text review. For the SPIDER tool in MEDLINE, 36.81% hits were deemed relevant at the title stage (14 articles/38 articles) and 5/14 articles (35.71%) were confirmed to meet the inclusion criteria after full text review. For the SPIDER tool in EMBASE, 16.36% were relevant at the title stage (9 articles/55 articles) and 3/9 (33.33%) were confirmed to meet the inclusion criteria after full text review.

Sensitivity and specificity

The SPIDER tool identified 13 relevant articles out of 239 articles across all three databases (5.43%) compared to PICOS which identified 13 articles out of 448 articles (2.90%) and PICO which identified 18 articles out of 23758 articles (0.076%). Of the 18 relevant articles identified by the PICO tool, 66.66% came from both MEDLINE and CINAHL Plus (12 articles each), and 72.22% came from EMBASE (13 articles). Of the 13 relevant articles identified by the PICOS tool 46.15% came from MEDLINE (6 articles), 53.84% came from EMBASE (7 articles) and 92.31% came from CINAHL Plus (12 articles). Of the 13 relevant articles identified by SPIDER, 38.46% came from MEDLINE (5 articles) and 23.07% came from EMBASE (3 articles) and 92.30% came from CINAHL Plus (12 articles) Table  7 .

Different articles were found across different tools and databases (as shown in Table  6 ). All three databases were checked for all articles. One article was available in CINAHL Plus but not identified by any of the tools [ 17 ]. Two papers were identified in all databases through all search tools. Five papers were identified in MEDLINE through all search tools, three identified in EMBASE through all search tools and 12 identified in CINAHL through all search tools. Five papers were identified solely in CINAHL Plus, with one of these papers only identified using the PICO search method. One paper was identified by all search tools in EMBASE but not identified by any in MEDLINE. No new studies were identified using the SPIDER or PICOS tools alone in any database.

In this article we addressed the aim of replicating a comparison between the SPIDER, PICOS and PICO search tools. As previously described in Cooke et al. [ 9 ], the SPIDER tool produced a greatly reduced number of initial hits to sift through, however in this study it missed five studies that were identified through the PICO method. This may be partly be explained by the nature of the research question prompting the search. As this study included subthemes of studies whose focus differed from the initial research question (i.e. only a smaller section of the paper related to health care) then it’s possible that these studies were picked up by a broader search but not the highly specific SPIDER search. Other authors researching the process of qualitative literature reviews have previously commented that there appears to be a decision to be made about the benefits of comprehensiveness of findings versus the accuracy of the studies identified [ 11 ]. Given the common nature of using sub-sections of papers for systematic reviews then our findings suggest that comprehensiveness needs to be the key for this type of search.

The PICOS tool was more specific than the PICO tool, but did not identify any additional relevant hits to the SPIDER tool, suggesting it is of approximately equal sensitivity. PICOS identified the same number of papers as the SPIDER tool and both demonstrated a substantially lower number of hits generated than a regular PICO search. The SPIDER tool showed the greatest specificity due the small number of hits generated. This may mean that review teams with very limited resources or time, and who are not aiming for a totally comprehensive search (i.e. in the case of scoping studies), would benefit from using the SPIDER tool. This might be applicable particularly to studies such as qualitative syntheses, where the research aim is theoretical saturation, not a comprehensive search [ 34 ]. In addition, articles written to influence policy often require swift publication, providing another area in which either SPIDER or PICOS might improve current practice.

The issue of time was also related to the number of relevant articles identified per database. Whilst EMBASE generated nearly twice as many hits as MEDLINE, only one additional paper was found. The PICO tool identified all articles, suggesting that where time is not a factor, it might be of more benefit to use this tool, as SPIDER demonstrated lower sensitivity, did not identify any new articles and identified fewer relevant articles than PICO.

Our findings indicate that it is worthwhile testing a chosen search tool across various databases as they produce different results; i.e. CINHAL Plus identified papers not identified in MEDLINE or EMBASE databases. It is therefore important for future research to investigate the potential of the SPIDER vs. PICOS and PICO tools as a base for the recommended comprehensive searching process, by investigating the contribution of the SPIDER and PICOS tools at every stage from the initial search hits, to the final included relevant articles.

As CINAHL is a database dedicated to nursing and allied health research, it was expected that it would produce a greater number of relevant articles than more medically focussed databases [ 10 ], as nursing and allied areas have traditionally been at the forefront of qualitative investigations into Multiple Sclerosis.

SPIDER proved to be a tool designed to formulate search terms easily, as it naturally fits the crucial elements of the search question. However, even though some qualitative keywords are necessary to identify qualitative studies, including the words “ qualitative research ” AND the name of the type of research e.g. “ grounded theory ” might be too restrictive, particularly given the poor use of the qualitative index term, and might partially explain the fewer studies identified by SPIDER in comparison to PICO. Studies not identified by the SPIDER model in MEDLINE and EMBASE databases did not use keywords such as “ qualitative ”, but some described qualitative methods, such as “ phenomenological-hermeneutic ” [ 16 ] or “ interview(s) ” [ 20 ],[ 23 ].

In all PICO searches for MEDLINE and EMBASE the word “ qualitative ” combined with the phrase “ multiple sclerosis ” identified many quantitative studies reporting brain scan assessments that were wholly unrelated to the search aim. This was because the word “ qualitative ” in this context referred to using a qualitative method to provide information about the quality of the scan and any potential flaws [ 35 ]. This caused a problem with specificity, resulting in thousands of inappropriate hits as there was no way to exclude studies with the word “ qualitative ” unless all articles clearly utilised and indexed qualitative research methods in the title, abstract and keywords.

Many studies were excluded at the full text stage on the basis that the samples were mixed: being comprised of either various neurological conditions or mixed groups of people i.e. patients and carers/patients and health care professionals and so forth. Without clearer titles and abstracts, and potentially an indexing phrase that indicates mixed samples, there is no way of avoiding this issue. Excluding the phrases “ caregivers ” or “ health care professionals ” would have excluded any studies that used these phrases (for example in the introduction or implication for future research sections) and therefore it is difficult to see how this could be prevented. A strength and limitation of our study is that whilst it details a real world example of evidence searching, it only addresses one topic. Further research should test these search tools against a wider variety of narrative review and meta-synthesis topics.

SPIDER greatly reduced the initial number of articles identified on a given search due to increased specificity, however because of lower sensitivity omitted many relevant papers. The PICOS tool resulted in an overall more sensitive search, but still demonstrated poor specificity on this topic. Further investigations of the specificity and sensitivity of SPIDER and PICOS on varied topics will be of benefit to research teams with limited time and resources or articles necessary to impact on policy or change current practice. However, where comprehensiveness is a key factor we suggest that the PICO tool should be used preferentially. Part of the lower identification rate for SPIDER (in comparison to PICO) was poor labelling and use of qualitative keywords in indexing studies. As both individual research submissions and journal/database indexers improve, or standardise, the indexing of qualitative studies, it is likely that the relevance of the SPIDER tool will increase. The recommendation for current practice therefore is to use the PICO tool across a variety of databases. In this article we have shown that SPIDER is relevant for those researchers completing systematic narrative reviews of qualitative literature but not as effective as PICO. Future research should investigate the use of SPIDER and PICOS across varied databases.

Authors’ information

Caroly Chew-Graham is part-funded by the National Institute for Health Research (NIHR) Collaborations for Leadership in Applied Health Research and Care West Midlands.

Stephens KR: Systematic reviews: the heart of evidence-based practice. Am Assoc Crit Care Nurs Clin Issu. 2001, 12 (4): 529-538.

Google Scholar  

Higgins JPT, Green S: Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. The Cochrane Collaboration. 2013

Dixon-Woods M, Cavers D, Agarwal S, Annandale E, Arthur A, Harvey J, Hsu R, Katbamna S, Olson R, Smith L, Riley R, Sutton AJ: Conducting a critical interpretive synthesis of the literature on access to healthcare by vulnerable groups. BMC Med Res Methodol. 2006, 6: 3-10.1186/1471-2288-6-35.

Article   Google Scholar  

Centre for reviews and dissemination: Systematic Reviews: CRD’s Guidance for Undertaking Reviews in Health Care. 2006, University of York, York

Helmer D, Savoie I, Green C, Kazanjian A: Evidence-based practice: extending the search to find material for the systematic review. Bull Med Libr Assoc. 2001, 89 (4): 346-352.

CAS   PubMed   PubMed Central   Google Scholar  

Stevinson C, Lawlor DA: Searching multiple databases for systematic reviews: added value or diminishing returns?. Complement Ther Med. 2004, 12: 228-232. 10.1016/j.ctim.2004.09.003.

Article   CAS   PubMed   Google Scholar  

Noyes J: Never mind the qualitative feel the depth! The evolving role of qualitative research in Cochrane intervention reviews. J Res Nurs. 2010, 15: 525-534. 10.1177/1744987110381696.

Noyes J, Popay J, Pearson A, Hannes K, Booth A: Chapter 20: qualitative research and cochrane reviews. Cochrane Handbook for Systematic Reviews of Interventions. Edited by: Higgins JPT, Green S. 2008, 20.1-20.18

Cooke A, Smith D, Booth A: Beyond PICO: the SPIDER tool for qualitative evidence synthesis. Qual Health Res. 2012, 22: 1435-1443. 10.1177/1049732312452938.

Article   PubMed   Google Scholar  

Evans D: Database searches for qualitative research. J Med Library Assoc. 2002, 9: 290-293.

Shaw RL, Booth A, Sutton AJ, Miller T, Smith JA, Young B, Jones DR, Dixon-Woods M: Finding qualitative research: an evaluation of search strategies. BMC Med Res Methodol. 2004, 4: 5-10.1186/1471-2288-4-5.

Article   PubMed   PubMed Central   Google Scholar  

Lewin S, Glenton C, Oxman AD: Use of qualitative methods alongside randomised controlled trials of complex healthcare interventions: methodological study. BMJ 2009, 339(b3496). ..

Kenny A, Hyett N, Sawtell J, Dickson-Swift V, Farmer J, O’Meara P: Community participation in rural health: a scoping review. BMC Health Serv Res. 2013, 13: 64-10.1186/1472-6963-13-64.

Methley AM, Chew-Graham C, Campbell S, Cheraghi-Sohi S: Experiences of UK health care services for people with Multiple Sclerosis: a systematic narrative review. Health Expect 2014, epub ahead of print..

Sinfield P, Baker R, Camosso- Stefinovic J, Colman AM, Tarrant C, Mellon JK, Steward W, Kockelbergh R, Agarwal S: Men’s and carer’s experiences of care for prostate cancer: a narrative literature review. Health Expect. 2009, 12: 301-312. 10.1111/j.1369-7625.2009.00546.x.

Lohne V, Aasgaard T, Caspari S, Slettebo A, Naden D: The lonely battle for dignity: individuals struggling with multiple sclerosis. Nurs Ethics. 2010, 17 (3): 301-311. 10.1177/0969733010361439.

Mackereth PA, Booth K, Hillier VF, Caress A: Reflexology and progressive muscle relaxation training for people with multiple sclerosis: a crossover trial. Complement Ther Clin Pract. 2008, 15: 14-21. 10.1016/j.ctcp.2008.07.002.

Isaksson AK, Ahlström G: Managing chronic sorrow: experiences of patients with multiple sclerosis. J Neurosci Nurs. 2008, 40 (3): 180-191. 10.1097/01376517-200806000-00009.

Edwards RG, Barlow JH, Turner AP: Experiences of diagnosis and treatment among people with multiple sclerosis. J Eval Clin Pract. 2008, 14 (3): 460-464. 10.1111/j.1365-2753.2007.00902.x.

Barker-Collo S, Cartwright C, Read J: Into the unknown: The experiences of individuals living with multiple sclerosis. J Neurosci Nurs. 2006, 38 (6): 435-441. 10.1097/01376517-200612000-00008.

Isaksson AK, Ahlström G: From symptom to diagnosis: illness experiences of multiple sclerosis patients. J Neurosci Nurs. 2006, 38 (4): 229-237. 10.1097/01376517-200608000-00005.

Miller CE, Jezewski MA: Relapsing MS patients’ experiences with glatiramer acetate treatment: a phenomenologic study. J Neurosci Nurs. 2006, 38 (1): 37-41. 10.1097/01376517-200602000-00008.

Johnson J: On receiving the diagnosis of multiple sclerosis: managing the transition. Mult Scler. 2003, 9 (1): 82-88. 10.1191/1352458503ms856oa.

Miller CE, Jezewski MA: A phenomenologic assessment of relapsing MS patients’ experiences during treatment with Interferon Beta-1(*). J Neurosci Nurs. 2001, 33 (5): 240-244. 10.1097/01376517-200110000-00004.

Miller CM: The lived experience of relapsing multiple sclerosis: a phenomenological study. J Neurosci Nurs. 1997, 29 (5): 294-304. 10.1097/01376517-199710000-00003.

Aars H, Bruusgaard D: Chronic disease and sexuality: an interview study on sexual dysfunction in patients with multiple sclerosis. Tidsskrift Den Norske Laegeforening. 1989, 109 (32): 3352-3354.

CAS   Google Scholar  

Rintell DJ, Frankel D, Minden SL, Glanz BI: Patients’ perspectives on quality of mental health care for people with MS. Gen Hosp Psychiatry. 2012, 34 (6): 604-610. 10.1016/j.genhosppsych.2012.04.001.

Laidlaw A, Henwood S: Patients with multiple sclerosis: their experiences and perceptions of the MRI investigation. J Diagn Radiogr Imaging. 2003, 5 (1): 19-25. 10.1017/S146047280300004X.

Koopman W, Schweitzer A: The journey to multiple sclerosis: a qualitative study. J Neurosci Nurs. 1999, 31 (1): 17-26. 10.1097/01376517-199902000-00003.

Hansen AK, Krogh H, Bangsgaard L, Aabling S: Facing the diagnosis. Sygeplejersken. 2008, 4: 52-56.

Loveland CA: The experiences of African americans and euro-americans with multiple sclerosis. Sex Disabil. 1999, 17 (1): 19-35. 10.1023/A:1021499628918.

Moriya R, Suzuki S: A qualitative study relating to the experiences of people with MS: differences by disease severity. Br J Neurosci Nurs. 2011, 7 (4): 593-600. 10.12968/bjnn.2011.7.4.593.

Classen S, Lou JQ: Exploring rehabilitation and wellness needs of people with MS living in South Florida: a pilot study. Intern J MS Care. 2004, 1: 26-31. 10.7224/1537-2073-6.1.26.

Booth A: Cochrane or cock-eyed? How should we conduct systematic reviews of qualitative research? In Qualitative Evidence-Based Practice Conference. Coventry; 2001. [ ]

Chalavi S, Simmons A, Dijkstra H, Barker GJ, Reindeers AATS: Quantitative and qualitative assessment of structural magnetic resonance imaging data in a two-center study. BMC Med Imaging. 2012, 12: 27-10.1186/1471-2342-12-27.

Download references


This study was funded by a School for Primary Care Research PhD studentship from the National Institute of Health Research. Support in selecting search terms is acknowledged from Olivia Walsby, Academic Engagement Librarian at the University of Manchester. We are grateful to Professor Peter Bower for his comments on the protocol.

Author information

Authors and affiliations.

University of Manchester, Centre for Primary Care, Williamson Building, Oxford Road, Manchester, M13 9PL, UK

Abigail M Methley, Stephen Campbell & Sudeh Cheraghi-Sohi

Institute of Primary Care and Health Sciences, Keele University, Keele, UK

Carolyn Chew-Graham

Central Manchester Hospitals Site, Manchester Mental Health and Social Care Trust, Research and Innovation 3rd Floor, Rawnsley Building, Hathersage Road, Manchester, M13 9WL, UK

Rosalind McNally

NIHR Greater Manchester Primary Care Patient Safety Translational Research Centre, Institute of Population Health, The University of Manchester, Manchester, M13 9WL, UK

Stephen Campbell & Sudeh Cheraghi-Sohi

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Abigail M Methley .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

AM designed the study, conducted all searches, appraised all potential studies and wrote and revised the draft manuscript and subsequent manuscripts. SC made significant contributions to the conception and design of the study, assisted with the presentation of findings and assisted with drafting and revising the manuscript. CCG and RM made significant contributions to the conception and design of the study, assisted with the presentation of findings and assisted with drafting and revising the manuscript. SCS conceived and designed the study, assisted with searches, appraised relevant studies and assisted with drafting and revising the manuscript. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Methley, A.M., Campbell, S., Chew-Graham, C. et al. PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res 14 , 579 (2014).

Download citation

Received : 03 February 2014

Accepted : 03 November 2014

Published : 21 November 2014


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Health care
  • Users’ experiences
  • Multiple sclerosis (MS)
  • Research evaluation
  • Qualitative
  • Systematic reviews

BMC Health Services Research

ISSN: 1472-6963

systematic review pico tools

Covidence website will be inaccessible as we upgrading our platform on Monday 23rd August at 10am AEST, / 2am CEST/1am BST (Sunday, 15th August 8pm EDT/5pm PDT) 

PICO: all you need to know

  • Best Practice

Home | Blog | Best Practice | PICO: all you need to know

PICO framework for systematic reviews

PICO is an essential tool for systematic reviewers who are studying the effects of interventions. It’s a simple framework for formulating a research question and deciding on the eligibility of studies for inclusion in the review. It considers four basic elements: 

👩‍👩‍👦‍👦 The Population (or patients)

💊 The Intervention

💊 The Control

📏 The Outcome

Structuring a review question in this way brings useful focus and helps review teams to make a detailed plan of the work ahead. Think of PICO as a methodological golden thread that runs all the way through your review and holds everything together conceptually. 

So how does that work in practice? Let’s walk through the systematic review process step by step to take a look at PICO in action.

1. The review question ❓

The PICO framework is the basis for most review questions, for example:

🌓 “In children with nocturnal enuresis (population), how effective are alarms (intervention) versus drug treatments (comparison) for the prevention of bedwetting (outcome)?”

Sometimes the review question misses out the comparison, for example if the review compares the intervention with no treatment or with usual care:

👟 “How effective is physical therapy (intervention) for reducing foot pain (outcome) in runners with plantar fasciitis (population)?”

🤧 “How effective is echinacea (intervention) in preventing common cold (outcome) in healthy adults (population)?”

2. The search strategy 🔍

PICO then informs the search strategy by providing relevant search terms that will be used to retrieve studies. Although it is good practice to use PICO to inform the search, it is not usually advisable to use the ‘Outcomes’ element of PICO in a search strategy. This is because it can reduce the recall of the search , which risks excluding potentially relevant studies (for example, those that measured the outcome of interest but did not report it in the published paper).

Let’s take a look at the search strategy used for the Cochrane review ‘ Wheat flour fortification with iron and other micronutrients for reducing anaemia and improving iron status in populations ‘ in Figure 1.

systematic review pico tools

The search term “Iron/ or Ferrous Compounds” in line 1 aims to identify studies containing the intervention of interest (fortified wheat flour). Lines 2 to 11 develop this to make the search for studies with a relevant intervention as comprehensive as possible. In this review, the same search terms are used for both the intervention and the comparison (non-fortified wheat flour).

Line 12 relates to the population . This review doesn’t specify particular characteristics of the population in the review title. A review with a narrower scope might, for example, look at pregnant women, or patients with celiac disease. In this example, the reviewers are conducting a broad search and just want to make sure that the studies they retrieve pertain to humans.

The second portion of line 1, “Anemia, Iron-Deficiency/” could relate to the outcomes of interest (iron levels in the study participants will be measured to assess the effectiveness of the intervention). However, anemia and iron deficiency are more likely to be included here as baseline characteristics in the study population (i.e. some of the individuals that are randomized to the treatment or the control arm at the start of the trials will have anemia). That’s because specifying outcomes in the search strategy might exclude relevant data, as we saw in section 2.

3. Screening the studies 👀

Once the search is complete, the PICO-informed inclusion and exclusion criteria support the efficient screening of studies. Conveniently, studies imported into Covidence can be screened alongside a list of inclusion and exclusion criteria at both the title and abstract and the full text review stages.

4. Data extraction 📥

PICO isn’t just used to determine the eligibility of studies for a review. It can also be used to group the study data for analysis. Covidence’s data extraction template uses the PICO elements to organise relevant information about the characteristics of the included studies (Figure 2). It’s easy to use and fully customizable. Sorting study data in this way prepares reviewers for the analysis and synthesis that comes next.

systematic review pico tools

If your review doesn’t follow the PICO format, you can still use Covidence to collect and extract data from included studies. ‘Population’ is just the label applied to the unit that is randomly assigned to the intervention or the comparison. If that unit is not an individual patient but, for example, a school, an eye, or a side of the mouth (hello, dentists! 👋), the data extraction form can still be used to sort the study data according to your needs.

5. Data synthesis 📈

To re-cap, we’ve seen so far that PICO is used to set the inclusion criteria and to organise study data ahead of synthesis. Cochrane makes a useful distinction between these two uses. The first is called the review PICO because it’s what is used in setting the review question. The second is the PICO for each synthesis. This specifies how data will be grouped, or perhaps split, into a number of separate syntheses. These two concepts are distinct from a third, the PICO of the included studies. This is the PICO defined by the individual studies whose data are included in the review.

How would this look in practice? Well, for illustrative purposes (no real trials or real systematic reviews here), the three PICOs could look like those shown in Figure 3. The review PICO produces a relatively broad question. To answer it, reviewers must examine data from eligible studies, each of which has its own PICO ( the PICO of the included studies ). The review team then uses a PICO for each synthesis to make decisions on how to analyse and present the gathered data.

systematic review pico tools

Deciding on exactly how to organise the data for analysis in a review will depend on factors including the type and amount of data. If you have reason to expect that the treatment effect could differ according to a characteristic of the population such as age or sex, you could plan a subgroup analysis. But to avoid the pitfalls of data dredging and to minimise the risk of bias, subgroup analyses should be specified and justified when writing the protocol or project plan. 

Planning is an essential part of conducting a successful systematic review. PICO helps reviewers to plan each stage of a review, from the initial question, through searching and selecting studies, right up to the data collection and synthesis. Taking the time to understand PICO and how it can be used to best effect is a good way to prepare for the work ahead and to ensure your analysis stays focused on, and relevant to, the review question. 

1.  Frandsen, T. F., Nielsen, M. F. B., Lindhardt, C. L., & Eriksen, M. B. (2020). Using the full PICO model as a search tool for systematic reviews resulted in lower recall for some PICO elements. Journal of Clinical Epidemiology , 127 , 69–75.

 2. Field  MS, Mithra  P, Peña-Rosas  JP. Wheat flour fortification with iron and other micronutrients for reducing anaemia and improving iron status in populations. Cochrane Database of Systematic Reviews 2021, Issue 1. Art. No.: CD011302. DOI: 10.1002/14651858.CD011302.pub3.

 3. McKenzie JE, Brennan SE, Ryan RE, Thomson HJ, Johnston RV, Thomas J. Chapter 3: Defining the criteria for including studies and how they will be grouped for the synthesis. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2 (updated February 2021). Cochrane, 2021. Available from .

Save up to 71 hours on your systematic review

Laura Mellor. Portsmouth, UK

Laura Mellor. Portsmouth, UK

Perhaps you'd also like....

Data Extraction Communicate Regularly & Keep a Log for Reporting Checklists

Data Extraction Tip 5: Communicate Regularly

The Covidence Global Scholarship recipients are putting evidence-based research into practice. We caught up with some of the winners to discover the impact of their work and find out more about their experiences.

Data Extraction: Extract the right amount of data

Data Extraction Tip 4: Extract the Right Amount of Data

Data Extraction Pilot The Template

Data Extraction Tip 3: Pilot the Template

Better systematic review management, head office, working for an institution or organisation.

Find out why over 350 of the world’s leading institutions are seeing a surge in publications since using Covidence!

Request a consultation with one of our team members and start empowering your researchers:

By using our site you consent to our use of cookies to measure and improve our site’s performance. Please see our Privacy Policy for more information. 

Pubrica Academy

Pubrica Academy

What are the PICO elements in systematic review?

  • The PICO framework is used in evidence-based practice and especially in evidence-based medicine to formulate a clinical or healthcare related question
  • In a systematic review , the PICO framework is also used to develop literature search strategies to ensure the comprehensive and bias-free searches
  • The Cochrane Handbook for Systematic Reviews of Interventions mentioned using the PICO framework as a model for developing a review question, thus ensures that the relevant components of the question are well defined

A systematic review is a method supporting evidence-based practices and healthcare decisions primarily through quantitative approach whereby a complete search is conducted in attempting to identify all the related publications which are integrated and assimilated through statistical analysis. The crucial factor for a systematic review is a comprehensive search process as it can be viewed as preventing the risk of bias and thus providing quality representation of available research. However, current research investigation process mainly focuses on different methods to ensure the comprehensive and bias-free searches for quantitative systematic reviews . Because of time consumption and resources requirement in completing a systematic literature search, multiple efforts have been made to study the sensitivity of searches, and thus it reduces the amount of time spent reviewing irrelevant articles which are of no use. Through devising a search strategy , a search tool has been used in organizing framework to list terms in the search questions by the main concepts and such frameworks will also be very helpful for teams doing a systematic review without an experienced information specialist. The PICO framework focuses on the Population, Intervention, Comparison and Outcomes and it is a commonly used tool quantitative systematic review to identify different components of clinical evidence for a systematic review, and the Cochrane Collaboration recognizes it.

Practitioners of Evidence-Based Practice (EBP) often use a specialized framework, called PICO, to form the question and facilitate the literature search. A systematic review question typically focused on narrow parameters and usually fitted into the PICO question format .

P – Patient | Population

Most important characteristics of patients. Examples: Gender, age, and disease or condition

I – Intervention or exposure

Main intervention. Examples: Drug treatment, diagnostic and screening test

C – Comparison or control

Main alternative. Examples: Standard therapy, placebo, no treatment, and a gold standard

O – Outcome

What you are trying to accomplish, improve, measure, affect. Examples: Reduced mortality or morbidity, and improved memory

PICO can be used along with the variant such as PICOS (S-Study design), PICOC (C-Context), and PICOT (T-timeframe).

The PICO model was developed to help structure a well-built clinical question and enable a literature search of relevant citations. Since its introduction, it has played an essential role as a conceptualizing model for evidence-based medicine. The PICO framework will also help in reducing the time and retrieving related documents, thus ensures bias-free quality systematic review , and it also helps to determine the transparency of evidence synthesis results and findings.

  • Wilson, MC, Richardson, WS, Nishikawa, J & Hayward, RS 1995, ‘The well-built clinical question: A key to evidence-based decisions’,  ACP Journal Club , vol. 123, no. 3, pp. A12-A12.
  • The Systematic Review: An Overview American Journal of Nursing: March 2014 – Volume 114 – Issue 3 – p 53-58.DOI: 10.1097/01.NAJ.0000444496.24228.2c
  • Mette Brandt Eriksen, The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: a systematic review, J Med Libr Assoc. 2018 Oct; 106(4): 420–43.DOI: 10.5195/jmla.2018.345
  • Alex Pollock and Eivind Berge, How to do a systematic review, International Journal of Stroke 2018, vol. 13(2) 138–156.DOI: 10.1177/1747493017743796
  • Recent Posts

Scientific Research Division

  • Research proposal for PhD application biotechnology: Background literature writing - July 24, 2020
  • Challenges for research scholars in writing pharmaceutical research grant - July 18, 2020

Related Articles

systematic review pico tools

What is a Systematic Review?

systematic review pico tools

What is the formulation of the research question in systematic review?

What is the risk of bias assessment and different tools used to assess systematic review.


Cleveland Clinic Florida - How to Conduct Systematic Reviews: What is PICO

  • What is PICO
  • Step 1: Choose your topic
  • Step 2: Identify your keywords
  • Step 3: Connect your keywords
  • Step 4: Choose your databases
  • Step 5: Find your subjects
  • Step 6: Run your search
  • Step 7: Apply your criteria
  • Step 8: Manage your citations
  • Outline of the Process
  • Goldblatt Library Assistance
  • Goldblatt Library Homepage This link opens in a new window

Using PICO to formulate a search question

The Cochrane Library Searching using PICOT

systematic review pico tools

What is PICO?

According to the Centre for Evidence Based Medicine (CEBM), well-formed clinical questions are essential in practicing EBM. "To benefit patients and clinicians, such questions need to be both directly relevant to patients' problems and phrased in ways that direct your search to relevant and precise answers." - CEBM, University of Toronto,  Asking Focused Questions

The PICO model is a tool that can help you formulate a good clinical question. Sometimes it's referred to as PICO-T, containing an optional 5th factor. 

  • Asking an Answerable Question (Cochrane Library)
  • PICO Cochrane Library Tutorial (University of Oxford)
  • Question Templates for PICOT (Sonoma State University)

This page was adapted from   (PA/MPH) PICO  by George Washington University, Himmelfarb Health Sciences Library.

  • << Previous: Home
  • Next: Step 1: Choose your topic >>
  • Last Updated: Jul 26, 2023 12:07 PM
  • URL:

PolyU Library

  • The Hong Kong Polytechnic University
  • Guides & Tutorials

Systematic Search for Systematic Review

  • Formulate Research Question Using PICO
  • Introduction
  • Find Systematic Reviews (SR)
  • Databases Selection for Conducting SR
  • Step 1. Set Preferences in EndNote
  • Step 2. Create Groups in EndNote
  • Step 3. Export Search Results from Databases to EndNote
  • Step 4. Add Name of Database to References
  • Step 5. Remove Duplicate Records
  • Step 6. Share References with Teammates
  • Step 7. Find Full Text Articles
  • [Optional] Export References to Excel

Worksheets for Documenting & Reporting Search Process

Here are some resources for you to document and report your search process in a systematic review. 

  • Workbook for documenting systematic search
  • PRISMA Flow Diagram A flow diagram to depict the flow of information through the different phases of a systematic review. It maps out the number of records identified, included and excluded, and the reasons for exclusions.

Understanding SR

  • What are systematic reviews? (Cochrane)
  • Intro to Systematic Reviews & Meta-Analyses
  • Using PICO to formulate a search question   (CEBM)
  • Turning search terms into a search   (CEBM)
  • Turning your search strategy into results: PubMed demonstration   (CEBM)

Understanding study design

  • What is a randomised trial?
  • Epidemiology Study Types: Randomized Control Trial
  • Epidemiology Study Types: Cohort and Case-Control
  • Cohort, Case-Control, Meta-Analysis, Cross-sectional Study Designs & Definition 

Copyright Disclaimer

Creative Commons License

Except where otherwise noted, the content of this guide is licensed under a  CC BY-NC 4.0 License .

A systematic review aims to answer a specific research (clinical) question. A well-formulated question will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, and presenting findings ( Cochrane Handbook , Sec. 5.1.1).

To define a  researchable  question, the most commonly used structure is  PICO , which specifies the type of P atient or P opulation, type of I nterventions (and C omparisons if there is any), and the type of O utcomes that are of interest. 

The table below gives an example on how a research question is framed using the PICO structure. You may also use the PICO components to write the objective and title of your review, and later to structure your inclusion and exclusion criteria for study selection. This ensures that the whole review process is guided by your research question. 

Type of Question and Study Design

While formulating your research question, it's also important to consider the  type of question  you are asking because this will affect the type of studies (or study design ) to be included in your review.

Each type of question defines its type of studies in order to provide the best evidence. For example, to answer a therapeutic question, you need to include as many Randomized Controlled Trials (RCTs) as possible, because RCTs are considered to have the highest  level of evidence  (least bias) for solving a therapeutic problem. 

The table below suggests the best designs for specific type of question. The Level of Evidence pyramid, which is widely adopted in the medical research area, shows a hierarchy of the quality of medical research evidence in different type of studies ( Level of Evidence (2011), Oxford Centre for Evidence-based Medicine, CEBM ).

Usually, the study design of a research work will be clearly indicated either in its title or abstract, especially for RCT. Some databases also allow to search or refine results to one or a few study designs, which helps you locate as many as possible the relevant studies. If you are not sure the study design of a research work, refer to this brief guide for spotting study designs  (by CEBM).

Learn to Build a Good Clinical Question

Learn to build a good clinical question  from this  EBP Tutorial: Module 1:  "Introduction to Evidence-Based Practice"

It is provided by Duke University and University of North Carolina at Chapel Hill, USA.

PICO Framework and the Question Statement The above named section  in the Library guide:  Evidence-Based Practice in Health , provided by the University of Canberra Library, explains the PICO framework with examples and in various question types.

Documenting Your Search Process

Systematic review requires a detailed and structured reporting of the search strategy and selection criteria used in the review. Therefore we strongly advise you to document your search process from the very beginning. You may use this workbook  to help you with the documentation.

The documentation should include:

  • Research concepts in PICO structure and research question ,
  • Type of studies you intend to include, and
  • Inclusion and exclusion criteria in PICO structure

and the whole search process, including:

  • Databases searched (hosting platforms) , including journals and other sources covered in handsearching
  • Date of search
  • Search strategy , including keywords and subject headings used, the combination of searches (usually copy-paste from database search page)
  • Filters used in initial search or refine results, including year coverage, type of studies, age, etc.
  • Number of results retrieved after each search and refinement in each database
  • Total number of results from all databases searched
  • Duplicates identified from all results
  • Number of results with full text

Eventually, you will need to include the information above when you start writing your review. A highly recommended structure for reporting the search process is the PRISMA Flow Diagram . You may also use PRISMA Flow Diagram Generator to generate a diagram in a different format (based on your input). 

  • << Previous: Find Systematic Reviews (SR)
  • Next: Databases Selection for Conducting SR >>
  • Last Updated: Oct 30, 2023 10:57 AM
  • URL:

Becker Medical Library logotype

  • Library Hours
  • (314) 362-7080
  • [email protected]
  • Library Website
  • Electronic Books & Journals
  • Database Directory
  • Catalog Home
  • Library Home

Systematic Review Guide

  • Preliminary Search
  • PICO format
  • Full Text Retrieval
  • Covidence Screening Application
  • Evidence Grading & Appraisal
  • Risk of Bias Tools
  • Manuscript Preparation
  • Other Review Types

On the Systematic Review Request form you will be asked to outline your research question in PICO format. This allows us to easily understand the main concepts of your research question. Here is what PICO stands for:

P = Problem/Population

I = Intervention (or the experimental variable)

C = Comparison (or the control variable) [Optional]

O = Outcome

If your research question does not fit neatly into PICO that is okay. Just try to include the elements of your question as closely as possible into the format. Your collaborating librarian will discuss any questions or concerns about your research topic before putting together your systematic review search strategy.

  • << Previous: Preliminary Search
  • Next: Full Text Retrieval >>
  • Last Updated: Nov 6, 2023 1:27 PM
  • URL:

A Review of the PubMed PICO Tool: Using Evidence-Based Practice in Health Education


  • 1 Walden University, Minneapolis, MN, USA.
  • PMID: 31874567
  • DOI: 10.1177/1524839919893361

The PubMed PICO (Patient, Intervention, Comparison, Outcome) tool from the National Library of Medicine provides health education professionals and students a method to conduct evidence-based practice literature searches to enhance the quality of new and existing health education interventions and programs. This review provides an overview on evidence-based practice, an overview of the National Commission for Health Education Credentialing Inc. competencies related to evidence-based practice. It introduces the PubMed PICO tool and provides suggestions on how health education professionals can use the tool more effectively. Through the use of the PubMed PICO tool, health education students and professionals can enhance their literature search strategies to help ensure a comprehensive and exhaustive literature review.

Keywords: career development/professional preparation; health research; program planning and evaluation; technology.

Publication types

  • Education, Professional*
  • Evidence-Based Practice*
  • Health Personnel* / education

Systematic Reviews and Other Expert Reviews

  • Welch Systematic Review Collaboration

Initiate Your Review

Identification of evidence, selection of evidence, evaluation of evidence, analysis and interpretation of evidence, report your findings.

  • Systematic Review Standards and Guidelines
  • Types of Expert Reviews
  • Expert Searching Tips
  • More Welch Guides
  • Copyright and Attribution Statement
  • Is a Review Needed?
  • Define Your Question
  • Assemble Your Team
  • Develop Your Protocol
  • Tools for Question Formulation

Prior to starting your project, you should search the literature to determine if a review has recently been published on your topic of interest. Depending on the quality of a recent review, you may decide your planned review is not needed at this time and/or that your review question should change.

For help with searching the literature for recent reviews, contact your Welch Medical Library Informationist . 

A well-thought-out and clearly formulated research question is the foundation for a successful systematic review.  It provides the basis for the search terms used to identify evidence and informs the generation of inclusion/exclusion criteria used to select evidence.  A research question includes multiple components, which may include a population, an exposure or intervention, a setting or context, a comparison, an outcome, and a timeframe.  The inclusion/exclusion criteria specifically address these components and are used to decide which studies will be included in or excluded from the review.

For more information about the process of defining the research question and developing criteria for inclusion see Core Methods section of the Cochrane Handbook .

The PICO (Patient or Problem, Intervention, Comparison, Outcome) framework is a commonly used tool for formulating research questions. Davies (2011) reviews PICO and numerous PICO-derived frameworks for developing evidence based practice questions.

Systematic reviews are specialized research projects that require multiple types of expertise.  These include subject matter knowledge, familiarity with systematic review methodology, skills for searching in a variety of databases, and expertise in statistical methods.

The inclusion of a trained information professional on the team has become part of established standards for high quality systematic reviews (   IOM , Cochrane , AHRQ ).  Contact your Welch Informationist about becoming a part of your team.

A protocol is a key document that outlines how the many steps of a systematic review will be carried out.  It provides a roadmap to the review team and shows transparently how the review will be conducted.

It's a good idea to register your protocol.  It increases transparency, minimizes bias, and reduces the redundancy of groups working on the same topics ( PLoS Medicine Editors, 2011 ).

The following registries publish protocols of systematic reviews and include information about registering a protocol:

  • The Cochrane Library
  • The Campbell Collaboration Online Library
  • JBI Systematic Review Register
  • CAMARADES-NC3Rs Protocols

Contact your Welch Informationist with questions about developing your protocol and where to register it.

The following resources list commonly used frameworks for developing research questions:

  • McMaster University
  • Welch Library Expert Searching Guide - Formulating Your Research Question

The following are free tools for analyzing topics and visualizing connections within large sets of documents:

  • SWIFT-Review
  • General Priniciples
  • Search Filters
  • Grey Literature
  • Handsearching & Citation Searching

Databases must be searched as comprehensively as possible in order to identify all of the documents relevant to your systematic review.  The following components are used in comprehensive searching:

  • Controlled vocabulary terms (e.g. as found in PubMed's MeSH Database )
  • Keywords (also called natural language terms or text words)
  • Boolean operators (AND, OR) and/or proximity operators
  • Database-specific syntax that allows for the searching of phrases, parts of documents, variants of terms, etc.

Visit our  Expert Searching  guide to learn more about these and other searching concepts. 

When Welch informationists are on systematic review teams, they develop comprehensive search strategies for multiple databases.

When searching for health and medical information you are expected to search PubMed , which includes the MEDLINE database of indexed citations. PubMed contains a very large collection of literature, but to be comprehensive you will need to supplement your PubMed search with additional databases. Always check the requirements or recommendations specified in guidelines for the type of review you're conducting (See this guide's " Standards and Guidelines " page for a list of guidelines).  

Things to consider when selecting additional databases include

  • subject area(s),
  • regional coverage, and
  • language coverage.

The Welch Medical Library website offers a searchable and browsable list of Hopkins-subscribed databases. This list covers topics including biomedicine, health, engineering, business, economics, criminal justice, public policy, etc. In addition, the Hopkins Sheridan Libraries maintain a collection of subject guides that link to subject-specific databases.

Search filters, also known as "search hedges," are pre-developed search strategies for topics and study types. We recommend only using validated filters that have been tested for reliability and accuracy.  

Do not confuse filters with pre-set limits in databases, even when these limits are called filters (as they are in PubMed). Pre-set limits such as "Article types," "Humans," and "Ages" will limit the search to controlled vocabulary terms, and exclude articles that would be found with the combination of controlled vocabulary and keywords used by validated filters.

See this guide's " Expert Searching Tips " page for more information about search filters, including examples of validated filters for finding randomized controlled trials. Contact your Welch Informationist for questions about the appropriate use of search filters.

Grey literature is information produced by government agencies, academic institutions, and the for-profit sector that is not made available by commercial publishers. This kind of literature can be more difficult to find than the journal literature, but you may wish to include it because it may contain unique and relevant evidence related to your review.  

Examples of grey literature document types have been compiled by GreyNet International and include clinical trials, reports, proceedings, dissertations and theses, white papers, newsletters, and patents. Some of these types of documents can be found along with the journal literature in some databases, but grey literature searches often require the use of other specialized sources.

Contact your Welch Informationist for questions about grey literature and how to search for it.

Handsearching involves the page by page examination of key journals or conference proceedings for relevant documents.

Handsearching may identify documents that are missed by database searching because of how articles are described, or indexed, in databases such as PubMed or Embase. Gaps in indexing may result from whole journals not being indexed, portions of journals not being indexed, lack of appropriate indexing terms, inconsistencies in indexing, or poorly reported research that does not allow accurate indexing ( Hopewell, Clarke, Lefebvre, & Scherer, 2007 ). 

Citation searching involves manually exploring the citation networks of selected relevant articles as another way of identifying documents that may have been missed through other means of searching. Forward searching identifies later articles that have cited articles of interest.  Backward searching identifies earlier articles from the reference lists of articles of interest. The databases  Scopus  and Web of Science  have excellent features for citation searching.

Contact your  Welch Informationist with questions about best practices for handsearching and citation searching.

The following are free tools for identifying MeSH terms and keywords from text or articles:

  • MeSH on Demand - MeSH terms only
  • Yale MeSH Analyzer - MeSH terms and author-supplied keywords for up to 20 articles
  • PubMed PubReMiner - MeSH terms and title/abstract keywords for searches up to 10,000 articles
  • Screening Studies
  • Citation Management Tools
  • Screening Tools

A citation management tool is needed to organize the results of searches and to remove and store duplicates. Typically, records are then exported from the citation management tool to a screening tool.

A pre-established set of inclusion/exclusion criteria are employed when screening studies.

In systematic reviews, the selection of studies is a two-step process that includes a title/abstract review followed by a full-text review. Studies are reviewed according to the inclusion/exclusion criteria by two reviewers, who must agree to include or exclude. Any conflicts are resolved by a third reviewer.

EndNote is the preferred tool for large systematic reviews because of the following features:

  • Bulk import of large files
  • Advanced options for removing duplicate records
  • Bulk retrieval of full text

RefWorks is sometimes used for smaller reviews for the following reasons:

  • An institution-wide subscription for the Johns Hopkins community
  • Collaborative features

Information about these and other citation management tools are available from the library's Citation Management  guide.

Tools that are Best Used for Screening (Basic Data Abstraction Features, If Any)

  • Covidence - Johns Hopkins has an institution-wide subscription and the Welch Medical Library provides support for Covidence.
  • abstrackr (free)
  • Rayyan (free)

Tools that Offer More Advanced Features (Beyond Screening)

  • DistillerSR ($)
  • EPPI-Reviewer ($)
  • HAWC - Health Assessment Workspace Collaborative (free) - toxicology
  • PICO Portal (free/$)
  • SWIFT-Active Screener ($)
  • sysrev (free/$)
  • Systematic Review Facility (free) - developed for animal studies

Additional tools can be found by searching the Systematic Review Toolbox (SR Toolbox).

  • Data Abstraction
  • Study Appraisal
  • Data Abstraction Tools
  • Study Appraisal Tools

Data abstraction is the systematic collection of data elements from included studies in a review. Data elements could include study design, intervention, outcomes measured, results, etc.  This is best achieved through a standardized form derived from your research question and inclusion/exclusion criteria.  Pilot the use of this form before full implementation.

Study appraisal systematically examines factors such as

  • the appropriateness of study design,
  • outcome measures,
  • methodological quality and the risk of bias, and
  • the quality of reporting.

Tools that Have Robust Data Abstraction Features:

  • Covidence (licensed through Hopkins)
  • HAWC - Health Assessment Workspace Collaborative (free) - developed for toxicology
  • Qualtrics  - (licensed through Hopkins) survey tool for data capture
  • REDCap - (licensed through Hopkins) Research Electronic Data Capture
  • SRDR - Systematic Review Data Repository (free)

Tools for Measuring Study Quality

  • Centre for Evidence-Based Medicine (CEBM) - Critical Appraisal Tools
  • Cochrane Collaboration's Risk of Bias Tool
  • Cochrane GRADE Training
  • GRADE Handbook
  • GRADEpro GDT - GRADE software
  • GRADE-CERQUAL - for assessing the quality of reviews of qualitative research
  • Jadad Scale (see article appendix)
  • Newcastle-Ottawa Scale - quality assessment scale for non-randomized studies
  • OHAT Risk of Bias Rating Tool for Human and Animal Studies
  • SYRCLE's Risk of Bias Tool - for animal studies
  • Synthesis of Evidence

The synthesis of evidence includes a qualitative synthesis of included studies which describes study methodology, strengths and limitations, patterns across studies, potential bias in study design, and the relevance of studies to the populations, comparisons, cointerventions, settings, and outcomes or measures of interest. 

The synthesis of evidence may also include a meta-analysis that pools data from included studies.  A meta-analysis will address the heterogeneity among study effects, statistical uncertainty, and the sensitivity of conclusions to changes in the protocol, assumptions, and study selection ( Institute of Medicine, 2011 ).

Tools for the Synthesis of Evidence

  • metafor (free) - a meta-analysis package for R
  • OpenMeta[Analyst] (free)
  • RevMan (free) - designed for the Cochrane Review community

Additional tools can be found by searching the Systematic Review Toolbox (SR Toolbox).  

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) is an established guideline for reporting systematic reviews.

  • PRISMA Statement
  • PRISMA Explanation and Elaboration
  • PRISMA Checklist
  • PRISMA Flow Diagram - documents the screening, inclusion, and exclusion of records
  • PRISMA Extensions - includes reporting guidelines for different types or aspects of systematic reviews

See the Standards and Guidelines  page of this guide for other evidence synthesis reporting guidelines.

  • << Previous: Welch Systematic Review Collaboration
  • Next: Systematic Review Standards and Guidelines >>
  • Last Updated: Feb 6, 2024 9:28 AM
  • URL:

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • PMC10248995

Logo of sysrev

Guidance to best tools and practices for systematic reviews

Kat kolaski.

1 Departments of Orthopaedic Surgery, Pediatrics, and Neurology, Wake Forest School of Medicine, Winston-Salem, NC USA

Lynne Romeiser Logan

2 Department of Physical Medicine and Rehabilitation, SUNY Upstate Medical University, Syracuse, NY USA

John P. A. Ioannidis

3 Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University School of Medicine, Stanford, CA USA

Associated Data

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy.

A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work.

Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13643-023-02255-9.

Part 1. The state of evidence synthesis

Evidence syntheses are commonly regarded as the foundation of evidence-based medicine (EBM). They are widely accredited for providing reliable evidence and, as such, they have significantly influenced medical research and clinical practice. Despite their uptake throughout health care and ubiquity in contemporary medical literature, some important aspects of evidence syntheses are generally overlooked or not well recognized. Evidence syntheses are mostly retrospective exercises, they often depend on weak or irreparably flawed data, and they may use tools that have acknowledged or yet unrecognized limitations. They are complicated and time-consuming undertakings prone to bias and errors. Production of a good evidence synthesis requires careful preparation and high levels of organization in order to limit potential pitfalls [ 1 ]. Many authors do not recognize the complexity of such an endeavor and the many methodological challenges they may encounter. Failure to do so is likely to result in research and resource waste.

Given their potential impact on people’s lives, it is crucial for evidence syntheses to correctly report on the current knowledge base. In order to be perceived as trustworthy, reliable demonstration of the accuracy of evidence syntheses is equally imperative [ 2 ]. Concerns about the trustworthiness of evidence syntheses are not recent developments. From the early years when EBM first began to gain traction until recent times when thousands of systematic reviews are published monthly [ 3 ] the rigor of evidence syntheses has always varied. Many systematic reviews and meta-analyses had obvious deficiencies because original methods and processes had gaps, lacked precision, and/or were not widely known. The situation has improved with empirical research concerning which methods to use and standardization of appraisal tools. However, given the geometrical increase in the number of evidence syntheses being published, a relatively larger pool of unreliable evidence syntheses is being published today.

Publication of methodological studies that critically appraise the methods used in evidence syntheses is increasing at a fast pace. This reflects the availability of tools specifically developed for this purpose [ 4 – 6 ]. Yet many clinical specialties report that alarming numbers of evidence syntheses fail on these assessments. The syntheses identified report on a broad range of common conditions including, but not limited to, cancer, [ 7 ] chronic obstructive pulmonary disease, [ 8 ] osteoporosis, [ 9 ] stroke, [ 10 ] cerebral palsy, [ 11 ] chronic low back pain, [ 12 ] refractive error, [ 13 ] major depression, [ 14 ] pain, [ 15 ] and obesity [ 16 , 17 ]. The situation is even more concerning with regard to evidence syntheses included in clinical practice guidelines (CPGs) [ 18 – 20 ]. Astonishingly, in a sample of CPGs published in 2017–18, more than half did not apply even basic systematic methods in the evidence syntheses used to inform their recommendations [ 21 ].

These reports, while not widely acknowledged, suggest there are pervasive problems not limited to evidence syntheses that evaluate specific kinds of interventions or include primary research of a particular study design (eg, randomized versus non-randomized) [ 22 ]. Similar concerns about the reliability of evidence syntheses have been expressed by proponents of EBM in highly circulated medical journals [ 23 – 26 ]. These publications have also raised awareness about redundancy, inadequate input of statistical expertise, and deficient reporting. These issues plague primary research as well; however, there is heightened concern for the impact of these deficiencies given the critical role of evidence syntheses in policy and clinical decision-making.

Methods and guidance to produce a reliable evidence synthesis

Several international consortiums of EBM experts and national health care organizations currently provide detailed guidance (Table ​ (Table1). 1 ). They draw criteria from the reporting and methodological standards of currently recommended appraisal tools, and regularly review and update their methods to reflect new information and changing needs. In addition, they endorse the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for rating the overall quality of a body of evidence [ 27 ]. These groups typically certify or commission systematic reviews that are published in exclusive databases (eg, Cochrane, JBI) or are used to develop government or agency sponsored guidelines or health technology assessments (eg, National Institute for Health and Care Excellence [NICE], Scottish Intercollegiate Guidelines Network [SIGN], Agency for Healthcare Research and Quality [AHRQ]). They offer developers of evidence syntheses various levels of methodological advice, technical and administrative support, and editorial assistance. Use of specific protocols and checklists are required for development teams within these groups, but their online methodological resources are accessible to any potential author.

Guidance for development of evidence syntheses

Notably, Cochrane is the largest single producer of evidence syntheses in biomedical research; however, these only account for 15% of the total [ 28 ]. The World Health Organization requires Cochrane standards be used to develop evidence syntheses that inform their CPGs [ 29 ]. Authors investigating questions of intervention effectiveness in syntheses developed for Cochrane follow the Methodological Expectations of Cochrane Intervention Reviews [ 30 ] and undergo multi-tiered peer review [ 31 , 32 ]. Several empirical evaluations have shown that Cochrane systematic reviews are of higher methodological quality compared with non-Cochrane reviews [ 4 , 7 , 9 , 11 , 14 , 32 – 35 ]. However, some of these assessments have biases: they may be conducted by Cochrane-affiliated authors, and they sometimes use scales and tools developed and used in the Cochrane environment and by its partners. In addition, evidence syntheses published in the Cochrane database are not subject to space or word restrictions, while non-Cochrane syntheses are often limited. As a result, information that may be relevant to the critical appraisal of non-Cochrane reviews is often removed or is relegated to online-only supplements that may not be readily or fully accessible [ 28 ].

Influences on the state of evidence synthesis

Many authors are familiar with the evidence syntheses produced by the leading EBM organizations but can be intimidated by the time and effort necessary to apply their standards. Instead of following their guidance, authors may employ methods that are discouraged or outdated 28]. Suboptimal methods described in in the literature may then be taken up by others. For example, the Newcastle–Ottawa Scale (NOS) is a commonly used tool for appraising non-randomized studies [ 36 ]. Many authors justify their selection of this tool with reference to a publication that describes the unreliability of the NOS and recommends against its use [ 37 ]. Obviously, the authors who cite this report for that purpose have not read it. Authors and peer reviewers have a responsibility to use reliable and accurate methods and not copycat previous citations or substandard work [ 38 , 39 ]. Similar cautions may potentially extend to automation tools. These have concentrated on evidence searching [ 40 ] and selection given how demanding it is for humans to maintain truly up-to-date evidence [ 2 , 41 ]. Cochrane has deployed machine learning to identify randomized controlled trials (RCTs) and studies related to COVID-19, [ 2 , 42 ] but such tools are not yet commonly used [ 43 ]. The routine integration of automation tools in the development of future evidence syntheses should not displace the interpretive part of the process.

Editorials about unreliable or misleading systematic reviews highlight several of the intertwining factors that may contribute to continued publication of unreliable evidence syntheses: shortcomings and inconsistencies of the peer review process, lack of endorsement of current standards on the part of journal editors, the incentive structure of academia, industry influences, publication bias, and the lure of “predatory” journals [ 44 – 48 ]. At this juncture, clarification of the extent to which each of these factors contribute remains speculative, but their impact is likely to be synergistic.

Over time, the generalized acceptance of the conclusions of systematic reviews as incontrovertible has affected trends in the dissemination and uptake of evidence. Reporting of the results of evidence syntheses and recommendations of CPGs has shifted beyond medical journals to press releases and news headlines and, more recently, to the realm of social media and influencers. The lay public and policy makers may depend on these outlets for interpreting evidence syntheses and CPGs. Unfortunately, communication to the general public often reflects intentional or non-intentional misrepresentation or “spin” of the research findings [ 49 – 52 ] News and social media outlets also tend to reduce conclusions on a body of evidence and recommendations for treatment to binary choices (eg, “do it” versus “don’t do it”) that may be assigned an actionable symbol (eg, red/green traffic lights, smiley/frowning face emoji).

Strategies for improvement

Many authors and peer reviewers are volunteer health care professionals or trainees who lack formal training in evidence synthesis [ 46 , 53 ]. Informing them about research methodology could increase the likelihood they will apply rigorous methods [ 25 , 33 , 45 ]. We tackle this challenge, from both a theoretical and a practical perspective, by offering guidance applicable to any specialty. It is based on recent methodological research that is extensively referenced to promote self-study. However, the information presented is not intended to be substitute for committed training in evidence synthesis methodology; instead, we hope to inspire our target audience to seek such training. We also hope to inform a broader audience of clinicians and guideline developers influenced by evidence syntheses. Notably, these communities often include the same members who serve in different capacities.

In the following sections, we highlight methodological concepts and practices that may be unfamiliar, problematic, confusing, or controversial. In Part 2, we consider various types of evidence syntheses and the types of research evidence summarized by them. In Part 3, we examine some widely used (and misused) tools for the critical appraisal of systematic reviews and reporting guidelines for evidence syntheses. In Part 4, we discuss how to meet methodological conduct standards applicable to key components of systematic reviews. In Part 5, we describe the merits and caveats of rating the overall certainty of a body of evidence. Finally, in Part 6, we summarize suggested terminology, methods, and tools for development and evaluation of evidence syntheses that reflect current best practices.

Part 2. Types of syntheses and research evidence

A good foundation for the development of evidence syntheses requires an appreciation of their various methodologies and the ability to correctly identify the types of research potentially available for inclusion in the synthesis.

Types of evidence syntheses

Systematic reviews have historically focused on the benefits and harms of interventions; over time, various types of systematic reviews have emerged to address the diverse information needs of clinicians, patients, and policy makers [ 54 ] Systematic reviews with traditional components have become defined by the different topics they assess (Table 2.1 ). In addition, other distinctive types of evidence syntheses have evolved, including overviews or umbrella reviews, scoping reviews, rapid reviews, and living reviews. The popularity of these has been increasing in recent years [ 55 – 58 ]. A summary of the development, methods, available guidance, and indications for these unique types of evidence syntheses is available in Additional File 2 A.

Types of traditional systematic reviews

Both Cochrane [ 30 , 59 ] and JBI [ 60 ] provide methodologies for many types of evidence syntheses; they describe these with different terminology, but there is obvious overlap (Table 2.2 ). The majority of evidence syntheses published by Cochrane (96%) and JBI (62%) are categorized as intervention reviews. This reflects the earlier development and dissemination of their intervention review methodologies; these remain well-established [ 30 , 59 , 61 ] as both organizations continue to focus on topics related to treatment efficacy and harms. In contrast, intervention reviews represent only about half of the total published in the general medical literature, and several non-intervention review types contribute to a significant proportion of the other half.

Evidence syntheses published by Cochrane and JBI

a Data from . Accessed 17 Sep 2022

b Data obtained via personal email communication on 18 Sep 2022 with Emilie Francis, editorial assistant, JBI Evidence Synthesis

c Includes the following categories: prevalence, scoping, mixed methods, and realist reviews

d This methodology is not supported in the current version of the JBI Manual for Evidence Synthesis

Types of research evidence

There is consensus on the importance of using multiple study designs in evidence syntheses; at the same time, there is a lack of agreement on methods to identify included study designs. Authors of evidence syntheses may use various taxonomies and associated algorithms to guide selection and/or classification of study designs. These tools differentiate categories of research and apply labels to individual study designs (eg, RCT, cross-sectional). A familiar example is the Design Tree endorsed by the Centre for Evidence-Based Medicine [ 70 ]. Such tools may not be helpful to authors of evidence syntheses for multiple reasons.

Suboptimal levels of agreement and accuracy even among trained methodologists reflect challenges with the application of such tools [ 71 , 72 ]. Problematic distinctions or decision points (eg, experimental or observational, controlled or uncontrolled, prospective or retrospective) and design labels (eg, cohort, case control, uncontrolled trial) have been reported [ 71 ]. The variable application of ambiguous study design labels to non-randomized studies is common, making them especially prone to misclassification [ 73 ]. In addition, study labels do not denote the unique design features that make different types of non-randomized studies susceptible to different biases, including those related to how the data are obtained (eg, clinical trials, disease registries, wearable devices). Given this limitation, it is important to be aware that design labels preclude the accurate assignment of non-randomized studies to a “level of evidence” in traditional hierarchies [ 74 ].

These concerns suggest that available tools and nomenclature used to distinguish types of research evidence may not uniformly apply to biomedical research and non-health fields that utilize evidence syntheses (eg, education, economics) [ 75 , 76 ]. Moreover, primary research reports often do not describe study design or do so incompletely or inaccurately; thus, indexing in PubMed and other databases does not address the potential for misclassification [ 77 ]. Yet proper identification of research evidence has implications for several key components of evidence syntheses. For example, search strategies limited by index terms using design labels or study selection based on labels applied by the authors of primary studies may cause inconsistent or unjustified study inclusions and/or exclusions [ 77 ]. In addition, because risk of bias (RoB) tools consider attributes specific to certain types of studies and study design features, results of these assessments may be invalidated if an inappropriate tool is used. Appropriate classification of studies is also relevant for the selection of a suitable method of synthesis and interpretation of those results.

An alternative to these tools and nomenclature involves application of a few fundamental distinctions that encompass a wide range of research designs and contexts. While these distinctions are not novel, we integrate them into a practical scheme (see Fig. ​ Fig.1) 1 ) designed to guide authors of evidence syntheses in the basic identification of research evidence. The initial distinction is between primary and secondary studies. Primary studies are then further distinguished by: 1) the type of data reported (qualitative or quantitative); and 2) two defining design features (group or single-case and randomized or non-randomized). The different types of studies and study designs represented in the scheme are described in detail in Additional File 2 B. It is important to conceptualize their methods as complementary as opposed to contrasting or hierarchical [ 78 ]; each offers advantages and disadvantages that determine their appropriateness for answering different kinds of research questions in an evidence synthesis.

An external file that holds a picture, illustration, etc.
Object name is 13643_2023_2255_Fig1_HTML.jpg

Distinguishing types of research evidence

Application of these basic distinctions may avoid some of the potential difficulties associated with study design labels and taxonomies. Nevertheless, debatable methodological issues are raised when certain types of research identified in this scheme are included in an evidence synthesis. We briefly highlight those associated with inclusion of non-randomized studies, case reports and series, and a combination of primary and secondary studies.

Non-randomized studies

When investigating an intervention’s effectiveness, it is important for authors to recognize the uncertainty of observed effects reported by studies with high RoB. Results of statistical analyses that include such studies need to be interpreted with caution in order to avoid misleading conclusions [ 74 ]. Review authors may consider excluding randomized studies with high RoB from meta-analyses. Non-randomized studies of intervention (NRSI) are affected by a greater potential range of biases and thus vary more than RCTs in their ability to estimate a causal effect [ 79 ]. If data from NRSI are synthesized in meta-analyses, it is helpful to separately report their summary estimates [ 6 , 74 ].

Nonetheless, certain design features of NRSI (eg, which parts of the study were prospectively designed) may help to distinguish stronger from weaker ones. Cochrane recommends that authors of a review including NRSI focus on relevant study design features when determining eligibility criteria instead of relying on non-informative study design labels [ 79 , 80 ] This process is facilitated by a study design feature checklist; guidance on using the checklist is included with developers’ description of the tool [ 73 , 74 ]. Authors collect information about these design features during data extraction and then consider it when making final study selection decisions and when performing RoB assessments of the included NRSI.

Case reports and case series

Correctly identified case reports and case series can contribute evidence not well captured by other designs [ 81 ]; in addition, some topics may be limited to a body of evidence that consists primarily of uncontrolled clinical observations. Murad and colleagues offer a framework for how to include case reports and series in an evidence synthesis [ 82 ]. Distinguishing between cohort studies and case series in these syntheses is important, especially for those that rely on evidence from NRSI. Additional data obtained from studies misclassified as case series can potentially increase the confidence in effect estimates. Mathes and Pieper provide authors of evidence syntheses with specific guidance on distinguishing between cohort studies and case series, but emphasize the increased workload involved [ 77 ].

Primary and secondary studies

Synthesis of combined evidence from primary and secondary studies may provide a broad perspective on the entirety of available literature on a topic. This is, in fact, the recommended strategy for scoping reviews that may include a variety of sources of evidence (eg, CPGs, popular media). However, except for scoping reviews, the synthesis of data from primary and secondary studies is discouraged unless there are strong reasons to justify doing so.

Combining primary and secondary sources of evidence is challenging for authors of other types of evidence syntheses for several reasons [ 83 ]. Assessments of RoB for primary and secondary studies are derived from conceptually different tools, thus obfuscating the ability to make an overall RoB assessment of a combination of these study types. In addition, authors who include primary and secondary studies must devise non-standardized methods for synthesis. Note this contrasts with well-established methods available for updating existing evidence syntheses with additional data from new primary studies [ 84 – 86 ]. However, a new review that synthesizes data from primary and secondary studies raises questions of validity and may unintentionally support a biased conclusion because no existing methodological guidance is currently available [ 87 ].


We suggest that journal editors require authors to identify which type of evidence synthesis they are submitting and reference the specific methodology used for its development. This will clarify the research question and methods for peer reviewers and potentially simplify the editorial process. Editors should announce this practice and include it in the instructions to authors. To decrease bias and apply correct methods, authors must also accurately identify the types of research evidence included in their syntheses.

Part 3. Conduct and reporting

The need to develop criteria to assess the rigor of systematic reviews was recognized soon after the EBM movement began to gain international traction [ 88 , 89 ]. Systematic reviews rapidly became popular, but many were very poorly conceived, conducted, and reported. These problems remain highly prevalent [ 23 ] despite development of guidelines and tools to standardize and improve the performance and reporting of evidence syntheses [ 22 , 28 ]. Table 3.1  provides some historical perspective on the evolution of tools developed specifically for the evaluation of systematic reviews, with or without meta-analysis.

Tools specifying standards for systematic reviews with and without meta-analysis

a Currently recommended

b Validated tool for systematic reviews of interventions developed for use by authors of overviews or umbrella reviews

These tools are often interchangeably invoked when referring to the “quality” of an evidence synthesis. However, quality is a vague term that is frequently misused and misunderstood; more precisely, these tools specify different standards for evidence syntheses. Methodological standards address how well a systematic review was designed and performed [ 5 ]. RoB assessments refer to systematic flaws or limitations in the design, conduct, or analysis of research that distort the findings of the review [ 4 ]. Reporting standards help systematic review authors describe the methodology they used and the results of their synthesis in sufficient detail [ 92 ]. It is essential to distinguish between these evaluations: a systematic review may be biased, it may fail to report sufficient information on essential features, or it may exhibit both problems; a thoroughly reported systematic evidence synthesis review may still be biased and flawed while an otherwise unbiased one may suffer from deficient documentation.

We direct attention to the currently recommended tools listed in Table 3.1  but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update of the 2009 Preferred Reporting Items for Systematic Reviews of Meta-Analyses statement), which offers guidance on reporting standards. The exclusive focus on these three tools is by design; it addresses concerns related to the considerable variability in tools used for the evaluation of systematic reviews [ 28 , 88 , 96 , 97 ]. We highlight the underlying constructs these tools were designed to assess, then describe their components and applications. Their known (or potential) uptake and impact and limitations are also discussed.

Evaluation of conduct


AMSTAR [ 5 ] was in use for a decade prior to the 2017 publication of AMSTAR-2; both provide a broad evaluation of methodological quality of intervention systematic reviews, including flaws arising through poor conduct of the review [ 6 ]. ROBIS, published in 2016, was developed to specifically assess RoB introduced by the conduct of the review; it is applicable to systematic reviews of interventions and several other types of reviews [ 4 ]. Both tools reflect a shift to a domain-based approach as opposed to generic quality checklists. There are a few items unique to each tool; however, similarities between items have been demonstrated [ 98 , 99 ]. AMSTAR-2 and ROBIS are recommended for use by: 1) authors of overviews or umbrella reviews and CPGs to evaluate systematic reviews considered as evidence; 2) authors of methodological research studies to appraise included systematic reviews; and 3) peer reviewers for appraisal of submitted systematic review manuscripts. For authors, these tools may function as teaching aids and inform conduct of their review during its development.


Systematic reviews that include randomized and/or non-randomized studies as evidence can be appraised with AMSTAR-2 and ROBIS. Other characteristics of AMSTAR-2 and ROBIS are summarized in Table 3.2 . Both tools define categories for an overall rating; however, neither tool is intended to generate a total score by simply calculating the number of responses satisfying criteria for individual items [ 4 , 6 ]. AMSTAR-2 focuses on the rigor of a review’s methods irrespective of the specific subject matter. ROBIS places emphasis on a review’s results section— this suggests it may be optimally applied by appraisers with some knowledge of the review’s topic as they may be better equipped to determine if certain procedures (or lack thereof) would impact the validity of a review’s findings [ 98 , 100 ]. Reliability studies show AMSTAR-2 overall confidence ratings strongly correlate with the overall RoB ratings in ROBIS [ 100 , 101 ].

Comparison of AMSTAR-2 and ROBIS

a ROBIS includes an optional first phase to assess the applicability of the review to the research question of interest. The tool may be applicable to other review types in addition to the four specified, although modification of this initial phase will be needed (Personal Communication via email, Penny Whiting, 28 Jan 2022)

b AMSTAR-2 item #9 and #11 require separate responses for RCTs and NRSI

Interrater reliability has been shown to be acceptable for AMSTAR-2 [ 6 , 11 , 102 ] and ROBIS [ 4 , 98 , 103 ] but neither tool has been shown to be superior in this regard [ 100 , 101 , 104 , 105 ]. Overall, variability in reliability for both tools has been reported across items, between pairs of raters, and between centers [ 6 , 100 , 101 , 104 ]. The effects of appraiser experience on the results of AMSTAR-2 and ROBIS require further evaluation [ 101 , 105 ]. Updates to both tools should address items shown to be prone to individual appraisers’ subjective biases and opinions [ 11 , 100 ]; this may involve modifications of the current domains and signaling questions as well as incorporation of methods to make an appraiser’s judgments more explicit. Future revisions of these tools may also consider the addition of standards for aspects of systematic review development currently lacking (eg, rating overall certainty of evidence, [ 99 ] methods for synthesis without meta-analysis [ 105 ]) and removal of items that assess aspects of reporting that are thoroughly evaluated by PRISMA 2020.


A good understanding of what is required to satisfy the standards of AMSTAR-2 and ROBIS involves study of the accompanying guidance documents written by the tools’ developers; these contain detailed descriptions of each item’s standards. In addition, accurate appraisal of a systematic review with either tool requires training. Most experts recommend independent assessment by at least two appraisers with a process for resolving discrepancies as well as procedures to establish interrater reliability, such as pilot testing, a calibration phase or exercise, and development of predefined decision rules [ 35 , 99 – 101 , 103 , 104 , 106 ]. These methods may, to some extent, address the challenges associated with the diversity in methodological training, subject matter expertise, and experience using the tools that are likely to exist among appraisers.

The standards of AMSTAR, AMSTAR-2, and ROBIS have been used in many methodological studies and epidemiological investigations. However, the increased publication of overviews or umbrella reviews and CPGs has likely been a greater influence on the widening acceptance of these tools. Critical appraisal of the secondary studies considered evidence is essential to the trustworthiness of both the recommendations of CPGs and the conclusions of overviews. Currently both Cochrane [ 55 ] and JBI [ 107 ] recommend AMSTAR-2 and ROBIS in their guidance for authors of overviews or umbrella reviews. However, ROBIS and AMSTAR-2 were released in 2016 and 2017, respectively; thus, to date, limited data have been reported about the uptake of these tools or which of the two may be preferred [ 21 , 106 ]. Currently, in relation to CPGs, AMSTAR-2 appears to be overwhelmingly popular compared to ROBIS. A Google Scholar search of this topic (search terms “AMSTAR 2 AND clinical practice guidelines,” “ROBIS AND clinical practice guidelines” 13 May 2022) found 12,700 hits for AMSTAR-2 and 1,280 for ROBIS. The apparent greater appeal of AMSTAR-2 may relate to its longer track record given the original version of the tool was in use for 10 years prior to its update in 2017.

Barriers to the uptake of AMSTAR-2 and ROBIS include the real or perceived time and resources necessary to complete the items they include and appraisers’ confidence in their own ratings [ 104 ]. Reports from comparative studies available to date indicate that appraisers find AMSTAR-2 questions, responses, and guidance to be clearer and simpler compared with ROBIS [ 11 , 101 , 104 , 105 ]. This suggests that for appraisal of intervention systematic reviews, AMSTAR-2 may be a more practical tool than ROBIS, especially for novice appraisers [ 101 , 103 – 105 ]. The unique characteristics of each tool, as well as their potential advantages and disadvantages, should be taken into consideration when deciding which tool should be used for an appraisal of a systematic review. In addition, the choice of one or the other may depend on how the results of an appraisal will be used; for example, a peer reviewer’s appraisal of a single manuscript versus an appraisal of multiple systematic reviews in an overview or umbrella review, CPG, or systematic methodological study.

Authors of overviews and CPGs report results of AMSTAR-2 and ROBIS appraisals for each of the systematic reviews they include as evidence. Ideally, an independent judgment of their appraisals can be made by the end users of overviews and CPGs; however, most stakeholders, including clinicians, are unlikely to have a sophisticated understanding of these tools. Nevertheless, they should at least be aware that AMSTAR-2 and ROBIS ratings reported in overviews and CPGs may be inaccurate because the tools are not applied as intended by their developers. This can result from inadequate training of the overview or CPG authors who perform the appraisals, or to modifications of the appraisal tools imposed by them. The potential variability in overall confidence and RoB ratings highlights why appraisers applying these tools need to support their judgments with explicit documentation; this allows readers to judge for themselves whether they agree with the criteria used by appraisers [ 4 , 108 ]. When these judgments are explicit, the underlying rationale used when applying these tools can be assessed [ 109 ].

Theoretically, we would expect an association of AMSTAR-2 with improved methodological rigor and an association of ROBIS with lower RoB in recent systematic reviews compared to those published before 2017. To our knowledge, this has not yet been demonstrated; however, like reports about the actual uptake of these tools, time will tell. Additional data on user experience is also needed to further elucidate the practical challenges and methodological nuances encountered with the application of these tools. This information could potentially inform the creation of unifying criteria to guide and standardize the appraisal of evidence syntheses [ 109 ].

Evaluation of reporting

Complete reporting is essential for users to establish the trustworthiness and applicability of a systematic review’s findings. Efforts to standardize and improve the reporting of systematic reviews resulted in the 2009 publication of the PRISMA statement [ 92 ] with its accompanying explanation and elaboration document [ 110 ]. This guideline was designed to help authors prepare a complete and transparent report of their systematic review. In addition, adherence to PRISMA is often used to evaluate the thoroughness of reporting of published systematic reviews [ 111 ]. The updated version, PRISMA 2020 [ 93 ], and its guidance document [ 112 ] were published in 2021. Items on the original and updated versions of PRISMA are organized by the six basic review components they address (title, abstract, introduction, methods, results, discussion). The PRISMA 2020 update is a considerably expanded version of the original; it includes standards and examples for the 27 original and 13 additional reporting items that capture methodological advances and may enhance the replicability of reviews [ 113 ].

The original PRISMA statement fostered the development of various PRISMA extensions (Table 3.3 ). These include reporting guidance for scoping reviews and reviews of diagnostic test accuracy and for intervention reviews that report on the following: harms outcomes, equity issues, the effects of acupuncture, the results of network meta-analyses and analyses of individual participant data. Detailed reporting guidance for specific systematic review components (abstracts, protocols, literature searches) is also available.

PRISMA extensions

PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

a Note the abstract reporting checklist is now incorporated into PRISMA 2020 [ 93 ]

Uptake and impact

The 2009 PRISMA standards [ 92 ] for reporting have been widely endorsed by authors, journals, and EBM-related organizations. We anticipate the same for PRISMA 2020 [ 93 ] given its co-publication in multiple high-impact journals. However, to date, there is a lack of strong evidence for an association between improved systematic review reporting and endorsement of PRISMA 2009 standards [ 43 , 111 ]. Most journals require a PRISMA checklist accompany submissions of systematic review manuscripts. However, the accuracy of information presented on these self-reported checklists is not necessarily verified. It remains unclear which strategies (eg, authors’ self-report of checklists, peer reviewer checks) might improve adherence to the PRISMA reporting standards; in addition, the feasibility of any potentially effective strategies must be taken into consideration given the structure and limitations of current research and publication practices [ 124 ].

Pitfalls and limitations of PRISMA, AMSTAR-2, and ROBIS

Misunderstanding of the roles of these tools and their misapplication may be widespread problems. PRISMA 2020 is a reporting guideline that is most beneficial if consulted when developing a review as opposed to merely completing a checklist when submitting to a journal; at that point, the review is finished, with good or bad methodological choices. However, PRISMA checklists evaluate how completely an element of review conduct was reported, but do not evaluate the caliber of conduct or performance of a review. Thus, review authors and readers should not think that a rigorous systematic review can be produced by simply following the PRISMA 2020 guidelines. Similarly, it is important to recognize that AMSTAR-2 and ROBIS are tools to evaluate the conduct of a review but do not substitute for conceptual methodological guidance. In addition, they are not intended to be simple checklists. In fact, they have the potential for misuse or abuse if applied as such; for example, by calculating a total score to make a judgment about a review’s overall confidence or RoB. Proper selection of a response for the individual items on AMSTAR-2 and ROBIS requires training or at least reference to their accompanying guidance documents.

Not surprisingly, it has been shown that compliance with the PRISMA checklist is not necessarily associated with satisfying the standards of ROBIS [ 125 ]. AMSTAR-2 and ROBIS were not available when PRISMA 2009 was developed; however, they were considered in the development of PRISMA 2020 [ 113 ]. Therefore, future studies may show a positive relationship between fulfillment of PRISMA 2020 standards for reporting and meeting the standards of tools evaluating methodological quality and RoB.

Choice of an appropriate tool for the evaluation of a systematic review first involves identification of the underlying construct to be assessed. For systematic reviews of interventions, recommended tools include AMSTAR-2 and ROBIS for appraisal of conduct and PRISMA 2020 for completeness of reporting. All three tools were developed rigorously and provide easily accessible and detailed user guidance, which is necessary for their proper application and interpretation. When considering a manuscript for publication, training in these tools can sensitize peer reviewers and editors to major issues that may affect the review’s trustworthiness and completeness of reporting. Judgment of the overall certainty of a body of evidence and formulation of recommendations rely, in part, on AMSTAR-2 or ROBIS appraisals of systematic reviews. Therefore, training on the application of these tools is essential for authors of overviews and developers of CPGs. Peer reviewers and editors considering an overview or CPG for publication must hold their authors to a high standard of transparency regarding both the conduct and reporting of these appraisals.

Part 4. Meeting conduct standards

Many authors, peer reviewers, and editors erroneously equate fulfillment of the items on the PRISMA checklist with superior methodological rigor. For direction on methodology, we refer them to available resources that provide comprehensive conceptual guidance [ 59 , 60 ] as well as primers with basic step-by-step instructions [ 1 , 126 , 127 ]. This section is intended to complement study of such resources by facilitating use of AMSTAR-2 and ROBIS, tools specifically developed to evaluate methodological rigor of systematic reviews. These tools are widely accepted by methodologists; however, in the general medical literature, they are not uniformly selected for the critical appraisal of systematic reviews [ 88 , 96 ].

To enable their uptake, Table 4.1  links review components to the corresponding appraisal tool items. Expectations of AMSTAR-2 and ROBIS are concisely stated, and reasoning provided.

Systematic review components linked to appraisal with AMSTAR-2 and ROBIS a

CoI conflict of interest, MA meta-analysis, NA not addressed, PICO participant, intervention, comparison, outcome, PRISMA-P Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols, RoB risk of bias

a Components shown in bold are chosen for elaboration in Part 4 for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors; and/or 2) the component is evaluated by standards of an AMSTAR-2 “critical” domain

b Critical domains of AMSTAR-2 are indicated by *

Issues involved in meeting the standards for seven review components (identified in bold in Table 4.1 ) are addressed in detail. These were chosen for elaboration for one (or both) of two reasons: 1) the component has been identified as potentially problematic for systematic review authors based on consistent reports of their frequent AMSTAR-2 or ROBIS deficiencies [ 9 , 11 , 15 , 88 , 128 , 129 ]; and/or 2) the review component is judged by standards of an AMSTAR-2 “critical” domain. These have the greatest implications for how a systematic review will be appraised: if standards for any one of these critical domains are not met, the review is rated as having “critically low confidence.”

Research question

Specific and unambiguous research questions may have more value for reviews that deal with hypothesis testing. Mnemonics for the various elements of research questions are suggested by JBI and Cochrane (Table 2.1 ). These prompt authors to consider the specialized methods involved for developing different types of systematic reviews; however, while inclusion of the suggested elements makes a review compliant with a particular review’s methods, it does not necessarily make a research question appropriate. Table 4.2  lists acronyms that may aid in developing the research question. They include overlapping concepts of importance in this time of proliferating reviews of uncertain value [ 130 ]. If these issues are not prospectively contemplated, systematic review authors may establish an overly broad scope, or develop runaway scope allowing them to stray from predefined choices relating to key comparisons and outcomes.

Research question development

a Cummings SR, Browner WS, Hulley SB. Conceiving the research question and developing the study plan. In: Hulley SB, Cummings SR, Browner WS, editors. Designing clinical research: an epidemiological approach; 4th edn. Lippincott Williams & Wilkins; 2007. p. 14–22

b Doran, GT. There’s a S.M.A.R.T. way to write management’s goals and objectives. Manage Rev. 1981;70:35-6.

c Johnson BT, Hennessy EA. Systematic reviews and meta-analyses in the health sciences: best practice methods for research syntheses. Soc Sci Med. 2019;233:237–51

Once a research question is established, searching on registry sites and databases for existing systematic reviews addressing the same or a similar topic is necessary in order to avoid contributing to research waste [ 131 ]. Repeating an existing systematic review must be justified, for example, if previous reviews are out of date or methodologically flawed. A full discussion on replication of intervention systematic reviews, including a consensus checklist, can be found in the work of Tugwell and colleagues [ 84 ].

Protocol development is considered a core component of systematic reviews [ 125 , 126 , 132 ]. Review protocols may allow researchers to plan and anticipate potential issues, assess validity of methods, prevent arbitrary decision-making, and minimize bias that can be introduced by the conduct of the review. Registration of a protocol that allows public access promotes transparency of the systematic review’s methods and processes and reduces the potential for duplication [ 132 ]. Thinking early and carefully about all the steps of a systematic review is pragmatic and logical and may mitigate the influence of the authors’ prior knowledge of the evidence [ 133 ]. In addition, the protocol stage is when the scope of the review can be carefully considered by authors, reviewers, and editors; this may help to avoid production of overly ambitious reviews that include excessive numbers of comparisons and outcomes or are undisciplined in their study selection.

An association with attainment of AMSTAR standards in systematic reviews with published prospective protocols has been reported [ 134 ]. However, completeness of reporting does not seem to be different in reviews with a protocol compared to those without one [ 135 ]. PRISMA-P [ 116 ] and its accompanying elaboration and explanation document [ 136 ] can be used to guide and assess the reporting of protocols. A final version of the review should fully describe any protocol deviations. Peer reviewers may compare the submitted manuscript with any available pre-registered protocol; this is required if AMSTAR-2 or ROBIS are used for critical appraisal.

There are multiple options for the recording of protocols (Table 4.3 ). Some journals will peer review and publish protocols. In addition, many online sites offer date-stamped and publicly accessible protocol registration. Some of these are exclusively for protocols of evidence syntheses; others are less restrictive and offer researchers the capacity for data storage, sharing, and other workflow features. These sites document protocol details to varying extents and have different requirements [ 137 ]. The most popular site for systematic reviews, the International Prospective Register of Systematic Reviews (PROSPERO), for example, only registers reviews that report on an outcome with direct relevance to human health. The PROSPERO record documents protocols for all types of reviews except literature and scoping reviews. Of note, PROSPERO requires authors register their review protocols prior to any data extraction [ 133 , 138 ]. The electronic records of most of these registry sites allow authors to update their protocols and facilitate transparent tracking of protocol changes, which are not unexpected during the progress of the review [ 139 ].

Options for protocol registration of evidence syntheses

a Authors are advised to contact their target journal regarding submission of systematic review protocols

b Registration is restricted to approved review projects

c The JBI registry lists review projects currently underway by JBI-affiliated entities. These records include a review’s title, primary author, research question, and PICO elements. JBI recommends that authors register eligible protocols with PROSPERO

d See Pieper and Rombey [ 137 ] for detailed characteristics of these five registries

e See Pieper and Rombey [ 137 ] for other systematic review data repository options

Study design inclusion

For most systematic reviews, broad inclusion of study designs is recommended [ 126 ]. This may allow comparison of results between contrasting study design types [ 126 ]. Certain study designs may be considered preferable depending on the type of review and nature of the research question. However, prevailing stereotypes about what each study design does best may not be accurate. For example, in systematic reviews of interventions, randomized designs are typically thought to answer highly specific questions while non-randomized designs often are expected to reveal greater information about harms or real-word evidence [ 126 , 140 , 141 ]. This may be a false distinction; randomized trials may be pragmatic [ 142 ], they may offer important (and more unbiased) information on harms [ 143 ], and data from non-randomized trials may not necessarily be more real-world-oriented [ 144 ].

Moreover, there may not be any available evidence reported by RCTs for certain research questions; in some cases, there may not be any RCTs or NRSI. When the available evidence is limited to case reports and case series, it is not possible to test hypotheses nor provide descriptive estimates or associations; however, a systematic review of these studies can still offer important insights [ 81 , 145 ]. When authors anticipate that limited evidence of any kind may be available to inform their research questions, a scoping review can be considered. Alternatively, decisions regarding inclusion of indirect as opposed to direct evidence can be addressed during protocol development [ 146 ]. Including indirect evidence at an early stage of intervention systematic review development allows authors to decide if such studies offer any additional and/or different understanding of treatment effects for their population or comparison of interest. Issues of indirectness of included studies are accounted for later in the process, during determination of the overall certainty of evidence (see Part 5 for details).

Evidence search

Both AMSTAR-2 and ROBIS require systematic and comprehensive searches for evidence. This is essential for any systematic review. Both tools discourage search restrictions based on language and publication source. Given increasing globalism in health care, the practice of including English-only literature should be avoided [ 126 ]. There are many examples in which language bias (different results in studies published in different languages) has been documented [ 147 , 148 ]. This does not mean that all literature, in all languages, is equally trustworthy [ 148 ]; however, the only way to formally probe for the potential of such biases is to consider all languages in the initial search. The gray literature and a search of trials may also reveal important details about topics that would otherwise be missed [ 149 – 151 ]. Again, inclusiveness will allow review authors to investigate whether results differ in gray literature and trials [ 41 , 151 – 153 ].

Authors should make every attempt to complete their review within one year as that is the likely viable life of a search. (1) If that is not possible, the search should be updated close to the time of completion [ 154 ]. Different research topics may warrant less of a delay, for example, in rapidly changing fields (as in the case of the COVID-19 pandemic), even one month may radically change the available evidence.

Excluded studies

AMSTAR-2 requires authors to provide references for any studies excluded at the full text phase of study selection along with reasons for exclusion; this allows readers to feel confident that all relevant literature has been considered for inclusion and that exclusions are defensible.

Risk of bias assessment of included studies

The design of the studies included in a systematic review (eg, RCT, cohort, case series) should not be equated with appraisal of its RoB. To meet AMSTAR-2 and ROBIS standards, systematic review authors must examine RoB issues specific to the design of each primary study they include as evidence. It is unlikely that a single RoB appraisal tool will be suitable for all research designs. In addition to tools for randomized and non-randomized studies, specific tools are available for evaluation of RoB in case reports and case series [ 82 ] and single-case experimental designs [ 155 , 156 ]. Note the RoB tools selected must meet the standards of the appraisal tool used to judge the conduct of the review. For example, AMSTAR-2 identifies four sources of bias specific to RCTs and NRSI that must be addressed by the RoB tool(s) chosen by the review authors. The Cochrane RoB-2 [ 157 ] tool for RCTs and ROBINS-I [ 158 ] for NRSI for RoB assessment meet the AMSTAR-2 standards. Appraisers on the review team should not modify any RoB tool without complete transparency and acknowledgment that they have invalidated the interpretation of the tool as intended by its developers [ 159 ]. Conduct of RoB assessments is not addressed AMSTAR-2; to meet ROBIS standards, two independent reviewers should complete RoB assessments of included primary studies.

Implications of the RoB assessments must be explicitly discussed and considered in the conclusions of the review. Discussion of the overall RoB of included studies may consider the weight of the studies at high RoB, the importance of the sources of bias in the studies being summarized, and if their importance differs in relationship to the outcomes reported. If a meta-analysis is performed, serious concerns for RoB of individual studies should be accounted for in these results as well. If the results of the meta-analysis for a specific outcome change when studies at high RoB are excluded, readers will have a more accurate understanding of this body of evidence. However, while investigating the potential impact of specific biases is a useful exercise, it is important to avoid over-interpretation, especially when there are sparse data.

Synthesis methods for quantitative data

Syntheses of quantitative data reported by primary studies are broadly categorized as one of two types: meta-analysis, and synthesis without meta-analysis (Table 4.4 ). Before deciding on one of these methods, authors should seek methodological advice about whether reported data can be transformed or used in other ways to provide a consistent effect measure across studies [ 160 , 161 ].

Common methods for quantitative synthesis

CI confidence interval (or credible interval, if analysis is done in Bayesian framework)

a See text for descriptions of the types of data combined in each of these approaches

b See Additional File 4  for guidance on the structure and presentation of forest plots

c General approach is similar to aggregate data meta-analysis but there are substantial differences relating to data collection and checking and analysis [ 162 ]. This approach to syntheses is applicable to intervention, diagnostic, and prognostic systematic reviews [ 163 ]

d Examples include meta-regression, hierarchical and multivariate approaches [ 164 ]

e In-depth guidance and illustrations of these methods are provided in Chapter 12 of the Cochrane Handbook [ 160 ]


Systematic reviews that employ meta-analysis should not be referred to simply as “meta-analyses.” The term meta-analysis strictly refers to a specific statistical technique used when study effect estimates and their variances are available, yielding a quantitative summary of results. In general, methods for meta-analysis involve use of a weighted average of effect estimates from two or more studies. If considered carefully, meta-analysis increases the precision of the estimated magnitude of effect and can offer useful insights about heterogeneity and estimates of effects. We refer to standard references for a thorough introduction and formal training [ 165 – 167 ].

There are three common approaches to meta-analysis in current health care–related systematic reviews (Table 4.4 ). Aggregate meta-analyses is the most familiar to authors of evidence syntheses and their end users. This standard meta-analysis combines data on effect estimates reported by studies that investigate similar research questions involving direct comparisons of an intervention and comparator. Results of these analyses provide a single summary intervention effect estimate. If the included studies in a systematic review measure an outcome differently, their reported results may be transformed to make them comparable [ 161 ]. Forest plots visually present essential information about the individual studies and the overall pooled analysis (see Additional File 4  for details).

Less familiar and more challenging meta-analytical approaches used in secondary research include individual participant data (IPD) and network meta-analyses (NMA); PRISMA extensions provide reporting guidelines for both [ 117 , 118 ]. In IPD, the raw data on each participant from each eligible study are re-analyzed as opposed to the study-level data analyzed in aggregate data meta-analyses [ 168 ]. This may offer advantages, including the potential for limiting concerns about bias and allowing more robust analyses [ 163 ]. As suggested by the description in Table 4.4 , NMA is a complex statistical approach. It combines aggregate data [ 169 ] or IPD [ 170 ] for effect estimates from direct and indirect comparisons reported in two or more studies of three or more interventions. This makes it a potentially powerful statistical tool; while multiple interventions are typically available to treat a condition, few have been evaluated in head-to-head trials [ 171 ]. Both IPD and NMA facilitate a broader scope, and potentially provide more reliable and/or detailed results; however, compared with standard aggregate data meta-analyses, their methods are more complicated, time-consuming, and resource-intensive, and they have their own biases, so one needs sufficient funding, technical expertise, and preparation to employ them successfully [ 41 , 172 , 173 ].

Several items in AMSTAR-2 and ROBIS address meta-analysis; thus, understanding the strengths, weaknesses, assumptions, and limitations of methods for meta-analyses is important. According to the standards of both tools, plans for a meta-analysis must be addressed in the review protocol, including reasoning, description of the type of quantitative data to be synthesized, and the methods planned for combining the data. This should not consist of stock statements describing conventional meta-analysis techniques; rather, authors are expected to anticipate issues specific to their research questions. Concern for the lack of training in meta-analysis methods among systematic review authors cannot be overstated. For those with training, the use of popular software (eg, RevMan [ 174 ], MetaXL [ 175 ], JBI SUMARI [ 176 ]) may facilitate exploration of these methods; however, such programs cannot substitute for the accurate interpretation of the results of meta-analyses, especially for more complex meta-analytical approaches.

Synthesis without meta-analysis

There are varied reasons a meta-analysis may not be appropriate or desirable [ 160 , 161 ]. Syntheses that informally use statistical methods other than meta-analysis are variably referred to as descriptive, narrative, or qualitative syntheses or summaries; these terms are also applied to syntheses that make no attempt to statistically combine data from individual studies. However, use of such imprecise terminology is discouraged; in order to fully explore the results of any type of synthesis, some narration or description is needed to supplement the data visually presented in tabular or graphic forms [ 63 , 177 ]. In addition, the term “qualitative synthesis” is easily confused with a synthesis of qualitative data in a qualitative or mixed methods review. “Synthesis without meta-analysis” is currently the preferred description of other ways to combine quantitative data from two or more studies. Use of this specific terminology when referring to these types of syntheses also implies the application of formal methods (Table 4.4 ).

Methods for syntheses without meta-analysis involve structured presentations of the data in any tables and plots. In comparison to narrative descriptions of each study, these are designed to more effectively and transparently show patterns and convey detailed information about the data; they also allow informal exploration of heterogeneity [ 178 ]. In addition, acceptable quantitative statistical methods (Table 4.4 ) are formally applied; however, it is important to recognize these methods have significant limitations for the interpretation of the effectiveness of an intervention [ 160 ]. Nevertheless, when meta-analysis is not possible, the application of these methods is less prone to bias compared with an unstructured narrative description of included studies [ 178 , 179 ].

Vote counting is commonly used in systematic reviews and involves a tally of studies reporting results that meet some threshold of importance applied by review authors. Until recently, it has not typically been identified as a method for synthesis without meta-analysis. Guidance on an acceptable vote counting method based on direction of effect is currently available [ 160 ] and should be used instead of narrative descriptions of such results (eg, “more than half the studies showed improvement”; “only a few studies reported adverse effects”; “7 out of 10 studies favored the intervention”). Unacceptable methods include vote counting by statistical significance or magnitude of effect or some subjective rule applied by the authors.

AMSTAR-2 and ROBIS standards do not explicitly address conduct of syntheses without meta-analysis, although AMSTAR-2 items 13 and 14 might be considered relevant. Guidance for the complete reporting of syntheses without meta-analysis for systematic reviews of interventions is available in the Synthesis without Meta-analysis (SWiM) guideline [ 180 ] and methodological guidance is available in the Cochrane Handbook [ 160 , 181 ].

Familiarity with AMSTAR-2 and ROBIS makes sense for authors of systematic reviews as these appraisal tools will be used to judge their work; however, training is necessary for authors to truly appreciate and apply methodological rigor. Moreover, judgment of the potential contribution of a systematic review to the current knowledge base goes beyond meeting the standards of AMSTAR-2 and ROBIS. These tools do not explicitly address some crucial concepts involved in the development of a systematic review; this further emphasizes the need for author training.

We recommend that systematic review authors incorporate specific practices or exercises when formulating a research question at the protocol stage, These should be designed to raise the review team’s awareness of how to prevent research and resource waste [ 84 , 130 ] and to stimulate careful contemplation of the scope of the review [ 30 ]. Authors’ training should also focus on justifiably choosing a formal method for the synthesis of quantitative and/or qualitative data from primary research; both types of data require specific expertise. For typical reviews that involve syntheses of quantitative data, statistical expertise is necessary, initially for decisions about appropriate methods, [ 160 , 161 ] and then to inform any meta-analyses [ 167 ] or other statistical methods applied [ 160 ].

Part 5. Rating overall certainty of evidence

Report of an overall certainty of evidence assessment in a systematic review is an important new reporting standard of the updated PRISMA 2020 guidelines [ 93 ]. Systematic review authors are well acquainted with assessing RoB in individual primary studies, but much less familiar with assessment of overall certainty across an entire body of evidence. Yet a reliable way to evaluate this broader concept is now recognized as a vital part of interpreting the evidence.

Historical systems for rating evidence are based on study design and usually involve hierarchical levels or classes of evidence that use numbers and/or letters to designate the level/class. These systems were endorsed by various EBM-related organizations. Professional societies and regulatory groups then widely adopted them, often with modifications for application to the available primary research base in specific clinical areas. In 2002, a report issued by the AHRQ identified 40 systems to rate quality of a body of evidence [ 182 ]. A critical appraisal of systems used by prominent health care organizations published in 2004 revealed limitations in sensibility, reproducibility, applicability to different questions, and usability to different end users [ 183 ]. Persistent use of hierarchical rating schemes to describe overall quality continues to complicate the interpretation of evidence. This is indicated by recent reports of poor interpretability of systematic review results by readers [ 184 – 186 ] and misleading interpretations of the evidence related to the “spin” systematic review authors may put on their conclusions [ 50 , 187 ].

Recognition of the shortcomings of hierarchical rating systems raised concerns that misleading clinical recommendations could result even if based on a rigorous systematic review. In addition, the number and variability of these systems were considered obstacles to quick and accurate interpretations of the evidence by clinicians, patients, and policymakers [ 183 ]. These issues contributed to the development of the GRADE approach. An international working group, that continues to actively evaluate and refine it, first introduced GRADE in 2004 [ 188 ]. Currently more than 110 organizations from 19 countries around the world have endorsed or are using GRADE [ 189 ].

GRADE approach to rating overall certainty

GRADE offers a consistent and sensible approach for two separate processes: rating the overall certainty of a body of evidence and the strength of recommendations. The former is the expected conclusion of a systematic review, while the latter is pertinent to the development of CPGs. As such, GRADE provides a mechanism to bridge the gap from evidence synthesis to application of the evidence for informed clinical decision-making [ 27 , 190 ]. We briefly examine the GRADE approach but only as it applies to rating overall certainty of evidence in systematic reviews.

In GRADE, use of “certainty” of a body of evidence is preferred over the term “quality.” [ 191 ] Certainty refers to the level of confidence systematic review authors have that, for each outcome, an effect estimate represents the true effect. The GRADE approach to rating confidence in estimates begins with identifying the study type (RCT or NRSI) and then systematically considers criteria to rate the certainty of evidence up or down (Table 5.1 ).

GRADE criteria for rating certainty of evidence

a Applies to randomized studies

b Applies to non-randomized studies

This process results in assignment of one of the four GRADE certainty ratings to each outcome; these are clearly conveyed with the use of basic interpretation symbols (Table 5.2 ) [ 192 ]. Notably, when multiple outcomes are reported in a systematic review, each outcome is assigned a unique certainty rating; thus different levels of certainty may exist in the body of evidence being examined.

GRADE certainty ratings and their interpretation symbols a

a From the GRADE Handbook [ 192 ]

GRADE’s developers acknowledge some subjectivity is involved in this process [ 193 ]. In addition, they emphasize that both the criteria for rating evidence up and down (Table 5.1 ) as well as the four overall certainty ratings (Table 5.2 ) reflect a continuum as opposed to discrete categories [ 194 ]. Consequently, deciding whether a study falls above or below the threshold for rating up or down may not be straightforward, and preliminary overall certainty ratings may be intermediate (eg, between low and moderate). Thus, the proper application of GRADE requires systematic review authors to take an overall view of the body of evidence and explicitly describe the rationale for their final ratings.

Advantages of GRADE

Outcomes important to the individuals who experience the problem of interest maintain a prominent role throughout the GRADE process [ 191 ]. These outcomes must inform the research questions (eg, PICO [population, intervention, comparator, outcome]) that are specified a priori in a systematic review protocol. Evidence for these outcomes is then investigated and each critical or important outcome is ultimately assigned a certainty of evidence as the end point of the review. Notably, limitations of the included studies have an impact at the outcome level. Ultimately, the certainty ratings for each outcome reported in a systematic review are considered by guideline panels. They use a different process to formulate recommendations that involves assessment of the evidence across outcomes [ 201 ]. It is beyond our scope to describe the GRADE process for formulating recommendations; however, it is critical to understand how these two outcome-centric concepts of certainty of evidence in the GRADE framework are related and distinguished. An in-depth illustration using examples from recently published evidence syntheses and CPGs is provided in Additional File 5 A (Table AF5A-1).

The GRADE approach is applicable irrespective of whether the certainty of the primary research evidence is high or very low; in some circumstances, indirect evidence of higher certainty may be considered if direct evidence is unavailable or of low certainty [ 27 ]. In fact, most interventions and outcomes in medicine have low or very low certainty of evidence based on GRADE and there seems to be no major improvement over time [ 202 , 203 ]. This is still a very important (even if sobering) realization for calibrating our understanding of medical evidence. A major appeal of the GRADE approach is that it offers a common framework that enables authors of evidence syntheses to make complex judgments about evidence certainty and to convey these with unambiguous terminology. This prevents some common mistakes made by review authors, including overstating results (or under-reporting harms) [ 187 ] and making recommendations for treatment. This is illustrated in Table AF5A-2 (Additional File 5 A), which compares the concluding statements made about overall certainty in a systematic review with and without application of the GRADE approach.

Theoretically, application of GRADE should improve consistency of judgments about certainty of evidence, both between authors and across systematic reviews. In one empirical evaluation conducted by the GRADE Working Group, interrater reliability of two individual raters assessing certainty of the evidence for a specific outcome increased from ~ 0.3 without using GRADE to ~ 0.7 by using GRADE [ 204 ]. However, others report variable agreement among those experienced in GRADE assessments of evidence certainty [ 190 ]. Like any other tool, GRADE requires training in order to be properly applied. The intricacies of the GRADE approach and the necessary subjectivity involved suggest that improving agreement may require strict rules for its application; alternatively, use of general guidance and consensus among review authors may result in less consistency but provide important information for the end user [ 190 ].

GRADE caveats

Simply invoking “the GRADE approach” does not automatically ensure GRADE methods were employed by authors of a systematic review (or developers of a CPG). Table 5.3 lists the criteria the GRADE working group has established for this purpose. These criteria highlight the specific terminology and methods that apply to rating the certainty of evidence for outcomes reported in a systematic review [ 191 ], which is different from rating overall certainty across outcomes considered in the formulation of recommendations [ 205 ]. Modifications of standard GRADE methods and terminology are discouraged as these may detract from GRADE’s objectives to minimize conceptual confusion and maximize clear communication [ 206 ].

Criteria for using GRADE in a systematic review a

a Adapted from the GRADE working group [ 206 ]; this list does not contain the additional criteria that apply to the development of a clinical practice guideline

Nevertheless, GRADE is prone to misapplications [ 207 , 208 ], which can distort a systematic review’s conclusions about the certainty of evidence. Systematic review authors without proper GRADE training are likely to misinterpret the terms “quality” and “grade” and to misunderstand the constructs assessed by GRADE versus other appraisal tools. For example, review authors may reference the standard GRADE certainty ratings (Table 5.2 ) to describe evidence for their outcome(s) of interest. However, these ratings are invalidated if authors omit or inadequately perform RoB evaluations of each included primary study. Such deficiencies in RoB assessments are unacceptable but not uncommon, as reported in methodological studies of systematic reviews and overviews [ 104 , 186 , 209 , 210 ]. GRADE ratings are also invalidated if review authors do not formally address and report on the other criteria (Table 5.1 ) necessary for a GRADE certainty rating.

Other caveats pertain to application of a GRADE certainty of evidence rating in various types of evidence syntheses. Current adaptations of GRADE are described in Additional File 5 B and included on Table 6.3 , which is introduced in the next section.

Concise Guide to best practices for evidence syntheses, version 1.0 a

AMSTAR A MeaSurement Tool to Assess Systematic Reviews, CASP Critical Appraisal Skills Programme, CERQual Confidence in the Evidence from Reviews of Qualitative research, ConQual Establishing Confidence in the output of Qualitative research synthesis, COSMIN COnsensus-based Standards for the selection of health Measurement Instruments, DTA diagnostic test accuracy, eMERGe meta-ethnography reporting guidance, ENTREQ enhancing transparency in reporting the synthesis of qualitative research, GRADE Grading of Recommendations Assessment, Development and Evaluation, MA meta-analysis, NRSI non-randomized studies of interventions, P protocol, PRIOR Preferred Reporting Items for Overviews of Reviews, PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, PROBAST Prediction model Risk Of Bias ASsessment Tool, QUADAS quality assessment of studies of diagnostic accuracy included in systematic reviews, QUIPS Quality In Prognosis Studies, RCT randomized controlled trial, RoB risk of bias, ROBINS-I Risk Of Bias In Non-randomised Studies of Interventions, ROBIS Risk of Bias in Systematic Reviews, ScR scoping review, SWiM systematic review without meta-analysis

a Superscript numbers represent citations provided in the main reference list. Additional File 6 lists links to available online resources for the methods and tools included in the Concise Guide

b The MECIR manual [ 30 ] provides Cochrane’s specific standards for both reporting and conduct of intervention systematic reviews and protocols

c Editorial and peer reviewers can evaluate completeness of reporting in submitted manuscripts using these tools. Authors may be required to submit a self-reported checklist for the applicable tools

d The decision flowchart described by Flemming and colleagues [ 223 ] is recommended for guidance on how to choose the best approach to reporting for qualitative reviews

e SWiM was developed for intervention studies reporting quantitative data. However, if there is not a more directly relevant reporting guideline, SWiM may prompt reviewers to consider the important details to report. (Personal Communication via email, Mhairi Campbell, 14 Dec 2022)

f JBI recommends their own tools for the critical appraisal of various quantitative primary study designs included in systematic reviews of intervention effectiveness, prevalence and incidence, and etiology and risk as well as for the critical appraisal of systematic reviews included in umbrella reviews. However, except for the JBI Checklists for studies reporting prevalence data and qualitative research, the development, validity, and reliability of these tools are not well documented

g Studies that are not RCTs or NRSI require tools developed specifically to evaluate their design features. Examples include single case experimental design [ 155 , 156 ] and case reports and series [ 82 ]

h The evaluation of methodological quality of studies included in a synthesis of qualitative research is debatable [ 224 ]. Authors may select a tool appropriate for the type of qualitative synthesis methodology employed. The CASP Qualitative Checklist [ 218 ] is an example of a published, commonly used tool that focuses on assessment of the methodological strengths and limitations of qualitative studies. The JBI Critical Appraisal Checklist for Qualitative Research [ 219 ] is recommended for reviews using a meta-aggregative approach

i Consider including risk of bias assessment of included studies if this information is relevant to the research question; however, scoping reviews do not include an assessment of the overall certainty of a body of evidence

j Guidance available from the GRADE working group [ 225 , 226 ]; also recommend consultation with the Cochrane diagnostic methods group

k Guidance available from the GRADE working group [ 227 ]; also recommend consultation with Cochrane prognostic methods group

l Used for syntheses in reviews with a meta-aggregative approach [ 224 ]

m Chapter 5 in the JBI Manual offers guidance on how to adapt GRADE to prevalence and incidence reviews [ 69 ]

n Janiaud and colleagues suggest criteria for evaluating evidence certainty for meta-analyses of non-randomized studies evaluating risk factors [ 228 ]

o The COSMIN user manual provides details on how to apply GRADE in systematic reviews of measurement properties [ 229 ]

The expected culmination of a systematic review should be a rating of overall certainty of a body of evidence for each outcome reported. The GRADE approach is recommended for making these judgments for outcomes reported in systematic reviews of interventions and can be adapted for other types of reviews. This represents the initial step in the process of making recommendations based on evidence syntheses. Peer reviewers should ensure authors meet the minimal criteria for supporting the GRADE approach when reviewing any evidence synthesis that reports certainty ratings derived using GRADE. Authors and peer reviewers of evidence syntheses unfamiliar with GRADE are encouraged to seek formal training and take advantage of the resources available on the GRADE website [ 211 , 212 ].

Part 6. Concise Guide to best practices

Accumulating data in recent years suggest that many evidence syntheses (with or without meta-analysis) are not reliable. This relates in part to the fact that their authors, who are often clinicians, can be overwhelmed by the plethora of ways to evaluate evidence. They tend to resort to familiar but often inadequate, inappropriate, or obsolete methods and tools and, as a result, produce unreliable reviews. These manuscripts may not be recognized as such by peer reviewers and journal editors who may disregard current standards. When such a systematic review is published or included in a CPG, clinicians and stakeholders tend to believe that it is trustworthy. A vicious cycle in which inadequate methodology is rewarded and potentially misleading conclusions are accepted is thus supported. There is no quick or easy way to break this cycle; however, increasing awareness of best practices among all these stakeholder groups, who often have minimal (if any) training in methodology, may begin to mitigate it. This is the rationale for inclusion of Parts 2 through 5 in this guidance document. These sections present core concepts and important methodological developments that inform current standards and recommendations. We conclude by taking a direct and practical approach.

Inconsistent and imprecise terminology used in the context of development and evaluation of evidence syntheses is problematic for authors, peer reviewers and editors, and may lead to the application of inappropriate methods and tools. In response, we endorse use of the basic terms (Table 6.1 ) defined in the PRISMA 2020 statement [ 93 ]. In addition, we have identified several problematic expressions and nomenclature. In Table 6.2 , we compile suggestions for preferred terms less likely to be misinterpreted.

Terms relevant to the reporting of health care–related evidence syntheses a

a Reproduced from Page and colleagues [ 93 ]

Terminology suggestions for health care–related evidence syntheses

a For example, meta-aggregation, meta-ethnography, critical interpretative synthesis, realist synthesis

b This term may best apply to the synthesis in a mixed methods systematic review in which data from different types of evidence (eg, qualitative, quantitative, economic) are summarized [ 64 ]

We also propose a Concise Guide (Table 6.3 ) that summarizes the methods and tools recommended for the development and evaluation of nine types of evidence syntheses. Suggestions for specific tools are based on the rigor of their development as well as the availability of detailed guidance from their developers to ensure their proper application. The formatting of the Concise Guide addresses a well-known source of confusion by clearly distinguishing the underlying methodological constructs that these tools were designed to assess. Important clarifications and explanations follow in the guide’s footnotes; associated websites, if available, are listed in Additional File 6 .

To encourage uptake of best practices, journal editors may consider adopting or adapting the Concise Guide in their instructions to authors and peer reviewers of evidence syntheses. Given the evolving nature of evidence synthesis methodology, the suggested methods and tools are likely to require regular updates. Authors of evidence syntheses should monitor the literature to ensure they are employing current methods and tools. Some types of evidence syntheses (eg, rapid, economic, methodological) are not included in the Concise Guide; for these, authors are advised to obtain recommendations for acceptable methods by consulting with their target journal.

We encourage the appropriate and informed use of the methods and tools discussed throughout this commentary and summarized in the Concise Guide (Table 6.3 ). However, we caution against their application in a perfunctory or superficial fashion. This is a common pitfall among authors of evidence syntheses, especially as the standards of such tools become associated with acceptance of a manuscript by a journal. Consequently, published evidence syntheses may show improved adherence to the requirements of these tools without necessarily making genuine improvements in their performance.

In line with our main objective, the suggested tools in the Concise Guide address the reliability of evidence syntheses; however, we recognize that the utility of systematic reviews is an equally important concern. An unbiased and thoroughly reported evidence synthesis may still not be highly informative if the evidence itself that is summarized is sparse, weak and/or biased [ 24 ]. Many intervention systematic reviews, including those developed by Cochrane [ 203 ] and those applying GRADE [ 202 ], ultimately find no evidence, or find the evidence to be inconclusive (eg, “weak,” “mixed,” or of “low certainty”). This often reflects the primary research base; however, it is important to know what is known (or not known) about a topic when considering an intervention for patients and discussing treatment options with them.

Alternatively, the frequency of “empty” and inconclusive reviews published in the medical literature may relate to limitations of conventional methods that focus on hypothesis testing; these have emphasized the importance of statistical significance in primary research and effect sizes from aggregate meta-analyses [ 183 ]. It is becoming increasingly apparent that this approach may not be appropriate for all topics [ 130 ]. Development of the GRADE approach has facilitated a better understanding of significant factors (beyond effect size) that contribute to the overall certainty of evidence. Other notable responses include the development of integrative synthesis methods for the evaluation of complex interventions [ 230 , 231 ], the incorporation of crowdsourcing and machine learning into systematic review workflows (eg the Cochrane Evidence Pipeline) [ 2 ], the shift in paradigm to living systemic review and NMA platforms [ 232 , 233 ] and the proposal of a new evidence ecosystem that fosters bidirectional collaborations and interactions among a global network of evidence synthesis stakeholders [ 234 ]. These evolutions in data sources and methods may ultimately make evidence syntheses more streamlined, less duplicative, and more importantly, they may be more useful for timely policy and clinical decision-making; however, that will only be the case if they are rigorously reported and conducted.

We look forward to others’ ideas and proposals for the advancement of methods for evidence syntheses. For now, we encourage dissemination and uptake of the currently accepted best tools and practices for their development and evaluation; at the same time, we stress that uptake of appraisal tools, checklists, and software programs cannot substitute for proper education in the methodology of evidence syntheses and meta-analysis. Authors, peer reviewers, and editors must strive to make accurate and reliable contributions to the present evidence knowledge base; online alerts, upcoming technology, and accessible education may make this more feasible than ever before. Our intention is to improve the trustworthiness of evidence syntheses across disciplines, topics, and types of evidence syntheses. All of us must continue to study, teach, and act cooperatively for that to happen.


Michelle Oakman Hayes for her assistance with the graphics, Mike Clarke for his willingness to answer our seemingly arbitrary questions, and Bernard Dan for his encouragement of this project.

Authors’ contributions

All authors participated in the development of the ideas, writing, and review of this manuscript. The author(s) read and approved the final manuscript.

The work of John Ioannidis has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.


The authors declare no competing interests.

This article has been published simultaneously in BMC Systematic Reviews, Acta Anaesthesiologica Scandinavica, BMC Infectious Diseases, British Journal of Pharmacology, JBI Evidence Synthesis, the Journal of Bone and Joint Surgery Reviews , and the Journal of Pediatric Rehabilitation Medicine .

Publisher’ s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Library Guides

HK 710: Meta-Analysis

  • Databases for finding articles

Search Guides

Databases to find articles, meta-analysis examples, literature searches and prisma.

PICO Method

Head of Research and Instruction & Associate Professor

Profile Photo

Evidence Based Practice

  • Evidence Based Medicine Worksheet

Types of Reviews

There are several different kinds of articles frequently found in the literature for medical and health sciences. Here are the best:

  • Meta-analysis (PLATINUM STANDARD!)   - A method of synthesizing the data from more than one study, in order to produce a summary statistic. 
  • Systematic review – An approach that involves capturing and assessing the evidence by some systematic method, where all the components of the approach and the assessment are made explicit and documented.  Some systematic reviews include a meta-analysis (see above).
  • Scoping review - A type of knowledge synthesis that uses a systematic and iterative approach to identify and synthesize an existing or emerging body of literature on a given topic. The key differences between systematic reviews and scoping reviews are differing purposes and aims. The purpose of a scoping review is to map the body of literature on a topic area. The purpose of a systematic review is to sum up the best available research on a specific question.
  • Randomized Controlled Trial  – An experimental study in which users are randomly allocated to one of two or more options, where some get the option of interest and others get another option (e.g. a standard service).

Content Contributor: Booth A & Brice A (2004) Evidence Based Practice for Information Professionals: A Handbook . London: Facet Publishing.

Citation Help Videos Using EndNote

  • CINAHL Plus with Full Text This link opens in a new window CINAHL Plus with Full Text is the world's most comprehensive source for nursing and allied health journals, providing full text for more than 770 journals indexed in CINAHL.
  • MEDLINE (via Ebsco) This link opens in a new window National Library of Medicine's premier database in the life sciences, with a concentration in biomedicine.
  • SPORTDiscus This link opens in a new window Comprehensive source of full text for sports & sports medicine journals
  • PubMed This link opens in a new window The PubMed database contains more than 34 million citations and abstracts from the biomedical literature. PubMed facilitates searching across three National Library of Medicine resources: MEDLINE (includes Medical Subject Headings: MeSH), PubMed Central, and Bookshelf.
  • PubMed Clinical Queries This tool uses predefined filters to help you quickly refine PubMed searches on clinical or disease-specific topics. These specialized searches for include: COVID-19 Articles, Clinical Study Categories, and Medical Genetics
  • Embase This link opens in a new window Embase is a comprehensive biomedical database that focuses on drugs and pharmacology, medical devices, clinical medicine, and basic science relevant to clinical medicine.
  • Cochrane Library This link opens in a new window Includes reliable evidence from Cochrane and other systematic reviews, clinical trials, and results of the worlds best medical research studies. The Cochrane Reviews are provided in full text.
  • Scopus This link opens in a new window Comprehensive abstract and citation database containing both peer-reviewed research literature and quality web sources. With over 18,000 titles from more than 5,000 international publishers, Scopus supports research needs in the scientific, technical, medical and social sciences fields as well as in the arts and humanities. Scopus does not test or support Mac/IoS Devices. Scopus recommends IE 8 or higher, Chrome or Firefox.
  • Google Scholar This link opens in a new window Provides a search of scholarly literature across many disciplines and sources, including articles, theses, books, and abstracts.

  • Meta-Analysis Example Paper
  • Systematic Reviews Library Guide

When conducting literature searches, you should know that the average time to complete is about 44 hours!

Unc health sciences library guide provided the following helpful information:, when you design a search strategy to find all of the articles related to your research question. you will:.

  • Define the main concepts of your topic
  • Choose which databases you want to search
  • List terms to describe each concept
  • Add terms from controlled vocabulary like MeSH
  • Use field tags to tell the database where to search for terms
  • Combine terms and concepts with Boolean operators AND and OR
  • Translate your search strategy to match the format standards for each database
  • Save a copy of your search strategy and details about your search

Step 1: Preparation  To complete the the PRISMA diagram, save a copy of the diagram to use alongside your searches. It can be downloaded from  the PRISMA website . 

Step 2: Doing the Database Search  Run the search for each database individually, including ALL your search terms, any MeSH or other subject headings, truncation (like hemipleg * ), and/or wildcards (like sul ? ur). Apply all your limits (such as years of search, English language only, and so on). Once all search terms have been combined and you have applied all relevant limits, you should have a final number of records or articles for each database. Enter this information in the top left box of the PRISMA flow chart. You should add the total number of combined results from all databases (including duplicates) after the equal sign where it says  Databases (n=) . Many researchers also add notations in the box for the number of results from each database search, for example, Pubmed (n=335), Embase (n= 600), and so on.  If you search trial registers, such as ,  CENTRAL ,  ICTRP , or others, you should enter that number after the equal sign in  Registers (n=) .

NOTE:  Some citation managers automatically remove duplicates with each file you import.  Be sure to capture the number of articles from your database searches before any duplicates are removed.

Records identified from databases or registers

Step 3: Remove All Duplicates  To avoid reviewing duplicate articles, you need to remove any articles that appear more than once in your results. You may want to export the entire list of articles from each database to a citation manager such as EndNote, Zotero, or Mendeley (including both citation and abstract in your file) and remove the duplicates there. If you are using Covidence for your review, you should also add the duplicate articles identified in Covidence to the citation manager number.  Enter the number of records removed as duplicates in the second box on your PRISMA template.  If you are using automation tools to help evaluate the relevance of citations in your results, you would also enter that number here.

Records removed before screening: duplicates, automation tool exclusions, or other reasons

NOTE: If you are using Covidence to screen your articles , you can copy the numbers from the PRISMA diagram in your Covidence review into the boxes mentioned below.  Covidence does not include the number of results from each database, so you will need to keep track of that  number yourself.

Step 4: Records Screened- Title/Abstract Screening  The next step is to add the number of articles that you will screen. This should be the number of records identified minus the number from the duplicates removed box.

Number of records screened in Title/Abstract level

Step 5: Records Excluded- Title/Abstract Screening  You will need to screen the titles and abstracts for articles which are relevant to your research question. Any articles that appear to help you provide an answer to your research question should be included. Record the number of articles excluded through title/abstract screening in the box to the right titled "Records excluded."  You can optionally add exclusion reasons at this level, but they are not required until full text screening.

Records excluded after title & abstract screening

Step 6: Reports Sought for Retrieval  This is the number of articles you obtain in preparation for full text screening.  Subtract the number of excluded records (Step 5) from the total number screened (Step 4) and this will be your number sought for retrieval.

Reports sought for retrieval

Step 7: Reports Not Retrieved  List the number of articles for which you are unable to find the full text.  Remember to use Find@UNC and  Interlibrary Loan  to request articles to see if we can order them from other libraries before automatically excluding them.

Reports not retrived

Step 8: Reports Assessed for Eligibility- Full Text Screening   This should be the number of reports sought for retrieval (Step 6) minus the number of reports not retrieved (Step 7). Review the full text for these articles to assess their eligibility for inclusion in your systematic review. 

Reports assessed for eligibility

Step 9: Reports Excluded  After reviewing all articles in the full-text screening stage for eligibility, enter the total number of articles you exclude in the box titled "Reports excluded," and then list your reasons for excluding the articles as well as the number of records excluded for each reason.  Examples include wrong setting, wrong patient population, wrong intervention, wrong dosage, etc.  You should only count an excluded article once in your list even if if meets multiple exclusion criteria.

Reports excluded, including reason for exclusion and number

Step 10: Included Studies  The final step is to subtract the number of records excluded during the eligibility review of full-texts (Step 9) from the total number of articles reviewed for eligibility (Step 8). Enter this number in the box labeled "Studies included in review," combining numbers with your grey literature search results in this box if needed.  You have now completed your PRISMA flow diagram, unless you have also performed searches in non-database sources.

Studies included in review

If you have also searched additional sources, such as professional organization websites, cited or citing references, etc., complete the additional steps listed in the following box  Documenting Your Grey Literature Search .

When designing and conducting literature searches, a librarian can help advise you on:

  • How to create a search strategy with Boolean operators, database-specific syntax, subject headings, and appropriate keywords 
  • How to apply previously published systematic review search strategies to your current search
  • How to translate a search strategy from one database's preferred structure to another


systematic review pico tools

P: Patient, Problem or Population  – What are the most important characteristics of the patient and their health status?

I: Intervention  – What main intervention are you considering (medical, surgical, preventative)

C: Comparison  – What are the alternative benchmark or goldmark standards being considered, if any?

O: Outcome  – What is the estimated likelihood of a clinical outcome attributable to a specific disease, condition or injury?

T: Type of Question/Study  – You can have questions of different types. They can be categorized as a diagnosis, prognosis, therapy, etiology/harm, or prevention question. What study design would best answer the question: randomized controlled trial; cohort study; case controlled study; case series; case series; case report etc.

  • Last Updated: Feb 14, 2024 1:56 PM
  • URL:

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Rapid reviews methods series: guidance on rapid qualitative evidence synthesis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Andrew Booth 1 , 2 ,
  • Isolde Sommer 3 , 4 ,
  • Jane Noyes 2 , 5 ,
  • Catherine Houghton 2 , 6 ,
  • Fiona Campbell 1 , 7
  • The Cochrane Rapid Reviews Methods Group and Cochrane Qualitative and Implementation Methods Group (CQIMG)
  • 1 EnSyGN Sheffield Evidence Synthesis Group , University of Sheffield , Sheffield , UK
  • 2 Cochrane Qualitative and Implementation Methods Group (CQIMG) , London , UK
  • 3 Department for Evidence-based Medicine and Evaluation , University for Continuing Education Krems , Krems , Austria
  • 4 Cochrane Rapid Reviews Group & Cochrane Austria , Krems , Austria
  • 5 Bangor University , Bangor , UK
  • 6 University of Galway , Galway , Ireland
  • 7 University of Newcastle upon Tyne , Newcastle upon Tyne , UK
  • Correspondence to Professor Andrew Booth, Univ Sheffield, Sheffield, UK; a.booth{at}

This paper forms part of a series of methodological guidance from the Cochrane Rapid Reviews Methods Group and addresses rapid qualitative evidence syntheses (QESs), which use modified systematic, transparent and reproducible methodsu to accelerate the synthesis of qualitative evidence when faced with resource constraints. This guidance covers the review process as it relates to synthesis of qualitative research. ‘Rapid’ or ‘resource-constrained’ QES require use of templates and targeted knowledge user involvement. Clear definition of perspectives and decisions on indirect evidence, sampling and use of existing QES help in targeting eligibility criteria. Involvement of an information specialist, especially in prioritising databases, targeting grey literature and planning supplemental searches, can prove invaluable. Use of templates and frameworks in study selection and data extraction can be accompanied by quality assurance procedures targeting areas of likely weakness. Current Cochrane guidance informs selection of tools for quality assessment and of synthesis method. Thematic and framework synthesis facilitate efficient synthesis of large numbers of studies or plentiful data. Finally, judicious use of Grading of Recommendations Assessment, Development and Evaluation approach for assessing the Confidence of Evidence from Reviews of Qualitative research assessments and of software as appropriate help to achieve a timely and useful review product.

  • Systematic Reviews as Topic
  • Patient Care

Data availability statement

No data are available. Not applicable. All data is from published articles.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: .

Statistics from

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Rapid Qualitative Evidence Synthesis (QES) is a relatively recent innovation in evidence synthesis and few published examples currently exists.

Guidance for authoring a rapid QES is scattered and requires compilation and summary.


This paper represents the first attempt to compile current guidance, illustrated by the experience of several international review teams.

We identify features of rapid QES methods that could be accelerated or abbreviated and where methods resemble those for conventional QESs.


This paper offers guidance for researchers when conducting a rapid QES and informs commissioners of research and policy-makers what to expect when commissioning such a review.


This paper forms part of a series from the Cochrane Rapid Reviews Methods Group providing methodological guidance for rapid reviews. While other papers in the series 1–4 focus on generic considerations, we aim to provide in-depth recommendations specific to a resource-constrained (or rapid) qualitative evidence synthesis (rQES). 5 This paper is accompanied by recommended resources ( online supplemental appendix A ) and an elaboration with practical considerations ( online supplemental appendix B ).

Supplemental material

The role of qualitative evidence in decision-making is increasingly recognised. 6 This, in turn, has led to appreciation of the value of qualitative evidence syntheses (QESs) that summarise findings across multiple contexts. 7 Recognition of the need for such syntheses to be available at the time most useful to decision-making has, in turn, driven demand for rapid qualitative evidence syntheses. 8 The breadth of potential rQES mirrors the versatility of QES in general (from focused questions to broad overviews) and outputs range from descriptive thematic maps through to theory-informed syntheses (see table 1 ).

  • View inline

Glossary of important terms (alphabetically)

As with other resource-constrained reviews, no one size fits all. A team should start by specifying the phenomenon of interest, the review question, 9 the perspectives to be included 9 and the sample to be determined and selected. 10 Subsequently, the team must finalise the appropriate choice of synthesis. 11 Above all, the review team should consider the intended knowledge users, 3 including requirements of the funder.

An rQES team, in particular, cannot afford any extra time or resource requirements that might arise from either a misunderstanding of the review question, an unclear picture of user requirements or an inappropriate choice of methods. The team seeks to align the review question and the requirements of the knowledge user with available time and resources. They also need to ensure that the choice of data and choice of synthesis are appropriate to the intended ‘knowledge claims’ (epistemology) made by the rQES. 11 This involves the team asking ‘what types of data are meaningful for this review question?’, ‘what types of data are trustworthy?’ and ‘is the favoured synthesis method appropriate for this type of data?’. 12 This paper aims to help rQES teams to choose methods that best fit their project while understanding the limitations of those choices. Our recommendations derive from current QES guidance, 5 evidence on modified QES methods, 8 13 and practical experience. 14 15

This paper presents an overview of considerations and recommendations as described in table 2 . Supplemental materials including additional resources details of our recommendations and practical examples are provided in online supplemental appendices A and B .

Recommendations for resource-constrained qualitative evidence synthesis (rQES)

Setting the review question and topic refinement

Rapid reviews summarise information from multiple research studies to produce evidence for ‘the public, researchers, policymakers and funders in a systematic, resource-efficient manner’. 16 Involvement of knowledge users is critical. 3 Given time constraints, individual knowledge users could be asked only to feedback on very specific decisions and tasks or on selective sections of the protocol. Specifically, whenever a QES is abbreviated or accelerated, a team should ensure that the review question is agreed by a minimum number of knowledge users with expertise or experience that reflects all the important review perspectives and with authority to approve the final version 2 5 11 ( table 2 , item R1).

Involvement of topic experts can ensure that the rQES is responsive to need. 14 17 One Cochrane rQES saved considerable time by agreeing the review topic within a single meeting and one-phase iteration. 9 Decisions on topics to be omitted are also informed by a knowledge of existing QESs. 17

An information specialist can help to manage the quantity and quality of available evidence by setting conceptual boundaries and logistic limits. A structured question format, such as Setting-Perspective-Interest, phenomenon of-Comparison-Evaluation or Population-Interest, phenomenon of-Context helps in communicating the scope and, subsequently, in operationalising study selection. 9 18

Scoping (of review parameters) and mapping (of key types of evidence and likely richness of data) helps when planning the review. 5 19 The option to choose purposive sampling over comprehensive sampling approaches, as offered by standard QES, may be particularly helpful in the context of a rapid QES. 8 Once a team knows the approximate number and distribution of studies, perhaps mapping them against country, age, ethnicity, etc), they can decide whether or not to use purposive sampling. 12 An rQES for the WHO combined purposive with variation sampling. Sampling in two stages started by reducing the initial number of studies to a more manageable sampling frame and then sampling approximately a third of the remaining studies from within the sampling frame. 20

Sampling may target richer studies and/or privilege diversity. 8 21 A rich qualitative study typically illustrates findings with verbatim extracts from transcripts from interviews or textual responses from questionnaires. Rich studies are often found in specialist qualitative research or social science journals. In contrast, less rich studies may itemise themes with an occasional indicative text extract and tend to summarise findings. In clinical or biomedical journals less rich findings may be placed within a single table or box.

No rule exists on an optimal number of studies; too many studies makes it challenging to ‘maintain insight’, 22 too few does not sustain rigorous analysis. 23 Guidance on sampling is available from the forthcoming Cochrane-Campbell QES Handbook.

A review team can use templates to fast-track writing of a protocol. The protocol should always be publicly available ( table 2 , item R2). 24 25 Formal registration may require that the team has not commenced data extraction but should be considered if it does not compromise the rQES timeframe. Time pressures may require that methods are left suitably flexible to allow well-justified changes to be made as a detailed picture of the studies and data emerge. 26 The first Cochrane rQES drew heavily on text from a joint protocol/review template previously produced within Cochrane. 24

Setting eligibility criteria

An rQES team may need to limit the number of perspectives, focusing on those most important for decision-making 5 9 27 ( table 2 , item R3). Beyond the patients/clients each additional perspective (eg, family members, health professionals, other professionals, etc) multiplies the additional effort involved.

A rapid QES may require strict date and setting restrictions 17 and language restrictions that accommodate the specific requirements of the review. Specifically, the team should consider whether changes in context over time or substantive differences between geographical regions could be used to justify a narrower date range or a limited coverage of countries and/or languages. The team should also decide if ‘indirect evidence’ is to substitute for the absence of direct evidence. An rQES typically focuses on direct evidence, except when only indirect evidence is available 28 ( table 2 , item R4). Decisions on relevance are challenging—precautions for swine influenza may inform precautions for bird influenza. 28 A smoking ban may operate similarly to seat belt legislation, etc. A review team should identify where such shared mechanisms might operate. 28 An rQES team must also decide whether to use frameworks or models to focus the review. Theories may be unearthed within the topic search or be already known to team members, fro example, Theory of Planned Behaviour. 29

Options for managing the quantity and quality of studies and data emerge during the scoping (see above). In summary, the review team should consider privileging rich qualitative studies 2 ; consider a stepwise approach to inclusion of qualitative data and explore the possibility of sampling ( table 2 , item R5). For example, where data is plentiful an rQES may be limited to qualitative research and/or to mixed methods studies. Where data is less plentiful then surveys or other qualitative data sources may need to be included. Where plentiful reviews already exist, a team may decide to conduct a review of reviews 5 by including multiple QES within a mega-synthesis 28 29 ( table 2 , item R6).

Searching for QES merits its own guidance, 21–23 30 this section reinforces important considerations from guidance specific to qualitative research. Generic guidance for rapid reviews in this series broadly applies to rapid QESs. 1

In addition to journal articles, by far the most plentiful source, qualitative research is found in book chapters, theses and in published and unpublished reports. 21 Searches to support an rQES can (a) limit the number of databases searched, deliberately selecting databases from diverse disciplines, (b) use abbreviated study filters to retrieve qualitative designs and (c) employ high yield complementary methods (eg, reference checking, citation searching and Related Articles features). An information specialist (eg, librarian) should be involved in prioritising sources and search methods ( table 2 , item R7). 11 14

According to empirical evidence optimal database combinations include Scopus plus CINAHL or Scopus plus ProQuest Dissertations and Theses Global (two-database combinations) and Scopus plus CINAHL plus ProQuest Dissertations and Theses Global (three-database combination) with both choices retrieving between 89% and 92% of relevant studies. 30

If resources allow, searches should include one or two specialised databases ( table 2 , item R8) from different disciplines or contexts 21 (eg, social science databases, specialist discipline databases or regional or institutional repositories). Even when resources are limited, the information specialist should factor in time for peer review of at least one search strategy ( table 2 , item R9). 31 Searches for ‘grey literature’ should selectively target appropriate types of grey literature (such as theses or process evaluations) and supplemental searches, including citation chaining or Related Articles features ( table 2 , item R10). 32 The first Cochrane rQES reported that searching reference lists of key papers yielded an extra 30 candidate papers for review. However, the team documented exclusion of grey literature as a limitation of their review. 15

Study selection

Consistency in study selection is achieved by using templates, by gaining a shared team understanding of the audience and purpose, and by ongoing communication within, and beyond, the team. 2 33 Individuals may work in parallel on the same task, as in the first Cochrane rQES, or follow a ‘segmented’ approach where each reviewer is allocated a different task. 14 The use of machine learning in the specific context of rQES remains experimental. However, the possibility of developing qualitative study classifiers comparable to those for randomised controlled trials offers an achievable aspiration. 34

Title and abstract screening

The entire screening team should use pre-prepared, pretested title and abstract templates to limit the scale of piloting, calibration and testing ( table 2 , item R11). 1 14 The first Cochrane rQES team double-screened titles and abstracts within Covidence review software. 14 Disagreements were resolved with reference to a third reviewer achieving a shared understanding of the eligibility criteria and enhancing familiarity with target studies and insight from data. 14 The team should target and prioritise identified risks of either over-zealous inclusion or over-exclusion specific to each rQES ( table 2 , item R12). 14 The team should maximise opportunities to capture divergent views and perspectives within study findings. 35

Full-text screening

Full-text screening similarly benefits from using a pre-prepared pretested standardised template where possible 1 14 ( table 2 , item R11). If a single reviewer undertakes full-text screening, 8 the team should identify likely risks to trustworthiness of findings and focus quality control procedures (eg, use of additional reviewers and percentages for double screening) on specific threats 14 ( table 2 , item R13). The Cochrane rQES team opted for double screening to assist their immersion within the topic. 14

Data extraction

Data extraction of descriptive/contextual data may be facilitated by review management software (eg, EPPI-Reviewer) or home-made approaches using Google Forms, or other survey software. 36 Where extraction of qualitative findings requires line-by-line coding with multiple iterations of the data then a qualitative data management analysis package, such as QSR NVivo, reaps dividends. 36 The team must decide if, collectively, they favour extracting data to a template or coding direct within an electronic version of an article.

Quality control must be fit for purpose but not excessive. Published examples typically use a single reviewer for data extraction 8 with use of two independent reviewers being the exception. The team could limit data extraction to minimal essential items. They may also consider re-using descriptive details and findings previously extracted within previous well-conducted QES ( table 2 , item R14). A pre-existing framework, where readily identified, may help to structure the data extraction template. 15 37 The same framework may be used to present the findings. Some organisations may specify a preferred framework, such as an evidence-to-decision-making framework. 38

Assessment of methodological limitations

The QES community assess ‘methodological limitations’ rather than use ‘risk of bias’ terminology. An rQES team should pick an approach appropriate to their specific review. For example, a thematic map may not require assessment of individual studies—a brief statement of the generic limitations of the set of studies may be sufficient. However, for any synthesis that underpins practice recommendations 39 assessment of included studies is integral to the credibility of findings. In any decision-making context that involves recommendations or guidelines, an assessment of methodological limitations is mandatory. 40 41

Each review team should work with knowledge users to determine a review-specific approach to quality assessment. 27 While ‘traffic lights’, similar to the outputs from the Cochrane Risk of Bias tool, may facilitate rapid interpretation, accompanying textual notes are invaluable in highlighting specific areas for concern. In particular, the rQES team should demonstrate that they are aware (a) that research designs for qualitative research seek to elicit divergent views, rather than control for variation; (b) that, for qualitative research, the selection of the sample is far more informative than the size of the sample; and (c) that researchers from primary research, and equally reviewers for the qualitative synthesis, need to be thoughtful and reflexive about their possible influences on interpretation of either the primary data or the synthesised findings.

Selection of checklist

Numerous scales and checklists exist for assessing the quality of qualitative studies. In the absence of validated risk of bias tools for qualitative studies, the team should choose a tool according to Cochrane Qualitative and Implementation Methods Group (CQIMG) guidance together with expediency (according to ease of use, prior familiarity, etc) ( table 2 , item R15). 41 In comparison to the Critical Appraisal Skills Programme checklist which was never designed for use in synthesis, 42 the Cochrane qualitative tool is similarly easy to use and was designed for QES use. Work is underway to identify an assessment process that is compatible with QESs that support decision-making. 41 For now the choice of a checklist remains determined by interim Cochrane guidance and, beyond this, by personal preference and experience. For an rQES a team could use a single reviewer to assess methodological limitations, with verification of judgements (and support statements) by a second reviewer ( table 2 , item R16).

The CQIMG endorses three types of synthesis; thematic synthesis, framework synthesis and meta-ethnography ( box 1 ). 43 44 Rapid QES favour descriptive thematic synthesis 45 or framework synthesis, 46 47 except when theory generation (meta-ethnography 48 49 or analytical thematic synthesis) is a priority ( table 2 , item R17).

Choosing a method for rapid qualitative synthesis

Thematic synthesis: first choice method for rQES. 45 For example, in their rapid QES Crooks and colleagues 44 used a thematic synthesis to understand the experiences of both academic and lived experience coresearchers within palliative and end of life research. 45

Framework synthesis: alternative where a suitable framework can be speedily identified. 46 For example, Bright and colleagues 46 considered ‘best-fit framework synthesis’ as appropriate for mapping study findings to an ‘a priori framework of dimensions measured by prenatal maternal anxiety tools’ within their ‘streamlined and time-limited evidence review’. 47

Less commonly, an adapted meta-ethnographical approach was used for an implementation model of social distancing where supportive data (29 studies) was plentiful. 48 However, this QES demonstrates several features that subsequently challenge its original identification as ‘rapid’. 49

Abbrevations: QES, qualitative evidence synthesis; rQES, resource-constrained qualitative evidence synthesis.

The team should consider whether a conceptual model, theory or framework offers a rapid way for organising, coding, interpreting and presenting findings ( table 2 , item R18). If the extracted data appears rich enough to sustain further interpretation, data from a thematic or framework synthesis can subsequently be explored within a subsequent meta-ethnography. 43 However, this requires a team with substantial interpretative expertise. 11

Assessments of confidence in the evidence 4 are central to any rQES that seeks to support decision-making and the QES-specific Grading of Recommendations Assessment, Development and Evaluation approach for assessing the Confidence of Evidence from Reviews of Qualitative research (GRADE-CERQual) approach is designed to assess confidence in qualitative evidence. 50 This can be performed by a single reviewer, confirmed by a second reviewer. 26 Additional reviewers could verify all, or a sample of, assessments. For a rapid assessment a team must prioritise findings, using objective criteria; a WHO rQES focused only on the three ‘highly synthesised findings’. 20 The team could consider reusing GRADE-CERQual assessments from published QESs if findings are relevant and of demonstrable high quality ( table 2 , item R19). 50 No rapid approach to full application of GRADE-CERQual currently exists.

Reporting and record management

Little is written on optimal use of technology. 8 A rapid review is not a good time to learn review management software or qualitative analysis management software. Using such software for all general QES processes ( table 2 , item R20), and then harnessing these skills and tools when specifically under resource pressures, is a sounder strategy. Good file labelling and folder management and a ‘develop once, re-use multi-times’ approach facilitates resource savings.

Reporting requirements include the meta-ethnography reporting guidance (eMERGe) 51 and the Enhancing transparency in reporting the synthesis of qualitative research (ENTREQ) statement. 52 An rQES should describe limitations and their implications for confidence in the evidence even more thoroughly than a regular QES; detailing the consequences of fast-tracking, streamlining or of omitting processes all together. 8 Time spent documenting reflexivity is similarly important. 27 If QES methodology is to remain credible rapid approaches must be applied with insight and documented with circumspection. 53 54 (56)

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Klerings I ,
  • Robalino S ,
  • Booth A , et al
  • Nussbaumer-Streit B ,
  • Hamel C , et al
  • Garritty C ,
  • Tricco AC ,
  • Smith M , et al
  • Gartlehner G ,
  • Devane D , et al
  • NHS Scotland
  • Campbell F ,
  • Flemming K , et al
  • Glenton C ,
  • Lubarsky S ,
  • Varpio L , et al
  • Meskell P ,
  • Glenton C , et al
  • Houghton C ,
  • Delaney H , et al
  • Beecher C ,
  • Maeso B , et al
  • McKenzie JE , et al
  • Harris JL ,
  • Cargo M , et al
  • Varley-Campbell J , et al
  • Downe S , et al
  • Shamseer L ,
  • Clarke M , et al
  • Nussbaumer-Streit B , et al
  • Finlayson KW ,
  • Lawrie TA , et al
  • Lewin S , et al
  • Frandsen TF ,
  • Gildberg FA ,
  • Tingleff EB
  • Mshelia S ,
  • Analo CV , et al
  • Husk K , et al
  • Carmona C ,
  • Carroll C ,
  • Ilott I , et al
  • Meehan B , et al
  • Munthe-Kaas H ,
  • Bohren MA ,
  • Munthe-Kaas HM ,
  • French DP ,
  • Flemming K ,
  • Garside R , et al
  • Shulman C , et al
  • Dixon-Woods M
  • Bright KS ,
  • Norris JM ,
  • Letourneau NL , et al
  • Sadjadi M ,
  • Mörschel KS ,
  • Petticrew M
  • France EF ,
  • Cunningham M ,
  • Ring N , et al
  • McInnes E , et al
  • Britten N ,
  • Garside R ,
  • Pope C , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

Contributors All authors (AB, IS, JN, CH, FC) have made substantial contributions to the conception and design of the guidance document. AB led on drafting the work and revising it critically for important intellectual content. All other authors (IS, JN, CH, FC) contributed to revisions of the document. All authors (AB, IS, JN, CH, FC) have given final approval of the version to be published. As members of the Cochrane Qualitative and Implementation Methods Group and/or the Cochrane Rapid Reviews Methods Group all authors (AB, IS, JN, CH, FC) agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests AB is co-convenor of the Cochrane Qualitative and Implementation Methods Group. In the last 36 months, he received royalties from Systematic Approaches To a Successful Literature Review (Sage 3rd edition), honoraria from the Agency for Healthcare Research and Quality, and travel support from the WHO. JN is lead convenor of the Cochrane Qualitative and Implementation Methods Group. In the last 36 months, she has received honoraria from the Agency for Healthcare Research and Quality and travel support from the WHO. CH is co-convenor of the Cochrane Qualitative and Implementation Methods Group.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; internally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Help | Advanced Search

Computer Science > Computation and Language

Title: a systematic review of data-to-text nlg.

Abstract: This systematic review aims to provide a comprehensive analysis of the state of data-to-text generation research, focusing on identifying research gaps, offering future directions, and addressing challenges found during the review. We thoroughly examined the literature, including approaches, datasets, evaluation metrics, applications, multilingualism, and hallucination mitigation measures. Our review provides a roadmap for future research in this rapidly evolving field.

Submission history

Access paper:.

  • Download PDF
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Disclaimer » Advertising


Search Strategy

Inclusion criteria, data extraction and quality assessment, data synthesis, search results, test accuracy and sample characteristics across studies.

Ask Suicide-Screening Questions

Behavioral Health Screen

Computerized Adaptive Screen for Suicidal Youth

Patient Health Questionnaire-9, Item 9

Risk of Suicide Questionnaire

Methodological Quality of Included Studies

Test accuracy of identified screening tools, areas for methodological improvement, accuracy of tools among diverse populations, clinical pathways and the role of screening, identifying an optimal reference standard, limitations, conclusions, suicide risk screening tools for pediatric patients: a systematic review of test accuracy.

FUNDING: No external funding.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no conflicts of interest to report.

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • CME Quiz Close Quiz
  • Open the PDF for in another window
  • Get Permissions
  • Cite Icon Cite
  • Search Site

Nathan J. Lowry , Pauline Goger , Maria Hands Ruz , Fangfei Ye , Christine B. Cha; Suicide Risk Screening Tools for Pediatric Patients: A Systematic Review of Test Accuracy. Pediatrics 2024; e2023064172. 10.1542/peds.2023-064172

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Health care settings have increasingly adopted universal suicide risk screening tools into nonpsychiatric pediatric care; however, a systematic review examining the accuracy of these tools does not yet exist.

Identify and review research on the test accuracy of suicide risk screening tools for pediatric patients in nonpsychiatric medical settings.

PubMed and PsycINFO were searched to identify peer-reviewed articles published before March 23, 2023.

Articles that quantified the accuracy of a suicide risk screening tool (eg, sensitivity, specificity) in a nonpsychiatric medical setting (eg, primary care, specialty care, inpatient or surgical units, or the emergency department) were included.

A total of 13 studies were included in this review. Screening tool psychometric properties and study risk of bias were evaluated.

Sensitivity among individual studies ranged from 50% to 100%, and specificity ranged from 58.8% to 96%. Methodological quality was relatively varied, and applicability concerns were low. When stratifying results by screening tool, the Ask Suicide-Screening Questions and Computerized Adaptive Screen for Suicidal Youth had the most robust evidence base.

Because of considerable study heterogeneity, a meta-analytic approach was deemed inappropriate. This prevented us from statistically testing for differences between identified screening tools.

The Ask Suicide-Screening Questions and Computerized Adaptive Screen for Suicidal Youth exhibit satisfactory test accuracy and appear promising for integration into clinical practice. Although initial findings are promising, additional research targeted at examining the accuracy of screening tools among diverse populations is needed to ensure the equity of screening efforts.

Youth suicide is a prevalent and complex public health problem. 1 Moreover, it is resource- and time-intensive to effectively assess, predict, and treat. Universal suicide risk screening (ie, screening all pediatric patients for suicide risk regardless of presenting problem) is one way to feasibly identify and manage pediatric quickly “rule out” those who do not require further assessment, allowing medical settings to suicide risk. 2 , – 6  

The practice of universal suicide risk screening can help guide medical providers on where to direct their clinical attention and resources. However, validated tools are needed to ensure that screening is both accurate and feasible. Emerging research suggests that single item screens are often inaccurate, resulting in both under- and over-detection of suicidality. 7 , – 10 Missing youth at risk or overburdening limited mental health resources can directly result in negative clinical outcomes. Thus, it is important that tools are not just evidence-based, but also validated among the populations that are being screened.

Previous reviews 11 , 12 have identified evidence-based screening tools for pediatric patients, however, a formal synthesis of research examining the accuracy (ie, the ability of a test to correctly identify the presence or absence of suicidality) of these tools does not exist. Moreover, it is unclear how these tools perform in nonpsychiatric medical settings, where suicidality is not as prevalent or may not be the primary concern. Recently, medical organizations such as the American Academy of Pediatrics have recommended suicide risk screening for all pediatric patients ages 12 and older. 13 Thus, as nonpsychiatric health care settings increasingly adopt universal suicide risk screening tools into practice, it is important to evaluate their accuracy. To address this gap, this systematic review aims to (1) identify research examining the accuracy of suicide risk screening tools for pediatric patients in nonpsychiatric medical settings, (2) summarize psychometric properties of suicide risk screening tools, and (3) identify areas for future research.

This review follows Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 14 reporting guidelines and was preregistered with the International Prospective Register of Systematic Reviews (CRD42023406150).

Medline/PubMed and PsycINFO/EBSCO were searched to identify peer-reviewed articles published before March 23, 2023. The following search terms were used: (((“Adolescent”[Mesh]) OR “Child”[Mesh] OR “Youth” OR “Pediatric”) AND Suicid* AND (Screen* OR “Risk Assessment”[Mesh])). Two authors (M.H.-R. and F.Y.) independently screened abstracts for eligibility. Disagreements were resolved through the consensus of a third blind reviewer (N.L.). This process was repeated when screening full-text articles for eligibility. Screening was completed using the online platform, Covidence.

Eligible empirical articles met the following inclusion criteria: (1) written in English; (2) published in a peer-reviewed journal; (3) sample is youth aged 25 years old or younger; (4) study was conducted in a nonpsychiatric medical setting (eg, primary care, specialty care, inpatient or surgical units, or the emergency department [ED]); and (5) outcomes quantified the accuracy of a suicide risk screening tool (eg, sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV], area under the curve [AUC]). Articles were not excluded based on year published.

Two authors (M.H.-R. and F.Y.) completed data extraction; data from all included articles was double coded to ensure accuracy. The following information was extracted from each article: (1) sample characteristics; (2) study setting; (3) screening tool characteristics; and (4) screening tool psychometric properties. If studies stratified results by nonpsychiatric and psychiatric medical settings, sample characteristics and results were extracted only from the nonpsychiatric population.

The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) 15 tool was used to evaluate study risk of bias. QUADAS-2 signaling questions were adapted to evaluate potential bias in participant selection, index test, reference standard, and study flow and timing. We classified studies that lost at least 20% of participants to follow up as having “high” risk of bias. 16 Additionally, applicability of patient selection, index test, and reference standard was evaluated. Using the signaling questions, each study received a rating of low, high, or unclear for risk of bias and applicability. One author (P.G.) evaluated all included articles using the predetermined QUADAS-2 rating criteria. Articles that required additional consensus were evaluated by 2 authors (N.L. and C.C.) using the same QUADAS-2 criteria.

Extracted information was summarized using a narrative synthesis approach, with a focus on summary measures that quantify the accuracy of a suicide risk screening tools (eg, sensitivity, specificity, PPV, NPV, AUC).

A total of 3560 articles were identified by our search. The oldest article identified by our search was published in 1970. Given our broad search criteria, a total of 3520 were deemed ineligible at the title and abstract screening phase. A total of 40 full text articles were screened; 13 studies 17 , – 29 met eligibility criteria and were included in this systematic review. Figure 1 provides a PRISMA diagram for the study selection process. Inter-rater agreement was moderate for the abstract screening phase (κ = 0.42) and almost perfect for the full-text screening phase (κ = 0.82). 30  

PRISMA flow diagram.

PRISMA flow diagram.

Sensitivity among individual studies ranged from 50% to 100% and specificity ranged from 58.8% to 96% ( Table 1 ). For studies that reported AUC, values ranged from 0.754 to 0.956.

Study Characteristics and Diagnostic Test Accuracy

C-SSRS, Columbia Suicide Severity Rating Scale; RSQ, Risk of Suicide Questionnaire; SPS, Suicide Probability Scale; SA, Suicide Attempt; SSI, Scale for Suicidal Ideation.

Statistic not reported.

Across the 13 included studies, participant ages ranged from 8 to 22 years old ( Table 2 ). Study samples were primarily female, white, and non-Hispanic. Sample sizes also varied considerably, ranging from 100 to 13 420.

Study Sample Characteristics

Hopper et al (2012) enrolled a sample of Australian youth. Horowitz et al (2022) is a secondary analysis that examined non-Hispanic Black and non-Hispanic white youth. Uygun et al (2022) enrolled a sample of Turkish youth.

Eight studies identified by this review examined the accuracy of the Ask Suicide-Screening Questions (ASQ). 17 , – 19 , 21 , 23 , – 25 , 29 The ASQ is a 4-item self-report screening tool developed to identify clinically significant suicidal ideation or past behavior. The ASQ was first developed and validated among a sample of youth aged 10 to 21 presenting to the ED: when compared with the reference standard Suicidal Ideation Questionnaire (SIQ/SIQ-JR), 31 , 32 the ASQ had a sensitivity of 96.9% and specificity of 87.6%. 23 Subsequent research has examined the ASQ’s psychometric properties in pediatric medical and surgical inpatient units, outpatient specialty care, and outpatient primary care. Using the SIQ/SIQ-JR as the reference standard, the ASQ demonstrated a sensitivity of 96.7% and specificity of 91.1% among medical and surgical inpatients aged 10 to 21. 24 In outpatient specialty care, the sensitivity was 100% and specificity 91.2% and in outpatient primary care, 100% and 87.9%. 17  

Additional research has evaluated the ASQ’s predictive validity. One retrospective cohort study 19 reviewed medical records to determine the association between the ASQ and a suicide-related ED visit or death 3 months after screening. In this study, the ASQ yielded a sensitivity of 60% and specificity of 92.7%. 19 A separate retrospective cohort study 21 examined medical records of pediatric patients from the ED to determine how accurate the ASQ was in identifying individuals with a suicide-related ED visit 3 months after screening. This study found the ASQ to have a sensitivity of 66.7% and specificity of 84.2%. 21 Lastly, in 1 prospective study of pediatric ED patients, 18 the ASQ had a sensitivity of 95.1% and specificity of 58.8% when using a suicide attempt 3 months post screening as the reference standard.

Two studies 25 , 29 examined the cross cultural validity of the ASQ. One examined the validity of a Turkish language version of the ASQ for screening in a Turkish ED. Using the Suicide Probability Scale 33 as the reference standard, the Turkish ASQ had a sensitivity of 100% and specificity of 75.3%. The second study examined the accuracy of the ASQ among Black youth. 25 This was a secondary analysis using data from 3 previous ASQ studies 17 , 23 , 24 and found no significant differences in ASQ psychometric properties between Black youth (sensitivity: 94%, specificity: 91.4%) and white youth (sensitivity: 90.9%, specificity: 91.8%). 25  

One study 20 examined the accuracy of the Behavioral Health Screen (BHS), a self-report screening tool designed for use in outpatient primary care settings. This internet-based tool encompasses 13 domains ranging from nutrition to anxiety. One module within the BHS is the “suicide and self-harm” subscale, which is comprised of 5 core items and 5 follow-up items. This module identifies past week and lifetime suicide risk among adolescents and young adults. In its original validation study, the BHS “suicide and self-harm” subscale demonstrated a sensitivity of 83% and specificity of 87% 20 when compared with the Scale for Suicidal Ideation. 34  

Two studies 18 , 27 examined the accuracy of the Computerized Adaptive Screen for Suicidal Youth (CASSY), an electronic self-report screening tool designed to identify youth at risk for future suicidal behavior. The CASSY is a computerized adaptive test, such that an algorithm determines what items are administered based on how the respondent answers. These questions are drawn from a pool of 72 items that cover various domains, including post traumatic stress disorder, social adjustment, sleep, substance use, anger, and aggression. On average, the CASSY administers 11 items (range 5 to 21). Of note, the CASSY includes 3 of the 4 ASQ items in every administration.

The CASSY was originally developed utilizing prospective data from pediatric EDs and then independently validated with a sample pediatric ED patients 12 to 17-year-olds. 27 In this study, the CASSY had a sensitivity of 82.4% and specificity of 72.5% when compared with the reference standard (a self-reported suicide attempt 3 months after screening). 27 Another study 18 utilizing a sample of pediatric ED patients extended this research and found the CASSY to have a sensitivity of 94.5% and specificity of 64.3%.

Two studies 26 , 28 examined the accuracy of the Patient Health Questionnaire-9 (PHQ-9). The PHQ-9 is a self-report depression screening tool that evaluates depressive symptoms over the past 2 weeks and includes 1 item (Item 9) that screens for thoughts of suicide and self-harm. 35 One study 28 examined the psychometric properties of the PHQ-9 Item 9 in an outpatient specialty care unit for 12 to 22-year-olds by using the Columbia-Suicide Severity Rating Scale 36 as the reference standard. In this cross-sectional study, PHQ-9 Item 9 had a sensitivity of 53.3% and specificity of 95.7%. 28  

Another study identified by this review assessed the validity of the Patient Health Questionnaire-Adolescent Version (PHQ-A) Item 9. 26 This study used a convenience sample of medical and surgical inpatients aged 10 to 21-year-old. Using the SIQ/SIQ-JR 31 as the reference standard, PHQ-A Item 9 had a sensitivity of 70.0% and specificity of 96.0%. 26  

One study 22 investigated the accuracy of the Risk of Suicide Questionnaire (RSQ), a 4-item verbally administered screening tool that identifies current suicidality. This study used a sample of pediatric patients ages 13 to 18 without mental health complaints or recent psychiatric history who presented to the ED. 22 Of note, none of the participants who screened positive on the RSQ also screened positive on the reference standard SIQ (cutoff score >41) or SIQ-JR (cutoff score >31). When using more liberal cut-off scores of 30 on the SIQ and 23 on the SIQ-JR, the RSQ demonstrated a sensitivity of 50% and specificity of 79%. 22  

Methodological quality was relatively varied across domains ( Fig 2 ). The index test and patient selection domains exhibited a low risk of bias, apart from 1 and 3 studies rated as having high bias in these categories, respectively. In contrast, at least 50% of studies were classified as having high risk of bias for the reference standard and flow and timing domains, mostly because of using survey measures as reference standard and not administering the same version of the reference standard to all participants. Applicability concerns were very low, with hardly any studies showing bias across these domains.

QUADAS-2 ratings of included studies.

QUADAS-2 ratings of included studies.

The findings of this review demonstrate that certain screening tools for suicide risk exhibit satisfactory accuracy and hold promise for integration into clinical practice. We provide commentary on the accuracy of identified tools, where study methodology can be improved, considerations for the cross-cultural validity of screening tools, the role of screening within clinical pathways, and optimal clinical targets for youth risk screening.

The ASQ was the most widely studied tool in this review and demonstrated strong psychometric properties. Additionally, it was the only tool to be evaluated and validated for use among pediatric patients in the ED, inpatient and medical surgical units, and outpatient primary and specialty care. The newly developed CASSY also shows promise as a screening tool, demonstrating strong sensitivity and specificity for use in the ED while leveraging novel adaptive testing methods. However, the accuracy of the CASSY in other medical settings (eg, outpatient primary care) has yet to be examined and is an area for future research to address.

The BHS displayed appropriate accuracy in outpatient primary care. However, it should be noted that the BHS is a broadband measure that screens for other psychopathology and risky behavior, and it is unclear how its suicide risk subscale would perform in isolation. Moreover, further research examining the accuracy of the BHS in other medical settings is needed to determine its utility in other medical settings.

Our findings suggest that caution should be exercised when using certain screening tools. Both the PHQ-9/PHQ-A, Item 9, and the RSQ demonstrated poor sensitivity in their respective studies. This may in part result from the fact that neither tool was specifically developed for universal suicide risk screening. The PHQ has been validated for use as a depression screening tool, 37 however, its reliance on a single item to screen for suicide risk may make it susceptible to missing patients who warrant further examination. Similarly, the RSQ was originally developed to identify suicide risk among psychiatric inpatients, which may contribute to its psychometric properties not generalizing to a nonpsychiatric medical population. Additionally, in its original validation study, the RSQ demonstrated a sensitivity of 98% and specificity of 37%, 38 suggesting that it may also be prone to generating a false positives.

Using the QUADAS-2, 15 this review identified areas where the methodological quality of suicide screening research can be strengthened to avoid risk of bias, specifically in the domains of reference standard and flow and timing. Many studies 17 , 20 , 22 , – 26 , 29 opted to use a self-report survey measure as the reference standard (eg, the SIQ or Scale for Suicidal Ideation), which inherently introduced the possibility of measurement error and risk of bias. This may have resulted in the over- or underestimation of screening tool accuracy. Future research should aspire to use clinician assessment or behavioral outcomes (eg, suicide attempts) as the “gold standard” when validating tools.

Flow and timing of administration represented a significant area in need of improvement across studies. Several studies 17 , 22 , – 26 administered a different reference standard to participants based on age (eg, the SIQ and SIQ-JR). Administering either the SIQ or SIQ-JR was necessary when choosing the SIQ because these tools are validated for different age groups, 31 , 32 however, this is not considered best practice when determining the diagnostic test accuracy of a tool 15 and other options (eg, clinician assessment) were available and would have addressed this issue. Other studies suffered from participant attrition either because of losing more than 20% of participants to follow-up 18 , 27 or by retroactively selecting only complete screening cases from medical records. 19 , 21 Participant retention is often a challenge in longitudinal studies, though the potential for misclassification resulting in the overestimation of sensitivity and specificity should not be overlooked. Lastly, 1 study 21 reported that the index test was not administered with fidelity.

There is a clear need for additional research to examine the accuracy of suicide risk screening tools among diverse youth populations. This is a particularly relevant area of study as cultural differences can influence how an individual perceives and discusses suicidality. 39 Overall, samples across studies were primarily female, white, and non-Hispanic ( Table 2 ). Moreover, only 1 study directly examined the validity of screening among youth of color (ie, Black youth), 25 raising concerns about the generalizability of findings.

Translations of screening tools are also needed to account for the diversity of the youth that present during screening. Moreover, it is crucial to validate these translations to ensure that tools have relevant vernacular and retain the intended meaning of questions. One identified study validated a Turkish language version of the ASQ. 29 Other research that did not meet inclusion criteria has examined the validity of screening tool translations 40 , 41 and current research is underway to validate translations of screening tools. 42 Nevertheless, further research in this area is needed to ensure the equity of screening.

Screening tool accuracy should also be evaluated in the context of larger clinical pathways. The American Academy of Pediatrics (AAP) provides guidance on youth suicide prevention strategies in clinical settings. 43 Referencing clinical pathways developed by an American Academy of Child and Adolescent Psychiatry workgroup, 44 AAP emphasizes screening as the first step in a 3-tiered pathway. Within these pathways, screening tool sensitivity is valued over specificity, which may result in false positives. To address the issue, patients who screen positive on the initial suicide risk screen receive a brief suicide safety assessment, which acts as a triage step to confirm the presence of suicidality. Using information collected during the brief suicide safety assessment, a provider can then determine appropriate safety precautions and next steps for care (eg, a full mental health evaluation, outpatient referral, or no action). Thus, lower screening tool specificity values may be tolerable. However, there is currently no standard of care for managing suicide risk in the medical setting, and the pathways highlighted by AAP represent one possible approach. Future research aimed at studying these pathways would help clarify the role of screening and the necessary accuracy to optimize the feasibility of risk identification.

One important discrepancy between studies identified by this review was the reference standard used for validation. Although all the tools identified by this review screen for “suicide risk,” some studies used a reference standard of current suicidal ideation, (ie, the SIQ) whereas others used future suicidal behavior (ie, suicide attempts). This difference is representative of current debate on the optimal target of suicide risk screening. 45 , – 47 Notably, the use of different reference standards may have important implications for the accuracy of suicide risk screening tools and the clinical information these tools provide to clinicians.

Suicide attempts are an important outcome to prevent, however, future suicidal behavior is notoriously difficult to predict, 48 which may result in poor suicide risk screening tool accuracy. Instead, a de-emphasis on future behavior and focus on detecting clinically significant suicidal ideation may improve the accuracy of screening tools. Moreover, although suicidal ideation is a poor predictor of suicide mortality, 49 , 50 identifying youth with suicidal thoughts has important clinical utility, regardless of whether they go on to die by suicide. Compared with adolescents without suicidal ideation, adolescents with suicidal ideation are significantly more likely to develop psychopathology and have poor overall functioning by age 30. 51 Thus, suicidal ideation appears to be an important clinical indicator of distress that is relevant to the detection of youth suicidality and overall well-being. However, this is an emerging area of research, and future studies should investigate how different reference standards impact suicide risk screening tool accuracy and clinical outcomes.

The findings of this review should be interpreted with the following limitations. Foremost, despite a comprehensive search, it is possible that articles relevant to this review were missed. Additionally, because of considerable study heterogeneity, a meta-analytic approach was deemed inappropriate. This prevented us from statistically testing for differences between identified screening tools.

The findings of this review demonstrate that screening tools for suicide risk exhibit satisfactory test accuracy and hold promise for integration into clinical practice. However, practicing caution is necessary when choosing screening tools, as some commonly recommended tools lack sufficient research to support their validity for pediatric patients and require further evaluation. Additionally, the low number of studies identified by this review suggests that this is a novel field of research that requires further study. Future research targeted at studying the accuracy of screening tools among diverse populations is particularly needed to ensure the equity of universal suicide risk screening efforts.

Mr Lowry conceptualized and designed the study and interpreted data; Dr Goger interpreted data; Dr Cha conceptualized and designed the study, interpreted data, and provided supervision; and all authors drafted the initial manuscript, critically reviewed and revised the manuscript for important intellectual content, approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.

Patient Health Questionnaire

Patient Health Questionnaire, Adolescent Version

Suicidal Ideation Questionnaire

Suicidal Ideation Questionnaire, Junior Version

Competing Interests

Advertising Disclaimer »

Citing articles via

Email alerts.

systematic review pico tools


  • Editorial Board
  • Editorial Policies
  • Journal Blogs
  • Pediatrics On Call
  • Online ISSN 1098-4275
  • Print ISSN 0031-4005
  • Pediatrics Open Science
  • Hospital Pediatrics
  • Pediatrics in Review
  • AAP Grand Rounds
  • Latest News
  • Pediatric Care Online
  • Red Book Online
  • Pediatric Patient Education
  • AAP Toolkits
  • AAP Pediatric Coding Newsletter

First 1,000 Days Knowledge Center

Institutions/librarians, group practices, licensing/permissions, integrations, advertising.

  • Privacy Statement | Accessibility Statement | Terms of Use | Support Center | Contact Us
  • © Copyright American Academy of Pediatrics

This Feature Is Available To Subscribers Only

Sign In or Create an Account


  1. Using PICO or PICo

    systematic review pico tools

  2. The PICO framework for framing systematic review research questions

    systematic review pico tools

  3. The PICO framework for framing systematic review research questions

    systematic review pico tools

  4. What are PICO elements in systematic review

    systematic review pico tools

  5. Using PICO or PICo

    systematic review pico tools

  6. We are looking for qualified Systematic Review methodologists!

    systematic review pico tools


  1. New Pico design

  2. 🟡⚪ Busqueda avanzada y sistemática de literatura cientifica en Pubmed y Scopus+ ChatGPT como soporte

  3. Lookiing for another way for region switcher


  5. introduction to medical research

  6. New Pico Design By Pico Master


  1. The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: a systematic review

    This review aimed to determine if the use of the patient, intervention, comparison, outcome (PICO) model as a search strategy tool affects the quality of a literature search. Methods

  2. Cochrane PICO search

    Cochrane PICO search is a powerful discovery tool for finding Cochrane Reviews, now available in the Cochrane Library. With Cochrane PICO search you can search over 4500 Cochrane intervention reviews published since 2015 by: P opulation (or Patient or Problem): what are the characteristics of the patient or population - e.g. a condition?

  3. PICO, PICOS and SPIDER: a comparison study of specificity and

    PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews Abigail M Methley, Stephen Campbell, Carolyn Chew-Graham, Rosalind McNally, and Sudeh Cheraghi-Sohi Author information Article notes Copyright and License information PMC Disclaimer Go to: Abstract Background

  4. PICO

    This is a guide to understanding systematic reviews and other evidence synthesis methods. Links to resources and information about the Libraries' Systematic Review Service is included. Developing Your Question Developing your research question is one of the most important steps in the review process.

  5. Using the full PICO model as a search tool for systematic reviews

    The PICO model is often used to develop the search strategy for a systematic review. Existing guidelines generally recommend that a search strategy should include the population, intervention(s), and types of study design.

  6. Defining your review question

    PICO is a framework for developing a focused clinical question. Slightly different versions of this concept are used to search for quantitative and qualitative reviews, examples are given below: PICO for quantitative studies Here is an example of a clinical question that outlines the PICO components: PICo for qualitative studies

  7. Using the full PICO model as a search tool for systematic reviews

    32679315 10.1016/j.jclinepi.2020.07.005 The use of the four-part PICO model to facilitate search strategy development for a precise answer is recommended for structuring searches for systematic reviews. Existing guidelines generally recommend that a search strategy should include the population, intervention (s), and types of study design.

  8. PICO Portal

    PICO Portal's AI-powered Systematic Literature Review platform and managed services let you focus on developing high quality research, generating evidence you can trust - faster than ever. Supercharged Systematic Reviews Relevant Articles First

  9. PICO, PICOS and SPIDER: a comparison study of specificity and

    PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews Abigail M Methley, Stephen Campbell, Carolyn Chew-Graham, Rosalind McNally & Sudeh Cheraghi-Sohi BMC Health Services Research 14, Article number: 579 ( 2014 ) Cite this article 187k Accesses 889 Citations 58 Altmetric

  10. PICO: all you need to know

    PICO is an essential tool for systematic reviewers who are studying the effects of interventions. It's a simple framework for formulating a research question and deciding on the eligibility of studies for inclusion in the review. It considers four basic elements: The Population (or patients) The Intervention The Control The Outcome

  11. What are the PICO elements in systematic review?

    A systematic review is a method supporting evidence-based practices and healthcare decisions primarily through quantitative approach whereby a complete search is conducted in attempting to identify all the related publications which are integrated and assimilated through statistical analysis.

  12. Cleveland Clinic Florida

    The PICO model is a tool that can help you formulate a good clinical question. Sometimes it's referred to as PICO-T, containing an optional 5th factor. Asking an Answerable Question (Cochrane Library) PICO Cochrane Library Tutorial (University of Oxford) Question Templates for PICOT (Sonoma State University) Trip: PICO

  13. Using the full PICO model as a search tool for systematic reviews

    Using the full PICO model as a search tool for systematic reviews resulted in lower recall for some PICO elements - ScienceDirect Article preview Abstract Introduction Section snippets References (47) Cited by (49) Journal of Clinical Epidemiology Volume 127, November 2020, Pages 69-75 Original Article

  14. The Systematic Review Toolbox

    The PubMed PICO tool is a quick way of searching PubMed, where you simply enter the population and intervention, plus comparator and/or outcome if appropriate (these are optional fields). You can also select the type of publications you are interested in, for example RCTs or systematic reviews.

  15. Systematic Search for Systematic Review

    A systematic review aims to answer a specific research (clinical) question. A well-formulated question will guide many aspects of the review process, including determining eligibility criteria, searching for studies, collecting data from included studies, and presenting findings (Cochrane Handbook, Sec. 5.1.1).To define a researchable question, the most commonly used structure is PICO, which ...

  16. PDF Systematic Review Tools

    Systematic Review Tools Kiara Comfort, MLIS Kim Harp, MLS Leon S. McGoogan Health Sciences Library February 7, 2024 • Systematic Review Overview ... • PICO (Patient, Intervention, Comparison, Outcome) • SPIDER (Sample, Phenomenon of Interest, Design, Evaluation, Research

  17. Using the full PICO model as a search tool for systematic reviews

    The use of the four-part PICO model to facilitate search strategy development for a precise answer is recommended for structuring searches for systematic reviews. Existing guidelines generally recommend that a search strategy should include the population, intervention(s), and types of study design. Consequently, comparison and outcome are not recommended as a part of the search strategy.

  18. (PDF) Using the full PICO model as a search tool for systematic reviews

    Objective The use of the four-part PICO model to facilitate search strategy development for a precise answer is recommended for structuring searches for systematic reviews.

  19. BeckerGuides: Systematic Review Guide: PICO format

    PICO. On the Systematic Review Request form you will be asked to outline your research question in PICO format. This allows us to easily understand the main concepts of your research question. Here is what PICO stands for: P = Problem/Population. I = Intervention (or the experimental variable) C = Comparison (or the control variable) [Optional ...

  20. Cochrane Library About PICO

    There are three different sorts of PICOs within Cochrane Reviews. PICO stands for four different potential components of a health question used in Cochrane Review research: Patient, Population or Problem; Intervention; Comparison; Outcome. These components give you the specific who, what, when, where and how, of an evidence-based health-care research question.

  21. A Review of the PubMed PICO Tool: Using Evidence-Based Practice in

    The PubMed PICO (Patient, Intervention, Comparison, Outcome) tool from the National Library of Medicine provides health education professionals and students a method to conduct evidence-based practice literature searches to enhance the quality of new and existing health education interventions and programs.

  22. The Systematic Review Toolbox

    About Contact PICO Portal Description PICO Portal is a web-based tool for citation screening, full text review, data extraction and quality assessment that facilitates an efficient systematic review. PICO Portal aspires to combine the strengths of a modern user interface and cutting-edge machine learning functionality.

  23. Systematic Reviews and Other Expert Reviews

    The PICO (Patient or Problem, Intervention, Comparison, Outcome) framework is a commonly used tool for formulating research questions. ... EndNote is the preferred tool for large systematic reviews because of the following features: Bulk import of large files; Advanced options for removing duplicate records;

  24. Guidance to best tools and practices for systematic reviews

    We direct attention to the currently recommended tools listed in Table 3.1 but concentrate on AMSTAR-2 (update of AMSTAR [A Measurement Tool to Assess Systematic Reviews]) and ROBIS (Risk of Bias in Systematic Reviews), which evaluate methodological quality and RoB, respectively. For comparison and completeness, we include PRISMA 2020 (update ...

  25. Databases for finding articles

    Some systematic reviews include a meta-analysis (see above). Scoping review - A type of knowledge synthesis that uses a systematic and iterative approach to identify and synthesize an existing or emerging body of literature on a given topic. The key differences between systematic reviews and scoping reviews are differing purposes and aims.

  26. Rapid reviews methods series: guidance on rapid qualitative evidence

    Setting the review question and topic refinement. Rapid reviews summarise information from multiple research studies to produce evidence for 'the public, researchers, policymakers and funders in a systematic, resource-efficient manner'.16 Involvement of knowledge users is critical.3 Given time constraints, individual knowledge users could be asked only to feedback on very specific ...

  27. [2402.08496] A Systematic Review of Data-to-Text NLG

    This systematic review aims to provide a comprehensive analysis of the state of data-to-text generation research, focusing on identifying research gaps, offering future directions, and addressing challenges found during the review. We thoroughly examined the literature, including approaches, datasets, evaluation metrics, applications, multilingualism, and hallucination mitigation measures. Our ...

  28. Suicide Risk Screening Tools for Pediatric Patients: A Systematic

    CONTEXT:. Health care settings have increasingly adopted universal suicide risk screening tools into nonpsychiatric pediatric care; however, a systematic review examining the accuracy of these tools does not yet exist.OBJECTIVE:. Identify and review research on the test accuracy of suicide risk screening tools for pediatric patients in nonpsychiatric medical settings.DATA SOURCES:. PubMed and ...