Fact check: Correcting the record about the UC Berkeley Library’s long-term space plan

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Research methods--quantitative, qualitative, and more: overview.

  • Quantitative Research
  • Qualitative Research
  • Data Science Methods (Machine Learning, AI, Big Data)
  • Text Mining and Computational Text Analysis
  • Evidence Synthesis/Systematic Reviews
  • Get Data, Get Help!

About Research Methods

This guide provides an overview of research methods, how to choose and use them, and supports and resources at UC Berkeley. 

As Patten and Newhart note in the book Understanding Research Methods , "Research methods are the building blocks of the scientific enterprise. They are the "how" for building systematic knowledge. The accumulation of knowledge through research is by its nature a collective endeavor. Each well-designed study provides evidence that may support, amend, refute, or deepen the understanding of existing knowledge...Decisions are important throughout the practice of research and are designed to help researchers collect evidence that includes the full spectrum of the phenomenon under study, to maintain logical rules, and to mitigate or account for possible sources of bias. In many ways, learning research methods is learning how to see and make these decisions."

The choice of methods varies by discipline, by the kind of phenomenon being studied and the data being used to study it, by the technology available, and more.  This guide is an introduction, but if you don't see what you need here, always contact your subject librarian, and/or take a look to see if there's a library research guide that will answer your question. 

Suggestions for changes and additions to this guide are welcome! 

START HERE: SAGE Research Methods

Without question, the most comprehensive resource available from the library is SAGE Research Methods.  HERE IS THE ONLINE GUIDE  to this one-stop shopping collection, and some helpful links are below:

  • SAGE Research Methods
  • Little Green Books  (Quantitative Methods)
  • Little Blue Books  (Qualitative Methods)
  • Dictionaries and Encyclopedias  
  • Case studies of real research projects
  • Sample datasets for hands-on practice
  • Streaming video--see methods come to life
  • Methodspace- -a community for researchers
  • SAGE Research Methods Course Mapping

Library Data Services at UC Berkeley

Library Data Services Program and Digital Scholarship Services

The LDSP offers a variety of services and tools !  From this link, check out pages for each of the following topics:  discovering data, managing data, collecting data, GIS data, text data mining, publishing data, digital scholarship, open science, and the Research Data Management Program.

Be sure also to check out the visual guide to where to seek assistance on campus with any research question you may have!

Library GIS Services

Other Data Services at Berkeley

D-Lab Supports Berkeley faculty, staff, and graduate students with research in data intensive social science, including a wide range of training and workshop offerings Dryad Dryad is a simple self-service tool for researchers to use in publishing their datasets. It provides tools for the effective publication of and access to research data. Geospatial Innovation Facility (GIF) Provides leadership and training across a broad array of integrated mapping technologies on campu Research Data Management A UC Berkeley guide and consulting service for research data management issues

General Research Methods Resources

Here are some general resources for assistance:

  • Assistance from ICPSR (must create an account to access): Getting Help with Data , and Resources for Students
  • Wiley Stats Ref for background information on statistics topics
  • Survey Documentation and Analysis (SDA) .  Program for easy web-based analysis of survey data.


  • D-Lab/Data Science Discovery Consultants Request help with your research project from peer consultants.
  • Research data (RDM) consulting Meet with RDM consultants before designing the data security, storage, and sharing aspects of your qualitative project.
  • Statistics Department Consulting Services A service in which advanced graduate students, under faculty supervision, are available to consult during specified hours in the Fall and Spring semesters.

Related Resourcex

  • IRB / CPHS Qualitative research projects with human subjects often require that you go through an ethics review.
  • OURS (Office of Undergraduate Research and Scholarships) OURS supports undergraduates who want to embark on research projects and assistantships. In particular, check out their "Getting Started in Research" workshops
  • Sponsored Projects Sponsored projects works with researchers applying for major external grants.
  • Next: Quantitative Research >>
  • Last Updated: Apr 3, 2023 3:14 PM
  • URL: https://guides.lib.berkeley.edu/researchmethods
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

analysis in research methodology

Home Market Research

Data Analysis in Research: Types & Methods


Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection  methods, and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.


QuestionPro, a survey research software company boasting a user base exceeding 5 million worldwide.

Tailoring Survey Software for Diverse Academic Research Needs: An Insight into QuestionPro’s Versatility

Dec 19, 2023

Macro Journeys

Macro Journeys: Business Growth via Customer Experiences

Dec 18, 2023

Action Planning Tools

Action Planning Tools: What It Is & Key Elements

analysis in research methodology

Unveiling the Top 5 Market Research Trends Redefining Insights in 2024

Other categories.

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Grad Coach

What Is Research Methodology? A Plain-Language Explanation & Definition (With Examples)

By Derek Jansen (MBA)  and Kerryn Warren (PhD) | June 2020 (Last updated April 2023)

If you’re new to formal academic research, it’s quite likely that you’re feeling a little overwhelmed by all the technical lingo that gets thrown around. And who could blame you – “research methodology”, “research methods”, “sampling strategies”… it all seems never-ending!

In this post, we’ll demystify the landscape with plain-language explanations and loads of examples (including easy-to-follow videos), so that you can approach your dissertation, thesis or research project with confidence. Let’s get started.

Research Methodology 101

  • What exactly research methodology means
  • What qualitative , quantitative and mixed methods are
  • What sampling strategy is
  • What data collection methods are
  • What data analysis methods are
  • How to choose your research methodology
  • Example of a research methodology

Free Webinar: Research Methodology 101

What is research methodology?

Research methodology simply refers to the practical “how” of a research study. More specifically, it’s about how  a researcher  systematically designs a study  to ensure valid and reliable results that address the research aims, objectives and research questions . Specifically, how the researcher went about deciding:

  • What type of data to collect (e.g., qualitative or quantitative data )
  • Who  to collect it from (i.e., the sampling strategy )
  • How to  collect  it (i.e., the data collection method )
  • How to  analyse  it (i.e., the data analysis methods )

Within any formal piece of academic research (be it a dissertation, thesis or journal article), you’ll find a research methodology chapter or section which covers the aspects mentioned above. Importantly, a good methodology chapter explains not just   what methodological choices were made, but also explains  why they were made. In other words, the methodology chapter should justify  the design choices, by showing that the chosen methods and techniques are the best fit for the research aims, objectives and research questions. 

So, it’s the same as research design?

Not quite. As we mentioned, research methodology refers to the collection of practical decisions regarding what data you’ll collect, from who, how you’ll collect it and how you’ll analyse it. Research design, on the other hand, is more about the overall strategy you’ll adopt in your study. For example, whether you’ll use an experimental design in which you manipulate one variable while controlling others. You can learn more about research design and the various design types here .

Need a helping hand?

analysis in research methodology

What are qualitative, quantitative and mixed-methods?

Qualitative, quantitative and mixed-methods are different types of methodological approaches, distinguished by their focus on words , numbers or both . This is a bit of an oversimplification, but its a good starting point for understanding.

Let’s take a closer look.

Qualitative research refers to research which focuses on collecting and analysing words (written or spoken) and textual or visual data, whereas quantitative research focuses on measurement and testing using numerical data . Qualitative analysis can also focus on other “softer” data points, such as body language or visual elements.

It’s quite common for a qualitative methodology to be used when the research aims and research questions are exploratory  in nature. For example, a qualitative methodology might be used to understand peoples’ perceptions about an event that took place, or a political candidate running for president. 

Contrasted to this, a quantitative methodology is typically used when the research aims and research questions are confirmatory  in nature. For example, a quantitative methodology might be used to measure the relationship between two variables (e.g. personality type and likelihood to commit a crime) or to test a set of hypotheses .

As you’ve probably guessed, the mixed-method methodology attempts to combine the best of both qualitative and quantitative methodologies to integrate perspectives and create a rich picture. If you’d like to learn more about these three methodological approaches, be sure to watch our explainer video below.

What is sampling strategy?

Simply put, sampling is about deciding who (or where) you’re going to collect your data from . Why does this matter? Well, generally it’s not possible to collect data from every single person in your group of interest (this is called the “population”), so you’ll need to engage a smaller portion of that group that’s accessible and manageable (this is called the “sample”).

How you go about selecting the sample (i.e., your sampling strategy) will have a major impact on your study.  There are many different sampling methods  you can choose from, but the two overarching categories are probability   sampling and  non-probability   sampling .

Probability sampling  involves using a completely random sample from the group of people you’re interested in. This is comparable to throwing the names all potential participants into a hat, shaking it up, and picking out the “winners”. By using a completely random sample, you’ll minimise the risk of selection bias and the results of your study will be more generalisable  to the entire population. 

Non-probability sampling , on the other hand,  doesn’t use a random sample . For example, it might involve using a convenience sample, which means you’d only interview or survey people that you have access to (perhaps your friends, family or work colleagues), rather than a truly random sample. With non-probability sampling, the results are typically not generalisable .

To learn more about sampling methods, be sure to check out the video below.

What are data collection methods?

As the name suggests, data collection methods simply refers to the way in which you go about collecting the data for your study. Some of the most common data collection methods include:

  • Interviews (which can be unstructured, semi-structured or structured)
  • Focus groups and group interviews
  • Surveys (online or physical surveys)
  • Observations (watching and recording activities)
  • Biophysical measurements (e.g., blood pressure, heart rate, etc.)
  • Documents and records (e.g., financial reports, court records, etc.)

The choice of which data collection method to use depends on your overall research aims and research questions , as well as practicalities and resource constraints. For example, if your research is exploratory in nature, qualitative methods such as interviews and focus groups would likely be a good fit. Conversely, if your research aims to measure specific variables or test hypotheses, large-scale surveys that produce large volumes of numerical data would likely be a better fit.

What are data analysis methods?

Data analysis methods refer to the methods and techniques that you’ll use to make sense of your data. These can be grouped according to whether the research is qualitative  (words-based) or quantitative (numbers-based).

Popular data analysis methods in qualitative research include:

  • Qualitative content analysis
  • Thematic analysis
  • Discourse analysis
  • Narrative analysis
  • Interpretative phenomenological analysis (IPA)
  • Visual analysis (of photographs, videos, art, etc.)

Qualitative data analysis all begins with data coding , after which an analysis method is applied. In some cases, more than one analysis method is used, depending on the research aims and research questions . In the video below, we explore some  common qualitative analysis methods, along with practical examples.  

Moving on to the quantitative side of things, popular data analysis methods in this type of research include:

  • Descriptive statistics (e.g. means, medians, modes )
  • Inferential statistics (e.g. correlation, regression, structural equation modelling)

Again, the choice of which data collection method to use depends on your overall research aims and objectives , as well as practicalities and resource constraints. In the video below, we explain some core concepts central to quantitative analysis.

How do I choose a research methodology?

As you’ve probably picked up by now, your research aims and objectives have a major influence on the research methodology . So, the starting point for developing your research methodology is to take a step back and look at the big picture of your research, before you make methodology decisions. The first question you need to ask yourself is whether your research is exploratory or confirmatory in nature.

If your research aims and objectives are primarily exploratory in nature, your research will likely be qualitative and therefore you might consider qualitative data collection methods (e.g. interviews) and analysis methods (e.g. qualitative content analysis). 

Conversely, if your research aims and objective are looking to measure or test something (i.e. they’re confirmatory), then your research will quite likely be quantitative in nature, and you might consider quantitative data collection methods (e.g. surveys) and analyses (e.g. statistical analysis).

Designing your research and working out your methodology is a large topic, which we cover extensively on the blog . For now, however, the key takeaway is that you should always start with your research aims, objectives and research questions (the golden thread). Every methodological choice you make needs align with those three components. 

Example of a research methodology chapter

In the video below, we provide a detailed walkthrough of a research methodology from an actual dissertation, as well as an overview of our free methodology template .

analysis in research methodology

Psst… there’s more (for free)

This post is part of our dissertation mini-course, which covers everything you need to get started with your dissertation, thesis or research project. 

You Might Also Like:

What is descriptive statistics?


Leo Balanlay

Thank you for this simple yet comprehensive and easy to digest presentation. God Bless!

Derek Jansen

You’re most welcome, Leo. Best of luck with your research!


I found it very useful. many thanks

Solomon F. Joel

This is really directional. A make-easy research knowledge.

Upendo Mmbaga

Thank you for this, I think will help my research proposal


Thanks for good interpretation,well understood.

Alhaji Alie Kanu

Good morning sorry I want to the search topic

Baraka Gombela

Thank u more


Thank you, your explanation is simple and very helpful.

Suleiman Abubakar

Very educative a.nd exciting platform. A bigger thank you and I’ll like to always be with you

Daniel Mondela

That’s the best analysis


So simple yet so insightful. Thank you.

Wendy Lushaba

This really easy to read as it is self-explanatory. Very much appreciated…


Thanks for this. It’s so helpful and explicit. For those elements highlighted in orange, they were good sources of referrals for concepts I didn’t understand. A million thanks for this.

Tabe Solomon Matebesi

Good morning, I have been reading your research lessons through out a period of times. They are important, impressive and clear. Want to subscribe and be and be active with you.

Hafiz Tahir

Thankyou So much Sir Derek…

Good morning thanks so much for the on line lectures am a student of university of Makeni.select a research topic and deliberate on it so that we’ll continue to understand more.sorry that’s a suggestion.

James Olukoya

Beautiful presentation. I love it.


please provide a research mehodology example for zoology

Ogar , Praise

It’s very educative and well explained

Joseph Chan

Thanks for the concise and informative data.

Goja Terhemba John

This is really good for students to be safe and well understand that research is all about

Prakash thapa

Thank you so much Derek sir🖤🙏🤗


Very simple and reliable

Chizor Adisa

This is really helpful. Thanks alot. God bless you.


very useful, Thank you very much..

nakato justine

thanks a lot its really useful


in a nutshell..thank you!


Thanks for updating my understanding on this aspect of my Thesis writing.


Very simple but yet insightful Thank you

Adegboyega ADaeBAYO

This has been an eye opening experience. Thank you grad coach team.


Very useful message for research scholars


Really very helpful thank you


yes you are right and i’m left


Research methodology with a simplest way i have never seen before this article.

wogayehu tuji

wow thank u so much

Good morning thanks so much for the on line lectures am a student of university of Makeni.select a research topic and deliberate on is so that we will continue to understand more.sorry that’s a suggestion.


Very precise and informative.

Javangwe Nyeketa

Thanks for simplifying these terms for us, really appreciate it.

Mary Benard Mwanganya

Thanks this has really helped me. It is very easy to understand.


I found the notes and the presentation assisting and opening my understanding on research methodology

Godfrey Martin Assenga

Good presentation

Nhubu Tawanda

Im so glad you clarified my misconceptions. Im now ready to fry my onions. Thank you so much. God bless


Thank you a lot.


thanks for the easy way of learning and desirable presentation.

Ajala Tajudeen

Thanks a lot. I am inspired

Visor Likali

Well written

Pondris Patrick

I am writing a APA Format paper . I using questionnaire with 120 STDs teacher for my participant. Can you write me mthology for this research. Send it through email sent. Just need a sample as an example please. My topic is ” impacts of overcrowding on students learning

Thanks for your comment.

We can’t write your methodology for you. If you’re looking for samples, you should be able to find some sample methodologies on Google. Alternatively, you can download some previous dissertations from a dissertation directory and have a look at the methodology chapters therein.

All the best with your research.


Thank you so much for this!! God Bless


Thank you. Explicit explanation


Thank you, Derek and Kerryn, for making this simple to understand. I’m currently at the inception stage of my research.


Thnks a lot , this was very usefull on my assignment

Beulah Emmanuel

excellent explanation

Gino Raz

I’m currently working on my master’s thesis, thanks for this! I’m certain that I will use Qualitative methodology.


Thanks a lot for this concise piece, it was quite relieving and helpful. God bless you BIG…

Yonas Tesheme

I am currently doing my dissertation proposal and I am sure that I will do quantitative research. Thank you very much it was extremely helpful.

zahid t ahmad

Very interesting and informative yet I would like to know about examples of Research Questions as well, if possible.

Maisnam loyalakla

I’m about to submit a research presentation, I have come to understand from your simplification on understanding research methodology. My research will be mixed methodology, qualitative as well as quantitative. So aim and objective of mixed method would be both exploratory and confirmatory. Thanks you very much for your guidance.

Mila Milano

OMG thanks for that, you’re a life saver. You covered all the points I needed. Thank you so much ❤️ ❤️ ❤️


Thank you immensely for this simple, easy to comprehend explanation of data collection methods. I have been stuck here for months 😩. Glad I found your piece. Super insightful.


I’m going to write synopsis which will be quantitative research method and I don’t know how to frame my topic, can I kindly get some ideas..


Thanks for this, I was really struggling.

This was really informative I was struggling but this helped me.

Modie Maria Neswiswi

Thanks a lot for this information, simple and straightforward. I’m a last year student from the University of South Africa UNISA South Africa.

Mursel Amin

its very much informative and understandable. I have enlightened.

Mustapha Abubakar

An interesting nice exploration of a topic.


Thank you. Accurate and simple🥰

Sikandar Ali Shah

This article was really helpful, it helped me understanding the basic concepts of the topic Research Methodology. The examples were very clear, and easy to understand. I would like to visit this website again. Thank you so much for such a great explanation of the subject.


Thanks dude


Thank you Doctor Derek for this wonderful piece, please help to provide your details for reference purpose. God bless.


Many compliments to you


Great work , thank you very much for the simple explanation


Thank you. I had to give a presentation on this topic. I have looked everywhere on the internet but this is the best and simple explanation.

omodara beatrice

thank you, its very informative.


Well explained. Now I know my research methodology will be qualitative and exploratory. Thank you so much, keep up the good work


Well explained, thank you very much.

Ainembabazi Rose

This is good explanation, I have understood the different methods of research. Thanks a lot.

Kamran Saeed

Great work…very well explanation

Hyacinth Chebe Ukwuani

Thanks Derek. Kerryn was just fantastic!

Great to hear that, Hyacinth. Best of luck with your research!

Matobela Joel Marabi

Its a good templates very attractive and important to PhD students and lectuter

Thanks for the feedback, Matobela. Good luck with your research methodology.


Thank you. This is really helpful.

You’re very welcome, Elie. Good luck with your research methodology.

Sakina Dalal

Well explained thanks


This is a very helpful site especially for young researchers at college. It provides sufficient information to guide students and equip them with the necessary foundation to ask any other questions aimed at deepening their understanding.

Thanks for the kind words, Edward. Good luck with your research!

Ngwisa Marie-claire NJOTU

Thank you. I have learned a lot.

Great to hear that, Ngwisa. Good luck with your research methodology!


Thank you for keeping your presentation simples and short and covering key information for research methodology. My key takeaway: Start with defining your research objective the other will depend on the aims of your research question.


My name is Zanele I would like to be assisted with my research , and the topic is shortage of nursing staff globally want are the causes , effects on health, patients and community and also globally

Oluwafemi Taiwo

Thanks for making it simple and clear. It greatly helped in understanding research methodology. Regards.


This is well simplified and straight to the point

Gabriel mugangavari

Thank you Dr

Dina Haj Ibrahim

I was given an assignment to research 2 publications and describe their research methodology? I don’t know how to start this task can someone help me?

Sure. You’re welcome to book an initial consultation with one of our Research Coaches to discuss how we can assist – https://gradcoach.com/book/new/ .


Thanks a lot I am relieved of a heavy burden.keep up with the good work

Ngaka Mokoena

I’m very much grateful Dr Derek. I’m planning to pursue one of the careers that really needs one to be very much eager to know. There’s a lot of research to do and everything, but since I’ve gotten this information I will use it to the best of my potential.

Pritam Pal

Thank you so much, words are not enough to explain how helpful this session has been for me!


Thanks this has thought me alot.

kenechukwu ambrose

Very concise and helpful. Thanks a lot

Eunice Shatila Sinyemu 32070

Thank Derek. This is very helpful. Your step by step explanation has made it easier for me to understand different concepts. Now i can get on with my research.


I wish i had come across this sooner. So simple but yet insightful

yugine the

really nice explanation thank you so much


I’m so grateful finding this site, it’s really helpful…….every term well explained and provide accurate understanding especially to student going into an in-depth research for the very first time, even though my lecturer already explained this topic to the class, I think I got the clear and efficient explanation here, much thanks to the author.


It is very helpful material

Lubabalo Ntshebe

I would like to be assisted with my research topic : Literature Review and research methodologies. My topic is : what is the relationship between unemployment and economic growth?


Its really nice and good for us.

Ekokobe Aloysius



Short but sweet.Thank you

Shishir Pokharel

Informative article. Thanks for your detailed information.

Badr Alharbi

I’m currently working on my Ph.D. thesis. Thanks a lot, Derek and Kerryn, Well-organized sequences, facilitate the readers’ following.


great article for someone who does not have any background can even understand

Hasan Chowdhury

I am a bit confused about research design and methodology. Are they the same? If not, what are the differences and how are they related?

Thanks in advance.

Ndileka Myoli

concise and informative.

Sureka Batagoda

Thank you very much

More Smith

How can we site this article is Harvard style?


Very well written piece that afforded better understanding of the concept. Thank you!

Denis Eken Lomoro

Am a new researcher trying to learn how best to write a research proposal. I find your article spot on and want to download the free template but finding difficulties. Can u kindly send it to my email, the free download entitled, “Free Download: Research Proposal Template (with Examples)”.

fatima sani

Thank too much


Thank you very much for your comprehensive explanation about research methodology so I like to thank you again for giving us such great things.

Aqsa Iftijhar

Good very well explained.Thanks for sharing it.

Krishna Dhakal

Thank u sir, it is really a good guideline.


so helpful thank you very much.

Joelma M Monteiro

Thanks for the video it was very explanatory and detailed, easy to comprehend and follow up. please, keep it up the good work


It was very helpful, a well-written document with precise information.

orebotswe morokane

how do i reference this?


MLA Jansen, Derek, and Kerryn Warren. “What (Exactly) Is Research Methodology?” Grad Coach, June 2021, gradcoach.com/what-is-research-methodology/.

APA Jansen, D., & Warren, K. (2021, June). What (Exactly) Is Research Methodology? Grad Coach. https://gradcoach.com/what-is-research-methodology/


Your explanation is easily understood. Thank you

Dr Christie

Very help article. Now I can go my methodology chapter in my thesis with ease

Alice W. Mbuthia

I feel guided ,Thank you

Joseph B. Smith

This simplification is very helpful. It is simple but very educative, thanks ever so much

Dr. Ukpai Ukpai Eni

The write up is informative and educative. It is an academic intellectual representation that every good researcher can find useful. Thanks

chimbini Joseph

Wow, this is wonderful long live.


Nice initiative


thank you the video was helpful to me.


Thank you very much for your simple and clear explanations I’m really satisfied by the way you did it By now, I think I can realize a very good article by following your fastidious indications May God bless you


Thanks very much, it was very concise and informational for a beginner like me to gain an insight into what i am about to undertake. I really appreciate.

Adv Asad Ali

very informative sir, it is amazing to understand the meaning of question hidden behind that, and simple language is used other than legislature to understand easily. stay happy.

Jonas Tan

This one is really amazing. All content in your youtube channel is a very helpful guide for doing research. Thanks, GradCoach.

mahmoud ali

research methodologies

Lucas Sinyangwe

Please send me more information concerning dissertation research.

Amamten Jr.

Nice piece of knowledge shared….. #Thump_UP

Hajara Salihu

This is amazing, it has said it all. Thanks to Gradcoach

Gerald Andrew Babu

This is wonderful,very elaborate and clear.I hope to reach out for your assistance in my research very soon.


This is the answer I am searching about…

realy thanks a lot

Ahmed Saeed

Thank you very much for this awesome, to the point and inclusive article.

Soraya Kolli

Thank you very much I need validity and reliability explanation I have exams


Thank you for a well explained piece. This will help me going forward.

Emmanuel Chukwuma

Very simple and well detailed Many thanks

Zeeshan Ali Khan

This is so very simple yet so very effective and comprehensive. An Excellent piece of work.

Molly Wasonga

I wish I saw this earlier on! Great insights for a beginner(researcher) like me. Thanks a mil!

Blessings Chigodo

Thank you very much, for such a simplified, clear and practical step by step both for academic students and general research work. Holistic, effective to use and easy to read step by step. One can easily apply the steps in practical terms and produce a quality document/up-to standard

Thanks for simplifying these terms for us, really appreciated.

Joseph Kyereme

Thanks for a great work. well understood .


This was very helpful. It was simple but profound and very easy to understand. Thank you so much!


Great and amazing research guidelines. Best site for learning research

ankita bhatt

hello sir/ma’am, i didn’t find yet that what type of research methodology i am using. because i am writing my report on CSR and collect all my data from websites and articles so which type of methodology i should write in dissertation report. please help me. i am from India.


how does this really work?

princelow presley

perfect content, thanks a lot

George Nangpaak Duut

As a researcher, I commend you for the detailed and simplified information on the topic in question. I would like to remain in touch for the sharing of research ideas on other topics. Thank you


Impressive. Thank you, Grad Coach 😍

Thank you Grad Coach for this piece of information. I have at least learned about the different types of research methodologies.

Varinder singh Rana

Very useful content with easy way

Mbangu Jones Kashweeka

Thank you very much for the presentation. I am an MPH student with the Adventist University of Africa. I have successfully completed my theory and starting on my research this July. My topic is “Factors associated with Dental Caries in (one District) in Botswana. I need help on how to go about this quantitative research

Carolyn Russell

I am so grateful to run across something that was sooo helpful. I have been on my doctorate journey for quite some time. Your breakdown on methodology helped me to refresh my intent. Thank you.

Indabawa Musbahu

thanks so much for this good lecture. student from university of science and technology, Wudil. Kano Nigeria.

Limpho Mphutlane

It’s profound easy to understand I appreciate

Mustafa Salimi

Thanks a lot for sharing superb information in a detailed but concise manner. It was really helpful and helped a lot in getting into my own research methodology.

Rabilu yau

Comment * thanks very much

Ari M. Hussein

This was sooo helpful for me thank you so much i didn’t even know what i had to write thank you!

You’re most welcome 🙂

Varsha Patnaik

Simple and good. Very much helpful. Thank you so much.


This is very good work. I have benefited.

Dr Md Asraul Hoque

Thank you so much for sharing

Nkasa lizwi

This is powerful thank you so much guys

I am nkasa lizwi doing my research proposal on honors with the university of Walter Sisulu Komani I m on part 3 now can you assist me.my topic is: transitional challenges faced by educators in intermediate phase in the Alfred Nzo District.

Atonisah Jonathan

Appreciate the presentation. Very useful step-by-step guidelines to follow.

Bello Suleiman

I appreciate sir


wow! This is super insightful for me. Thank you!

Emerita Guzman

Indeed this material is very helpful! Kudos writers/authors.


I want to say thank you very much, I got a lot of info and knowledge. Be blessed.

Akanji wasiu

I want present a seminar paper on Optimisation of Deep learning-based models on vulnerability detection in digital transactions.

Need assistance

Clement Lokwar

Dear Sir, I want to be assisted on my research on Sanitation and Water management in emergencies areas.

Peter Sone Kome

I am deeply grateful for the knowledge gained. I will be getting in touch shortly as I want to be assisted in my ongoing research.


The information shared is informative, crisp and clear. Kudos Team! And thanks a lot!

Bipin pokhrel

hello i want to study


Hello!! Grad coach teams. I am extremely happy in your tutorial or consultation. i am really benefited all material and briefing. Thank you very much for your generous helps. Please keep it up. If you add in your briefing, references for further reading, it will be very nice.


All I have to say is, thank u gyz.


Good, l thanks


  • What Is A Literature Review (In A Dissertation Or Thesis) - Grad Coach - […] the literature review is to inform the choice of methodology for your own research. As we’ve discussed on the Grad Coach blog,…
  • Free Download: Research Proposal Template (With Examples) - Grad Coach - […] Research design (methodology) […]
  • Dissertation vs Thesis: What's the difference? - Grad Coach - […] and thesis writing on a daily basis – everything from how to find a good research topic to which…

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

analysis in research methodology

  • Print Friendly
  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • 6. The Methodology
  • Purpose of Guide
  • Design Flaws to Avoid
  • Independent and Dependent Variables
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

The methods section describes actions taken to investigate a research problem and the rationale for the application of specific procedures or techniques used to identify, select, process, and analyze information applied to understanding the problem, thereby, allowing the reader to critically evaluate a study’s overall validity and reliability. The methodology section of a research paper answers two main questions: How was the data collected or generated? And, how was it analyzed? The writing should be direct and precise and always written in the past tense.

Kallet, Richard H. "How to Write the Methods Section of a Research Paper." Respiratory Care 49 (October 2004): 1229-1232.

Importance of a Good Methodology Section

You must explain how you obtained and analyzed your results for the following reasons:

  • Readers need to know how the data was obtained because the method you chose affects the results and, by extension, how you interpreted their significance in the discussion section of your paper.
  • Methodology is crucial for any branch of scholarship because an unreliable method produces unreliable results and, as a consequence, undermines the value of your analysis of the findings.
  • In most cases, there are a variety of different methods you can choose to investigate a research problem. The methodology section of your paper should clearly articulate the reasons why you have chosen a particular procedure or technique.
  • The reader wants to know that the data was collected or generated in a way that is consistent with accepted practice in the field of study. For example, if you are using a multiple choice questionnaire, readers need to know that it offered your respondents a reasonable range of answers to choose from.
  • The method must be appropriate to fulfilling the overall aims of the study. For example, you need to ensure that you have a large enough sample size to be able to generalize and make recommendations based upon the findings.
  • The methodology should discuss the problems that were anticipated and the steps you took to prevent them from occurring. For any problems that do arise, you must describe the ways in which they were minimized or why these problems do not impact in any meaningful way your interpretation of the findings.
  • In the social and behavioral sciences, it is important to always provide sufficient information to allow other researchers to adopt or replicate your methodology. This information is particularly important when a new method has been developed or an innovative use of an existing method is utilized.

Bem, Daryl J. Writing the Empirical Journal Article. Psychology Writing Center. University of Washington; Denscombe, Martyn. The Good Research Guide: For Small-Scale Social Research Projects . 5th edition. Buckingham, UK: Open University Press, 2014; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences . Thousand Oaks, CA: Corwin Press, 2008.

Structure and Writing Style

I.  Groups of Research Methods

There are two main groups of research methods in the social sciences:

  • The e mpirical-analytical group approaches the study of social sciences in a similar manner that researchers study the natural sciences . This type of research focuses on objective knowledge, research questions that can be answered yes or no, and operational definitions of variables to be measured. The empirical-analytical group employs deductive reasoning that uses existing theory as a foundation for formulating hypotheses that need to be tested. This approach is focused on explanation.
  • The i nterpretative group of methods is focused on understanding phenomenon in a comprehensive, holistic way . Interpretive methods focus on analytically disclosing the meaning-making practices of human subjects [the why, how, or by what means people do what they do], while showing how those practices arrange so that it can be used to generate observable outcomes. Interpretive methods allow you to recognize your connection to the phenomena under investigation. However, the interpretative group requires careful examination of variables because it focuses more on subjective knowledge.

II.  Content

The introduction to your methodology section should begin by restating the research problem and underlying assumptions underpinning your study. This is followed by situating the methods you used to gather, analyze, and process information within the overall “tradition” of your field of study and within the particular research design you have chosen to study the problem. If the method you choose lies outside of the tradition of your field [i.e., your review of the literature demonstrates that the method is not commonly used], provide a justification for how your choice of methods specifically addresses the research problem in ways that have not been utilized in prior studies.

The remainder of your methodology section should describe the following:

  • Decisions made in selecting the data you have analyzed or, in the case of qualitative research, the subjects and research setting you have examined,
  • Tools and methods used to identify and collect information, and how you identified relevant variables,
  • The ways in which you processed the data and the procedures you used to analyze that data, and
  • The specific research tools or strategies that you utilized to study the underlying hypothesis and research questions.

In addition, an effectively written methodology section should:

  • Introduce the overall methodological approach for investigating your research problem . Is your study qualitative or quantitative or a combination of both (mixed method)? Are you going to take a special approach, such as action research, or a more neutral stance?
  • Indicate how the approach fits the overall research design . Your methods for gathering data should have a clear connection to your research problem. In other words, make sure that your methods will actually address the problem. One of the most common deficiencies found in research papers is that the proposed methodology is not suitable to achieving the stated objective of your paper.
  • Describe the specific methods of data collection you are going to use , such as, surveys, interviews, questionnaires, observation, archival research. If you are analyzing existing data, such as a data set or archival documents, describe how it was originally created or gathered and by whom. Also be sure to explain how older data is still relevant to investigating the current research problem.
  • Explain how you intend to analyze your results . Will you use statistical analysis? Will you use specific theoretical perspectives to help you analyze a text or explain observed behaviors? Describe how you plan to obtain an accurate assessment of relationships, patterns, trends, distributions, and possible contradictions found in the data.
  • Provide background and a rationale for methodologies that are unfamiliar for your readers . Very often in the social sciences, research problems and the methods for investigating them require more explanation/rationale than widely accepted rules governing the natural and physical sciences. Be clear and concise in your explanation.
  • Provide a justification for subject selection and sampling procedure . For instance, if you propose to conduct interviews, how do you intend to select the sample population? If you are analyzing texts, which texts have you chosen, and why? If you are using statistics, why is this set of data being used? If other data sources exist, explain why the data you chose is most appropriate to addressing the research problem.
  • Provide a justification for case study selection . A common method of analyzing research problems in the social sciences is to analyze specific cases. These can be a person, place, event, phenomenon, or other type of subject of analysis that are either examined as a singular topic of in-depth investigation or multiple topics of investigation studied for the purpose of comparing or contrasting findings. In either method, you should explain why a case or cases were chosen and how they specifically relate to the research problem.
  • Describe potential limitations . Are there any practical limitations that could affect your data collection? How will you attempt to control for potential confounding variables and errors? If your methodology may lead to problems you can anticipate, state this openly and show why pursuing this methodology outweighs the risk of these problems cropping up.

NOTE :   Once you have written all of the elements of the methods section, subsequent revisions should focus on how to present those elements as clearly and as logically as possibly. The description of how you prepared to study the research problem, how you gathered the data, and the protocol for analyzing the data should be organized chronologically. For clarity, when a large amount of detail must be presented, information should be presented in sub-sections according to topic. If necessary, consider using appendices for raw data.

ANOTHER NOTE : If you are conducting a qualitative analysis of a research problem , the methodology section generally requires a more elaborate description of the methods used as well as an explanation of the processes applied to gathering and analyzing of data than is generally required for studies using quantitative methods. Because you are the primary instrument for generating the data [e.g., through interviews or observations], the process for collecting that data has a significantly greater impact on producing the findings. Therefore, qualitative research requires a more detailed description of the methods used.

YET ANOTHER NOTE :   If your study involves interviews, observations, or other qualitative techniques involving human subjects , you may be required to obtain approval from the university's Office for the Protection of Research Subjects before beginning your research. This is not a common procedure for most undergraduate level student research assignments. However, i f your professor states you need approval, you must include a statement in your methods section that you received official endorsement and adequate informed consent from the office and that there was a clear assessment and minimization of risks to participants and to the university. This statement informs the reader that your study was conducted in an ethical and responsible manner. In some cases, the approval notice is included as an appendix to your paper.

III.  Problems to Avoid

Irrelevant Detail The methodology section of your paper should be thorough but concise. Do not provide any background information that does not directly help the reader understand why a particular method was chosen, how the data was gathered or obtained, and how the data was analyzed in relation to the research problem [note: analyzed, not interpreted! Save how you interpreted the findings for the discussion section]. With this in mind, the page length of your methods section will generally be less than any other section of your paper except the conclusion.

Unnecessary Explanation of Basic Procedures Remember that you are not writing a how-to guide about a particular method. You should make the assumption that readers possess a basic understanding of how to investigate the research problem on their own and, therefore, you do not have to go into great detail about specific methodological procedures. The focus should be on how you applied a method , not on the mechanics of doing a method. An exception to this rule is if you select an unconventional methodological approach; if this is the case, be sure to explain why this approach was chosen and how it enhances the overall process of discovery.

Problem Blindness It is almost a given that you will encounter problems when collecting or generating your data, or, gaps will exist in existing data or archival materials. Do not ignore these problems or pretend they did not occur. Often, documenting how you overcame obstacles can form an interesting part of the methodology. It demonstrates to the reader that you can provide a cogent rationale for the decisions you made to minimize the impact of any problems that arose.

Literature Review Just as the literature review section of your paper provides an overview of sources you have examined while researching a particular topic, the methodology section should cite any sources that informed your choice and application of a particular method [i.e., the choice of a survey should include any citations to the works you used to help construct the survey].

It’s More than Sources of Information! A description of a research study's method should not be confused with a description of the sources of information. Such a list of sources is useful in and of itself, especially if it is accompanied by an explanation about the selection and use of the sources. The description of the project's methodology complements a list of sources in that it sets forth the organization and interpretation of information emanating from those sources.

Azevedo, L.F. et al. "How to Write a Scientific Paper: Writing the Methods Section." Revista Portuguesa de Pneumologia 17 (2011): 232-238; Blair Lorrie. “Choosing a Methodology.” In Writing a Graduate Thesis or Dissertation , Teaching Writing Series. (Rotterdam: Sense Publishers 2016), pp. 49-72; Butin, Dan W. The Education Dissertation A Guide for Practitioner Scholars . Thousand Oaks, CA: Corwin, 2010; Carter, Susan. Structuring Your Research Thesis . New York: Palgrave Macmillan, 2012; Kallet, Richard H. “How to Write the Methods Section of a Research Paper.” Respiratory Care 49 (October 2004):1229-1232; Lunenburg, Frederick C. Writing a Successful Thesis or Dissertation: Tips and Strategies for Students in the Social and Behavioral Sciences . Thousand Oaks, CA: Corwin Press, 2008. Methods Section. The Writer’s Handbook. Writing Center. University of Wisconsin, Madison; Rudestam, Kjell Erik and Rae R. Newton. “The Method Chapter: Describing Your Research Plan.” In Surviving Your Dissertation: A Comprehensive Guide to Content and Process . (Thousand Oaks, Sage Publications, 2015), pp. 87-115; What is Interpretive Research. Institute of Public and International Affairs, University of Utah; Writing the Experimental Report: Methods, Results, and Discussion. The Writing Lab and The OWL. Purdue University; Methods and Materials. The Structure, Format, Content, and Style of a Journal-Style Scientific Paper. Department of Biology. Bates College.

Writing Tip

Statistical Designs and Tests? Do Not Fear Them!

Don't avoid using a quantitative approach to analyzing your research problem just because you fear the idea of applying statistical designs and tests. A qualitative approach, such as conducting interviews or content analysis of archival texts, can yield exciting new insights about a research problem, but it should not be undertaken simply because you have a disdain for running a simple regression. A well designed quantitative research study can often be accomplished in very clear and direct ways, whereas, a similar study of a qualitative nature usually requires considerable time to analyze large volumes of data and a tremendous burden to create new paths for analysis where previously no path associated with your research problem had existed.

To locate data and statistics, GO HERE .

Another Writing Tip

Knowing the Relationship Between Theories and Methods

There can be multiple meaning associated with the term "theories" and the term "methods" in social sciences research. A helpful way to delineate between them is to understand "theories" as representing different ways of characterizing the social world when you research it and "methods" as representing different ways of generating and analyzing data about that social world. Framed in this way, all empirical social sciences research involves theories and methods, whether they are stated explicitly or not. However, while theories and methods are often related, it is important that, as a researcher, you deliberately separate them in order to avoid your theories playing a disproportionate role in shaping what outcomes your chosen methods produce.

Introspectively engage in an ongoing dialectic between the application of theories and methods to help enable you to use the outcomes from your methods to interrogate and develop new theories, or ways of framing conceptually the research problem. This is how scholarship grows and branches out into new intellectual territory.

Reynolds, R. Larry. Ways of Knowing. Alternative Microeconomics . Part 1, Chapter 3. Boise State University; The Theory-Method Relationship. S-Cool Revision. United Kingdom.

Yet Another Writing Tip

Methods and the Methodology

Do not confuse the terms "methods" and "methodology." As Schneider notes, a method refers to the technical steps taken to do research . Descriptions of methods usually include defining and stating why you have chosen specific techniques to investigate a research problem, followed by an outline of the procedures you used to systematically select, gather, and process the data [remember to always save the interpretation of data for the discussion section of your paper].

The methodology refers to a discussion of the underlying reasoning why particular methods were used . This discussion includes describing the theoretical concepts that inform the choice of methods to be applied, placing the choice of methods within the more general nature of academic work, and reviewing its relevance to examining the research problem. The methodology section also includes a thorough review of the methods other scholars have used to study the topic.

Bryman, Alan. "Of Methods and Methodology." Qualitative Research in Organizations and Management: An International Journal 3 (2008): 159-168; Schneider, Florian. “What's in a Methodology: The Difference between Method, Methodology, and Theory…and How to Get the Balance Right?” PoliticsEastAsia.com. Chinese Department, University of Leiden, Netherlands.

  • << Previous: Scholarly vs. Popular Publications
  • Next: Qualitative Methods >>
  • Last Updated: Oct 10, 2023 1:30 PM
  • URL: https://libguides.usc.edu/writingguide

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base


  • What Is Qualitative Research? | Methods & Examples

What Is Qualitative Research? | Methods & Examples

Published on June 19, 2020 by Pritha Bhandari . Revised on June 22, 2023.

Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research.

Qualitative research is the opposite of quantitative research , which involves collecting and analyzing numerical data for statistical analysis.

Qualitative research is commonly used in the humanities and social sciences, in subjects such as anthropology, sociology, education, health sciences, history, etc.

  • How does social media shape body image in teenagers?
  • How do children and adults interpret healthy eating in the UK?
  • What factors influence employee retention in a large organization?
  • How is anxiety experienced around the world?
  • How can teachers integrate social issues into science curriculums?

Table of contents

Approaches to qualitative research, qualitative research methods, qualitative data analysis, advantages of qualitative research, disadvantages of qualitative research, other interesting articles, frequently asked questions about qualitative research.

Qualitative research is used to understand how people experience the world. While there are many approaches to qualitative research, they tend to be flexible and focus on retaining rich meaning when interpreting data.

Common approaches include grounded theory, ethnography , action research , phenomenological research, and narrative research. They share some similarities, but emphasize different aims and perspectives.

Note that qualitative research is at risk for certain research biases including the Hawthorne effect , observer bias , recall bias , and social desirability bias . While not always totally avoidable, awareness of potential biases as you collect and analyze your data can prevent them from impacting your work too much.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Each of the research approaches involve using one or more data collection methods . These are some of the most common qualitative methods:

  • Observations: recording what you have seen, heard, or encountered in detailed field notes.
  • Interviews:  personally asking people questions in one-on-one conversations.
  • Focus groups: asking questions and generating discussion among a group of people.
  • Surveys : distributing questionnaires with open-ended questions.
  • Secondary research: collecting existing data in the form of texts, images, audio or video recordings, etc.
  • You take field notes with observations and reflect on your own experiences of the company culture.
  • You distribute open-ended surveys to employees across all the company’s offices by email to find out if the culture varies across locations.
  • You conduct in-depth interviews with employees in your office to learn about their experiences and perspectives in greater detail.

Qualitative researchers often consider themselves “instruments” in research because all observations, interpretations and analyses are filtered through their own personal lens.

For this reason, when writing up your methodology for qualitative research, it’s important to reflect on your approach and to thoroughly explain the choices you made in collecting and analyzing the data.

Qualitative data can take the form of texts, photos, videos and audio. For example, you might be working with interview transcripts, survey responses, fieldnotes, or recordings from natural settings.

Most types of qualitative data analysis share the same five steps:

  • Prepare and organize your data. This may mean transcribing interviews or typing up fieldnotes.
  • Review and explore your data. Examine the data for patterns or repeated ideas that emerge.
  • Develop a data coding system. Based on your initial ideas, establish a set of codes that you can apply to categorize your data.
  • Assign codes to the data. For example, in qualitative survey analysis, this may mean going through each participant’s responses and tagging them with codes in a spreadsheet. As you go through your data, you can create new codes to add to your system if necessary.
  • Identify recurring themes. Link codes together into cohesive, overarching themes.

There are several specific approaches to analyzing qualitative data. Although these methods share similar processes, they emphasize different concepts.

Qualitative research often tries to preserve the voice and perspective of participants and can be adjusted as new research questions arise. Qualitative research is good for:

  • Flexibility

The data collection and analysis process can be adapted as new ideas or patterns emerge. They are not rigidly decided beforehand.

  • Natural settings

Data collection occurs in real-world contexts or in naturalistic ways.

  • Meaningful insights

Detailed descriptions of people’s experiences, feelings and perceptions can be used in designing, testing or improving systems or products.

  • Generation of new ideas

Open-ended responses mean that researchers can uncover novel problems or opportunities that they wouldn’t have thought of otherwise.

Researchers must consider practical and theoretical limitations in analyzing and interpreting their data. Qualitative research suffers from:

  • Unreliability

The real-world setting often makes qualitative research unreliable because of uncontrolled factors that affect the data.

  • Subjectivity

Due to the researcher’s primary role in analyzing and interpreting data, qualitative research cannot be replicated . The researcher decides what is important and what is irrelevant in data analysis, so interpretations of the same data can vary greatly.

  • Limited generalizability

Small samples are often used to gather detailed data about specific contexts. Despite rigorous analysis procedures, it is difficult to draw generalizable conclusions because the data may be biased and unrepresentative of the wider population .

  • Labor-intensive

Although software can be used to manage and record large amounts of text, data analysis often has to be checked or performed manually.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

There are five common approaches to qualitative research :

  • Grounded theory involves collecting data in order to develop new theories.
  • Ethnography involves immersing yourself in a group or organization to understand its culture.
  • Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
  • Phenomenological research involves investigating phenomena through people’s lived experiences.
  • Action research links theory and practice in several cycles to drive innovative changes.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is Qualitative Research? | Methods & Examples. Scribbr. Retrieved December 19, 2023, from https://www.scribbr.com/methodology/qualitative-research/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, qualitative vs. quantitative research | differences, examples & methods, how to do thematic analysis | step-by-step guide & examples, what is your plagiarism score.

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Foundations
  • Measurement
  • Research Design
  • Conclusion Validity
  • Data Preparation
  • Descriptive Statistics
  • Inferential Statistics
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of responses and surveys.

Completely free for academics and students .

By the time you get to the analysis of your data, most of the really difficult work has been done. It’s much more difficult to: define the research problem; develop and implement a sampling plan; conceptualize, operationalize and test your measures; and develop a design structure. If you have done this work well, the analysis of the data is usually a fairly straightforward affair.

In most social research the data analysis involves three major steps, done in roughly this order:

  • Cleaning and organizing the data for analysis ( Data Preparation )
  • Describing the data ( Descriptive Statistics )
  • Testing Hypotheses and Models ( Inferential Statistics )

Data Preparation involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; transforming the data; and developing and documenting a database structure that integrates the various measures.

Descriptive Statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. With descriptive statistics you are simply describing what is, what the data shows.

Inferential Statistics investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population thinks. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data.

In most research studies, the analysis section follows these three phases of analysis. Descriptions of how the data were prepared tend to be brief and to focus on only the more unique aspects to your study, such as specific data transformations that are performed. The descriptive statistics that you actually look at can be voluminous. In most write-ups, these are carefully selected and organized into summary tables and graphs that only show the most relevant or important information. Usually, the researcher links each of the inferential analyses to specific research questions or hypotheses that were raised in the introduction, or notes any models that were tested that emerged as part of the analysis. In most analysis write-ups it’s especially critical to not “miss the forest for the trees.” If you present too much detail, the reader may not be able to follow the central line of the results. Often extensive analysis details are appropriately relegated to appendices, reserving only the most critical analysis summaries for the body of the report itself.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

  • Open access
  • Published: 07 September 2020

A tutorial on methodological studies: the what, when, how and why

  • Lawrence Mbuagbaw   ORCID: orcid.org/0000-0001-5855-5461 1 , 2 , 3 ,
  • Daeria O. Lawson 1 ,
  • Livia Puljak 4 ,
  • David B. Allison 5 &
  • Lehana Thabane 1 , 2 , 6 , 7 , 8  

BMC Medical Research Methodology volume  20 , Article number:  226 ( 2020 ) Cite this article

34k Accesses

48 Citations

58 Altmetric

Metrics details

Methodological studies – studies that evaluate the design, analysis or reporting of other research-related reports – play an important role in health research. They help to highlight issues in the conduct of research with the aim of improving health research methodology, and ultimately reducing research waste.

We provide an overview of some of the key aspects of methodological studies such as what they are, and when, how and why they are done. We adopt a “frequently asked questions” format to facilitate reading this paper and provide multiple examples to help guide researchers interested in conducting methodological studies. Some of the topics addressed include: is it necessary to publish a study protocol? How to select relevant research reports and databases for a methodological study? What approaches to data extraction and statistical analysis should be considered when conducting a methodological study? What are potential threats to validity and is there a way to appraise the quality of methodological studies?

Appropriate reflection and application of basic principles of epidemiology and biostatistics are required in the design and analysis of methodological studies. This paper provides an introduction for further discussion about the conduct of methodological studies.

Peer Review reports

The field of meta-research (or research-on-research) has proliferated in recent years in response to issues with research quality and conduct [ 1 , 2 , 3 ]. As the name suggests, this field targets issues with research design, conduct, analysis and reporting. Various types of research reports are often examined as the unit of analysis in these studies (e.g. abstracts, full manuscripts, trial registry entries). Like many other novel fields of research, meta-research has seen a proliferation of use before the development of reporting guidance. For example, this was the case with randomized trials for which risk of bias tools and reporting guidelines were only developed much later – after many trials had been published and noted to have limitations [ 4 , 5 ]; and for systematic reviews as well [ 6 , 7 , 8 ]. However, in the absence of formal guidance, studies that report on research differ substantially in how they are named, conducted and reported [ 9 , 10 ]. This creates challenges in identifying, summarizing and comparing them. In this tutorial paper, we will use the term methodological study to refer to any study that reports on the design, conduct, analysis or reporting of primary or secondary research-related reports (such as trial registry entries and conference abstracts).

In the past 10 years, there has been an increase in the use of terms related to methodological studies (based on records retrieved with a keyword search [in the title and abstract] for “methodological review” and “meta-epidemiological study” in PubMed up to December 2019), suggesting that these studies may be appearing more frequently in the literature. See Fig.  1 .

figure 1

Trends in the number studies that mention “methodological review” or “meta-

epidemiological study” in PubMed.

The methods used in many methodological studies have been borrowed from systematic and scoping reviews. This practice has influenced the direction of the field, with many methodological studies including searches of electronic databases, screening of records, duplicate data extraction and assessments of risk of bias in the included studies. However, the research questions posed in methodological studies do not always require the approaches listed above, and guidance is needed on when and how to apply these methods to a methodological study. Even though methodological studies can be conducted on qualitative or mixed methods research, this paper focuses on and draws examples exclusively from quantitative research.

The objectives of this paper are to provide some insights on how to conduct methodological studies so that there is greater consistency between the research questions posed, and the design, analysis and reporting of findings. We provide multiple examples to illustrate concepts and a proposed framework for categorizing methodological studies in quantitative research.

What is a methodological study?

Any study that describes or analyzes methods (design, conduct, analysis or reporting) in published (or unpublished) literature is a methodological study. Consequently, the scope of methodological studies is quite extensive and includes, but is not limited to, topics as diverse as: research question formulation [ 11 ]; adherence to reporting guidelines [ 12 , 13 , 14 ] and consistency in reporting [ 15 ]; approaches to study analysis [ 16 ]; investigating the credibility of analyses [ 17 ]; and studies that synthesize these methodological studies [ 18 ]. While the nomenclature of methodological studies is not uniform, the intents and purposes of these studies remain fairly consistent – to describe or analyze methods in primary or secondary studies. As such, methodological studies may also be classified as a subtype of observational studies.

Parallel to this are experimental studies that compare different methods. Even though they play an important role in informing optimal research methods, experimental methodological studies are beyond the scope of this paper. Examples of such studies include the randomized trials by Buscemi et al., comparing single data extraction to double data extraction [ 19 ], and Carrasco-Labra et al., comparing approaches to presenting findings in Grading of Recommendations, Assessment, Development and Evaluations (GRADE) summary of findings tables [ 20 ]. In these studies, the unit of analysis is the person or groups of individuals applying the methods. We also direct readers to the Studies Within a Trial (SWAT) and Studies Within a Review (SWAR) programme operated through the Hub for Trials Methodology Research, for further reading as a potential useful resource for these types of experimental studies [ 21 ]. Lastly, this paper is not meant to inform the conduct of research using computational simulation and mathematical modeling for which some guidance already exists [ 22 ], or studies on the development of methods using consensus-based approaches.

When should we conduct a methodological study?

Methodological studies occupy a unique niche in health research that allows them to inform methodological advances. Methodological studies should also be conducted as pre-cursors to reporting guideline development, as they provide an opportunity to understand current practices, and help to identify the need for guidance and gaps in methodological or reporting quality. For example, the development of the popular Preferred Reporting Items of Systematic reviews and Meta-Analyses (PRISMA) guidelines were preceded by methodological studies identifying poor reporting practices [ 23 , 24 ]. In these instances, after the reporting guidelines are published, methodological studies can also be used to monitor uptake of the guidelines.

These studies can also be conducted to inform the state of the art for design, analysis and reporting practices across different types of health research fields, with the aim of improving research practices, and preventing or reducing research waste. For example, Samaan et al. conducted a scoping review of adherence to different reporting guidelines in health care literature [ 18 ]. Methodological studies can also be used to determine the factors associated with reporting practices. For example, Abbade et al. investigated journal characteristics associated with the use of the Participants, Intervention, Comparison, Outcome, Timeframe (PICOT) format in framing research questions in trials of venous ulcer disease [ 11 ].

How often are methodological studies conducted?

There is no clear answer to this question. Based on a search of PubMed, the use of related terms (“methodological review” and “meta-epidemiological study”) – and therefore, the number of methodological studies – is on the rise. However, many other terms are used to describe methodological studies. There are also many studies that explore design, conduct, analysis or reporting of research reports, but that do not use any specific terms to describe or label their study design in terms of “methodology”. This diversity in nomenclature makes a census of methodological studies elusive. Appropriate terminology and key words for methodological studies are needed to facilitate improved accessibility for end-users.

Why do we conduct methodological studies?

Methodological studies provide information on the design, conduct, analysis or reporting of primary and secondary research and can be used to appraise quality, quantity, completeness, accuracy and consistency of health research. These issues can be explored in specific fields, journals, databases, geographical regions and time periods. For example, Areia et al. explored the quality of reporting of endoscopic diagnostic studies in gastroenterology [ 25 ]; Knol et al. investigated the reporting of p -values in baseline tables in randomized trial published in high impact journals [ 26 ]; Chen et al. describe adherence to the Consolidated Standards of Reporting Trials (CONSORT) statement in Chinese Journals [ 27 ]; and Hopewell et al. describe the effect of editors’ implementation of CONSORT guidelines on reporting of abstracts over time [ 28 ]. Methodological studies provide useful information to researchers, clinicians, editors, publishers and users of health literature. As a result, these studies have been at the cornerstone of important methodological developments in the past two decades and have informed the development of many health research guidelines including the highly cited CONSORT statement [ 5 ].

Where can we find methodological studies?

Methodological studies can be found in most common biomedical bibliographic databases (e.g. Embase, MEDLINE, PubMed, Web of Science). However, the biggest caveat is that methodological studies are hard to identify in the literature due to the wide variety of names used and the lack of comprehensive databases dedicated to them. A handful can be found in the Cochrane Library as “Cochrane Methodology Reviews”, but these studies only cover methodological issues related to systematic reviews. Previous attempts to catalogue all empirical studies of methods used in reviews were abandoned 10 years ago [ 29 ]. In other databases, a variety of search terms may be applied with different levels of sensitivity and specificity.

Some frequently asked questions about methodological studies

In this section, we have outlined responses to questions that might help inform the conduct of methodological studies.

Q: How should I select research reports for my methodological study?

A: Selection of research reports for a methodological study depends on the research question and eligibility criteria. Once a clear research question is set and the nature of literature one desires to review is known, one can then begin the selection process. Selection may begin with a broad search, especially if the eligibility criteria are not apparent. For example, a methodological study of Cochrane Reviews of HIV would not require a complex search as all eligible studies can easily be retrieved from the Cochrane Library after checking a few boxes [ 30 ]. On the other hand, a methodological study of subgroup analyses in trials of gastrointestinal oncology would require a search to find such trials, and further screening to identify trials that conducted a subgroup analysis [ 31 ].

The strategies used for identifying participants in observational studies can apply here. One may use a systematic search to identify all eligible studies. If the number of eligible studies is unmanageable, a random sample of articles can be expected to provide comparable results if it is sufficiently large [ 32 ]. For example, Wilson et al. used a random sample of trials from the Cochrane Stroke Group’s Trial Register to investigate completeness of reporting [ 33 ]. It is possible that a simple random sample would lead to underrepresentation of units (i.e. research reports) that are smaller in number. This is relevant if the investigators wish to compare multiple groups but have too few units in one group. In this case a stratified sample would help to create equal groups. For example, in a methodological study comparing Cochrane and non-Cochrane reviews, Kahale et al. drew random samples from both groups [ 34 ]. Alternatively, systematic or purposeful sampling strategies can be used and we encourage researchers to justify their selected approaches based on the study objective.

Q: How many databases should I search?

A: The number of databases one should search would depend on the approach to sampling, which can include targeting the entire “population” of interest or a sample of that population. If you are interested in including the entire target population for your research question, or drawing a random or systematic sample from it, then a comprehensive and exhaustive search for relevant articles is required. In this case, we recommend using systematic approaches for searching electronic databases (i.e. at least 2 databases with a replicable and time stamped search strategy). The results of your search will constitute a sampling frame from which eligible studies can be drawn.

Alternatively, if your approach to sampling is purposeful, then we recommend targeting the database(s) or data sources (e.g. journals, registries) that include the information you need. For example, if you are conducting a methodological study of high impact journals in plastic surgery and they are all indexed in PubMed, you likely do not need to search any other databases. You may also have a comprehensive list of all journals of interest and can approach your search using the journal names in your database search (or by accessing the journal archives directly from the journal’s website). Even though one could also search journals’ web pages directly, using a database such as PubMed has multiple advantages, such as the use of filters, so the search can be narrowed down to a certain period, or study types of interest. Furthermore, individual journals’ web sites may have different search functionalities, which do not necessarily yield a consistent output.

Q: Should I publish a protocol for my methodological study?

A: A protocol is a description of intended research methods. Currently, only protocols for clinical trials require registration [ 35 ]. Protocols for systematic reviews are encouraged but no formal recommendation exists. The scientific community welcomes the publication of protocols because they help protect against selective outcome reporting, the use of post hoc methodologies to embellish results, and to help avoid duplication of efforts [ 36 ]. While the latter two risks exist in methodological research, the negative consequences may be substantially less than for clinical outcomes. In a sample of 31 methodological studies, 7 (22.6%) referenced a published protocol [ 9 ]. In the Cochrane Library, there are 15 protocols for methodological reviews (21 July 2020). This suggests that publishing protocols for methodological studies is not uncommon.

Authors can consider publishing their study protocol in a scholarly journal as a manuscript. Advantages of such publication include obtaining peer-review feedback about the planned study, and easy retrieval by searching databases such as PubMed. The disadvantages in trying to publish protocols includes delays associated with manuscript handling and peer review, as well as costs, as few journals publish study protocols, and those journals mostly charge article-processing fees [ 37 ]. Authors who would like to make their protocol publicly available without publishing it in scholarly journals, could deposit their study protocols in publicly available repositories, such as the Open Science Framework ( https://osf.io/ ).

Q: How to appraise the quality of a methodological study?

A: To date, there is no published tool for appraising the risk of bias in a methodological study, but in principle, a methodological study could be considered as a type of observational study. Therefore, during conduct or appraisal, care should be taken to avoid the biases common in observational studies [ 38 ]. These biases include selection bias, comparability of groups, and ascertainment of exposure or outcome. In other words, to generate a representative sample, a comprehensive reproducible search may be necessary to build a sampling frame. Additionally, random sampling may be necessary to ensure that all the included research reports have the same probability of being selected, and the screening and selection processes should be transparent and reproducible. To ensure that the groups compared are similar in all characteristics, matching, random sampling or stratified sampling can be used. Statistical adjustments for between-group differences can also be applied at the analysis stage. Finally, duplicate data extraction can reduce errors in assessment of exposures or outcomes.

Q: Should I justify a sample size?

A: In all instances where one is not using the target population (i.e. the group to which inferences from the research report are directed) [ 39 ], a sample size justification is good practice. The sample size justification may take the form of a description of what is expected to be achieved with the number of articles selected, or a formal sample size estimation that outlines the number of articles required to answer the research question with a certain precision and power. Sample size justifications in methodological studies are reasonable in the following instances:

Comparing two groups

Determining a proportion, mean or another quantifier

Determining factors associated with an outcome using regression-based analyses

For example, El Dib et al. computed a sample size requirement for a methodological study of diagnostic strategies in randomized trials, based on a confidence interval approach [ 40 ].

Q: What should I call my study?

A: Other terms which have been used to describe/label methodological studies include “ methodological review ”, “methodological survey” , “meta-epidemiological study” , “systematic review” , “systematic survey”, “meta-research”, “research-on-research” and many others. We recommend that the study nomenclature be clear, unambiguous, informative and allow for appropriate indexing. Methodological study nomenclature that should be avoided includes “ systematic review” – as this will likely be confused with a systematic review of a clinical question. “ Systematic survey” may also lead to confusion about whether the survey was systematic (i.e. using a preplanned methodology) or a survey using “ systematic” sampling (i.e. a sampling approach using specific intervals to determine who is selected) [ 32 ]. Any of the above meanings of the words “ systematic” may be true for methodological studies and could be potentially misleading. “ Meta-epidemiological study” is ideal for indexing, but not very informative as it describes an entire field. The term “ review ” may point towards an appraisal or “review” of the design, conduct, analysis or reporting (or methodological components) of the targeted research reports, yet it has also been used to describe narrative reviews [ 41 , 42 ]. The term “ survey ” is also in line with the approaches used in many methodological studies [ 9 ], and would be indicative of the sampling procedures of this study design. However, in the absence of guidelines on nomenclature, the term “ methodological study ” is broad enough to capture most of the scenarios of such studies.

Q: Should I account for clustering in my methodological study?

A: Data from methodological studies are often clustered. For example, articles coming from a specific source may have different reporting standards (e.g. the Cochrane Library). Articles within the same journal may be similar due to editorial practices and policies, reporting requirements and endorsement of guidelines. There is emerging evidence that these are real concerns that should be accounted for in analyses [ 43 ]. Some cluster variables are described in the section: “ What variables are relevant to methodological studies?”

A variety of modelling approaches can be used to account for correlated data, including the use of marginal, fixed or mixed effects regression models with appropriate computation of standard errors [ 44 ]. For example, Kosa et al. used generalized estimation equations to account for correlation of articles within journals [ 15 ]. Not accounting for clustering could lead to incorrect p -values, unduly narrow confidence intervals, and biased estimates [ 45 ].

Q: Should I extract data in duplicate?

A: Yes. Duplicate data extraction takes more time but results in less errors [ 19 ]. Data extraction errors in turn affect the effect estimate [ 46 ], and therefore should be mitigated. Duplicate data extraction should be considered in the absence of other approaches to minimize extraction errors. However, much like systematic reviews, this area will likely see rapid new advances with machine learning and natural language processing technologies to support researchers with screening and data extraction [ 47 , 48 ]. However, experience plays an important role in the quality of extracted data and inexperienced extractors should be paired with experienced extractors [ 46 , 49 ].

Q: Should I assess the risk of bias of research reports included in my methodological study?

A : Risk of bias is most useful in determining the certainty that can be placed in the effect measure from a study. In methodological studies, risk of bias may not serve the purpose of determining the trustworthiness of results, as effect measures are often not the primary goal of methodological studies. Determining risk of bias in methodological studies is likely a practice borrowed from systematic review methodology, but whose intrinsic value is not obvious in methodological studies. When it is part of the research question, investigators often focus on one aspect of risk of bias. For example, Speich investigated how blinding was reported in surgical trials [ 50 ], and Abraha et al., investigated the application of intention-to-treat analyses in systematic reviews and trials [ 51 ].

Q: What variables are relevant to methodological studies?

A: There is empirical evidence that certain variables may inform the findings in a methodological study. We outline some of these and provide a brief overview below:

Country: Countries and regions differ in their research cultures, and the resources available to conduct research. Therefore, it is reasonable to believe that there may be differences in methodological features across countries. Methodological studies have reported loco-regional differences in reporting quality [ 52 , 53 ]. This may also be related to challenges non-English speakers face in publishing papers in English.

Authors’ expertise: The inclusion of authors with expertise in research methodology, biostatistics, and scientific writing is likely to influence the end-product. Oltean et al. found that among randomized trials in orthopaedic surgery, the use of analyses that accounted for clustering was more likely when specialists (e.g. statistician, epidemiologist or clinical trials methodologist) were included on the study team [ 54 ]. Fleming et al. found that including methodologists in the review team was associated with appropriate use of reporting guidelines [ 55 ].

Source of funding and conflicts of interest: Some studies have found that funded studies report better [ 56 , 57 ], while others do not [ 53 , 58 ]. The presence of funding would indicate the availability of resources deployed to ensure optimal design, conduct, analysis and reporting. However, the source of funding may introduce conflicts of interest and warrant assessment. For example, Kaiser et al. investigated the effect of industry funding on obesity or nutrition randomized trials and found that reporting quality was similar [ 59 ]. Thomas et al. looked at reporting quality of long-term weight loss trials and found that industry funded studies were better [ 60 ]. Kan et al. examined the association between industry funding and “positive trials” (trials reporting a significant intervention effect) and found that industry funding was highly predictive of a positive trial [ 61 ]. This finding is similar to that of a recent Cochrane Methodology Review by Hansen et al. [ 62 ]

Journal characteristics: Certain journals’ characteristics may influence the study design, analysis or reporting. Characteristics such as journal endorsement of guidelines [ 63 , 64 ], and Journal Impact Factor (JIF) have been shown to be associated with reporting [ 63 , 65 , 66 , 67 ].

Study size (sample size/number of sites): Some studies have shown that reporting is better in larger studies [ 53 , 56 , 58 ].

Year of publication: It is reasonable to assume that design, conduct, analysis and reporting of research will change over time. Many studies have demonstrated improvements in reporting over time or after the publication of reporting guidelines [ 68 , 69 ].

Type of intervention: In a methodological study of reporting quality of weight loss intervention studies, Thabane et al. found that trials of pharmacologic interventions were reported better than trials of non-pharmacologic interventions [ 70 ].

Interactions between variables: Complex interactions between the previously listed variables are possible. High income countries with more resources may be more likely to conduct larger studies and incorporate a variety of experts. Authors in certain countries may prefer certain journals, and journal endorsement of guidelines and editorial policies may change over time.

Q: Should I focus only on high impact journals?

A: Investigators may choose to investigate only high impact journals because they are more likely to influence practice and policy, or because they assume that methodological standards would be higher. However, the JIF may severely limit the scope of articles included and may skew the sample towards articles with positive findings. The generalizability and applicability of findings from a handful of journals must be examined carefully, especially since the JIF varies over time. Even among journals that are all “high impact”, variations exist in methodological standards.

Q: Can I conduct a methodological study of qualitative research?

A: Yes. Even though a lot of methodological research has been conducted in the quantitative research field, methodological studies of qualitative studies are feasible. Certain databases that catalogue qualitative research including the Cumulative Index to Nursing & Allied Health Literature (CINAHL) have defined subject headings that are specific to methodological research (e.g. “research methodology”). Alternatively, one could also conduct a qualitative methodological review; that is, use qualitative approaches to synthesize methodological issues in qualitative studies.

Q: What reporting guidelines should I use for my methodological study?

A: There is no guideline that covers the entire scope of methodological studies. One adaptation of the PRISMA guidelines has been published, which works well for studies that aim to use the entire target population of research reports [ 71 ]. However, it is not widely used (40 citations in 2 years as of 09 December 2019), and methodological studies that are designed as cross-sectional or before-after studies require a more fit-for purpose guideline. A more encompassing reporting guideline for a broad range of methodological studies is currently under development [ 72 ]. However, in the absence of formal guidance, the requirements for scientific reporting should be respected, and authors of methodological studies should focus on transparency and reproducibility.

Q: What are the potential threats to validity and how can I avoid them?

A: Methodological studies may be compromised by a lack of internal or external validity. The main threats to internal validity in methodological studies are selection and confounding bias. Investigators must ensure that the methods used to select articles does not make them differ systematically from the set of articles to which they would like to make inferences. For example, attempting to make extrapolations to all journals after analyzing high-impact journals would be misleading.

Many factors (confounders) may distort the association between the exposure and outcome if the included research reports differ with respect to these factors [ 73 ]. For example, when examining the association between source of funding and completeness of reporting, it may be necessary to account for journals that endorse the guidelines. Confounding bias can be addressed by restriction, matching and statistical adjustment [ 73 ]. Restriction appears to be the method of choice for many investigators who choose to include only high impact journals or articles in a specific field. For example, Knol et al. examined the reporting of p -values in baseline tables of high impact journals [ 26 ]. Matching is also sometimes used. In the methodological study of non-randomized interventional studies of elective ventral hernia repair, Parker et al. matched prospective studies with retrospective studies and compared reporting standards [ 74 ]. Some other methodological studies use statistical adjustments. For example, Zhang et al. used regression techniques to determine the factors associated with missing participant data in trials [ 16 ].

With regard to external validity, researchers interested in conducting methodological studies must consider how generalizable or applicable their findings are. This should tie in closely with the research question and should be explicit. For example. Findings from methodological studies on trials published in high impact cardiology journals cannot be assumed to be applicable to trials in other fields. However, investigators must ensure that their sample truly represents the target sample either by a) conducting a comprehensive and exhaustive search, or b) using an appropriate and justified, randomly selected sample of research reports.

Even applicability to high impact journals may vary based on the investigators’ definition, and over time. For example, for high impact journals in the field of general medicine, Bouwmeester et al. included the Annals of Internal Medicine (AIM), BMJ, the Journal of the American Medical Association (JAMA), Lancet, the New England Journal of Medicine (NEJM), and PLoS Medicine ( n  = 6) [ 75 ]. In contrast, the high impact journals selected in the methodological study by Schiller et al. were BMJ, JAMA, Lancet, and NEJM ( n  = 4) [ 76 ]. Another methodological study by Kosa et al. included AIM, BMJ, JAMA, Lancet and NEJM ( n  = 5). In the methodological study by Thabut et al., journals with a JIF greater than 5 were considered to be high impact. Riado Minguez et al. used first quartile journals in the Journal Citation Reports (JCR) for a specific year to determine “high impact” [ 77 ]. Ultimately, the definition of high impact will be based on the number of journals the investigators are willing to include, the year of impact and the JIF cut-off [ 78 ]. We acknowledge that the term “generalizability” may apply differently for methodological studies, especially when in many instances it is possible to include the entire target population in the sample studied.

Finally, methodological studies are not exempt from information bias which may stem from discrepancies in the included research reports [ 79 ], errors in data extraction, or inappropriate interpretation of the information extracted. Likewise, publication bias may also be a concern in methodological studies, but such concepts have not yet been explored.

A proposed framework

In order to inform discussions about methodological studies, the development of guidance for what should be reported, we have outlined some key features of methodological studies that can be used to classify them. For each of the categories outlined below, we provide an example. In our experience, the choice of approach to completing a methodological study can be informed by asking the following four questions:

What is the aim?

Methodological studies that investigate bias

A methodological study may be focused on exploring sources of bias in primary or secondary studies (meta-bias), or how bias is analyzed. We have taken care to distinguish bias (i.e. systematic deviations from the truth irrespective of the source) from reporting quality or completeness (i.e. not adhering to a specific reporting guideline or norm). An example of where this distinction would be important is in the case of a randomized trial with no blinding. This study (depending on the nature of the intervention) would be at risk of performance bias. However, if the authors report that their study was not blinded, they would have reported adequately. In fact, some methodological studies attempt to capture both “quality of conduct” and “quality of reporting”, such as Richie et al., who reported on the risk of bias in randomized trials of pharmacy practice interventions [ 80 ]. Babic et al. investigated how risk of bias was used to inform sensitivity analyses in Cochrane reviews [ 81 ]. Further, biases related to choice of outcomes can also be explored. For example, Tan et al investigated differences in treatment effect size based on the outcome reported [ 82 ].

Methodological studies that investigate quality (or completeness) of reporting

Methodological studies may report quality of reporting against a reporting checklist (i.e. adherence to guidelines) or against expected norms. For example, Croituro et al. report on the quality of reporting in systematic reviews published in dermatology journals based on their adherence to the PRISMA statement [ 83 ], and Khan et al. described the quality of reporting of harms in randomized controlled trials published in high impact cardiovascular journals based on the CONSORT extension for harms [ 84 ]. Other methodological studies investigate reporting of certain features of interest that may not be part of formally published checklists or guidelines. For example, Mbuagbaw et al. described how often the implications for research are elaborated using the Evidence, Participants, Intervention, Comparison, Outcome, Timeframe (EPICOT) format [ 30 ].

Methodological studies that investigate the consistency of reporting

Sometimes investigators may be interested in how consistent reports of the same research are, as it is expected that there should be consistency between: conference abstracts and published manuscripts; manuscript abstracts and manuscript main text; and trial registration and published manuscript. For example, Rosmarakis et al. investigated consistency between conference abstracts and full text manuscripts [ 85 ].

Methodological studies that investigate factors associated with reporting

In addition to identifying issues with reporting in primary and secondary studies, authors of methodological studies may be interested in determining the factors that are associated with certain reporting practices. Many methodological studies incorporate this, albeit as a secondary outcome. For example, Farrokhyar et al. investigated the factors associated with reporting quality in randomized trials of coronary artery bypass grafting surgery [ 53 ].

Methodological studies that investigate methods

Methodological studies may also be used to describe methods or compare methods, and the factors associated with methods. Muller et al. described the methods used for systematic reviews and meta-analyses of observational studies [ 86 ].

Methodological studies that summarize other methodological studies

Some methodological studies synthesize results from other methodological studies. For example, Li et al. conducted a scoping review of methodological reviews that investigated consistency between full text and abstracts in primary biomedical research [ 87 ].

Methodological studies that investigate nomenclature and terminology

Some methodological studies may investigate the use of names and terms in health research. For example, Martinic et al. investigated the definitions of systematic reviews used in overviews of systematic reviews (OSRs), meta-epidemiological studies and epidemiology textbooks [ 88 ].

Other types of methodological studies

In addition to the previously mentioned experimental methodological studies, there may exist other types of methodological studies not captured here.

What is the design?

Methodological studies that are descriptive

Most methodological studies are purely descriptive and report their findings as counts (percent) and means (standard deviation) or medians (interquartile range). For example, Mbuagbaw et al. described the reporting of research recommendations in Cochrane HIV systematic reviews [ 30 ]. Gohari et al. described the quality of reporting of randomized trials in diabetes in Iran [ 12 ].

Methodological studies that are analytical

Some methodological studies are analytical wherein “analytical studies identify and quantify associations, test hypotheses, identify causes and determine whether an association exists between variables, such as between an exposure and a disease.” [ 89 ] In the case of methodological studies all these investigations are possible. For example, Kosa et al. investigated the association between agreement in primary outcome from trial registry to published manuscript and study covariates. They found that larger and more recent studies were more likely to have agreement [ 15 ]. Tricco et al. compared the conclusion statements from Cochrane and non-Cochrane systematic reviews with a meta-analysis of the primary outcome and found that non-Cochrane reviews were more likely to report positive findings. These results are a test of the null hypothesis that the proportions of Cochrane and non-Cochrane reviews that report positive results are equal [ 90 ].

What is the sampling strategy?

Methodological studies that include the target population

Methodological reviews with narrow research questions may be able to include the entire target population. For example, in the methodological study of Cochrane HIV systematic reviews, Mbuagbaw et al. included all of the available studies ( n  = 103) [ 30 ].

Methodological studies that include a sample of the target population

Many methodological studies use random samples of the target population [ 33 , 91 , 92 ]. Alternatively, purposeful sampling may be used, limiting the sample to a subset of research-related reports published within a certain time period, or in journals with a certain ranking or on a topic. Systematic sampling can also be used when random sampling may be challenging to implement.

What is the unit of analysis?

Methodological studies with a research report as the unit of analysis

Many methodological studies use a research report (e.g. full manuscript of study, abstract portion of the study) as the unit of analysis, and inferences can be made at the study-level. However, both published and unpublished research-related reports can be studied. These may include articles, conference abstracts, registry entries etc.

Methodological studies with a design, analysis or reporting item as the unit of analysis

Some methodological studies report on items which may occur more than once per article. For example, Paquette et al. report on subgroup analyses in Cochrane reviews of atrial fibrillation in which 17 systematic reviews planned 56 subgroup analyses [ 93 ].

This framework is outlined in Fig.  2 .

figure 2

A proposed framework for methodological studies


Methodological studies have examined different aspects of reporting such as quality, completeness, consistency and adherence to reporting guidelines. As such, many of the methodological study examples cited in this tutorial are related to reporting. However, as an evolving field, the scope of research questions that can be addressed by methodological studies is expected to increase.

In this paper we have outlined the scope and purpose of methodological studies, along with examples of instances in which various approaches have been used. In the absence of formal guidance on the design, conduct, analysis and reporting of methodological studies, we have provided some advice to help make methodological studies consistent. This advice is grounded in good contemporary scientific practice. Generally, the research question should tie in with the sampling approach and planned analysis. We have also highlighted the variables that may inform findings from methodological studies. Lastly, we have provided suggestions for ways in which authors can categorize their methodological studies to inform their design and analysis.

Availability of data and materials

Data sharing is not applicable to this article as no new data were created or analyzed in this study.


Consolidated Standards of Reporting Trials

Evidence, Participants, Intervention, Comparison, Outcome, Timeframe

Grading of Recommendations, Assessment, Development and Evaluations

Participants, Intervention, Comparison, Outcome, Timeframe

Preferred Reporting Items of Systematic reviews and Meta-Analyses

Studies Within a Review

Studies Within a Trial

Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. Lancet. 2009;374(9683):86–9.

PubMed   Google Scholar  

Chan AW, Song F, Vickers A, Jefferson T, Dickersin K, Gotzsche PC, Krumholz HM, Ghersi D, van der Worp HB. Increasing value and reducing waste: addressing inaccessible research. Lancet. 2014;383(9913):257–66.

PubMed   PubMed Central   Google Scholar  

Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis. Lancet. 2014;383(9912):166–75.

Higgins JP, Altman DG, Gotzsche PC, Juni P, Moher D, Oxman AD, Savovic J, Schulz KF, Weeks L, Sterne JA. The Cochrane Collaboration's tool for assessing risk of bias in randomised trials. BMJ. 2011;343:d5928.

Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet. 2001;357.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med. 2009;6(7):e1000100.

Shea BJ, Hamel C, Wells GA, Bouter LM, Kristjansson E, Grimshaw J, Henry DA, Boers M. AMSTAR is a reliable and valid measurement tool to assess the methodological quality of systematic reviews. J Clin Epidemiol. 2009;62(10):1013–20.

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, Moher D, Tugwell P, Welch V, Kristjansson E, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. Bmj. 2017;358:j4008.

Lawson DO, Leenus A, Mbuagbaw L. Mapping the nomenclature, methodology, and reporting of studies that review methods: a pilot methodological review. Pilot Feasibility Studies. 2020;6(1):13.

Puljak L, Makaric ZL, Buljan I, Pieper D. What is a meta-epidemiological study? Analysis of published literature indicated heterogeneous study designs and definitions. J Comp Eff Res. 2020.

Abbade LPF, Wang M, Sriganesh K, Jin Y, Mbuagbaw L, Thabane L. The framing of research questions using the PICOT format in randomized controlled trials of venous ulcer disease is suboptimal: a systematic survey. Wound Repair Regen. 2017;25(5):892–900.

Gohari F, Baradaran HR, Tabatabaee M, Anijidani S, Mohammadpour Touserkani F, Atlasi R, Razmgir M. Quality of reporting randomized controlled trials (RCTs) in diabetes in Iran; a systematic review. J Diabetes Metab Disord. 2015;15(1):36.

Wang M, Jin Y, Hu ZJ, Thabane A, Dennis B, Gajic-Veljanoski O, Paul J, Thabane L. The reporting quality of abstracts of stepped wedge randomized trials is suboptimal: a systematic survey of the literature. Contemp Clin Trials Commun. 2017;8:1–10.

Shanthanna H, Kaushal A, Mbuagbaw L, Couban R, Busse J, Thabane L: A cross-sectional study of the reporting quality of pilot or feasibility trials in high-impact anesthesia journals Can J Anaesthesia 2018, 65(11):1180–1195.

Kosa SD, Mbuagbaw L, Borg Debono V, Bhandari M, Dennis BB, Ene G, Leenus A, Shi D, Thabane M, Valvasori S, et al. Agreement in reporting between trial publications and current clinical trial registry in high impact journals: a methodological review. Contemporary Clinical Trials. 2018;65:144–50.

Zhang Y, Florez ID, Colunga Lozano LE, Aloweni FAB, Kennedy SA, Li A, Craigie S, Zhang S, Agarwal A, Lopes LC, et al. A systematic survey on reporting and methods for handling missing participant data for continuous outcomes in randomized controlled trials. J Clin Epidemiol. 2017;88:57–66.

CAS   PubMed   Google Scholar  

Hernández AV, Boersma E, Murray GD, Habbema JD, Steyerberg EW. Subgroup analyses in therapeutic cardiovascular clinical trials: are most of them misleading? Am Heart J. 2006;151(2):257–64.

Samaan Z, Mbuagbaw L, Kosa D, Borg Debono V, Dillenburg R, Zhang S, Fruci V, Dennis B, Bawor M, Thabane L. A systematic scoping review of adherence to reporting guidelines in health care literature. J Multidiscip Healthc. 2013;6:169–88.

Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. J Clin Epidemiol. 2006;59(7):697–703.

Carrasco-Labra A, Brignardello-Petersen R, Santesso N, Neumann I, Mustafa RA, Mbuagbaw L, Etxeandia Ikobaltzeta I, De Stio C, McCullagh LJ, Alonso-Coello P. Improving GRADE evidence tables part 1: a randomized trial shows improved understanding of content in summary-of-findings tables with a new format. J Clin Epidemiol. 2016;74:7–18.

The Northern Ireland Hub for Trials Methodology Research: SWAT/SWAR Information [ https://www.qub.ac.uk/sites/TheNorthernIrelandNetworkforTrialsMethodologyResearch/SWATSWARInformation/ ]. Accessed 31 Aug 2020.

Chick S, Sánchez P, Ferrin D, Morrice D. How to conduct a successful simulation study. In: Proceedings of the 2003 winter simulation conference: 2003; 2003. p. 66–70.

Google Scholar  

Mulrow CD. The medical review article: state of the science. Ann Intern Med. 1987;106(3):485–8.

Sacks HS, Reitman D, Pagano D, Kupelnick B. Meta-analysis: an update. Mount Sinai J Med New York. 1996;63(3–4):216–24.

CAS   Google Scholar  

Areia M, Soares M, Dinis-Ribeiro M. Quality reporting of endoscopic diagnostic studies in gastrointestinal journals: where do we stand on the use of the STARD and CONSORT statements? Endoscopy. 2010;42(2):138–47.

Knol M, Groenwold R, Grobbee D. P-values in baseline tables of randomised controlled trials are inappropriate but still common in high impact journals. Eur J Prev Cardiol. 2012;19(2):231–2.

Chen M, Cui J, Zhang AL, Sze DM, Xue CC, May BH. Adherence to CONSORT items in randomized controlled trials of integrative medicine for colorectal Cancer published in Chinese journals. J Altern Complement Med. 2018;24(2):115–24.

Hopewell S, Ravaud P, Baron G, Boutron I. Effect of editors' implementation of CONSORT guidelines on the reporting of abstracts in high impact medical journals: interrupted time series analysis. BMJ. 2012;344:e4178.

The Cochrane Methodology Register Issue 2 2009 [ https://cmr.cochrane.org/help.htm ]. Accessed 31 Aug 2020.

Mbuagbaw L, Kredo T, Welch V, Mursleen S, Ross S, Zani B, Motaze NV, Quinlan L. Critical EPICOT items were absent in Cochrane human immunodeficiency virus systematic reviews: a bibliometric analysis. J Clin Epidemiol. 2016;74:66–72.

Barton S, Peckitt C, Sclafani F, Cunningham D, Chau I. The influence of industry sponsorship on the reporting of subgroup analyses within phase III randomised controlled trials in gastrointestinal oncology. Eur J Cancer. 2015;51(18):2732–9.

Setia MS. Methodology series module 5: sampling strategies. Indian J Dermatol. 2016;61(5):505–9.

Wilson B, Burnett P, Moher D, Altman DG, Al-Shahi Salman R. Completeness of reporting of randomised controlled trials including people with transient ischaemic attack or stroke: a systematic review. Eur Stroke J. 2018;3(4):337–46.

Kahale LA, Diab B, Brignardello-Petersen R, Agarwal A, Mustafa RA, Kwong J, Neumann I, Li L, Lopes LC, Briel M, et al. Systematic reviews do not adequately report or address missing outcome data in their analyses: a methodological survey. J Clin Epidemiol. 2018;99:14–23.

De Angelis CD, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S, Laine C, Marusic A, Overbeke AJPM, et al. Is this clinical trial fully registered?: a statement from the International Committee of Medical Journal Editors*. Ann Intern Med. 2005;143(2):146–8.

Ohtake PJ, Childs JD. Why publish study protocols? Phys Ther. 2014;94(9):1208–9.

Rombey T, Allers K, Mathes T, Hoffmann F, Pieper D. A descriptive analysis of the characteristics and the peer review process of systematic review protocols published in an open peer review journal from 2012 to 2017. BMC Med Res Methodol. 2019;19(1):57.

Grimes DA, Schulz KF. Bias and causal associations in observational research. Lancet. 2002;359(9302):248–52.

Porta M (ed.): A dictionary of epidemiology, 5th edn. Oxford: Oxford University Press, Inc.; 2008.

El Dib R, Tikkinen KAO, Akl EA, Gomaa HA, Mustafa RA, Agarwal A, Carpenter CR, Zhang Y, Jorge EC, Almeida R, et al. Systematic survey of randomized trials evaluating the impact of alternative diagnostic strategies on patient-important outcomes. J Clin Epidemiol. 2017;84:61–9.

Helzer JE, Robins LN, Taibleson M, Woodruff RA Jr, Reich T, Wish ED. Reliability of psychiatric diagnosis. I. a methodological review. Arch Gen Psychiatry. 1977;34(2):129–33.

Chung ST, Chacko SK, Sunehag AL, Haymond MW. Measurements of gluconeogenesis and Glycogenolysis: a methodological review. Diabetes. 2015;64(12):3996–4010.

CAS   PubMed   PubMed Central   Google Scholar  

Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in 'meta-epidemiological' research. Stat Med. 2002;21(11):1513–24.

Moen EL, Fricano-Kugler CJ, Luikart BW, O’Malley AJ. Analyzing clustered data: why and how to account for multiple observations nested within a study participant? PLoS One. 2016;11(1):e0146721.

Zyzanski SJ, Flocke SA, Dickinson LM. On the nature and analysis of clustered data. Ann Fam Med. 2004;2(3):199–200.

Mathes T, Klassen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17(1):152.

Bui DDA, Del Fiol G, Hurdle JF, Jonnalagadda S. Extractive text summarization system to aid data extraction from full text in systematic review development. J Biomed Inform. 2016;64:265–72.

Bui DD, Del Fiol G, Jonnalagadda S. PDF text classification to leverage information extraction from publication reports. J Biomed Inform. 2016;61:141–8.

Maticic K, Krnic Martinic M, Puljak L. Assessment of reporting quality of abstracts of systematic reviews with meta-analysis using PRISMA-A and discordance in assessments between raters without prior experience. BMC Med Res Methodol. 2019;19(1):32.

Speich B. Blinding in surgical randomized clinical trials in 2015. Ann Surg. 2017;266(1):21–2.

Abraha I, Cozzolino F, Orso M, Marchesi M, Germani A, Lombardo G, Eusebi P, De Florio R, Luchetta ML, Iorio A, et al. A systematic review found that deviations from intention-to-treat are common in randomized trials and systematic reviews. J Clin Epidemiol. 2017;84:37–46.

Zhong Y, Zhou W, Jiang H, Fan T, Diao X, Yang H, Min J, Wang G, Fu J, Mao B. Quality of reporting of two-group parallel randomized controlled clinical trials of multi-herb formulae: A survey of reports indexed in the Science Citation Index Expanded. Eur J Integrative Med. 2011;3(4):e309–16.

Farrokhyar F, Chu R, Whitlock R, Thabane L. A systematic review of the quality of publications reporting coronary artery bypass grafting trials. Can J Surg. 2007;50(4):266–77.

Oltean H, Gagnier JJ. Use of clustering analysis in randomized controlled trials in orthopaedic surgery. BMC Med Res Methodol. 2015;15:17.

Fleming PS, Koletsi D, Pandis N. Blinded by PRISMA: are systematic reviewers focusing on PRISMA and ignoring other guidelines? PLoS One. 2014;9(5):e96407.

Balasubramanian SP, Wiener M, Alshameeri Z, Tiruvoipati R, Elbourne D, Reed MW. Standards of reporting of randomized controlled trials in general surgery: can we do better? Ann Surg. 2006;244(5):663–7.

de Vries TW, van Roon EN. Low quality of reporting adverse drug reactions in paediatric randomised controlled trials. Arch Dis Child. 2010;95(12):1023–6.

Borg Debono V, Zhang S, Ye C, Paul J, Arya A, Hurlburt L, Murthy Y, Thabane L. The quality of reporting of RCTs used within a postoperative pain management meta-analysis, using the CONSORT statement. BMC Anesthesiol. 2012;12:13.

Kaiser KA, Cofield SS, Fontaine KR, Glasser SP, Thabane L, Chu R, Ambrale S, Dwary AD, Kumar A, Nayyar G, et al. Is funding source related to study reporting quality in obesity or nutrition randomized control trials in top-tier medical journals? Int J Obes. 2012;36(7):977–81.

Thomas O, Thabane L, Douketis J, Chu R, Westfall AO, Allison DB. Industry funding and the reporting quality of large long-term weight loss trials. Int J Obes. 2008;32(10):1531–6.

Khan NR, Saad H, Oravec CS, Rossi N, Nguyen V, Venable GT, Lillard JC, Patel P, Taylor DR, Vaughn BN, et al. A review of industry funding in randomized controlled trials published in the neurosurgical literature-the elephant in the room. Neurosurgery. 2018;83(5):890–7.

Hansen C, Lundh A, Rasmussen K, Hrobjartsson A. Financial conflicts of interest in systematic reviews: associations with results, conclusions, and methodological quality. Cochrane Database Syst Rev. 2019;8:Mr000047.

Kiehna EN, Starke RM, Pouratian N, Dumont AS. Standards for reporting randomized controlled trials in neurosurgery. J Neurosurg. 2011;114(2):280–5.

Liu LQ, Morris PJ, Pengel LH. Compliance to the CONSORT statement of randomized controlled trials in solid organ transplantation: a 3-year overview. Transpl Int. 2013;26(3):300–6.

Bala MM, Akl EA, Sun X, Bassler D, Mertz D, Mejza F, Vandvik PO, Malaga G, Johnston BC, Dahm P, et al. Randomized trials published in higher vs. lower impact journals differ in design, conduct, and analysis. J Clin Epidemiol. 2013;66(3):286–95.

Lee SY, Teoh PJ, Camm CF, Agha RA. Compliance of randomized controlled trials in trauma surgery with the CONSORT statement. J Trauma Acute Care Surg. 2013;75(4):562–72.

Ziogas DC, Zintzaras E. Analysis of the quality of reporting of randomized controlled trials in acute and chronic myeloid leukemia, and myelodysplastic syndromes as governed by the CONSORT statement. Ann Epidemiol. 2009;19(7):494–500.

Alvarez F, Meyer N, Gourraud PA, Paul C. CONSORT adoption and quality of reporting of randomized controlled trials: a systematic analysis in two dermatology journals. Br J Dermatol. 2009;161(5):1159–65.

Mbuagbaw L, Thabane M, Vanniyasingam T, Borg Debono V, Kosa S, Zhang S, Ye C, Parpia S, Dennis BB, Thabane L. Improvement in the quality of abstracts in major clinical journals since CONSORT extension for abstracts: a systematic review. Contemporary Clin trials. 2014;38(2):245–50.

Thabane L, Chu R, Cuddy K, Douketis J. What is the quality of reporting in weight loss intervention studies? A systematic review of randomized controlled trials. Int J Obes. 2007;31(10):1554–9.

Murad MH, Wang Z. Guidelines for reporting meta-epidemiological methodology research. Evidence Based Med. 2017;22(4):139.

METRIC - MEthodological sTudy ReportIng Checklist: guidelines for reporting methodological studies in health research [ http://www.equator-network.org/library/reporting-guidelines-under-development/reporting-guidelines-under-development-for-other-study-designs/#METRIC ]. Accessed 31 Aug 2020.

Jager KJ, Zoccali C, MacLeod A, Dekker FW. Confounding: what it is and how to deal with it. Kidney Int. 2008;73(3):256–60.

Parker SG, Halligan S, Erotocritou M, Wood CPJ, Boulton RW, Plumb AAO, Windsor ACJ, Mallett S. A systematic methodological review of non-randomised interventional studies of elective ventral hernia repair: clear definitions and a standardised minimum dataset are needed. Hernia. 2019.

Bouwmeester W, Zuithoff NPA, Mallett S, Geerlings MI, Vergouwe Y, Steyerberg EW, Altman DG, Moons KGM. Reporting and methods in clinical prediction research: a systematic review. PLoS Med. 2012;9(5):1–12.

Schiller P, Burchardi N, Niestroj M, Kieser M. Quality of reporting of clinical non-inferiority and equivalence randomised trials--update and extension. Trials. 2012;13:214.

Riado Minguez D, Kowalski M, Vallve Odena M, Longin Pontzen D, Jelicic Kadic A, Jeric M, Dosenovic S, Jakus D, Vrdoljak M, Poklepovic Pericic T, et al. Methodological and reporting quality of systematic reviews published in the highest ranking journals in the field of pain. Anesth Analg. 2017;125(4):1348–54.

Thabut G, Estellat C, Boutron I, Samama CM, Ravaud P. Methodological issues in trials assessing primary prophylaxis of venous thrombo-embolism. Eur Heart J. 2005;27(2):227–36.

Puljak L, Riva N, Parmelli E, González-Lorenzo M, Moja L, Pieper D. Data extraction methods: an analysis of internal reporting discrepancies in single manuscripts and practical advice. J Clin Epidemiol. 2020;117:158–64.

Ritchie A, Seubert L, Clifford R, Perry D, Bond C. Do randomised controlled trials relevant to pharmacy meet best practice standards for quality conduct and reporting? A systematic review. Int J Pharm Pract. 2019.

Babic A, Vuka I, Saric F, Proloscic I, Slapnicar E, Cavar J, Pericic TP, Pieper D, Puljak L. Overall bias methods and their use in sensitivity analysis of Cochrane reviews were not consistent. J Clin Epidemiol. 2019.

Tan A, Porcher R, Crequit P, Ravaud P, Dechartres A. Differences in treatment effect size between overall survival and progression-free survival in immunotherapy trials: a Meta-epidemiologic study of trials with results posted at ClinicalTrials.gov. J Clin Oncol. 2017;35(15):1686–94.

Croitoru D, Huang Y, Kurdina A, Chan AW, Drucker AM. Quality of reporting in systematic reviews published in dermatology journals. Br J Dermatol. 2020;182(6):1469–76.

Khan MS, Ochani RK, Shaikh A, Vaduganathan M, Khan SU, Fatima K, Yamani N, Mandrola J, Doukky R, Krasuski RA: Assessing the Quality of Reporting of Harms in Randomized Controlled Trials Published in High Impact Cardiovascular Journals. Eur Heart J Qual Care Clin Outcomes 2019.

Rosmarakis ES, Soteriades ES, Vergidis PI, Kasiakou SK, Falagas ME. From conference abstract to full paper: differences between data presented in conferences and journals. FASEB J. 2005;19(7):673–80.

Mueller M, D’Addario M, Egger M, Cevallos M, Dekkers O, Mugglin C, Scott P. Methods to systematically review and meta-analyse observational studies: a systematic scoping review of recommendations. BMC Med Res Methodol. 2018;18(1):44.

Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, Wang M, Bhatt M, Zielinski L, Sanger N, et al. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC Med Res Methodol. 2017;17(1):181.

Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19(1):203.

Analytical study [ https://medical-dictionary.thefreedictionary.com/analytical+study ]. Accessed 31 Aug 2020.

Tricco AC, Tetzlaff J, Pham B, Brehaut J, Moher D. Non-Cochrane vs. Cochrane reviews were twice as likely to have positive conclusion statements: cross-sectional study. J Clin Epidemiol. 2009;62(4):380–6 e381.

Schalken N, Rietbergen C. The reporting quality of systematic reviews and Meta-analyses in industrial and organizational psychology: a systematic review. Front Psychol. 2017;8:1395.

Ranker LR, Petersen JM, Fox MP. Awareness of and potential for dependent error in the observational epidemiologic literature: A review. Ann Epidemiol. 2019;36:15–9 e12.

Paquette M, Alotaibi AM, Nieuwlaat R, Santesso N, Mbuagbaw L. A meta-epidemiological study of subgroup analyses in cochrane systematic reviews of atrial fibrillation. Syst Rev. 2019;8(1):241.

Download references


This work did not receive any dedicated funding.

Author information

Authors and affiliations.

Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, ON, Canada

Lawrence Mbuagbaw, Daeria O. Lawson & Lehana Thabane

Biostatistics Unit/FSORC, 50 Charlton Avenue East, St Joseph’s Healthcare—Hamilton, 3rd Floor Martha Wing, Room H321, Hamilton, Ontario, L8N 4A6, Canada

Lawrence Mbuagbaw & Lehana Thabane

Centre for the Development of Best Practices in Health, Yaoundé, Cameroon

Lawrence Mbuagbaw

Center for Evidence-Based Medicine and Health Care, Catholic University of Croatia, Ilica 242, 10000, Zagreb, Croatia

Livia Puljak

Department of Epidemiology and Biostatistics, School of Public Health – Bloomington, Indiana University, Bloomington, IN, 47405, USA

David B. Allison

Departments of Paediatrics and Anaesthesia, McMaster University, Hamilton, ON, Canada

Lehana Thabane

Centre for Evaluation of Medicine, St. Joseph’s Healthcare-Hamilton, Hamilton, ON, Canada

Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada

You can also search for this author in PubMed   Google Scholar


LM conceived the idea and drafted the outline and paper. DOL and LT commented on the idea and draft outline. LM, LP and DOL performed literature searches and data extraction. All authors (LM, DOL, LT, LP, DBA) reviewed several draft versions of the manuscript and approved the final manuscript.

Corresponding author

Correspondence to Lawrence Mbuagbaw .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

DOL, DBA, LM, LP and LT are involved in the development of a reporting guideline for methodological studies.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Cite this article.

Mbuagbaw, L., Lawson, D.O., Puljak, L. et al. A tutorial on methodological studies: the what, when, how and why. BMC Med Res Methodol 20 , 226 (2020). https://doi.org/10.1186/s12874-020-01107-7

Download citation

Received : 27 May 2020

Accepted : 27 August 2020

Published : 07 September 2020

DOI : https://doi.org/10.1186/s12874-020-01107-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Methodological study
  • Meta-epidemiology
  • Research methods
  • Research-on-research

BMC Medical Research Methodology

ISSN: 1471-2288

analysis in research methodology

Your Modern Business Guide To Data Analysis Methods And Techniques

Data analysis methods and techniques blog post by datapine

Table of Contents

1) What Is Data Analysis?

2) Why Is Data Analysis Important?

3) What Is The Data Analysis Process?

4) Types Of Data Analysis Methods

5) Top Data Analysis Techniques To Apply

6) Quality Criteria For Data Analysis

7) Data Analysis Limitations & Barriers

8) Data Analysis Skills

9) Data Analysis In The Big Data Environment

In our data-rich age, understanding how to analyze and extract true meaning from our business’s digital insights is one of the primary drivers of success.

Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery , improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a vast amount of data.

With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield – but online data analysis is the solution.

In science, data analysis uses a more complex approach with advanced techniques to explore and experiment with data. On the other hand, in a business context, data is used to make data-driven decisions that will enable the company to improve its overall performance. In this post, we will cover the analysis of data from an organizational point of view while still going through the scientific and statistical foundations that are fundamental to understanding the basics of data analysis. 

To put all of that into perspective, we will answer a host of important analytical questions, explore analytical methods and techniques, while demonstrating how to perform analysis in the real world with a 17-step blueprint for success.

What Is Data Analysis?

Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

All these various methods are largely based on two core areas: quantitative and qualitative research.

To explain the key differences between qualitative and quantitative research, here’s a video for your viewing pleasure:

Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

Apart from qualitative and quantitative categories, there are also other types of data that you should be aware of before dividing into complex data analysis processes. These categories include: 

  • Big data: Refers to massive data sets that need to be analyzed using advanced software to reveal patterns and trends. It is considered to be one of the best analytical assets as it provides larger volumes of data at a faster rate. 
  • Metadata: Putting it simply, metadata is data that provides insights about other data. It summarizes key information about specific data that makes it easier to find and reuse for later purposes. 
  • Real time data: As its name suggests, real time data is presented as soon as it is acquired. From an organizational perspective, this is the most valuable data as it can help you make important decisions based on the latest developments. Our guide on real time analytics will tell you more about the topic. 
  • Machine data: This is more complex data that is generated solely by a machine such as phones, computers, or even websites and embedded systems, without previous human interaction.

Why Is Data Analysis Important?

Before we go into detail about the categories of analysis along with its methods and techniques, you must understand the potential that analyzing data can bring to your organization.

  • Informed decision-making : From a management perspective, you can benefit from analyzing your data as it helps you make decisions based on facts and not simple intuition. For instance, you can understand where to invest your capital, detect growth opportunities, predict your income, or tackle uncommon situations before they become problems. Through this, you can extract relevant insights from all areas in your organization, and with the help of dashboard software , present the data in a professional and interactive way to different stakeholders.
  • Reduce costs : Another great benefit is to reduce costs. With the help of advanced technologies such as predictive analytics, businesses can spot improvement opportunities, trends, and patterns in their data and plan their strategies accordingly. In time, this will help you save money and resources on implementing the wrong strategies. And not just that, by predicting different scenarios such as sales and demand you can also anticipate production and supply. 
  • Target customers better : Customers are arguably the most crucial element in any business. By using analytics to get a 360° vision of all aspects related to your customers, you can understand which channels they use to communicate with you, their demographics, interests, habits, purchasing behaviors, and more. In the long run, it will drive success to your marketing strategies, allow you to identify new potential customers, and avoid wasting resources on targeting the wrong people or sending the wrong message. You can also track customer satisfaction by analyzing your client’s reviews or your customer service department’s performance.

What Is The Data Analysis Process?

Data analysis process graphic

When we talk about analyzing data there is an order to follow in order to extract the needed conclusions. The analysis process consists of 5 key stages. We will cover each of them more in detail later in the post, but to start providing the needed context to understand what is coming next, here is a rundown of the 5 essential steps of data analysis. 

  • Identify: Before you get your hands dirty with data, you first need to identify why you need it in the first place. The identification is the stage in which you establish the questions you will need to answer. For example, what is the customer's perception of our brand? Or what type of packaging is more engaging to our potential customers? Once the questions are outlined you are ready for the next step. 
  • Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you define which sources of data you will use and how you will use them. The collection of data can come in different forms such as internal or external sources, surveys, interviews, questionnaires, and focus groups, among others.  An important note here is that the way you collect the data will be different in a quantitative and qualitative scenario. 
  • Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the data you collect will be useful, when collecting big amounts of data in different formats it is very likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start working with your data you need to make sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid hurting your analysis with bad-quality data. 
  • Analyze : With the help of various techniques such as statistical analysis, regressions, neural networks, text analysis, and more, you can start analyzing and manipulating your data to extract relevant conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you answer the questions you first thought of in the identify stage. Various technologies in the market assist researchers and average users with the management of their data. Some of them include business intelligence and visualization software, predictive analytics, and data mining, among others. 
  • Interpret: Last but not least you have one of the most important steps: it is time to interpret your results. This stage is where the researcher comes up with courses of action based on the findings. For example, here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some limitations and work on them. 

Now that you have a basic understanding of the key data analysis steps, let’s look at the top 17 essential methods.

17 Essential Types Of Data Analysis Methods

Before diving into the 17 essential types of methods, it is important that we go over really fast through the main analysis categories. Starting with the category of descriptive up to prescriptive analysis, the complexity and effort of data evaluation increases, but also the added value for the company.

a) Descriptive analysis - What happened.

The descriptive analysis method is the starting point for any analytic reflection, and it aims to answer the question of what happened? It does this by ordering, manipulating, and interpreting raw data from various sources to turn it into valuable insights for your organization.

Performing descriptive analysis is essential, as it enables us to present our insights in a meaningful way. Although it is relevant to mention that this analysis on its own will not allow you to predict future outcomes or tell you the answer to questions like why something happened, it will leave your data organized and ready to conduct further investigations.

b) Exploratory analysis - How to explore data relationships.

As its name suggests, the main aim of the exploratory analysis is to explore. Prior to it, there is still no notion of the relationship between the data and the variables. Once the data is investigated, exploratory analysis helps you to find connections and generate hypotheses and solutions for specific problems. A typical area of ​​application for it is data mining.

c) Diagnostic analysis - Why it happened.

Diagnostic data analytics empowers analysts and executives by helping them gain a firm contextual understanding of why something happened. If you know why something happened as well as how it happened, you will be able to pinpoint the exact ways of tackling the issue or challenge.

Designed to provide direct and actionable answers to specific questions, this is one of the world’s most important methods in research, among its other key organizational functions such as retail analytics , e.g.

c) Predictive analysis - What will happen.

The predictive method allows you to look into the future to answer the question: what will happen? In order to do this, it uses the results of the previously mentioned descriptive, exploratory, and diagnostic analysis, in addition to machine learning (ML) and artificial intelligence (AI). Through this, you can uncover future trends, potential problems or inefficiencies, connections, and casualties in your data.

With predictive analysis, you can unfold and develop initiatives that will not only enhance your various operational processes but also help you gain an all-important edge over the competition. If you understand why a trend, pattern, or event happened through data, you will be able to develop an informed projection of how things may unfold in particular areas of the business.

e) Prescriptive analysis - How will it happen.

Another of the most effective types of analysis methods in research. Prescriptive data techniques cross over from predictive analysis in the way that it revolves around using patterns or trends to develop responsive, practical business strategies.

By drilling down into prescriptive analysis, you will play an active role in the data consumption process by taking well-arranged sets of visual data and using it as a powerful fix to emerging issues in a number of key areas, including marketing, sales, customer experience, HR, fulfillment, finance, logistics analytics , and others.

Top 17 data analysis methods

As mentioned at the beginning of the post, data analysis methods can be divided into two big categories: quantitative and qualitative. Each of these categories holds a powerful analytical value that changes depending on the scenario and type of data you are working with. Below, we will discuss 17 methods that are divided into qualitative and quantitative approaches. 

Without further ado, here are the 17 essential types of data analysis methods with some use cases in the business world: 

A. Quantitative Methods 

To put it simply, quantitative analysis refers to all methods that use numerical data or data that can be turned into numbers (e.g. category variables like gender, age, etc.) to extract valuable insights. It is used to extract valuable conclusions about relationships, differences, and test hypotheses. Below we discuss some of the key quantitative methods. 

1. Cluster analysis

The action of grouping a set of data elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups – hence the term ‘cluster.’ Since there is no target variable when clustering, the method is often used to find hidden patterns in the data. The approach is also used to provide additional context to a trend or dataset.

Let's look at it from an organizational perspective. In a perfect world, marketers would be able to analyze each customer separately and give them the best-personalized service, but let's face it, with a large customer base, it is timely impossible to do that. That's where clustering comes in. By grouping customers into clusters based on demographics, purchasing behaviors, monetary value, or any other factor that might be relevant for your company, you will be able to immediately optimize your efforts and give your customers the best experience based on their needs.

2. Cohort analysis

This type of data analysis approach uses historical data to examine and compare a determined segment of users' behavior, which can then be grouped with others with similar characteristics. By using this methodology, it's possible to gain a wealth of insight into consumer needs or a firm understanding of a broader target group.

Cohort analysis can be really useful for performing analysis in marketing as it will allow you to understand the impact of your campaigns on specific groups of customers. To exemplify, imagine you send an email campaign encouraging customers to sign up for your site. For this, you create two versions of the campaign with different designs, CTAs, and ad content. Later on, you can use cohort analysis to track the performance of the campaign for a longer period of time and understand which type of content is driving your customers to sign up, repurchase, or engage in other ways.  

A useful tool to start performing cohort analysis method is Google Analytics. You can learn more about the benefits and limitations of using cohorts in GA in this useful guide . In the bottom image, you see an example of how you visualize a cohort in this tool. The segments (devices traffic) are divided into date cohorts (usage of devices) and then analyzed week by week to extract insights into performance.

Cohort analysis chart example from google analytics

3. Regression analysis

Regression uses historical data to understand how a dependent variable's value is affected when one (linear regression) or more independent variables (multiple regression) change or stay the same. By understanding each variable's relationship and how it developed in the past, you can anticipate possible outcomes and make better decisions in the future.

Let's bring it down with an example. Imagine you did a regression analysis of your sales in 2019 and discovered that variables like product quality, store design, customer service, marketing campaigns, and sales channels affected the overall result. Now you want to use regression to analyze which of these variables changed or if any new ones appeared during 2020. For example, you couldn’t sell as much in your physical store due to COVID lockdowns. Therefore, your sales could’ve either dropped in general or increased in your online channels. Through this, you can understand which independent variables affected the overall performance of your dependent variable, annual sales.

If you want to go deeper into this type of analysis, check out this article and learn more about how you can benefit from regression.

4. Neural networks

The neural network forms the basis for the intelligent algorithms of machine learning. It is a form of analytics that attempts, with minimal intervention, to understand how the human brain would generate insights and predict values. Neural networks learn from each and every data transaction, meaning that they evolve and advance over time.

A typical area of application for neural networks is predictive analytics. There are BI reporting tools that have this feature implemented within them, such as the Predictive Analytics Tool from datapine. This tool enables users to quickly and easily generate all kinds of predictions. All you have to do is select the data to be processed based on your KPIs, and the software automatically calculates forecasts based on historical and current data. Thanks to its user-friendly interface, anyone in your organization can manage it; there’s no need to be an advanced scientist. 

Here is an example of how you can use the predictive analysis tool from datapine:

Example on how to use predictive analytics tool from datapine

**click to enlarge**

5. Factor analysis

The factor analysis also called “dimension reduction” is a type of data analysis used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. The aim here is to uncover independent latent variables, an ideal method for streamlining specific segments.

A good way to understand this data analysis method is a customer evaluation of a product. The initial assessment is based on different variables like color, shape, wearability, current trends, materials, comfort, the place where they bought the product, and frequency of usage. Like this, the list can be endless, depending on what you want to track. In this case, factor analysis comes into the picture by summarizing all of these variables into homogenous groups, for example, by grouping the variables color, materials, quality, and trends into a brother latent variable of design.

If you want to start analyzing data using factor analysis we recommend you take a look at this practical guide from UCLA.

6. Data mining

A method of data analysis that is the umbrella term for engineering metrics and insights for additional value, direction, and context. By using exploratory statistical evaluation, data mining aims to identify dependencies, relations, patterns, and trends to generate advanced knowledge.  When considering how to analyze data, adopting a data mining mindset is essential to success - as such, it’s an area that is worth exploring in greater detail.

An excellent use case of data mining is datapine intelligent data alerts . With the help of artificial intelligence and machine learning, they provide automated signals based on particular commands or occurrences within a dataset. For example, if you’re monitoring supply chain KPIs , you could set an intelligent alarm to trigger when invalid or low-quality data appears. By doing so, you will be able to drill down deep into the issue and fix it swiftly and effectively.

In the following picture, you can see how the intelligent alarms from datapine work. By setting up ranges on daily orders, sessions, and revenues, the alarms will notify you if the goal was not completed or if it exceeded expectations.

Example on how to use intelligent alerts from datapine

7. Time series analysis

As its name suggests, time series analysis is used to analyze a set of data points collected over a specified period of time. Although analysts use this method to monitor the data points in a specific interval of time rather than just monitoring them intermittently, the time series analysis is not uniquely used for the purpose of collecting data over time. Instead, it allows researchers to understand if variables changed during the duration of the study, how the different variables are dependent, and how did it reach the end result. 

In a business context, this method is used to understand the causes of different trends and patterns to extract valuable insights. Another way of using this method is with the help of time series forecasting. Powered by predictive technologies, businesses can analyze various data sets over a period of time and forecast different future events. 

A great use case to put time series analysis into perspective is seasonality effects on sales. By using time series forecasting to analyze sales data of a specific product over time, you can understand if sales rise over a specific period of time (e.g. swimwear during summertime, or candy during Halloween). These insights allow you to predict demand and prepare production accordingly.  

8. Decision Trees 

The decision tree analysis aims to act as a support tool to make smart and strategic decisions. By visually displaying potential outcomes, consequences, and costs in a tree-like model, researchers and company users can easily evaluate all factors involved and choose the best course of action. Decision trees are helpful to analyze quantitative data and they allow for an improved decision-making process by helping you spot improvement opportunities, reduce costs, and enhance operational efficiency and production.

But how does a decision tree actually works? This method works like a flowchart that starts with the main decision that you need to make and branches out based on the different outcomes and consequences of each decision. Each outcome will outline its own consequences, costs, and gains and, at the end of the analysis, you can compare each of them and make the smartest decision. 

Businesses can use them to understand which project is more cost-effective and will bring more earnings in the long run. For example, imagine you need to decide if you want to update your software app or build a new app entirely.  Here you would compare the total costs, the time needed to be invested, potential revenue, and any other factor that might affect your decision.  In the end, you would be able to see which of these two options is more realistic and attainable for your company or research.

9. Conjoint analysis 

Last but not least, we have the conjoint analysis. This approach is usually used in surveys to understand how individuals value different attributes of a product or service and it is one of the most effective methods to extract consumer preferences. When it comes to purchasing, some clients might be more price-focused, others more features-focused, and others might have a sustainable focus. Whatever your customer's preferences are, you can find them with conjoint analysis. Through this, companies can define pricing strategies, packaging options, subscription packages, and more. 

A great example of conjoint analysis is in marketing and sales. For instance, a cupcake brand might use conjoint analysis and find that its clients prefer gluten-free options and cupcakes with healthier toppings over super sugary ones. Thus, the cupcake brand can turn these insights into advertisements and promotions to increase sales of this particular type of product. And not just that, conjoint analysis can also help businesses segment their customers based on their interests. This allows them to send different messaging that will bring value to each of the segments. 

10. Correspondence Analysis

Also known as reciprocal averaging, correspondence analysis is a method used to analyze the relationship between categorical variables presented within a contingency table. A contingency table is a table that displays two (simple correspondence analysis) or more (multiple correspondence analysis) categorical variables across rows and columns that show the distribution of the data, which is usually answers to a survey or questionnaire on a specific topic. 

This method starts by calculating an “expected value” which is done by multiplying row and column averages and dividing it by the overall original value of the specific table cell. The “expected value” is then subtracted from the original value resulting in a “residual number” which is what allows you to extract conclusions about relationships and distribution. The results of this analysis are later displayed using a map that represents the relationship between the different values. The closest two values are in the map, the bigger the relationship. Let’s put it into perspective with an example. 

Imagine you are carrying out a market research analysis about outdoor clothing brands and how they are perceived by the public. For this analysis, you ask a group of people to match each brand with a certain attribute which can be durability, innovation, quality materials, etc. When calculating the residual numbers, you can see that brand A has a positive residual for innovation but a negative one for durability. This means that brand A is not positioned as a durable brand in the market, something that competitors could take advantage of. 

11. Multidimensional Scaling (MDS)

MDS is a method used to observe the similarities or disparities between objects which can be colors, brands, people, geographical coordinates, and more. The objects are plotted using an “MDS map” that positions similar objects together and disparate ones far apart. The (dis) similarities between objects are represented using one or more dimensions that can be observed using a numerical scale. For example, if you want to know how people feel about the COVID-19 vaccine, you can use 1 for “don’t believe in the vaccine at all”  and 10 for “firmly believe in the vaccine” and a scale of 2 to 9 for in between responses.  When analyzing an MDS map the only thing that matters is the distance between the objects, the orientation of the dimensions is arbitrary and has no meaning at all. 

Multidimensional scaling is a valuable technique for market research, especially when it comes to evaluating product or brand positioning. For instance, if a cupcake brand wants to know how they are positioned compared to competitors, it can define 2-3 dimensions such as taste, ingredients, shopping experience, or more, and do a multidimensional scaling analysis to find improvement opportunities as well as areas in which competitors are currently leading. 

Another business example is in procurement when deciding on different suppliers. Decision makers can generate an MDS map to see how the different prices, delivery times, technical services, and more of the different suppliers differ and pick the one that suits their needs the best. 

A final example proposed by a research paper on "An Improved Study of Multilevel Semantic Network Visualization for Analyzing Sentiment Word of Movie Review Data". Researchers picked a two-dimensional MDS map to display the distances and relationships between different sentiments in movie reviews. They used 36 sentiment words and distributed them based on their emotional distance as we can see in the image below where the words "outraged" and "sweet" are on opposite sides of the map, marking the distance between the two emotions very clearly.

Example of multidimensional scaling analysis

Aside from being a valuable technique to analyze dissimilarities, MDS also serves as a dimension-reduction technique for large dimensional data. 

B. Qualitative Methods

Qualitative data analysis methods are defined as the observation of non-numerical data that is gathered and produced using methods of observation such as interviews, focus groups, questionnaires, and more. As opposed to quantitative methods, qualitative data is more subjective and highly valuable in analyzing customer retention and product development.

12. Text analysis

Text analysis, also known in the industry as text mining, works by taking large sets of textual data and arranging them in a way that makes it easier to manage. By working through this cleansing process in stringent detail, you will be able to extract the data that is truly relevant to your organization and use it to develop actionable insights that will propel you forward.

Modern software accelerate the application of text analytics. Thanks to the combination of machine learning and intelligent algorithms, you can perform advanced analytical processes such as sentiment analysis. This technique allows you to understand the intentions and emotions of a text, for example, if it's positive, negative, or neutral, and then give it a score depending on certain factors and categories that are relevant to your brand. Sentiment analysis is often used to monitor brand and product reputation and to understand how successful your customer experience is. To learn more about the topic check out this insightful article .

By analyzing data from various word-based sources, including product reviews, articles, social media communications, and survey responses, you will gain invaluable insights into your audience, as well as their needs, preferences, and pain points. This will allow you to create campaigns, services, and communications that meet your prospects’ needs on a personal level, growing your audience while boosting customer retention. There are various other “sub-methods” that are an extension of text analysis. Each of them serves a more specific purpose and we will look at them in detail next. 

13. Content Analysis

This is a straightforward and very popular method that examines the presence and frequency of certain words, concepts, and subjects in different content formats such as text, image, audio, or video. For example, the number of times the name of a celebrity is mentioned on social media or online tabloids. It does this by coding text data that is later categorized and tabulated in a way that can provide valuable insights, making it the perfect mix of quantitative and qualitative analysis.

There are two types of content analysis. The first one is the conceptual analysis which focuses on explicit data, for instance, the number of times a concept or word is mentioned in a piece of content. The second one is relational analysis, which focuses on the relationship between different concepts or words and how they are connected within a specific context. 

Content analysis is often used by marketers to measure brand reputation and customer behavior. For example, by analyzing customer reviews. It can also be used to analyze customer interviews and find directions for new product development. It is also important to note, that in order to extract the maximum potential out of this analysis method, it is necessary to have a clearly defined research question. 

14. Thematic Analysis

Very similar to content analysis, thematic analysis also helps in identifying and interpreting patterns in qualitative data with the main difference being that the first one can also be applied to quantitative analysis. The thematic method analyzes large pieces of text data such as focus group transcripts or interviews and groups them into themes or categories that come up frequently within the text. It is a great method when trying to figure out peoples view’s and opinions about a certain topic. For example, if you are a brand that cares about sustainability, you can do a survey of your customers to analyze their views and opinions about sustainability and how they apply it to their lives. You can also analyze customer service calls transcripts to find common issues and improve your service. 

Thematic analysis is a very subjective technique that relies on the researcher’s judgment. Therefore,  to avoid biases, it has 6 steps that include familiarization, coding, generating themes, reviewing themes, defining and naming themes, and writing up. It is also important to note that, because it is a flexible approach, the data can be interpreted in multiple ways and it can be hard to select what data is more important to emphasize. 

15. Narrative Analysis 

A bit more complex in nature than the two previous ones, narrative analysis is used to explore the meaning behind the stories that people tell and most importantly, how they tell them. By looking into the words that people use to describe a situation you can extract valuable conclusions about their perspective on a specific topic. Common sources for narrative data include autobiographies, family stories, opinion pieces, and testimonials, among others. 

From a business perspective, narrative analysis can be useful to analyze customer behaviors and feelings towards a specific product, service, feature, or others. It provides unique and deep insights that can be extremely valuable. However, it has some drawbacks.  

The biggest weakness of this method is that the sample sizes are usually very small due to the complexity and time-consuming nature of the collection of narrative data. Plus, the way a subject tells a story will be significantly influenced by his or her specific experiences, making it very hard to replicate in a subsequent study. 

16. Discourse Analysis

Discourse analysis is used to understand the meaning behind any type of written, verbal, or symbolic discourse based on its political, social, or cultural context. It mixes the analysis of languages and situations together. This means that the way the content is constructed and the meaning behind it is significantly influenced by the culture and society it takes place in. For example, if you are analyzing political speeches you need to consider different context elements such as the politician's background, the current political context of the country, the audience to which the speech is directed, and so on. 

From a business point of view, discourse analysis is a great market research tool. It allows marketers to understand how the norms and ideas of the specific market work and how their customers relate to those ideas. It can be very useful to build a brand mission or develop a unique tone of voice. 

17. Grounded Theory Analysis

Traditionally, researchers decide on a method and hypothesis and start to collect the data to prove that hypothesis. The grounded theory is the only method that doesn’t require an initial research question or hypothesis as its value lies in the generation of new theories. With the grounded theory method, you can go into the analysis process with an open mind and explore the data to generate new theories through tests and revisions. In fact, it is not necessary to collect the data and then start to analyze it. Researchers usually start to find valuable insights as they are gathering the data. 

All of these elements make grounded theory a very valuable method as theories are fully backed by data instead of initial assumptions. It is a great technique to analyze poorly researched topics or find the causes behind specific company outcomes. For example, product managers and marketers might use the grounded theory to find the causes of high levels of customer churn and look into customer surveys and reviews to develop new theories about the causes. 

How To Analyze Data? Top 17 Data Analysis Techniques To Apply

17 top data analysis techniques by datapine

Now that we’ve answered the questions “what is data analysis’”, why is it important, and covered the different data analysis types, it’s time to dig deeper into how to perform your analysis by working through these 17 essential techniques.

1. Collaborate your needs

Before you begin analyzing or drilling down into any techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

2. Establish your questions

Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important techniques as it will shape the very foundations of your success.

To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions .

3. Data democratization

After giving your data analytics methodology some real direction, and knowing which questions need answering to extract optimum value from the information available to your organization, you should continue with democratization.

Data democratization is an action that aims to connect data from various sources efficiently and quickly so that anyone in your organization can access it at any given moment. You can extract data in text, images, videos, numbers, or any other format. And then perform cross-database analysis to achieve more advanced insights to share with the rest of the company interactively.  

Once you have decided on your most valuable sources, you need to take all of this into a structured format to start collecting your insights. For this purpose, datapine offers an easy all-in-one data connectors feature to integrate all your internal and external sources and manage them at your will. Additionally, datapine’s end-to-end solution automatically updates your data, allowing you to save time and focus on performing the right analysis to grow your company.

data connectors from datapine

4. Think of governance 

When collecting data in a business or research context you always need to think about security and privacy. With data breaches becoming a topic of concern for businesses, the need to protect your client's or subject’s sensitive information becomes critical. 

To ensure that all this is taken care of, you need to think of a data governance strategy. According to Gartner , this concept refers to “ the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics .” In simpler words, data governance is a collection of processes, roles, and policies, that ensure the efficient use of data while still achieving the main company goals. It ensures that clear roles are in place for who can access the information and how they can access it. In time, this not only ensures that sensitive information is protected but also allows for an efficient analysis as a whole. 

5. Clean your data

After harvesting from so many sources you will be left with a vast amount of information that can be overwhelming to deal with. At the same time, you can be faced with incorrect data that can be misleading to your analysis. The smartest thing you can do to avoid dealing with this in the future is to clean the data. This is fundamental before visualizing it, as it will ensure that the insights you extract from it are correct.

There are many things that you need to look for in the cleaning process. The most important one is to eliminate any duplicate observations; this usually appears when using multiple internal and external sources of information. You can also add any missing codes, fix empty fields, and eliminate incorrectly formatted data.

Another usual form of cleaning is done with text data. As we mentioned earlier, most companies today analyze customer reviews, social media comments, questionnaires, and several other text inputs. In order for algorithms to detect patterns, text data needs to be revised to avoid invalid characters or any syntax or spelling errors. 

Most importantly, the aim of cleaning is to prevent you from arriving at false conclusions that can damage your company in the long run. By using clean data, you will also help BI solutions to interact better with your information and create better reports for your organization.

6. Set your KPIs

Once you’ve set your sources, cleaned your data, and established clear-cut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

KPIs are critical to both qualitative and quantitative analysis research. This is one of the primary methods of data analysis you certainly shouldn’t overlook.

To help you set the best possible KPIs for your initiatives and activities, here is an example of a relevant logistics KPI : transportation-related costs. If you want to see more go explore our collection of key performance indicator examples .

Transportation costs logistics KPIs

7. Omit useless data

Having bestowed your data analysis tools and techniques with true purpose and defined your mission, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

Trimming the informational fat is one of the most crucial methods of analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

8. Build a data management roadmap

While, at this point, this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional – one of the most powerful types of data analysis methods available today.

9. Integrate technology

There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right decision support software and technology.

Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights; it will also present them in a digestible, visual, interactive format from one central, live dashboard . A data methodology you can count on.

By integrating the right technology within your data analysis methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

For a look at the power of software for the purpose of analysis and to enhance your methods of analyzing, glance over our selection of dashboard examples .

10. Answer your questions

By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer your most burning business questions. Arguably, the best way to make your data concepts accessible across the organization is through data visualization.

11. Visualize your data

Online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the organization to extract meaningful insights that aid business evolution – and it covers all the different ways to analyze data.

The purpose of analyzing is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this is simpler than you think, as demonstrated by our marketing dashboard .

An executive dashboard example showcasing high-level marketing KPIs such as cost per lead, MQL, SQL, and cost per customer.

This visual, dynamic, and interactive online dashboard is a data analysis example designed to give Chief Marketing Officers (CMO) an overview of relevant metrics to help them understand if they achieved their monthly goals.

In detail, this example generated with a modern dashboard creator displays interactive charts for monthly revenues, costs, net income, and net income per customer; all of them are compared with the previous month so that you can understand how the data fluctuated. In addition, it shows a detailed summary of the number of users, customers, SQLs, and MQLs per month to visualize the whole picture and extract relevant insights or trends for your marketing reports .

The CMO dashboard is perfect for c-level management as it can help them monitor the strategic outcome of their marketing efforts and make data-driven decisions that can benefit the company exponentially.

12. Be careful with the interpretation

We already dedicated an entire post to data interpretation as it is a fundamental part of the process of data analysis. It gives meaning to the analytical information and aims to drive a concise conclusion from the analysis results. Since most of the time companies are dealing with data from many different sources, the interpretation stage needs to be done carefully and properly in order to avoid misinterpretations. 

To help you through the process, here we list three common practices that you need to avoid at all costs when looking at your data:

  • Correlation vs. causation: The human brain is formatted to find patterns. This behavior leads to one of the most common mistakes when performing interpretation: confusing correlation with causation. Although these two aspects can exist simultaneously, it is not correct to assume that because two things happened together, one provoked the other. A piece of advice to avoid falling into this mistake is never to trust just intuition, trust the data. If there is no objective evidence of causation, then always stick to correlation. 
  • Confirmation bias: This phenomenon describes the tendency to select and interpret only the data necessary to prove one hypothesis, often ignoring the elements that might disprove it. Even if it's not done on purpose, confirmation bias can represent a real problem, as excluding relevant information can lead to false conclusions and, therefore, bad business decisions. To avoid it, always try to disprove your hypothesis instead of proving it, share your analysis with other team members, and avoid drawing any conclusions before the entire analytical project is finalized.
  • Statistical significance: To put it in short words, statistical significance helps analysts understand if a result is actually accurate or if it happened because of a sampling error or pure chance. The level of statistical significance needed might depend on the sample size and the industry being analyzed. In any case, ignoring the significance of a result when it might influence decision-making can be a huge mistake.

13. Build a narrative

Now, we’re going to look at how you can bring all of these elements together in a way that will benefit your business - starting with a little something called data storytelling.

The human brain responds incredibly well to strong stories or narratives. Once you’ve cleansed, shaped, and visualized your most invaluable data using various BI dashboard tools , you should strive to tell a story - one with a clear-cut beginning, middle, and end.

By doing so, you will make your analytical efforts more accessible, digestible, and universal, empowering more people within your organization to use your discoveries to their actionable advantage.

14. Consider autonomous technology

Autonomous technologies, such as artificial intelligence (AI) and machine learning (ML), play a significant role in the advancement of understanding how to analyze data more effectively.

Gartner predicts that by the end of this year, 80% of emerging technologies will be developed with AI foundations. This is a testament to the ever-growing power and value of autonomous technologies.

At the moment, these technologies are revolutionizing the analysis industry. Some examples that we mentioned earlier are neural networks, intelligent alarms, and sentiment analysis.

15. Share the load

If you work with the right tools and dashboards, you will be able to present your metrics in a digestible, value-driven format, allowing almost everyone in the organization to connect with and use relevant data to their advantage.

Modern dashboards consolidate data from various sources, providing access to a wealth of insights in one centralized location, no matter if you need to monitor recruitment metrics or generate reports that need to be sent across numerous departments. Moreover, these cutting-edge tools offer access to dashboards from a multitude of devices, meaning that everyone within the business can connect with practical insights remotely - and share the load.

Once everyone is able to work with a data-driven mindset, you will catalyze the success of your business in ways you never thought possible. And when it comes to knowing how to analyze data, this kind of collaborative approach is essential.

16. Data analysis tools

In order to perform high-quality analysis of data, it is fundamental to use tools and software that will ensure the best results. Here we leave you a small summary of four fundamental categories of data analysis tools for your organization.

  • Business Intelligence: BI tools allow you to process significant amounts of data from several sources in any format. Through this, you can not only analyze and monitor your data to extract relevant insights but also create interactive reports and dashboards to visualize your KPIs and use them for your company's good. datapine is an amazing online BI software that is focused on delivering powerful online analysis features that are accessible to beginner and advanced users. Like this, it offers a full-service solution that includes cutting-edge analysis of data, KPIs visualization, live dashboards, reporting, and artificial intelligence technologies to predict trends and minimize risk.
  • Statistical analysis: These tools are usually designed for scientists, statisticians, market researchers, and mathematicians, as they allow them to perform complex statistical analyses with methods like regression analysis, predictive analysis, and statistical modeling. A good tool to perform this type of analysis is R-Studio as it offers a powerful data modeling and hypothesis testing feature that can cover both academic and general data analysis. This tool is one of the favorite ones in the industry, due to its capability for data cleaning, data reduction, and performing advanced analysis with several statistical methods. Another relevant tool to mention is SPSS from IBM. The software offers advanced statistical analysis for users of all skill levels. Thanks to a vast library of machine learning algorithms, text analysis, and a hypothesis testing approach it can help your company find relevant insights to drive better decisions. SPSS also works as a cloud service that enables you to run it anywhere.
  • SQL Consoles: SQL is a programming language often used to handle structured data in relational databases. Tools like these are popular among data scientists as they are extremely effective in unlocking these databases' value. Undoubtedly, one of the most used SQL software in the market is MySQL Workbench . This tool offers several features such as a visual tool for database modeling and monitoring, complete SQL optimization, administration tools, and visual performance dashboards to keep track of KPIs.
  • Data Visualization: These tools are used to represent your data through charts, graphs, and maps that allow you to find patterns and trends in the data. datapine's already mentioned BI platform also offers a wealth of powerful online data visualization tools with several benefits. Some of them include: delivering compelling data-driven presentations to share with your entire company, the ability to see your data online with any device wherever you are, an interactive dashboard design feature that enables you to showcase your results in an interactive and understandable way, and to perform online self-service reports that can be used simultaneously with several other people to enhance team productivity.

17. Refine your process constantly 

Last is a step that might seem obvious to some people, but it can be easily ignored if you think you are done. Once you have extracted the needed results, you should always take a retrospective look at your project and think about what you can improve. As you saw throughout this long list of techniques, data analysis is a complex process that requires constant refinement. For this reason, you should always go one step further and keep improving. 

Quality Criteria For Data Analysis

So far we’ve covered a list of methods and techniques that should help you perform efficient data analysis. But how do you measure the quality and validity of your results? This is done with the help of some science quality criteria. Here we will go into a more theoretical area that is critical to understanding the fundamentals of statistical analysis in science. However, you should also be aware of these steps in a business context, as they will allow you to assess the quality of your results in the correct way. Let’s dig in. 

  • Internal validity: The results of a survey are internally valid if they measure what they are supposed to measure and thus provide credible results. In other words , internal validity measures the trustworthiness of the results and how they can be affected by factors such as the research design, operational definitions, how the variables are measured, and more. For instance, imagine you are doing an interview to ask people if they brush their teeth two times a day. While most of them will answer yes, you can still notice that their answers correspond to what is socially acceptable, which is to brush your teeth at least twice a day. In this case, you can’t be 100% sure if respondents actually brush their teeth twice a day or if they just say that they do, therefore, the internal validity of this interview is very low. 
  • External validity: Essentially, external validity refers to the extent to which the results of your research can be applied to a broader context. It basically aims to prove that the findings of a study can be applied in the real world. If the research can be applied to other settings, individuals, and times, then the external validity is high. 
  • Reliability : If your research is reliable, it means that it can be reproduced. If your measurement were repeated under the same conditions, it would produce similar results. This means that your measuring instrument consistently produces reliable results. For example, imagine a doctor building a symptoms questionnaire to detect a specific disease in a patient. Then, various other doctors use this questionnaire but end up diagnosing the same patient with a different condition. This means the questionnaire is not reliable in detecting the initial disease. Another important note here is that in order for your research to be reliable, it also needs to be objective. If the results of a study are the same, independent of who assesses them or interprets them, the study can be considered reliable. Let’s see the objectivity criteria in more detail now. 
  • Objectivity: In data science, objectivity means that the researcher needs to stay fully objective when it comes to its analysis. The results of a study need to be affected by objective criteria and not by the beliefs, personality, or values of the researcher. Objectivity needs to be ensured when you are gathering the data, for example, when interviewing individuals, the questions need to be asked in a way that doesn't influence the results. Paired with this, objectivity also needs to be thought of when interpreting the data. If different researchers reach the same conclusions, then the study is objective. For this last point, you can set predefined criteria to interpret the results to ensure all researchers follow the same steps. 

The discussed quality criteria cover mostly potential influences in a quantitative context. Analysis in qualitative research has by default additional subjective influences that must be controlled in a different way. Therefore, there are other quality criteria for this kind of research such as credibility, transferability, dependability, and confirmability. You can see each of them more in detail on this resource . 

Data Analysis Limitations & Barriers

Analyzing data is not an easy task. As you’ve seen throughout this post, there are many steps and techniques that you need to apply in order to extract useful information from your research. While a well-performed analysis can bring various benefits to your organization it doesn't come without limitations. In this section, we will discuss some of the main barriers you might encounter when conducting an analysis. Let’s see them more in detail. 

  • Lack of clear goals: No matter how good your data or analysis might be if you don’t have clear goals or a hypothesis the process might be worthless. While we mentioned some methods that don’t require a predefined hypothesis, it is always better to enter the analytical process with some clear guidelines of what you are expecting to get out of it, especially in a business context in which data is utilized to support important strategic decisions. 
  • Objectivity: Arguably one of the biggest barriers when it comes to data analysis in research is to stay objective. When trying to prove a hypothesis, researchers might find themselves, intentionally or unintentionally, directing the results toward an outcome that they want. To avoid this, always question your assumptions and avoid confusing facts with opinions. You can also show your findings to a research partner or external person to confirm that your results are objective. 
  • Data representation: A fundamental part of the analytical procedure is the way you represent your data. You can use various graphs and charts to represent your findings, but not all of them will work for all purposes. Choosing the wrong visual can not only damage your analysis but can mislead your audience, therefore, it is important to understand when to use each type of data depending on your analytical goals. Our complete guide on the types of graphs and charts lists 20 different visuals with examples of when to use them. 
  • Flawed correlation : Misleading statistics can significantly damage your research. We’ve already pointed out a few interpretation issues previously in the post, but it is an important barrier that we can't avoid addressing here as well. Flawed correlations occur when two variables appear related to each other but they are not. Confusing correlations with causation can lead to a wrong interpretation of results which can lead to building wrong strategies and loss of resources, therefore, it is very important to identify the different interpretation mistakes and avoid them. 
  • Sample size: A very common barrier to a reliable and efficient analysis process is the sample size. In order for the results to be trustworthy, the sample size should be representative of what you are analyzing. For example, imagine you have a company of 1000 employees and you ask the question “do you like working here?” to 50 employees of which 49 say yes, which means 95%. Now, imagine you ask the same question to the 1000 employees and 950 say yes, which also means 95%. Saying that 95% of employees like working in the company when the sample size was only 50 is not a representative or trustworthy conclusion. The significance of the results is way more accurate when surveying a bigger sample size.   
  • Privacy concerns: In some cases, data collection can be subjected to privacy regulations. Businesses gather all kinds of information from their customers from purchasing behaviors to addresses and phone numbers. If this falls into the wrong hands due to a breach, it can affect the security and confidentiality of your clients. To avoid this issue, you need to collect only the data that is needed for your research and, if you are using sensitive facts, make it anonymous so customers are protected. The misuse of customer data can severely damage a business's reputation, so it is important to keep an eye on privacy. 
  • Lack of communication between teams : When it comes to performing data analysis on a business level, it is very likely that each department and team will have different goals and strategies. However, they are all working for the same common goal of helping the business run smoothly and keep growing. When teams are not connected and communicating with each other, it can directly affect the way general strategies are built. To avoid these issues, tools such as data dashboards enable teams to stay connected through data in a visually appealing way. 
  • Innumeracy : Businesses are working with data more and more every day. While there are many BI tools available to perform effective analysis, data literacy is still a constant barrier. Not all employees know how to apply analysis techniques or extract insights from them. To prevent this from happening, you can implement different training opportunities that will prepare every relevant user to deal with data. 

Key Data Analysis Skills

As you've learned throughout this lengthy guide, analyzing data is a complex task that requires a lot of knowledge and skills. That said, thanks to the rise of self-service tools the process is way more accessible and agile than it once was. Regardless, there are still some key skills that are valuable to have when working with data, we list the most important ones below.

  • Critical and statistical thinking: To successfully analyze data you need to be creative and think out of the box. Yes, that might sound like a weird statement considering that data is often tight to facts. However, a great level of critical thinking is required to uncover connections, come up with a valuable hypothesis, and extract conclusions that go a step further from the surface. This, of course, needs to be complemented by statistical thinking and an understanding of numbers. 
  • Data cleaning: Anyone who has ever worked with data before will tell you that the cleaning and preparation process accounts for 80% of a data analyst's work, therefore, the skill is fundamental. But not just that, not cleaning the data adequately can also significantly damage the analysis which can lead to poor decision-making in a business scenario. While there are multiple tools that automate the cleaning process and eliminate the possibility of human error, it is still a valuable skill to dominate. 
  • Data visualization: Visuals make the information easier to understand and analyze, not only for professional users but especially for non-technical ones. Having the necessary skills to not only choose the right chart type but know when to apply it correctly is key. This also means being able to design visually compelling charts that make the data exploration process more efficient. 
  • SQL: The Structured Query Language or SQL is a programming language used to communicate with databases. It is fundamental knowledge as it enables you to update, manipulate, and organize data from relational databases which are the most common databases used by companies. It is fairly easy to learn and one of the most valuable skills when it comes to data analysis. 
  • Communication skills: This is a skill that is especially valuable in a business environment. Being able to clearly communicate analytical outcomes to colleagues is incredibly important, especially when the information you are trying to convey is complex for non-technical people. This applies to in-person communication as well as written format, for example, when generating a dashboard or report. While this might be considered a “soft” skill compared to the other ones we mentioned, it should not be ignored as you most likely will need to share analytical findings with others no matter the context. 

Data Analysis In The Big Data Environment

Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

To inspire your efforts and put the importance of big data into context, here are some insights that you should know:

  • By 2026 the industry of big data is expected to be worth approximately $273.4 billion.
  • 94% of enterprises say that analyzing data is important for their growth and digital transformation. 
  • Companies that exploit the full potential of their data can increase their operating margins by 60% .
  • We already told you the benefits of Artificial Intelligence through this article. This industry's financial impact is expected to grow up to $40 billion by 2025.

Data analysis concepts may come in many forms, but fundamentally, any solid methodology will help to make your business more streamlined, cohesive, insightful, and successful than ever before.

Key Takeaways From Data Analysis 

As we reach the end of our data analysis journey, we leave a small summary of the main methods and techniques to perform excellent analysis and grow your business.

17 Essential Types of Data Analysis Methods:

  • Cluster analysis
  • Cohort analysis
  • Regression analysis
  • Factor analysis
  • Neural Networks
  • Data Mining
  • Text analysis
  • Time series analysis
  • Decision trees
  • Conjoint analysis 
  • Correspondence Analysis
  • Multidimensional Scaling 
  • Content analysis 
  • Thematic analysis
  • Narrative analysis 
  • Grounded theory analysis
  • Discourse analysis 

Top 17 Data Analysis Techniques:

  • Collaborate your needs
  • Establish your questions
  • Data democratization
  • Think of data governance 
  • Clean your data
  • Set your KPIs
  • Omit useless data
  • Build a data management roadmap
  • Integrate technology
  • Answer your questions
  • Visualize your data
  • Interpretation of data
  • Consider autonomous technology
  • Build a narrative
  • Share the load
  • Data Analysis tools
  • Refine your process constantly 

We’ve pondered the data analysis definition and drilled down into the practical applications of data-centric analytics, and one thing is clear: by taking measures to arrange your data and making your metrics work for you, it’s possible to transform raw information into action - the kind of that will push your business to the next level.

Yes, good data analytics techniques result in enhanced business intelligence (BI). To help you understand this notion in more detail, read our exploration of business intelligence reporting .

And, if you’re ready to perform your own analysis, drill down into your facts and figures while interacting with your data on astonishing visuals, you can try our software for a free, 14-day trial .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Front Physiol

Meta-Analytic Methodology for Basic Research: A Practical Guide

Nicholas mikolajewicz.

1 Faculty of Dentistry, McGill University, Montreal, QC, Canada

2 Shriners Hospital for Children-Canada, Montreal, QC, Canada

Svetlana V. Komarova

Associated data.

Basic life science literature is rich with information, however methodically quantitative attempts to organize this information are rare. Unlike clinical research, where consolidation efforts are facilitated by systematic review and meta-analysis, the basic sciences seldom use such rigorous quantitative methods. The goal of this study is to present a brief theoretical foundation, computational resources and workflow outline along with a working example for performing systematic or rapid reviews of basic research followed by meta-analysis. Conventional meta-analytic techniques are extended to accommodate methods and practices found in basic research. Emphasis is placed on handling heterogeneity that is inherently prevalent in studies that use diverse experimental designs and models. We introduce MetaLab , a meta-analytic toolbox developed in MATLAB R2016b which implements the methods described in this methodology and is provided for researchers and statisticians at Git repository ( https://github.com/NMikolajewicz/MetaLab ). Through the course of the manuscript, a rapid review of intracellular ATP concentrations in osteoblasts is used as an example to demonstrate workflow, intermediate and final outcomes of basic research meta-analyses. In addition, the features pertaining to larger datasets are illustrated with a systematic review of mechanically-stimulated ATP release kinetics in mammalian cells. We discuss the criteria required to ensure outcome validity, as well as exploratory methods to identify influential experimental and biological factors. Thus, meta-analyses provide informed estimates for biological outcomes and the range of their variability, which are critical for the hypothesis generation and evidence-driven design of translational studies, as well as development of computational models.


Evidence-based medical practice aims to consolidate best research evidence with clinical and patient expertise. Systematic reviews and meta-analyses are essential tools for synthesizing evidence needed to inform clinical decision making and policy. Systematic reviews summarize available literature using specific search parameters followed by critical appraisal and logical synthesis of multiple primary studies (Gopalakrishnan and Ganeshkumar, 2013 ). Meta-analysis refers to the statistical analysis of the data from independent primary studies focused on the same question, which aims to generate a quantitative estimate of the studied phenomenon, for example, the effectiveness of the intervention (Gopalakrishnan and Ganeshkumar, 2013 ). In clinical research, systematic reviews and meta-analyses are a critical part of evidence-based medicine. However, in basic science, attempts to evaluate prior literature in such rigorous and quantitative manner are rare, and narrative reviews are prevalent. The goal of this manuscript is to provide a brief theoretical foundation, computational resources and workflow outline for performing a systematic or rapid review followed by a meta-analysis of basic research studies.

Meta-analyses can be a challenging undertaking, requiring tedious screening and statistical understanding. There are several guides available that outline how to undertake a meta-analysis in clinical research (Higgins and Green, 2011 ). Software packages supporting clinical meta-analyses include the Excel plugins MetaXL (Barendregt and Doi, 2009 ) and Mix 2.0 (Bax, 2016 ), Revman (Cochrane Collaboration, 2011 ), Comprehensive Meta-Analysis Software [CMA (Borenstein et al., 2005 )], JASP (JASP Team, 2018 ) and MetaFOR library for R (Viechtbauer, 2010 ). While these packages can be adapted to basic science projects, difficulties may arise due to specific features of basic science studies, such as large and complex datasets and heterogeneity in experimental methodology. To address these limitations, we developed a software package aimed to facilitate meta-analyses of basic research, MetaLab in MATLAB R2016b, with an intuitive graphical interface that permits users with limited statistical and coding background to proceed with a meta-analytic project. We organized MetaLab into six modules ( Figure 1 ), each focused on different stages of the meta-analytic process, including graphical-data extraction, model parameter estimation, quantification and exploration of heterogeneity, data-synthesis, and meta-regression.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0001.jpg

General framework of MetaLab . The Data Extraction module assists with graphical data extraction from study figures. Fit Model module applies Monte-Carlo error propagation approach to fit complex datasets to model of interest. Prior to further analysis, reviewers have opportunity to manually curate and consolidate data from all sources. Prepare Data module imports datasets from a spreadsheet into MATLAB in a standardized format. Heterogeneity, Meta-analysis and Meta-regression modules facilitate meta-analytic synthesis of data.

In the present manuscript, we describe each step of the meta-analytic process with emphasis on specific considerations made when conducting a review of basic research. The complete workflow of parameter estimation using MetaLab is demonstrated for evaluation of intracellular ATP content in osteoblasts (OB [ATP] ic dataset) based on a rapid literature review. In addition, the features pertaining to larger datasets are explored with the ATP release kinetics from mechanically-stimulated mammalian cells (ATP release dataset) obtained as a result of a systematic review in our prior work (Mikolajewicz et al., 2018 ).

MetaLab can be freely accessed at Git repository ( https://github.com/NMikolajewicz/MetaLab ), and a detailed documentation of how to use MetaLab together with a working example is available in the Supporting materials .

Validity of Evidence in the Basic Sciences

To evaluate the translational potential of basic research, the validity of evidence must first be assessed, usually by examining the approach taken to collect and evaluate the data. Studies in the basic sciences are broadly grouped as hypothesis-generating and hypothesis-driven. The former tend to be small-sampled proof-of-principle studies and are typically exploratory and less valid than the latter. An argument can even be made that studies that report novel findings fall into this group as well, since their findings remain subject to external validation prior to being accepted by the broader scientific community. Alternatively, hypothesis-driven studies build upon what is known or strongly suggested by earlier work. These studies can also validate prior experimental findings with incremental contributions. Although such studies are often overlooked and even dismissed due to a lack of substantial novelty, their role in external validation of prior work is critical for establishing the translational potential of findings.

Another dimension to the validity of evidence in the basic sciences is the selection of experimental model. The human condition is near-impossible to recapitulate in a laboratory setting, therefore experimental models (e.g., cell lines, primary cells, animal models) are used to mimic the phenomenon of interest, albeit imperfectly. For these reasons, the best quality evidence comes from evaluating the performance of several independent experimental models. This is accomplished through systematic approaches that consolidate evidence from multiple studies, thereby filtering the signal from the noise and allowing for side-by-side comparison. While systematic reviews can be conducted to accomplish a qualitative comparison, meta-analytic approaches employ statistical methods which enable hypothesis generation and testing. When a meta-analysis in the basic sciences is hypothesis-driven, it can be used to evaluate the translational potential of a given outcome and provide recommendations for subsequent translational- and clinical-studies. Alternatively, if meta-analytic hypothesis testing is inconclusive, or exploratory analyses are conducted to examine sources of inconsistency between studies, novel hypotheses can be generated, and subsequently tested experimentally. Figure 2 summarizes this proposed framework.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0002.jpg

Schematic of proposed hierarchy of translational potential in basic research.

Steps in Quantitative Literature Review

All meta-analytic efforts prescribe to a similar workflow, outlined as follows:

  • Define primary and secondary objectives
  • Determine breadth of question
  • Construct search strategy: rapid or systematic search
  • Screen studies and determine eligibility
  • Extract data from relevant studies
  • Collect relevant study-level characteristics and experi-mental covariates
  • Evaluate quality of studies
  • Estimate model parameters for complex relation-ships (optional)
  • Compute appropriate outcome measure
  • Evaluate extent of between-study inconsistency (heterogeneity)
  • Perform relevant data transformations
  • Select meta-analytic model
  • Pool data and calculate summary measure and confidence interval
  • Explore potential sources of heterogeneity (ex. biological or experimental)
  • Subgroup and meta-regression analyses
  • Interpret findings
  • Provide recommendations for future work

Meta-Analysis Methodology

Search and selection strategies.

The first stage of any review involves formulating a primary objective in the form of a research question or hypothesis. Reviewers must explicitly define the objective of the review before starting the project, which serves to reduce the risk of data dredging, where reviewers later assign meaning to significant findings. Secondary objectives may also be defined; however, precaution must be taken as the search strategies formulated for the primary objective may not entirely encompass the body of work required to address the secondary objective. Depending on the purpose of a review, reviewers may choose to undertake a rapid or systematic review. While the meta-analytic methodology is similar for systematic and rapid reviews, the scope of literature assessed tends to be significantly narrower for rapid reviews permitting the project to proceed faster.

Systematic Review and Meta-Analysis

Systematic reviews involve comprehensive search strategies that enable reviewers to identify all relevant studies on a defined topic (DeLuca et al., 2008 ). Meta-analytic methods then permit reviewers to quantitatively appraise and synthesize outcomes across studies to obtain information on statistical significance and relevance. Systematic reviews of basic research data have the potential of producing information-rich databases which allow extensive secondary analysis. To comprehensively examine the pool of available information, search criteria must be sensitive enough not to miss relevant studies. Key terms and concepts that are expressed as synonymous keywords and index terms, such as Medical Subject Headings (MeSH), must be combined using Boolean operators AND, OR and NOT (Ecker and Skelly, 2010 ). Truncations, wildcards, and proximity operators can also help refine a search strategy by including spelling variations and different wordings of the same concept (Ecker and Skelly, 2010 ). Search strategies can be validated using a selection of expected relevant studies. If the search strategy fails to retrieve even one of the selected studies, the search strategy requires further optimization. This process is iterated, updating the search strategy in each iterative step until the search strategy performs at a satisfactory level (Finfgeld-Connett and Johnson, 2013 ). A comprehensive search is expected to return a large number of studies, many of which are not relevant to the topic, commonly resulting in a specificity of <10% (McGowan and Sampson, 2005 ). Therefore, the initial stage of sifting through the library to select relevant studies is time-consuming (may take 6 months to 2 years) and prone to human error. At this stage, it is recommended to include at least two independent reviewers to minimize selection bias and related errors. Nevertheless, systematic reviews have a potential to provide the highest quality quantitative evidence synthesis to directly inform the experimental and computational basic, preclinical and translational studies.

Rapid Review and Meta-Analysis

The goal of the rapid review, as the name implies, is to decrease the time needed to synthesize information. Rapid reviews are a suitable alternative to systematic approaches if reviewers prefer to get a general idea of the state of the field without an extensive time investment. Search strategies are constructed by increasing search specificity, thus reducing the number of irrelevant studies identified by the search at the expense of search comprehensiveness (Haby et al., 2016 ). The strength of a rapid review is in its flexibility to adapt to the needs of the reviewer, resulting in a lack of standardized methodology (Mattivi and Buchberger, 2016 ). Common shortcuts made in rapid reviews are: (i) narrowing search criteria, (ii) imposing date restrictions, (iii) conducting the review with a single reviewer, (iv) omitting expert consultation (i.e., librarian for search strategy development), (v) narrowing language criteria (ex. English only), (vi) foregoing the iterative process of searching and search term selection, (vii) omitting quality checklist criteria and (viii) limiting number of databases searched (Ganann et al., 2010 ). These shortcuts will limit the initial pool of studies returned from the search, thus expediting the selection process, but also potentially resulting in the exclusion of relevant studies and introduction of selection bias. While there is a consensus that rapid reviews do not sacrifice quality, or synthesize misrepresentative results (Haby et al., 2016 ), it is recommended that critical outcomes be later verified by systematic review (Ganann et al., 2010 ). Nevertheless, rapid reviews are a viable alternative when parameters for computational modeling need to be estimated. While systematic and rapid reviews rely on different strategies to select the relevant studies, the statistical methods used to synthesize data from the systematic and rapid review are identical.

Screening and Selection

When the literature search is complete (the date articles were retrieved from the databases needs to be recorded), articles are extracted and stored in a reference manager for screening. Before study screening, the inclusion and exclusion criteria must be defined to ensure consistency in study identification and retrieval, especially when multiple reviewers are involved. The critical steps in screening and selection are (1) removing duplicates, (2) screening for relevant studies by title and abstract, and (3) inspecting full texts to ensure they fulfill the eligibility criteria. There are several reference managers available including Mendeley and Rayyan, specifically developed to assist with screening systematic reviews. However, 98% of authors report using Endnote, Reference Manager or RefWorks to prepare their reviews (Lorenzetti and Ghali, 2013 ). Reference managers often have deduplication functions; however, these can be tedious and error-prone (Kwon et al., 2015 ). A protocol for faster and more reliable de-duplication in Endnote has been recently proposed (Bramer et al., 2016 ). The selection of articles should be sufficiently broad not to be dominated by a single lab or author. In basic research articles, it is common to find data sets that are reused by the same group in multiple studies. Therefore, additional precautions should be taken when deciding to include multiple studies published by a single group. At the end of the search, screening and selection process, the reviewer obtains a complete list of eligible full-text manuscripts. The entire screening and selection process should be reported in a PRISMA diagram, which maps the flow of information throughout the review according to prescribed guidelines published elsewhere (Moher et al., 2009 ). Figure 3 provides a summary of the workflow of search and selection strategies using the OB [ATP] ic rapid review and meta-analysis as an example.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0003.jpg

Example of the rapid review literature search. (A) Development of the search parameters to find literature on the intracellular ATP content in osteoblasts. (B) PRISMA diagram for the information flow.

Data Extraction, Initial Appraisal, and Preparation

Identification of parameters to be extracted.

It is advised to predefine analytic strategies before data extraction and analysis. However, the availability of reported effect measures and study designs will often influence this decision. When reviewers aim to estimate the absolute mean difference (absolute effect), normalized mean difference, response ratio or standardized mean difference (ex. Hedges' g), they need to extract study-level means (θ i ), standard deviations ( sd (θ i )), and sample sizes ( n i ), for control (denoted θ i c , s d ( θ i c ) , and n i c ) and intervention (denoted θ i r , s d ( θ i r ) , and n i r ) groups, for studies i . To estimate absolute mean effect, only the mean ( θ i r ), standard deviation ( s d ( θ i r ) ) , and sample size ( n i r ) are required. In basic research, it is common for a single study to present variations of the same observation (ex. measurements of the same entity using different techniques). In such cases, each point may be treated as an individual observation, or common outcomes within a study can be pooled by taking the mean weighted by the sample size. Another consideration is inconsistency between effect size units reported on the absolute scale, for example, protein concentrations can be reported as g/cell, mol/cell, g/g wet tissue or g/g dry tissue. In such cases, conversion to a common representation is required for comparison across studies, for which appropriate experimental parameters and calibrations need to be extracted from the studies. While some parameters can be approximated by reviewers, such as cell-related parameters found in BioNumbers database (Milo et al., 2010 ) and equipment-related parameters presumed from manufacturer manuals, reviewers should exercise caution when making such approximations as they can introduce systematic errors that manifest throughout the analysis. When data conversion is judged to be difficult but negative/basal controls are available, scale-free measures (i.e., normalized, standardized, or ratio effects) can still be used in the meta-analysis without the need to convert effects to common units on the absolute scale. In many cases, reviewers may only be able to decide on a suitable effect size measure after data extraction is complete.

It is regrettably common to encounter unclear or incomplete reporting, especially for the sample sizes and uncertainties. Reviewers may choose to reject studies with such problems due to quality concerns or to employ conservative assumptions to estimate missing data. For example, if it is unclear if a study reports the standard deviation or standard error of the mean, it can be assumed to be a standard error, which provides a more conservative estimate. If a study does not report uncertainties but is deemed important because it focuses on a rare phenomenon, imputation methods have been proposed to estimate uncertainty terms (Chowdhry et al., 2016 ). If a study reports a range of sample sizes, reviewers should extract the lowest value. Strategies to handle missing data should be pre-defined and thoroughly documented.

In addition to identifying relevant primary parameters, a priori defined study-level characteristics that have a potential to influence the outcome, such as species, cell type, specific methodology, should be identified and collected in parallel to data extraction. This information is valuable in subsequent exploratory analyses and can provide insight into influential factors through between-study comparison.

Quality Assessment

Formal quality assessment allows the reviewer to appraise the quality of identified studies and to make informed and methodical decision regarding exclusion of poorly conducted studies. In general, based on initial evaluation of full texts, each study is scored to reflect the study's overall quality and scientific rigor. Several quality-related characteristics have been described (Sena et al., 2007 ), such as: (i) published in peer-reviewed journal, (ii) complete statistical reporting, (iii) randomization of treatment or control, (iv) blinded analysis, (v) sample size calculation prior to the experiment, (vi) investigation of a dose-response relationship, and (vii) statement of compliance with regulatory requirements. We also suggest that the reviewers of basic research studies assess (viii) objective alignment between the study in question and the meta-analytic project. This involves noting if the outcome of interest was the primary study objective or was reported as a supporting or secondary outcome, which may not receive the same experimental rigor and is subject to expectation bias (Sheldrake, 1997 ). Additional quality criteria specific to experimental design may be included at the discretion of the reviewer. Once study scores have been assembled, study-level aggregate quality scores are determined by summing the number of satisfied criteria, and then evaluating how outcome estimates and heterogeneity vary with study quality. Significant variation arising from poorer quality studies may justify study omission in subsequent analysis.

Extraction of Tabular and Graphical Data

The next step is to compile the meta-analytic data set, which reviewers will use in subsequent analysis. For each study, the complete dataset which includes parameters required to estimate the target outcome, study characteristics, as well as data necessary for unit conversion needs to be extracted. Data reporting in basic research are commonly tabular or graphical. Reviewers can accurately extract tabular data from the text or tables. However, graphical data often must be extracted from the graph directly using time consuming and error prone methods. The Data Extraction Module in MetaLab was developed to facilitate systematic and unbiased data extraction; Reviewers provide study figures as inputs, then specify the reference points that are used to calibrate the axes and extract the data ( Figures 4A,B ).

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0004.jpg

MetaLab data extraction procedure is accurate, unbiased and robust to quality of data presentation. (A,B) Example of graphical data extraction using MetaLab. (A) Original figure (Bodin et al., 1992 ) with axes, data points and corresponding errors marked by reviewer. (B) Extracted data with error terms. (C–F) Validation of MetaLab data-extraction module. (C) Synthetic datasets were constructed using randomly generated data coordinates and marker sizes. (D) Extracted values were consistent with true values evaluated by linear regression with the slope β slope , red line: line of equality. (E) Data extraction was unbiased, evaluated with distribution of percent errors between true and extracted values. E mean , E median , E min , and E max are mean, median, minimum, and maximum % error respectively. (F) The absolute errors of extracted data were independent of data marker size, red line: line regression with the slope β slope .

To validate the performance of the MetaLab Data Extraction Module, we generated figures using 319 synthetic data points plotted with varying markers sizes ( Figure 4C ). Extracted and actual values were correlated ( R 2 = 0.99) with the relationship slope estimated as 1.00 (95% CI: 0.99 to 1.01) ( Figure 4D ). Bias was absent, with a mean percent error of 0.00% (95% CI: −0.02 to 0.02%) ( Figure 4E ). The narrow range of errors between −2.00 and 1.37%, and consistency between the median and mean error indicated no skewness. Data marker size did not contribute to the extraction error, as 0.00% of the variation in absolute error was explained by marker size, and the slope of the relationship between marker size and extraction error was 0.000 (95% CI: −0.001, 0.002) ( Figure 4F ). There data demonstrate that graphical data can be reliably extracted using MetaLab .

Extracting Data From Complex Relationships

Basic science often focuses on natural processes and phenomena characterized by complex relationships between a series of inputs (e.g., exposures) and outputs (e.g., response). The results are commonly explained by an accepted model of the relationship, such as Michaelis-Menten model of enzyme kinetics which involves two parameters–V max for the maximum rate and K m for the substrate concentration half of V max . For meta-analysis, model parameters characterizing complex relationships are of interest as they allow direct comparison of different multi-observational datasets. However, study-level outcomes for complex relationships often (i) lack consistency in reporting, and (ii) lack estimates of uncertainties for model parameters. Therefore, reviewers wishing to perform a meta-analysis of complex relationships may need to fit study-level data to a unified model y = f ( x , β) to estimate parameter set β characterizing the relationship ( Table 1 ), and assess the uncertainty in β.

Commonly used models of complex relationships in basic sciences.

The study-level data can be fitted to a model using conventional fitting methods, in which the model parameter error terms depend on the goodness of fit and number of available observations. Alternatively, a Monte Carlo simulation approach (Cox et al., 2003 ) allows for the propagation of study-level variances (uncertainty in the model inputs) to the uncertainty in the model parameter estimates ( Figure 5 ). Suppose that study i reported a set of k predictor variables x = { x j |1 ≤ j ≤ k } for a set of outcomes θ = {θ j |1 ≤ j ≤ k }, and that there is a corresponding set of standard deviations sd (θ) = { sd (θ j )|1 ≤ j ≤ k } and sample sizes n = { n j |1 ≤ j ≤ k } ( Figure 5A ). The Monte Carlo error propagation method assumes that outcomes are normally distributed, enabling pseudo random observations to be sampled from a distribution approximated by N ( θ j , s d ( θ j ) 2 ) . The pseudo random observations are then averaged to obtain a Monte-Carlo estimate θ j * for each observation such that

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0005.jpg

Model parameter estimation with Monte-Carlo error propagation method. (A) Study-level data taken from ATP release meta-analysis. (B) Assuming sigmoidal model, parameters were estimated using Fit Model MetaLab module by randomly sampling data from distributions defined by study level data. Model parameters were estimated for each set of sampled data. (C) Final model using parameters estimated from 400 simulations. (D) Distributions of parameters estimated for given dataset are unimodal and symmetrical.

where θ( j , m ) * represents a pseudo-random variable sampled n j times from N ( θ j , s d ( θ j ) 2 ) . The relationship between x and θ * = { θ j * | 1 ≤ j ≤ k } is then fitted with the model of interest using the least-squares method to obtain an estimate of model parameters β ( Figure 5B ). After many iterations of resampling and fitting, a distribution of parameter estimates N ( β ¯ ,   s d ( β ¯ ) 2 ) is obtained, from which the parameter means β ¯ and variances s d ( β ¯ ) 2 can be estimated ( Figures 5C,D ). As the number of iterations M tend to infinity, the parameter estimate converges to the expected value E (β).

It is critical for reviewers to ensure the data is consistent with the model such that the estimated parameters sufficiently capture the information conveyed in the underlying study-level data. In general, reliable model fittings are characterized by normal parameter distributions ( Figure 5D ) and have a high goodness of fit as quantified by R 2 . The advantage of using the Monte-Carlo approach is that it works as a black box procedure that does not require complex error propagation formulas, thus allowing handling of correlated and independent parameters without additional consideration.

Study-Level Effect Sizes

Depending on the purpose of the review product, study-level outcomes θ i can be expressed as one of several effect size measures. The absolute effect size, computed as a mean outcome or absolute difference from baseline, is the simplest, is independent of variance, and retains information about the context of the data (Baguley, 2009 ). However, the use of absolute effect size requires authors to report on a common scale or provide conversion parameters. In cases where a common scale is difficult to establish, a scale-free measure, such as standardized, normalized or relative measures can be used. Standardized mean differences, such Hedges' g or Cohen d, report the outcome as the size of the effect (difference between the means of experimental and control groups) relative to the overall variance (pooled and weighted standard deviation of combined experimental and control groups). The standardized mean difference, in addition to odds or risk ratios, is widely used in meta-analysis of clinical studies (Vesterinen et al., 2014 ), since it allows to summarize metrics that do not have unified meaning (e.g., a pain score), and takes into account the variability in the samples. However, the standardized measure is rarely used in basic science since study outcomes are commonly a defined measure, sample sizes are small, and variances are highly influenced by experimental and biological factors. Other measures that are more suited for basic science are the normalized mean difference, which expresses the difference between the outcome and baseline as a proportion of the baseline (alternatively called the percentage difference), and response ratio, which reports the outcome as a proportion of the baseline. All discussed measures have been included in MetaLab ( Table 2 ).

Types of effect sizes.

Provided are formulas to calculate the mean and standard error for the specified effect sizes .

Data Synthesis

The goal of any meta-analysis is to provide an outcome estimate that is representative of all study-level findings. One important feature of the meta-analysis is its ability to incorporate information about the quality and reliability of the primary studies by weighing larger, better reported studies more heavily. The two quantities of interest are the overall estimate and the measure of the variability in this estimate. Study-level outcomes θ i are synthesized as a weighted mean θ ^ according to the study-level weights w i :

where N is number of studies or datasets. The choice of a weighting scheme dictates how study-level variances are pooled to estimate the variance of the weighted mean. The weighting scheme thus significantly influences the outcome of meta-analysis, and if poorly chosen, potentially risks over-weighing less precise studies and generating a less valid, non-generalizable outcome. Thus, the notion of defining an a priori analysis protocol has to be balanced with the need to assure that the dataset is compatible with the chosen analytic strategy, which may be uncertain prior to data extraction. We provide strategies to compute and compare different study-level and global outcomes and their variances.

Weighting Schemes

To generate valid estimates of cumulative knowledge, studies are weighed according to their reliability. This conceptual framework, however, deteriorates if reported measures of precision are themselves flawed. The most commonly used measure of precision is the inverse variance which is a composite measure of total variance and sample size, such that studies with larger sample sizes and lower experimental errors are more reliable and more heavily weighed. Inverse variance weighting schemes are valid when (i) sampling error is random, (ii) the reported effects are homoscedastic, i.e., have equal variance and (iii) the sample size reflects the number of independent experimental observations. When assumptions (i) or (ii) are violated, sample size weighing can be used as an alternative. Despite sample size and sample variance being such critical parameters in the estimation of the global outcome, they are often prone to deficient reporting practices.

Potential problems with sample variance and sample size

The standard error se (θ i ) is required to compute inverse variance weights, however, primary literature as well as meta-analysis reviewers often confuse standard errors with standard deviations sd (θ i ) (Altman and Bland, 2005 ). Additionally, many assays used in basic research often have uneven error distributions, such that the variance component arising from experimental error depends on the magnitude of the effect (Bittker and Ross, 2016 ). Such uneven error distributions will lead to biased weighing that does not reflect true precision in measurement. Fortunately, the standard error and standard deviation have characteristic properties that can be assessed by the reviewer to determine whether inverse variance weights are appropriate for a given dataset. The study-level standard error se (θ i ) is a measure of precision and is estimated as the product of the sample standard deviation sd (θ i ) and margin of error 1 n i for study i . Therefore, the standard error is expected to be approximately inversely proportionate to the root of the study-level sample size n i

Unlike the standard error, the standard deviation–a measure of the variance of a random variable sd (θ) 2 -is assumed to be independent of the sample size because it is a descriptive statistic rather than a precision statistic. Since the total observed study-level sample variance is the sum of natural variability (assumed to be constant for a phenomenon) and random error, no relationship is expected between reported standard deviations and sample sizes. These assumptions can be tested by correlation analysis and can be used to inform the reviewer about the reliability of the study-level uncertainty measures. For example, a relationship between sample size and sample variance was observed for the OB [ATP] ic dataset ( Figure 6A) , but not for the ATP release data ( Figure 6B ). Therefore, in the case of the OB [ATP] ic data set, lower variances are not associated with higher precision and inverse variance weighting is not appropriate. Sample sizes are also frequently misrepresented in the basic sciences, as experimental replicates and repeated experiments are often reported interchangeably (incorrectly) as sample sizes (Vaux et al., 2012 ). Repeated (independent) experiments refer to number of randomly sampled observations, while replicates refer to the repeated measurement of a sample from one experiment to improve measurement precision. Statistical inference theory assumes random sampling, which is satisfied by independent experiments but not by replicate measurements. Misrepresentative reporting of replicates as the sample size may artificially inflate the reliability of results. While this is difficult to identify, poor reporting may be reflected in the overall quality score of a study.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0006.jpg

Assessment of study-level outcomes. (A,B) Reliability of study-level error measures. Relationship between study-level squared standard deviation s d ( θ i ) 2 and sample sizes n i are assumed to be independent when reliably reported. Association between s d ( θ i ) 2 and n i was present in OB [ATP] ic data set (A) and absent in ATP release data set (B) , red line : linear regression. (C,D) Distributions of study-level outcomes. Assessment of unweighted (UW– black ) and weighted (fixed effect; FE– blue , random effects; RE– red , sample-size weighting; N– green ) study-level distributions of data from OB [ATP] ic ( C ) and ATP release ( D ) data sets, before ( left ) and after log 10 transformation ( right ). Heterogeneity was quantified by Q, I 2 , and H 2 heterogeneity statistics. (E,F ) After log 10 transformation, H 2 heterogeneity statistics increased for OB [ATP] ic data set ( E ) and decreased for ATP release ( F ) data set.

Inverse variance weighting

The inverse variance is the most common measure of precision, representing a composite measure of total variance and sample size. Widely used weighting schemes based on the inverse variance are fixed effect or random effects meta-analytic models. The fixed effect model assumes that all the studies sample one true effect γ. The observed outcome θ i for study i is then a function of a within-study error ε i , θ i = γ + ε i , where ε i is normally distributed ε i ~ N ( 0 , s e ( θ i ) 2 ) . The standard error se (θ i ) is calculated from the sample standard deviation sd (θ i ) and sample size n i as:

Alternatively, the random effects model supposes that each study samples a different true outcome μ i , such that the combined effect μ is the mean of a population of true effects. The observed effect θ i for study i is then influenced by the intrastudy error ε i and interstudy error ξ i , θ i = μ i + ε i + ξ i , where ξ i is also assumed to be normally distributed ξ i ~ N (0, τ 2 ), with τ 2 representing the extent of heterogeneity, or between-study (interstudy) variance.

Study-level estimates for a fixed effect or random effects model are weighted using the inverse variance:

These weights are used to calculate the global outcome θ ^ (Equation 3) and the corresponding standard error s e ( θ ^ ) :

where N = number of datasets/studies. In practice, random effects models are favored over the fixed effect model, due to the prevalence of heterogeneity in experimental methods and biological outcomes. However, when there is no between-study variability (τ 2 = 0), the random effects model reduces to a fixed effect model. In contrast, when τ 2 is exceedingly large and interstudy variance dominates the weighting term [ τ 2 ≫ s e ( θ i ) 2 ] , random effects estimates will tend to an unweighted mean.

Interstudy variance τ 2 estimators . Under the assumptions of a random effects model, the total variance is the sum of the intrastudy variance (experimental sampling error) and interstudy variance τ 2 (variability of true effects). Since the distribution of true effects is unknown, we must estimate the value of τ 2 based on study-level outcomes (Borenstein, 2009 ). The DerSimonian and Laird (DL) method is the most commonly used in meta-analyses (DerSimonian and Laird, 1986 ). Other estimators such as the Hunter and Schmidt (Hunter and Schmidt, 2004 ), Hedges (Hedges and Olkin, 1985 ), Hartung-Makambi (Hartung and Makambi, 2002 ), Sidik-Jonkman (Sidik and Jonkman, 2005 ), and Paule-Mandel (Paule and Mandel, 1982 ) estimators have been proposed as either alternatives or improvements over the DL estimator (Sanchez-Meca and Marin-Martinez, 2008 ) and have been implemented in MetaLab ( Table 3) . Negative values of τ 2 are truncated at zero. An overview of the various τ 2 estimators along with recommendations on their use can be found elsewhere (Veroniki et al., 2016 ).

Interstudy variance estimators.

N = number of datasets/studies .

Sample-size weighting

Sample-size weighting is preferred in cases where variance estimates are unavailable or unreliable. Under this weighting scheme, study-level sample sizes are used in place of inverse variances as weights. The sampling error is then unaccounted for; however, since sampling error is random, larger sample sizes will effectively average out the error and produce more dependable results. This is contingent on reliable reporting of sample sizes which is difficult to assess and can be erroneous as detailed above. For a sample size weighted estimate, study-level sample sizes n i replace weights that are used to calculate the global effect size θ ^ , such that

The pooled standard error s e ( θ ^ ) for the global effect is then:

While sample size weighting is less affected by sampling variance, the performance of this estimator depends on the availability of studies (Marin-Martinez and Sanchez-Meca, 2010 ). When variances are reliably reported, sample-size weights should roughly correlate to inverse variance weights under the fixed effect model.

Meta-Analytic Data Distributions

One important consideration the reviewer should attend to is the normality of the study-level effects distributions assumed by most meta-analytic methods. Non-parametric methods that do not assume normality are available but are more computationally intensive and inaccessible to non-statisticians (Karabatsos et al., 2015 ). The performance of parametric meta-analytic methods has been shown to be robust to non-normally distributed effects (Kontopantelis and Reeves, 2012 ). However, this robustness is achieved by deriving artificially high estimates of heterogeneity for non-normally distributed data, resulting in conservatively wide confidence intervals and severely underpowered results (Jackson and Turner, 2017 ). Therefore, it is prudent to characterize the underlying distribution of study-level effects and perform transformations to normalize distributions to preserve the inferential integrity of the meta-analysis.

Assessing data distributions

Graphical approaches, such as the histogram, are commonly used to assess the distribution of data; however, in a meta-analysis, they can misrepresent the true distribution of effect sizes that may be different due to unequal weights assigned to each study. To address this, we can use a weighted histogram to evaluate effect size distributions ( Figure 6 ). A weighted histogram can be constructed by first binning studies according to their effect sizes. Each bin is then assigned weighted frequencies, calculated as the sum of study-level weights within the given bin. The sum of weights in each bin are then normalized by the sum of all weights across all bins

where P j is the weighted frequency for bin j , w ij is the weight for the effect size in bin j from study i , and nBins is the total number of bins. If the distribution is found deviate from normality, the most common explanations are that (i) the distribution is skewed due to inconsistencies between studies, (ii) subpopulations exist within the dataset giving rise to multimodal distributions or (iii) the studied phenomenon is not normally distributed. The source of inconsistencies and multimodality can be explored during the analysis of heterogeneity (i.e., to determine whether study-level characteristics can explain observed discrepancies). Skewness may however be inherent to the data when values are small, variances are large, and values cannot be negative (Limpert et al., 2001 ) and has been credited to be characteristic of natural processes (Grönholm and Annila, 2007 ). For sufficiently large sample sizes the central limit theorem holds that the means of a skewed data are approximately normally distributed. However, due to common limitation in the number of studies available for meta-analyses, meta-analytic global estimates of skewed distributions are often sensitive to extreme values. In these cases, data transformation can be used to achieve a normal distribution on the logarithmic scale (i.e., lognormal distribution).

Lognormal distributions

Since meta-analytic methods typically assume normality, the log transformation is a useful tool used to normalize skewed distributions ( Figures 6C–F ). In the ATP release dataset, we found that log transformation normalized the data distribution. However, in the case of the OB [ATP] ic dataset, log transformation revealed a bimodal distribution that was otherwise not obvious on the raw scale.

Data normalization by log transformation allows meta-analytic techniques to maintain their inferential properties. The outcomes synthesized on the logarithmic scale can then be transformed to the original raw scale to obtain asymmetrical confidence intervals which further accommodate the skew in the data. Study-level effect sizes θ i can be related to the logarithmic mean Θ i through the forward log transformation, meta-analyzed on the logarithmic scale, and back-transformed to the original scale using one of the back-transformation methods ( Table 4 ). We have implemented three different back-transformation methods into MetaLab, including geometric approximation (anti-log), naïve approximation (rearrangement of forward-transformation method) and tailor series approximation (Higgins et al., 2008 ). The geometric back-transformation will yield an estimate of θ ^ that is approximately equal to the median of the study-level effects. The naïve or tailor series approximation differ in how the standard errors are approximated, which is used to obtain a point estimate on the original raw scale. The naïve and tailor series approximations were shown to maintain adequate inferential properties in the meta-analytic context (Higgins et al., 2008 ).

Logarithmic Transformation Methods.

Forward-transformation of study-level estimates θ i to corresponding log-transformed estimates Θ i , and back-transformation of meta-analysis outcome Θ ^ to the corresponding outcome θ ^ on the raw scale (Higgins et al., 2008 ). v 1−α/2 : confidence interval critical value at significance level α .

Confidence Intervals

Once the meta-analysis global estimate and standard error has been computed, reviewers may proceed to construct the confidence intervals (CI). The CI represents the range of values within which the true mean outcome is contained with the probability of 1-α. In meta-analyses, the CI conveys information about the significance, magnitude and direction of an effect, and is used for inference and generalization of an outcome. Values that do not fall in the range of the CI may be interpreted as significantly different. In general, the CI is computed as the product of the standard error s e ( θ ^ )   and the critical value v 1−α/2 :

CI estimators

The critical value v 1−α/2 is derived from a theoretical distribution and represents the significance threshold for level α. A theoretical distribution describes the probability of any given possible outcome occurrence for a phenomenon. Extreme outcomes that lie furthest from the mean are known as the tails. The most commonly used theoretical distributions are the z-distribution and t -distribution, which are both symmetrical and bell-shaped, but differ in how far reaching or “heavy” the tails are. Heavier tails will result in larger critical values which translate to wider confidence intervals, and vice versa. Critical values drawn from a z-distribution, known as z-scores ( z ), are used when data are normal, and a sufficiently large number of studies are available (>30). The tails of a z-distribution are independent of the sample size and reflect those expected for a normal distribution. Critical values drawn from a t-distribution, known as t-scores (t), also assume data are normally-distributed, however, are used when there are fewer available studies (<30) because the t-distribution tails are heavier. This produces more conservative (wider) CIs, which help ensure that the data are not misleading or misrepresentative when there is limited evidence available. The heaviness of the t-distribution tails is dictated by the degree of freedom df , which is related to the number of available studies N ( df = N − 1 ) such that fewer studies will result in heavier t-distribution tails and therefore larger critical values. Importantly, the t-distribution is asymptotically normal and will thus converge to a z-distribution for a sufficiently large number of studies, resulting in similar critical values. For example, for a significance level α = 0.05 (5% false positive rate), the z-distribution will always yield a critical value v = 1.96, regardless of how many studies are available. The t-distribution will however yield v = 2.78 for 5 studies, v = 2.26 for 10 studies, v = 2.05 for 30 studies and v = 1.98 for 100 studies, gradually converging to 1.96 as the number of studies increases. We have implemented the z-distribution and t-distribution CI estimators into MetaLab.

Evaluating Meta-Analysis Performance

In general, 95% of study-level outcomes are expected to fall within the range of the 95% global CI. To determine whether the global 95% CI is consistent with the underlying study-level outcomes, the coverage of the CI can be computed as the proportion of study-level 95% CIs that overlap with the global 95% CI:

The coverage is a performance measure used to determine whether inference made on the study-level is consistent with inference made on the meta-analytic level. Coverage that is less than expected for a specified significance level (i.e., <95% coverage for α = 0.05) may be indicative of inaccurate estimators, excessive heterogeneity or inadequate choice of meta-analytic model, while coverage exceeding 95% may indicate an inefficient estimator that results in insufficient statistical power.

Overall, the performance of a meta-analysis is heavily influenced by the choice of weighting scheme and data transformation ( Figure 7 ). This is especially evident in the smaller datasets, such as our OB [ATP] i example, where both the global estimates and the confidence intervals are dramatically different under different weighting schemes ( Figure 7A ). Working with larger datasets, such as ATP release kinetics, allows to somewhat reduce the influence of the assumed model ( Figure 7B ). However, normalizing data distribution (by log transformation) produces much more consistent outcomes under different weighting schemes for both datasets, regardless of the number of available studies ( Figures 7A,B , log 10 synthesis ).

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0007.jpg

Comparison of global effect estimates using different weighting schemes. (A,B) Global effect estimates for OB [ATP] ic (A) and ATP release (B) following synthesis of original data (raw, black ) or of log 10 -transformed data followed by back-transformation to original scale (log 10 , gray ). Global effects ± 95% CI were obtained with unweighted data (UW), or using fixed effect (FE), random effects (RE), and sample-size ( n ) weighting schemes.

Analysis of Heterogeneity

Heterogeneity refers to inconsistency between studies. A large part of conducting a meta-analysis involves quantifying and accounting for sources of heterogeneity that may compromise the validity of meta-analysis. Basic research meta-analytic datasets are expected to be heterogeneous because ( i ) basic research literature searches tend to retrieve more studies than clinical literature searches and ( ii ) experimental methodologies used in basic research are more diverse and less standardized compared to clinical research. The presence of heterogeneity may limit the generalizability of an outcome due to the lack of study-level consensus. Nonetheless, exploration of heterogeneity sources can be insightful for the field in general, as it can identify biological or methodological factors that influence the outcome.

Quantifying of Heterogeneity

Higgins and Thompson emphasized that a heterogeneity metric should be (i) dependent on magnitude of heterogeneity, (ii) independent of measurement scale, (iii) independent of sample size and (iv) easily interpretable (Higgins and Thompson, 2002 ). Regrettably, the most commonly used test of heterogeneity is the Cochrane's Q test (Borenstein, 2009 ), which has been repeatedly shown to have undesirable statistical properties (Higgins et al., 2003 ). Nonetheless, we will introduce it here, not because of its widespread use, but because it is an intermediary statistic used to obtain more useful measures of heterogeneity, H 2 and I 2 . The measure of total variation Q total statistic is calculated as the sum of the weighted squared differences between the study-level means θ i and the fixed effect estimate θ ^ F E :

The Q total statistic is compared to a chi-square (χ 2 ) distribution ( df = N-1 ) to obtain a p -value, which, if significant, supports the presence of heterogeneity. However, the Q -test has been shown to be inadequately powered when the number of studies is too low ( N < 10) and excessively powered when study number is too high (N > 50) (Gavaghan et al., 2000 ; Higgins et al., 2003 ). Additionally, the Q total statistic is not a measure of the magnitude of heterogeneity due to its inherent dependence on the number of studies. To address this limitation, H 2 heterogeneity statistics was developed as the relative excess in Q total over degrees of freedom df :

H 2 is independent of the number of studies in the meta-analysis and is indicative of the magnitude of heterogeneity (Higgins and Thompson, 2002 ). For values <1, H 2 is truncated at 1, therefore values of H 2 can range from one to infinity, where H 2 = 1 indicates homogeneity. The corresponding confidence intervals for H 2 are

Intervals that do not overlap with 1 indicate significant heterogeneity. A more easily interpretable measure of heterogeneity is the I 2 statistic, which is a transformation of H 2 :

The corresponding 95% CI for I 2 is derived from the 95% CI for H 2

Values of I 2 range between 0 and 100% and describe the percentage of total variation that is attributed to heterogeneity. Like H 2 , I 2 provides a measure of the magnitude of heterogeneity. Values of I 2 at 25, 50, and 75% are generally graded as low, moderate and high heterogeneity, respectively (Higgins and Thompson, 2002 ; Pathak et al., 2017 ). However, several limitations have been noted for the I 2 statistic. I 2 has a non-linear dependence on τ 2 , thus I 2 will appear to saturate as it approaches 100% (Huedo-Medina et al., 2006 ). In cases of excessive heterogeneity, if heterogeneity is partially explained through subgroup analysis or meta-regression, residual unexplained heterogeneity may still be sufficient to maintain I 2 near saturation. Therefore, I 2 will fail to convey the decline in overall heterogeneity, while H 2 statistic that has no upper limit will allow to track changes in heterogeneity more meaningfully. In addition, a small number of studies (<10) will bias I 2 estimates, contributing to uncertainties inevitable associated with small meta-analyses (von Hippel, 2015 ). Of the three heterogeneity statistics Q total , H 2 and I 2 described, we recommend that H 2 is used as it best satisfies the criteria for a heterogeneity statistic defined by Higgins and Thompson ( 2002 ).

Identifying bias

Bias refers to distortions in the data that may result in misleading meta-analytic outcomes. In the presence of bias, meta-analysis outcomes are often contradicted by higher quality large sample-sized studies (Egger et al., 1997 ), thereby compromising the validity of the meta-analytic study. Sources of observed bias include publication bias, methodological inconsistencies and quality, data irregularities due to poor quality design, inadequate analysis or fraud, and availability or selection bias (Egger et al., 1997 ; Ahmed et al., 2012 ). At the level of study identification and inclusion for meta-analysis, systematic searches are preferred over rapid review search strategies, as narrow search strategies may omit relevant studies. Withholding negative results is also a common source of publication bias, which is further exacerbated by the small-study effect (the phenomenon by which smaller studies produce results with larger effect sizes than larger studies) (Schwarzer et al., 2015 ). By extension, smaller studies that produce negative results are more likely to not be published compared to larger studies that produce negative results. Identifying all sources of bias is unfeasible, however, tools are available to estimate the extent of bias present.

Funnel plots . Funnel plots have been widely used to assess the risk of bias and examine meta-analysis validity (Light and Pillemer, 1984 ; Borenstein, 2009 ). The logic underlying the funnel plot is that in the absence of bias, studies are symmetrically distributed around the fixed effect size estimate, due to sampling error being random. Moreover, precise study-level estimates are expected to be more consistent with the global effect size than less precise studies, where precision is inversely related to the study-level standard error. Thus, for an unbiased set of studies, study-level effects θ i plotted in relation to the inverse standard error 1/ se (θ i ) will produce a funnel shaped plot. Theoretical 95% CIs for the range of plotted standard errors are included as reference to visualize the expected distribution of studies in the absence of bias (Sterne and Harbord, 2004 ). When bias is present, study-level effects will be asymmetrically distributed around the global fixed-effect estimate. In the past, funnel plot asymmetries have been attributed solely to publication bias, however they should be interpreted more broadly as a general presence of bias or heterogeneity (Sterne et al., 2011 ). It should be noted that rapid reviews ( Figure 8A , left ) are far more subject to bias than systematic reviews ( Figure 8A , right ), due to the increased likelihood of relevant study omission.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0008.jpg

Analysis of heterogeneity and identification of influential studies. (A) Bias and heterogeneity in OB [ATP] ic ( left ) and ATP release ( right ) data sets were assessed with funnel plots. Log 10 -transformed study-level effect sizes (black markers) were plotted in relation to their precision assessed as inverse of standard error (1/SE). Blue dashed line : fixed effect estimate, red dashed line : random effects estimate, gray lines : Expected 95% confidence interval (95% CI) in the absence of bias/heterogeneity. (B) OB [ATP] ic were evaluated using Baujat plot and inconsistent and influential studies were identified in top right corner of plot ( arrows ). (C,D) Effect of the single study exclusion (C) and cumulative sequential exclusion of the most inconsistent studies (D) . Left : heterogeneity statistics, H 2 ( red line ) and I 2 ( black line ). Right : 95% CI ( red band ) and Q -test p -value ( black line ). Arrows : influential studies contributing to heterogeneity (same as those identified on Baujat Plot). Dashed Black line : homogeneity threshold T H where Q -test p = 0.05.

Heterogeneity sensitivity analyses

Inconsistencies between studies can arise for a number of reasons, including methodological or biological heterogeneity (Patsopoulos et al., 2008 ). Since accounting for heterogeneity is an essential part of any meta-analysis, it is of interest to identify influential studies that may contribute to the observed heterogeneity.

Baujat plot . The Baujat Plot was proposed as a diagnostic tool to identify the studies that contribute most to heterogeneity and influence the global outcome (Baujat, 2002 ). The graph illustrates the contribution Q i in f of each study to heterogeneity on the x-axis

and contribution θ i in f to global effect on the y-axis

Studies that strongly influence the global outcome and contribute to heterogeneity are visualized in the upper right corner of the plot ( Figure 8B ). This approach has been used to identify outlying studies in the past (Anzures-Cabrera and Higgins, 2010 ).

Single-study exclusion sensitivity . Single-study exclusion analysis assesses the sensitivity of the global outcome and heterogeneity to exclusion of single studies. The global outcomes and heterogeneity statistics are computed for a dataset with a single omitted study; single study exclusion is iterated for all studies; and influential outlying studies are identified by observing substantial declines in observed heterogeneity, as determined by Q total , H 2 , or I 2 , and by significant differences in the global outcome ( Figure 8C ). Influential studies should not be blindly discarded, but rather carefully examined to determine the reason for inconsistency. If a cause for heterogeneity can be identified, such as experimental design flaw, it is appropriate to omit the study from the analysis. All reasons for omission must be justified and made transparent by reviewers.

Cumulative-study exclusion sensitivity . Cumulative study exclusion sequentially removes studies to maximize the decrease in total variance Q total , such that a more homogenous set of studies with updated heterogeneity statistics is achieved with each iteration of exclusion ( Figure 8D ).

This method was proposed by Patsopoulos et al. to achieve desired levels of homogeneity (Patsopoulos et al., 2008 ), however, Higgins argued that its application should remain limited to (i) quantifying the extent to which heterogeneity permeates the set of studies and (ii) identifying sources of heterogeneity (Higgins, 2008 ). We propose the homogeneity threshold T H as a measure of heterogeneity that can be derived from cumulative-study exclusion sensitivity analysis. The homogeneity threshold describes the percentage of studies that need to be removed (by the maximal Q-reduction criteria) before a homogenous set of studies is achieved. For example, in the OB [ATP] ic dataset, the homogeneity threshold was 71%, since removal of 71% of the most inconsistent studies resulted in a homogeneous dataset ( Figure 8D , right ). After homogeneity is attained by cumulative exclusion, the global effect generally stabilizes with respect to subsequent study removal. This metric provides information about the extent of inconsistency present in the set of studies that is scale invariant (independent of the number of studies), and is easily interpretable.

Exploratory Analyses

The purpose of an exploratory analysis is to understand the data in ways that may not be represented by a pooled global estimate. This involves identifying sources of observed heterogeneity related to biological and experimental factors. Subgroup and meta-regression analyses are techniques used to explore known data groupings define by study-level characteristics (i.e., covariates). Additionally, we introduce the cluster-covariate dependence analysis, which is an unsupervised exploratory technique used to identify covariates that coincide well will natural groupings within the data, and the intrastudy regression analysis, which is used to validate meta-regression outcomes.

Cluster-covariate dependence analysis

Natural groupings within the data can be informative and serve as a basis to guide further analysis. Using an unsupervised k-means clustering approach (Lloyd, 1982 ), we can identify natural groupings within the study-level data and assign cluster memberships to these data ( Figure 9A ). Reviewers then have two choices: either proceed directly to subgroup analysis ( Figure 9B ) or look for covariates that co-cluster with cluster memberships ( Figure 9C ) In the latter case, dependencies between cluster memberships and known data covariates can be tested using Pearson's Chi-Squared test for independence. Covariates that coincide with clusters can be verified by subgroup analysis ( Figure 9D ). The dependence test is limited by the availability of studies and requires that at least 80% of covariate-cluster pairs are represented by at least 5 studies (McHugh, 2013 ). Clustering results should be considered exploratory and warrant further investigation due to several limitations. If the subpopulations were identified through clustering, however they do not depend on extracted covariates, reviewers risk assigning misrepresentative meaning to these clusters. Moreover, conventional clustering methods always converge to a result, therefore the data will still be partitioned even in the absence of natural data groupings. Future adaptations of this method might involve using different clustering algorithms (hierarchical clustering) or independence tests (G-test for independence) as well as introducing weighting terms to bias clustering to reflect study-level precisions.

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0009.jpg

Exploratory subgroup analysis. (A) Exploratory k-means clustering was used to partition OB [ATP] ic ( left ) and ATP release ( right ) data into potential clusters/subpopulations of interest. (B) Subgroup analysis of OB [ATP] ic data by differentiation status (immature – 0 to 3 day osteoblasts vs. mature – 4 to 28 day osteoblasts). Subgroup outcomes (fmol ATP/cell) estimated using sample-size weighting-scheme; black markers : Study-level outcomes ± 95% CI, marker sizes are proportional to sample size n . Orange and green bands : 95% CI for immature and mature osteoblast subgroups, respectively. (C) Dependence between ATP release cluster membership and known covariates/characteristics was assessed using Pearson's χ 2 independence test. Black bars : χ 2 test p -values for each covariate-cluster dependence test. Red line : α = 0.05 significance threshold. Arrow : most influential covariate (ex. recording method). (D) Subgroup analysis of ATP release by recording method. Subgroup outcomes (t half ) estimated using random effects weighting, τ 2 computed using DerSimonian-Laird estimator. Round markers : subgroup estimates ± 95% CI, marker sizes are proportional to number of studies per subgroup N . Gray band/diamond : global effect ± 95% CI.

Subgroup analysis

Subgroup analyses attempt to explain heterogeneity and explore differences in effects by partitioning studies into characteristic groups defined by study-level categorical covariates ( Figures 9B,D ; Table 5 ). Subgroup effects are estimated along with corresponding heterogeneity statistics. To evaluate the extent to which subgroup covariates contribute to observed inconsistencies, the explained heterogeneity Q between and unexplained heterogeneity Q within can be calculated.

Exploratory subgroup analysis.

Effect and heterogeneity estimates of ATP release by recording method .

where S is the total number of subgroups per given covariate and each subgroup j contains N j studies. The explained heterogeneity Q between is then the difference between total and subgroup heterogeneity:

If the p -value for the χ 2 distributed statistic Q between is significant, the subgrouping can be assumed to explain a significant amount of heterogeneity (Borenstein, 2009 ). Similarly, Q within statistic can be used to test whether there is any residual heterogeneity present within the subgroups.

The R e x p l a i n e d 2 is a related statistic that can be used to describe the percent of total heterogeneity that was explained by the covariate and is estimated as

Where pooled heterogeneity within subgroups τ w i t h i n 2 represents the remaining unexplained variation (Borenstein, 2009 ):

Subgroup analysis of the ATP release dataset revealed that recording method had a major influence on ATP release outcome, such that method A produced significantly lower outcomes than method B ( Figure 9D ; Table 5 , significance determined by non-overlapping 95% CIs). Additionally, recording method accounted for a significant amount of heterogeneity ( Q between , p < 0.001), however it represented only 4% ( R e x p l a i n e d 2 ) of the total observed heterogeneity. Needless to say, the remaining 96% of heterogeneity is significant ( Q within , p < 0.001). To explore the remaining heterogeneity, additional subgroup analysis can be conducted by further stratifying method A and method B subgroups by other covariates. However, in many meta-analyses multi-level data stratification may be unfeasible if covariates are unavailable or if the number of studies within subgroups are low.

Multiple comparisons . When multiple subgroups are present for a given covariate, and the reviewer wishes to investigate the statistical differences between the subgroups, the problem of multiple comparisons should be addressed. Error rates are multiplicative and increase substantially as the number of subgroup comparisons increases. The Bonferroni correction has been advocated to control for false positive findings in meta-analyses (Hedges and Olkin, 1985 ) which involves adjusting the significance threshold:

α * is the adjusted significance threshold to attain intended error rates α for m subgroup comparisons. Confidence intervals can then be computed using α * in place of α:


Meta-regression attempts to explain heterogeneity by examining the relationship between study-level outcomes and continuous covariates while incorporating the influence of categorical covariates ( Figure 10A ). The main differences between conventional linear regression and meta-regression are (i) the incorporation of weights and (ii) covariates are at the level of the study rather than the individual sample. The magnitude of the relationship β n between the covariates x n,i and outcome y i for study i and covariate n are of interest when conducting a meta-regression analysis. It should be noted that the intercept β 0 of a meta-regression with negligible effect of covariates is equivalent to the estimate approximated by a weighted mean (Equation 3). The generalized meta-regression model is specified as

An external file that holds a picture, illustration, etc.
Object name is fphys-10-00203-g0010.jpg

Meta-regression analysis and validation. (A) Relationship between osteoblast differentiation day (covariate) and intracellular ATP content (outcome) investigated by meta-regression analysis. Outcomes are on log 10 scale, meta-regression markers sizes are proportional to weights. Red bands : 95% CI. Gray bands : 95% CI of intercept only model. Solid red lines : intrastudy regression. (B) Meta-regression coefficient β inter ( black ) compared to intrastudy regression coefficient β intra ( red ). Shown are regression coefficients ± 95% CI.

where intrastudy variance ε i is

and the deviation from the distribution of effects η i depends on the chosen meta-analytic model:

The residual Q statistic that explains the dispersion of the studies from the regression line is calculated as follows

Where y i is the predicted value at x i according to the meta-regression model. Q residual is analogous to Q between computed during subgroup analysis and is used to test the degree of remaining unaccounted heterogeneity. Q residual is also used to approximate the unexplained interstudy variance τ r e s i d u a l 2

Which can be used to calculate R e x p l a i n e d 2 estimated as

Q model quantifies the amount of heterogeneity explained by the regression model and is analogous to Q within computed during subgroup analysis.

Intrastudy regression analysis The challenge of interpreting results from a meta-regression is that relationships that exist within studies may not necessarily exist across studies, and vice versa. Such inconsistencies are known as aggregation bias and in the context of meta-analyses can arise from excess heterogeneity or from confounding factors at the level of the study. This problem has been acknowledged in clinical meta-analyses (Thompson and Higgins, 2002 ), however cannot be corrected without access to individual patient data. Fortunately, basic research studies often report outcomes at varying predictor levels (ex. dose-response curves), permitting for intrastudy (within-study) relationships to be evaluated by the reviewer. If study-level regression coefficients can be computed for several studies ( Figure 10A , red lines ), they can be pooled to estimate an overall effect β intra . The meta-regression interstudy coefficient β inter and the overall intrastudy-regression coefficient β intra can then be compared in terms of magnitude and sign. Similarity in the magnitude and sign validates the existence of the relationship and characterizes its strength, while similarity in sign but not the magnitude, still supports the presence of the relationship, but calls for additional experiments to further characterize it. For the Ob [ATP] i dataset, the magnitude of the relationship between osteoblast differentiation day and intracellular ATP concentration was inconsistent between intrastudy and interstudy estimates, however the estimates were of consistent sign ( Figure 10B ).

Limitations of exploratory analyses

When performed with knowledge and care, exploratory analysis of meta-analytic data has an enormous potential for hypothesis generation, cataloging current practices and trends, and identifying gaps in the literature. Thus, we emphasize the inherent limitations of exploratory analyses:

Data dredging . A major pitfall in meta-analyses is data dredging (also known as p-hacking), which refers to searching for significant outcomes only to assign meaning later. While exploring the dataset for potential patterns can identify outcomes of interest, reviewers must be wary of random patterns that can arise in any dataset. Therefore, if a relationship is observed it should be used to generate hypotheses, which can then be tested on new datasets. Steps to avoid data dredging involve defining an a priori analysis plan for study-level covariates, limiting exploratory analysis of rapid review meta-analyses and correcting for multiple comparisons.

Statistical power . The statistical power reflects the probability of rejecting the null hypothesis when the alternative is true. Meta-analyses are believed to have higher statistical power than the underlying primary studies, however this is not always true (Hedges and Pigott, 2001 ; Jackson and Turner, 2017 ). Random effects meta-analyses handle data heterogeneity by accounting for between-study variance, however this weakens the inference properties of the model. To maintain statistical powers that exceed those of the contributing studies in a random effects meta-analysis, at least five studies are required (Jackson and Turner, 2017 ). This consequently limits subgroup analyses that partition studies into smaller groups to isolate covariate-dependent effects. Thus, reviewers should ensure that group are not under-represented to maintain statistical power. Another determinant of statistical power is the expected effect size, which if small, will be much more difficult to support with existing evidence than if it is large. Thus, if reviewers find that there is insufficient evidence to conclude that a small effect exists, this should not be interpreted as evidence of no effect.

Causal inference . Meta-analyses are not a tool for establishing causal inference. However, there are several criteria for causality that can be investigated through exploratory analyses that include consistency, strength of association, dose-dependence and plausibility (Weed, 2000 , 2010 ). For example, consistency, the strength of association, and dose-dependence can help establish that the outcome is dependent on exposure. However, reviewers are still posed with the challenge of accounting for confounding factors and bias. Therefore, while meta-analyses can explore various criteria for causality, causal claims are inappropriate, and outcomes should remain associative.


Meta-analyses of basic research can offer critical insights into the current state of knowledge. In this manuscript, we have adapted meta-analytic methods to basic science applications and provided a theoretical foundation, using OB [ATP] i and ATP release datasets, to illustrate the workflow. Since the generalizability of any meta-analysis relies on the transparent, unbiased and accurate methodology, the implications of deficient reporting practices and the limitations of the meta-analytic methods were discussed. Emphasis was placed on the analysis and exploration of heterogeneity. Additionally, several alternative and supporting methods have been proposed, including a method for validating meta-regression outcomes—intrastudy regression analysis, and a novel measure of heterogeneity—the homogeneity threshold. All analyses were conducted using MetaLab , a meta-analysis toolbox that we have developed in MATLAB R2016b. MetaLab has been provided for free to promote meta-analyses in basic research ( https://github.com/NMikolajewicz/MetaLab ).

In its current state, the translational pipeline from benchtop to bedside is an inefficient process, in one case estimated to produce ~1 clinically favorable clinical outcome for ~1,000 basic research studies (O'Collins et al., 2006 ). The methods we have described here serve as a general framework for comprehensive data consolidation, knowledge gap-identification, evidence-driven hypothesis generation and informed parameter estimation in computation modeling, which we hope will contribute to meta-analytic outcomes that better inform translation studies, thereby minimizing current failures in translational research.

Author Contributions

Both authors contributed to the study conception and design, data acquisition and interpretation and drafting and critical revision of the manuscript. NM developed MetaLab. Both authors approved the final version to be published.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


This work was supported by Natural Sciences and Engineering Research Council (NSERC, RGPIN-288253) and Canadian Institutes for Health Research (CIHR MOP-77643). NM was supported by the Faculty of Dentistry, McGill University and le Réseau de Recherche en Santé Buccodentaire et Osseuse (RSBO). Special thanks to Ali Mohammed (McGill University) for help with validation of MetaLab data extraction module.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphys.2019.00203/full#supplementary-material

  • Ahmed I., Sutton A. J., Riley R. D. (2012). Assessment of publication bias, selection bias, and unavailable data in meta-analyses using individual participant data: a database survey . Br. Med. J. 344 :d7762 10.1136/bmj.d7762 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altman D. G., Bland J. M. (2005). Standard deviations and standard errors . Br. Med. J. 331 , 903–903. 10.1136/bmj.331.7521.903 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Anzures-Cabrera J., Higgins J. P. T. (2010). Graphical displays for meta-analysis: an overview with suggestions for practice . Res. Synth. Methods 1 , 66–80. 10.1002/jrsm.6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Baguley T. (2009). Standardized or simple effect size: what should be reported? Br. J. Soc. Psychol. 100 , 603–617. 10.1348/000712608X377117 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Barendregt J., Doi S. (2009). MetaXL User Guide: Version 1.0 . Wilston, QLD: EpiGear International Pty Ltd. [ Google Scholar ]
  • Baujat B. (2002). A graphical method for exploring heterogeneity in meta-analyses: application to a meta-analysis of 65 trials . Stat. Med. 21 :18. 10.1002/sim.1221 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bax L. (2016). MIX 2.0 – Professional Software for Meta-analysis in Excel. Version BiostatXL . Available online at: https://www.meta-analysis-made-easy.com
  • Bittker J. A., Ross N. T. (2016). High Throughput Screening Methods: Evolution and Refinement. Cambridge: Royal Society of Chemistry; 10.1039/9781782626770 [ CrossRef ] [ Google Scholar ]
  • Bodin P., Milner P., Winter R., Burnstock G. (1992). Chronic hypoxia changes the ratio of endothelin to ATP release from rat aortic endothelial cells exposed to high flow . Proc. Biol. Sci. 247 , 131–135. 10.1098/rspb.1992.0019 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borenstein M. (2009). Introduction to Meta-Analysis . Chichester: John Wiley & Sons. 10.1002/9780470743386 [ CrossRef ] [ Google Scholar ]
  • Borenstein M., Hedges L., Higgins J. P. T., Rothstein H. R. (2005). Comprehensive meta-analysis (Version 2.2.027) [Computer software]. Englewood, CO. [ Google Scholar ]
  • Bramer W. M., Giustini D., de Jonge G. B., Holland L., Bekhuis T. (2016). De-duplication of database search results for systematic reviews in EndNote . J. Med. Libr. Assoc. 104 , 240–243. 10.3163/1536-5050.104.3.014 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chowdhry A. K., Dworkin R. H., McDermott M. P. (2016). Meta-analysis with missing study-level sample variance data . Stat. Med. 35 , 3021–3032. 10.1002/sim.6908 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cochrane Collaboration (2011). Review Manager (RevMan) [Computer Program] . Copenhagen. [ Google Scholar ]
  • Cox M., Harris P., Siebert B. R.-L. (2003). Evaluation of measurement uncertainty based on the propagation of distributions using monte carlo simulation . Measure. Techniq. 46 , 824–833. 10.1023/B:METE.0000008439.82231.ad [ CrossRef ] [ Google Scholar ]
  • DeLuca J. B., Mullins M. M., Lyles C. M., Crepaz N., Kay L., Thadiparthi S. (2008). Developing a comprehensive search strategy for evidence based systematic reviews . Evid. Based Libr. Inf. Pract. 3 , 3–32. 10.18438/B8KP66 [ CrossRef ] [ Google Scholar ]
  • DerSimonian R., Laird N. (1986). Meta-analysis in clinical trials . Control. Clin. Trials 7 , 177–188. 10.1016/0197-2456(86)90046-2 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ecker E. D., Skelly A. C. (2010). Conducting a winning literature search . Evid. Based Spine Care J. 1 , 9–14. 10.1055/s-0028-1100887 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Egger M., Smith G. D., Schneider M., Minder C. (1997). Bias in meta-analysis detected by a simple, graphical test . Br. Med. J. 315 , 629–634. 10.1136/bmj.315.7109.629 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Finfgeld-Connett D., Johnson E. D. (2013). Literature search strategies for conducting knowledge-building and theory-generating qualitative systematic reviews . J. Adv. Nurs. 69 , 194–204. 10.1111/j.1365-2648.2012.06037.x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ganann R., Ciliska D., Thomas H. (2010). Expediting systematic reviews: methods and implications of rapid reviews . Implementation Sci. 5 , 56–56. 10.1186/1748-5908-5-56 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gavaghan D. J., Moore R. A., McQuay H. J. (2000). An evaluation of homogeneity tests in meta-analyses in pain using simulations of individual patient data . Pain 85 , 415–424. 10.1016/S0304-3959(99)00302-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gopalakrishnan S., Ganeshkumar P. (2013). Systematic reviews and meta-analysis: understanding the best evidence in primary healthcare . J Fam. Med. Prim. Care 2 , 9–14. 10.4103/2249-4863.109934 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Grönholm T., Annila A. (2007). Natural distribution . Math. Biosci. 210 , 659–667. 10.1016/j.mbs.2007.07.004 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Haby M. M., Chapman E., Clark R., Barreto J., Reveiz L., Lavis J. N. (2016). What are the best methodologies for rapid reviews of the research evidence for evidence-informed decision making in health policy and practice: a rapid review . Health Res. Policy Syst. 14 :83. 10.1186/s12961-016-0155-7 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hartung J., Makambi K. H. (2002). Positive estimation of the between-study variance in meta-analysis: theory and methods . S. Afr. Stat. J. 36 , 55–76. [ Google Scholar ]
  • Hedges L. V., Olkin I. (1985). Statistical Methods for Meta-Analysis . New York, NY: Academic Press. [ Google Scholar ]
  • Hedges L. V., Pigott T. D. (2001). The power of statistical tests in meta-analysis . Psychol. Methods 6 , 203–217. 10.1037/1082-989X.6.3.203 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P. (2008). Commentary: heterogeneity in meta-analysis should be expected and appropriately quantified . Int. J. Epidemiol. 37 , 1158–1160. 10.1093/ije/dyn204 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P., Green S. (Eds.) (2011). Cochrane Handbook for Systematic Reviews of Interventions , Vol. 4 Oxford: John Wiley & Sons. [ Google Scholar ]
  • Higgins J. P., Thompson S. G. (2002). Quantifying heterogeneity in a meta-analysis . Stat. Med. 21 , 1539–1558. 10.1002/sim.1186 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P., Thompson S. G., Deeks J. J., Altman D. G. (2003). Measuring inconsistency in meta-analyses . Br. Med. J. 327 , 557–560. 10.1136/bmj.327.7414.557 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Higgins J. P., White I. R., Anzures-Cabrera J. (2008). Meta-analysis of skewed data: combining results reported on log-transformed or raw scales . Stat. Med. 27 , 6072–6092. 10.1002/sim.3427 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huedo-Medina T. B., Sanchez-Meca J., Marin-Martinez F., Botella J. (2006). Assessing heterogeneity in meta-analysis: Q statistic or I 2 index? Psychol. Methods 11 , 193–206. 10.1037/1082-989X.11.2.193 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hunter J. E., Schmidt F. L. (2004). Methods of Meta-analysis: Correcting Error and Bias in Research Findings . Thousand Oaks, CA: Sage. [ Google Scholar ]
  • Jackson D., Turner R. (2017). Power analysis for random-effects meta-analysis . Res. Synth. Methods 8 , 290–302. 10.1002/jrsm.1240 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • JASP Team (2018). JASP (Verision 0.9) [Computer Software] . Amsterdam. [ Google Scholar ]
  • Karabatsos G., Talbott E., Walker S. G. (2015). A Bayesian nonparametric meta-analysis model . Res. Synth. Methods 6 , 28–44. 10.1002/jrsm.1117 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kontopantelis E., Reeves D. (2012). Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: a simulation study . Stat. Methods Med. Res. 21 , 409–426. 10.1177/0962280210392008 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kwon Y., Lemieux M., McTavish J., Wathen N. (2015). Identifying and removing duplicate records from systematic review searches . J. Med. Libr. Assoc. 103 , 184–188. 10.3163/1536-5050.103.4.004 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Light R. J., Pillemer D. B. (1984). Summing Up: The Science of Reviewing Research . Cambridge, MA: Harvard University Press. [ Google Scholar ]
  • Limpert E., Stahel W. A., Abbt M. (2001). Log-normal Distributions across the sciences: keys and clues: on the charms of statistics, and how mechanical models resembling gambling machines offer a link to a handy way to characterize log-normal distributions, which can provide deeper insight into variability and probability—normal or log-normal: That is the question . AIBS Bull. 51 , 341–352. [ Google Scholar ]
  • Lloyd S. (1982). Least squares quantization in PCM . IEEE Trans. Inf. Theory 28 , 129–137. 10.1109/TIT.1982.1056489 [ CrossRef ] [ Google Scholar ]
  • Lorenzetti D. L., Ghali W. A. (2013). Reference management software for systematic reviews and meta-analyses: an exploration of usage and usability . BMC Med. Res. Methodol. 13 , 141–141. 10.1186/1471-2288-13-141 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Marin-Martinez F., Sanchez-Meca J. (2010). Weighting by inverse variance or by sample size in random-effects meta-analysis . Educ. Psychol. Meas. 70 , 56–73. 10.1177/0013164409344534 [ CrossRef ] [ Google Scholar ]
  • Mattivi J. T., Buchberger B. (2016). Using the amstar checklist for rapid reviews: is it feasible? Int. J. Technol. Assess. Health Care 32 , 276–283. 10.1017/S0266462316000465 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McGowan J., Sampson M. (2005). Systematic reviews need systematic searchers . J. Med. Libr. Assoc. 93 , 74–80. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • McHugh M. L. (2013). The Chi-square test of independence . Biochem. Med. 23 , 143–149. 10.11613/BM.2013.018 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mikolajewicz N., Mohammed A., Morris M., Komarova S. V. (2018). Mechanically-stimulated ATP release from mammalian cells: systematic review and meta-analysis . J. Cell Sci. 131 :22. 10.1242/jcs.223354 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Milo R., Jorgensen P., Moran U., Weber G., Springer M. (2010). BioNumbers—the database of key numbers in molecular and cell biology . Nucleic Acids Res. 38 :D750–3. 10.1093/nar/gkp889 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Moher D., Liberati A., Tetzlaff J., Altman D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement . PLoS Med. 6 :e1000097 10.1371/journal.pmed.1000097 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • O'Collins V. E., Macleod M. R., Donnan G. A., Horky L. L., van der Worp B. H., Howells D. W. (2006). 1,026 experimental treatments in acute stroke . Ann. Neurol. 59 , 467–477. 10.1002/ana.20741 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pathak M., Dwivedi S. N., Deo S. V. S., Sreenivas V., Thakur B. (2017). Which is the preferred measure of heterogeneity in meta-analysis and why? a revisit . Biostat Biometrics Open Acc . 1 , 1–7. 10.19080/BBOAJ.2017.01.555555 [ CrossRef ] [ Google Scholar ]
  • Patsopoulos N. A., Evangelou E., Ioannidis J. P. A. (2008). Sensitivity of between-study heterogeneity in meta-analysis: proposed metrics and empirical evaluation . Int. J. Epidemiol. 37 , 1148–1157. 10.1093/ije/dyn065 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Paule R. C., Mandel J. (1982). Consensus values and weighting factors . J. Res. Natl. Bur. Stand. 87 , 377–385. 10.6028/jres.087.022 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sanchez-Meca J., Marin-Martinez F. (2008). Confidence intervals for the overall effect size in random-effects meta-analysis . Psychol. Methods 13 , 31–48. 10.1037/1082-989X.13.1.31 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schwarzer G., Carpenter J. R., Rücker G. (2015). Small-study effects in meta-analysis , in Meta-Analysis with R , eds Schwarzer G., Carpenter J. R., Rücker G. (Cham: Springer International Publishing; ), 107–141. [ Google Scholar ]
  • Sena E., van der Worp H. B., Howells D., Macleod M. (2007). How can we improve the pre-clinical development of drugs for stroke? Trends Neurosci. 30 , 433–439. 10.1016/j.tins.2007.06.009 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sheldrake R. (1997). Experimental effects in scientific research: how widely are they neglected? Bull. Sci. Technol. Soc. 17 , 171–174. 10.1177/027046769701700405 [ CrossRef ] [ Google Scholar ]
  • Sidik K., Jonkman J. N. (2005). Simple heterogeneity variance estimation for meta-analysis . J. R. Stat. Soc. Ser. C Appl. Stat. 54 , 367–384. 10.1111/j.1467-9876.2005.00489.x [ CrossRef ] [ Google Scholar ]
  • Sterne J. A., Sutton A. J., Ioannidis J. P., Terrin N., Jones D. R., Lau J., et al.. (2011). Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials . Br. Med. J. 343 :d4002. 10.1136/bmj.d4002 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sterne J. A. C., Harbord R. (2004). Funnel plots in meta-analysis . Stata J. 4 , 127–141. 10.1177/1536867X0400400204 [ CrossRef ] [ Google Scholar ]
  • Thompson S. G., Higgins J. P. (2002). How should meta-regression analyses be undertaken and interpreted? Stat. Med. 21 , 1559–1573. 10.1002/sim.1187 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vaux D. L., Fidler F., Cumming G. (2012). Replicates and repeats—what is the difference and is it significant?: a brief discussion of statistics and experimental design . EMBO Rep. 13 , 291–296. 10.1038/embor.2012.36 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Veroniki A. A., Jackson D., Viechtbauer W., Bender R., Bowden J., Knapp G., et al.. (2016). Methods to estimate the between-study variance and its uncertainty in meta-analysis . Res. Synth. Methods 7 , 55–79. 10.1002/jrsm.1164 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Vesterinen H. M., Sena E. S., Egan K. J., Hirst T. C., Churolov L., Currie G. L., et al.. (2014). Meta-analysis of data from animal studies: a practical guide . J. Neurosci. Methods 221 , 92–102. 10.1016/j.jneumeth.2013.09.010 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Viechtbauer W. (2010). Conducting meta-analyses in R with the metafor package . J. Stat. Softw. 36 , 1–48. 10.18637/jss.v036.i03 [ CrossRef ] [ Google Scholar ]
  • von Hippel P. T. (2015). The heterogeneity statistic I 2 can be biased in small meta-analyses . BMC Med. Res. Methodol. 15 :35 10.1186/s12874-015-0024-z [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weed D. L. (2000). Interpreting epidemiological evidence: how meta-analysis and causal inference methods are related . Int. J. Epidemiol. 29 , 387–390. 10.1093/intjepid/29.3.387 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weed D. L. (2010). Meta-analysis and causal inference: a case study of benzene and non-hodgkin lymphoma . Ann. Epidemiol. 20 , 347–355. 10.1016/j.annepidem.2010.02.001 [ PubMed ] [ CrossRef ] [ Google Scholar ]

Trending now

23 best data visualization tools for 2024, data science roadmap 2024, business analytics basics: a beginner’s guide, best data analytics certification, best data science courses, power bi vs tableau: difference and comparison, top 30 excel formulas and functions you should know, 50 excel shortcuts key that you should know in 2024, what is regression analysis types | examples | uses, how to use vlookup in excel a step-by-step guide, what is data analysis: the essential guide.

What is Data Analysis? Process, Types, Methods, and Techniques

Table of Contents

Businesses today need every edge and advantage they can get. Thanks to obstacles like rapidly changing markets, economic uncertainty, shifting political landscapes, finicky consumer attitudes, and even global pandemics , businesses today are working with slimmer margins for error.

Companies that want to stay in business and thrive can improve their odds of success by making smart choices while answering the question: “What is data analysis?” And how does an individual or organization make these choices? They collect as much useful, actionable information as possible and then use it to make better-informed decisions!

Build a career in data analysis. Watch this video to learn about best data analysis courses in 2023.

This strategy is common sense, and it applies to personal life as well as business. No one makes important decisions without first finding out what’s at stake, the pros and cons, and the possible outcomes. Similarly, no company that wants to succeed should make decisions based on bad data. Organizations need information; they need data. This is where data analysis or data analytics enters the picture.

The job of understanding data is currently one of the growing industries in today's day and age, where data is considered as the 'new oil' in the market. Now, before getting into the details about the data analysis methods, let us first answer the question, what is data analysis?

Your Data Analytics Career is Around The Corner!

Your Data Analytics Career is Around The Corner!

What Is Data Analysis?

Although many groups, organizations, and experts have different ways of approaching data analysis, most of them can be distilled into a one-size-fits-all definition. Data analysis is the process of cleaning, changing, and processing raw data and extracting actionable, relevant information that helps businesses make informed decisions. The procedure helps reduce the risks inherent in decision-making by providing useful insights and statistics, often presented in charts, images, tables, and graphs.

A simple example of data analysis can be seen whenever we make a decision in our daily lives by evaluating what has happened in the past or what will happen if we make that decision. Basically, this is the process of analyzing the past or future and making a decision based on that analysis.

It’s not uncommon to hear the term “ big data ” brought up in discussions about data analysis. Data analysis plays a crucial role in processing big data into useful information. Neophyte data analysts who want to dig deeper by revisiting big data fundamentals should go back to the basic question, “ What is data ?”

Unlock new career opportunities with Simplilearn's non-coding courses . Gain valuable skills, explore diverse domains, and excel in the digital era.

Why is Data Analysis Important?

Here is a list of reasons why data analysis is crucial to doing business today.

  • Better Customer Targeting: You don’t want to waste your business’s precious time, resources, and money putting together advertising campaigns targeted at demographic groups that have little to no interest in the goods and services you offer. Data analysis helps you see where you should be focusing your advertising and marketing efforts.
  • You Will Know Your Target Customers Better: Data analysis tracks how well your products and campaigns are performing within your target demographic. Through data analysis, your business can get a better idea of your target audience’s spending habits, disposable income, and most likely areas of interest. This data helps businesses set prices, determine the length of ad campaigns, and even help project the number of goods needed.
  • Reduce Operational Costs: Data analysis shows you which areas in your business need more resources and money, and which areas are not producing and thus should be scaled back or eliminated outright.
  • Better Problem-Solving Methods: Informed decisions are more likely to be successful decisions. Data provides businesses with information. You can see where this progression is leading. Data analysis helps businesses make the right choices and avoid costly pitfalls.
  • You Get More Accurate Data: If you want to make informed decisions, you need data, but there’s more to it. The data in question must be accurate. Data analysis helps businesses acquire relevant, accurate information, suitable for developing future marketing strategies, business plans, and realigning the company’s vision or mission.

What Is the Data Analysis Process?

Answering the question “what is data analysis” is only the first step. Now we will look at how it’s performed. The process of data analysis, or alternately, data analysis steps, involves gathering all the information, processing it, exploring the data, and using it to find patterns and other insights. The process of data analysis consists of:

Data Requirement Gathering

Ask yourself why you’re doing this analysis, what type of data you want to use, and what data you plan to analyze.

Data Collection

Guided by your identified requirements, it’s time to collect the data from your sources. Sources include case studies, surveys, interviews, questionnaires, direct observation, and focus groups. Make sure to organize the collected data for analysis.

Data Cleaning

Not all of the data you collect will be useful, so it’s time to clean it up. This process is where you remove white spaces, duplicate records, and basic errors. Data cleaning is mandatory before sending the information on for analysis.

Data Analysis

Here is where you use data analysis software and other tools to help you interpret and understand the data and arrive at conclusions. Data analysis tools include Excel, Python , R, Looker, Rapid Miner, Chartio, Metabase, Redash, and Microsoft Power BI.

Data Interpretation

Now that you have your results, you need to interpret them and come up with the best courses of action based on your findings.

Data Visualization

Data visualization is a fancy way of saying, “graphically show your information in a way that people can read and understand it.” You can use charts, graphs, maps, bullet points, or a host of other methods. Visualization helps you derive valuable insights by helping you compare datasets and observe relationships.

Types of Data Analysis

A half-dozen popular types of data analysis are available today, commonly employed in the worlds of technology and business. They are: 

Descriptive Analysis

Descriptive analysis involves summarizing and describing the main features of a dataset. It focuses on organizing and presenting the data in a meaningful way, often using measures such as mean, median, mode, and standard deviation. It provides an overview of the data and helps identify patterns or trends.

Inferential Analysis

Inferential analysis aims to make inferences or predictions about a larger population based on sample data. It involves applying statistical techniques such as hypothesis testing, confidence intervals, and regression analysis. It helps generalize findings from a sample to a larger population.

Exploratory Data Analysis (EDA)

EDA focuses on exploring and understanding the data without preconceived hypotheses. It involves visualizations, summary statistics, and data profiling techniques to uncover patterns, relationships, and interesting features. It helps generate hypotheses for further analysis.

Diagnostic Analysis

Diagnostic analysis aims to understand the cause-and-effect relationships within the data. It investigates the factors or variables that contribute to specific outcomes or behaviors. Techniques such as regression analysis, ANOVA (Analysis of Variance), or correlation analysis are commonly used in diagnostic analysis.

Predictive Analysis

Predictive analysis involves using historical data to make predictions or forecasts about future outcomes. It utilizes statistical modeling techniques, machine learning algorithms, and time series analysis to identify patterns and build predictive models. It is often used for forecasting sales, predicting customer behavior, or estimating risk.

Prescriptive Analysis

Prescriptive analysis goes beyond predictive analysis by recommending actions or decisions based on the predictions. It combines historical data, optimization algorithms, and business rules to provide actionable insights and optimize outcomes. It helps in decision-making and resource allocation.

Next, we will get into the depths to understand about the data analysis methods.

Data Analysis Methods

Some professionals use the terms “data analysis methods” and “data analysis techniques” interchangeably. To further complicate matters, sometimes people throw in the previously discussed “data analysis types” into the fray as well! Our hope here is to establish a distinction between what kinds of data analysis exist, and the various ways it’s used.

Although there are many data analysis methods available, they all fall into one of two primary types: qualitative analysis and quantitative analysis .

Qualitative Data Analysis

The qualitative data analysis method derives data via words, symbols, pictures, and observations. This method doesn’t use statistics. The most common qualitative methods include:

  • Content Analysis, for analyzing behavioral and verbal data.
  • Narrative Analysis, for working with data culled from interviews, diaries, surveys.
  • Grounded Theory, for developing causal explanations of a given event by studying and extrapolating from one or more past cases.

Quantitative Data Analysis

Also known as statistical data analysis methods collect raw data and process it into numerical data. Quantitative analysis methods include:

  • Hypothesis Testing, for assessing the truth of a given hypothesis or theory for a data set or demographic.
  • Mean, or average determines a subject’s overall trend by dividing the sum of a list of numbers by the number of items on the list.
  • Sample Size Determination uses a small sample taken from a larger group of people and analyzed. The results gained are considered representative of the entire body. 

We can further expand our discussion of data analysis by showing various techniques, broken down by different concepts and tools. 

How to Analyze Data? Top Data Analysis Techniques to Apply

To analyze data effectively, you can apply various data analysis techniques. Here are some top techniques to consider:

Define Your Objectives

Clearly define the objectives of your data analysis. Understand the questions you want to answer or the insights you want to gain from the data. This will guide your analysis process.

Start by cleaning the data to ensure its quality and reliability. Remove duplicates, handle missing values, and correct any errors or inconsistencies. Data cleaning is crucial for accurate analysis.

Descriptive Statistics

Calculate descriptive statistics to understand the main characteristics of the data. Compute measures such as mean, median, mode, standard deviation, and percentiles. These statistics provide insights into the data's central tendency, spread, and distribution.

Create visual representations of the data using charts, graphs, or plots. Visualization helps spot patterns, trends, or outliers that may not be immediately apparent in the raw data. Use appropriate visualizations based on the type of data and the insights you want to convey.

Perform EDA techniques to explore the data deeply. Use data profiling, summary statistics, and visual exploration to identify patterns, relationships, or interesting features within the data. EDA helps generate hypotheses and guides further analysis.

Inferential Statistics

Apply inferential statistics to conclude the larger population based on sample data. Use techniques like hypothesis testing, confidence intervals, and regression analysis to test relationships, make predictions, or assess the significance of findings.

Machine Learning Algorithms

Utilize machine learning algorithms to analyze data and make predictions or classifications. Choose appropriate algorithms based on the nature of your data and the problem you're trying to solve. Train models using historical data and evaluate their performance on new data.

Clustering and Segmentation

Employ clustering techniques to identify groups or segments within your data. Clustering helps in understanding patterns or similarities between data points. It can be useful for customer segmentation, market analysis, or anomaly detection.

Time Series Analysis

If your data is collected over time, apply time series analysis techniques. Study trends, seasonality, and patterns in the data to forecast future values or identify underlying patterns or cycles.

Text Mining and NLP

If working with textual data, employ text mining and natural language processing techniques. Analyze sentiment, extract topics, classify text, or conduct entity recognition to derive insights from unstructured text data.

Remember, the choice of techniques depends on your specific data, objectives, and the insights you seek. It's essential to have a systematic and iterative approach, using multiple techniques to gain a comprehensive understanding of your data.

What Is the Importance of Data Analysis in Research?

A huge part of a researcher’s job is to sift through data. That is literally the definition of “research.” However, today’s Information Age routinely produces a tidal wave of data, enough to overwhelm even the most dedicated researcher. From a birds eye view, data analysis:

1. Plays a key role in distilling this information into a more accurate and relevant form, making it easier for researchers to do to their job.

2. Provides researchers with a vast selection of different tools, such as descriptive statistics, inferential analysis, and quantitative analysis.

3. Offers researchers better data and better ways to analyze and study said data.

Future-Proof Your AI/ML Career: Top Dos and Don'ts

Future-Proof Your AI/ML Career: Top Dos and Don'ts

Top Data Analysis Tools

So, here's a list of the top seven data analysis tools in terms of popularity, learning, and performance.

  • Tableau Public
  • R Programming
  • Apache Spark

Choose the Right Program

Looking to build a career in the exciting field of data analytics? Our Data Analytics courses are designed to provide you with the skills and knowledge you need to excel in this rapidly growing industry. Our expert instructors will guide you through hands-on projects, real-world scenarios, and case studies, giving you the practical experience you need to succeed. With our courses, you'll learn to analyze data, create insightful reports, and make data-driven decisions that can help drive business success.

Program Name Data Analyst Post Graduate Program In Data Analytics Data Analytics Bootcamp Geo All Geos All Geos US University Simplilearn Purdue Caltech Course Duration 11 Months 8 Months 6 Months Coding Experience Required No Basic No Skills You Will Learn 10+ skills including Python, MySQL, Tableau, NumPy and more Data Analytics, Statistical Analysis using Excel, Data Analysis Python and R, and more Data Visualization with Tableau, Linear and Logistic Regression, Data Manipulation and more Additional Benefits Applied Learning via Capstone and 20+ industry-relevant Data Analytics projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Access to Integrated Practical Labs Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

How to Become a Data Analyst

Now that we have answered the question “what is data analysis”, if you want to pursue a career in data analytics , you should start by first researching what it takes to become a data analyst . You can even check out the PG Program in Data Analytics in partnership with Purdue University . This program provides a hands-on approach with case studies and industry-aligned projects to bring the relevant concepts live. You will get broad exposure to key technologies and skills currently used in data analytics.

According to Forbes, the data analytics profession is exploding . The United States Bureau of Labor Statistics forecasts impressively robust growth for data science jobs skills and predicts that the data science field will grow about 28 percent through 2026. So, if you want a career that pays handsomely and will always be in demand, then check out Simplilearn and get started on your new, brighter future!

1. What is meant by data analysis?

Data analysis refers to the process of inspecting, cleaning, transforming, and interpreting data to discover valuable insights, draw conclusions, and support decision-making. It involves using various techniques and tools to analyze large sets of data and extract meaningful patterns, trends, correlations, and relationships within the data. Data analysis is essential across various industries and disciplines, as it helps uncover valuable information that can be used to optimize processes, solve problems, and make informed decisions.

2. What is the purpose of data analysis?

The purpose of data analysis is to gain meaningful insights from raw data to support decision-making, identify patterns, and extract valuable information. Some of the key objectives of data analysis include: Identifying trends and patterns, Making data-driven decisions, Finding correlations and relationships, Detecting anomalies, Improving performance, and Predictive modeling.

3. What are the types of data analytics?

Diagnostic Analysis, Predictive Analysis, Prescriptive Analysis, Text Analysis, and Statistical Analysis are the most commonly used data analytics types. Statistical analysis can be further broken down into Descriptive Analytics and Inferential Analysis.

4. What are the analytical tools used in data analytics?

The top 10 data analytical tools are Sequentum Enterprise, Datapine, Looker, KNIME, Lexalytics, SAS Forecasting, RapidMiner, OpenRefine, Talend, and NodeXL. The tools aid different data analysis processes, from data gathering to data sorting and analysis. 

5. What is the career growth in data analytics?

Starting off as a Data Analysis, you can quickly move into Senior Analyst, then Analytics Manager, Director of Analytics, or even Chief Data Officer (CDO).

6. Why Is Data Analytics Important?

Data Analysis is essential as it helps businesses understand their customers better, improves sales, improves customer targeting, reduces costs, and allows for the creation of better problem-solving strategies. 

7. Who Is Using Data Analytics?

Data Analytics has now been adopted almost across every industry. Regardless of company size or industry popularity, data analytics plays a huge part in helping businesses understand their customer’s needs and then use it to better tweak their products or services. Data Analytics is prominently used across industries such as Healthcare, Travel, Hospitality, and even FMCG products.

8. Is SQL good for data analysis?

Yes, SQL (Structured Query Language) is an excellent tool for data analysis, especially when dealing with structured data in relational databases. SQL is specifically designed for managing and manipulating structured data, making it a powerful language for data analysis tasks like filtering, sorting, aggregating, and joining datasets. It allows users to retrieve specific subsets of data from large databases efficiently.

While SQL excels at handling structured data, it may not be the best choice for all types of data analysis. For tasks involving more complex data manipulation, statistical analysis, or machine learning, other tools like Python or R may be more suitable.

9. What is data analysis in Excel?

Data analysis in Excel refers to the process of using Microsoft Excel's built-in features and functions to perform data analysis tasks. Excel provides a range of tools for basic data analysis, making it accessible to a wide range of users, including those without advanced programming or statistical knowledge.Some common data analysis tasks in Excel include: Filtering and sorting data, PivotTables and PivotCharts, Formulas and Functions, and Charts and Graphs.

While Excel is useful for basic data analysis, it may become limited when dealing with larger datasets or more complex analyses. In such cases, more advanced tools like Python, R, or dedicated data analysis software might be more suitable.

10. What is data analysis in Python?

Data analysis in Python refers to using Python programming language and its associated libraries to perform various data manipulation, exploration, and analysis tasks. Python has become popular for data analysis due to its simplicity, versatility, and the availability of numerous powerful libraries tailored for data science tasks.

Some commonly used Python libraries for data analysis include: Pandas, NumPy, Matplotlib and Seaborn, SciPy, Scikit-learn Python's flexibility and the extensive range of data analysis libraries make it a powerful tool for handling various data analysis challenges, from exploratory data analysis to sophisticated machine learning tasks.

Find our Data Analyst Online Bootcamp in top cities:

About the author.

Karin Kelley

Karin has spent more than a decade writing about emerging enterprise and cloud technologies. A passionate and lifelong researcher, learner, and writer, Karin is also a big fan of the outdoors, music, literature, and environmental and social sustainability.

Recommended Programs

Data Analyst

Post Graduate Program in Data Analytics

*Lifetime access to high-quality, self-paced e-learning content.

Data Analysis in Excel: The Best Guide

Data Analysis in Excel: The Best Guide

Recommended resources.

Big Data Career Guide: A Comprehensive Playbook to Becoming a Big Data Engineer

Big Data Career Guide: A Comprehensive Playbook to Becoming a Big Data Engineer

Why Python Is Essential for Data Analysis and Data Science?

Why Python Is Essential for Data Analysis and Data Science?

The Best Spotify Data Analysis Project You Need to Know

The Best Spotify Data Analysis Project You Need to Know

The Rise of the Data-Driven Professional: 6 Non-Data Roles That Need Data Analytics Skills

The Rise of the Data-Driven Professional: 6 Non-Data Roles That Need Data Analytics Skills

Exploratory Data Analysis [EDA]: Techniques, Best Practices and Popular Applications

Exploratory Data Analysis [EDA]: Techniques, Best Practices and Popular Applications

All the Ins and Outs of Exploratory Data Analysis

All the Ins and Outs of Exploratory Data Analysis

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

What is Research Methodology? Definition, Types, and Examples

analysis in research methodology

Research methodology 1,2 is a structured and scientific approach used to collect, analyze, and interpret quantitative or qualitative data to answer research questions or test hypotheses. A research methodology is like a plan for carrying out research and helps keep researchers on track by limiting the scope of the research. Several aspects must be considered before selecting an appropriate research methodology, such as research limitations and ethical concerns that may affect your research.

The research methodology section in a scientific paper describes the different methodological choices made, such as the data collection and analysis methods, and why these choices were selected. The reasons should explain why the methods chosen are the most appropriate to answer the research question. A good research methodology also helps ensure the reliability and validity of the research findings. There are three types of research methodology—quantitative, qualitative, and mixed-method, which can be chosen based on the research objectives.

What is research methodology ?

A research methodology describes the techniques and procedures used to identify and analyze information regarding a specific research topic. It is a process by which researchers design their study so that they can achieve their objectives using the selected research instruments. It includes all the important aspects of research, including research design, data collection methods, data analysis methods, and the overall framework within which the research is conducted. While these points can help you understand what is research methodology, you also need to know why it is important to pick the right methodology.

Why is research methodology important?

  • Helps other researchers who may want to replicate your research; the explanations will be of benefit to them.
  • You can easily answer any questions about your research if they arise at a later stage.
  • A research methodology provides a framework and guidelines for researchers to clearly define research questions, hypotheses, and objectives.
  • It helps researchers identify the most appropriate research design, sampling technique, and data collection and analysis methods.
  • A sound research methodology helps researchers ensure that their findings are valid and reliable and free from biases and errors.
  • It also helps ensure that ethical guidelines are followed while conducting research.
  • A good research methodology helps researchers in planning their research efficiently, by ensuring optimum usage of their time and resources.

Having a good research methodology in place has the following advantages: 3

Types of research methodology

There are three types of research methodology based on the type of research and the data required. 1

  • Quantitative research methodology focuses on measuring and testing numerical data. This approach is good for reaching a large number of people in a short amount of time. This type of research helps in testing the causal relationships between variables, making predictions, and generalizing results to wider populations.
  • Qualitative research methodology examines the opinions, behaviors, and experiences of people. It collects and analyzes words and textual data. This research methodology requires fewer participants but is still more time consuming because the time spent per participant is quite large. This method is used in exploratory research where the research problem being investigated is not clearly defined.
  • Mixed-method research methodology uses the characteristics of both quantitative and qualitative research methodologies in the same study. This method allows researchers to validate their findings, verify if the results observed using both methods are complementary, and explain any unexpected results obtained from one method by using the other method.

What are the types of sampling designs in research methodology?

Sampling 4 is an important part of a research methodology and involves selecting a representative sample of the population to conduct the study, making statistical inferences about them, and estimating the characteristics of the whole population based on these inferences. There are two types of sampling designs in research methodology—probability and nonprobability.

  • Probability sampling

In this type of sampling design, a sample is chosen from a larger population using some form of random selection, that is, every member of the population has an equal chance of being selected. The different types of probability sampling are:

  • Systematic —sample members are chosen at regular intervals. It requires selecting a starting point for the sample and sample size determination that can be repeated at regular intervals. This type of sampling method has a predefined range; hence, it is the least time consuming.
  • Stratified —researchers divide the population into smaller groups that don’t overlap but represent the entire population. While sampling, these groups can be organized, and then a sample can be drawn from each group separately.
  • Cluster —the population is divided into clusters based on demographic parameters like age, sex, location, etc.
  • Convenience —selects participants who are most easily accessible to researchers due to geographical proximity, availability at a particular time, etc.
  • Purposive —participants are selected at the researcher’s discretion. Researchers consider the purpose of the study and the understanding of the target audience.
  • Snowball —already selected participants use their social networks to refer the researcher to other potential participants.
  • Quota —while designing the study, the researchers decide how many people with which characteristics to include as participants. The characteristics help in choosing people most likely to provide insights into the subject.

What are data collection methods?

During research, data are collected using various methods depending on the research methodology being followed and the research methods being undertaken. Both qualitative and quantitative research have different data collection methods, as listed below.

Qualitative research 5

  • One-on-one interviews: Helps the interviewers understand a respondent’s subjective opinion and experience pertaining to a specific topic or event
  • Document study/literature review/record keeping: Researchers’ review of already existing written materials such as archives, annual reports, research articles, guidelines, policy documents, etc.
  • Focus groups: Constructive discussions that usually include a small sample of about 6-10 people and a moderator, to understand the participants’ opinion on a given topic.
  • Qualitative observation : Researchers collect data using their five senses (sight, smell, touch, taste, and hearing).

Quantitative research 6

  • Sampling: The most common type is probability sampling.
  • Interviews: Commonly telephonic or done in-person.
  • Observations: Structured observations are most commonly used in quantitative research. In this method, researchers make observations about specific behaviors of individuals in a structured setting.
  • Document review: Reviewing existing research or documents to collect evidence for supporting the research.
  • Surveys and questionnaires. Surveys can be administered both online and offline depending on the requirement and sample size.

What are data analysis methods?

The data collected using the various methods for qualitative and quantitative research need to be analyzed to generate meaningful conclusions. These data analysis methods 7 also differ between quantitative and qualitative research.

Quantitative research involves a deductive method for data analysis where hypotheses are developed at the beginning of the research and precise measurement is required. The methods include statistical analysis applications to analyze numerical data and are grouped into two categories—descriptive and inferential.

Descriptive analysis is used to describe the basic features of different types of data to present it in a way that ensures the patterns become meaningful. The different types of descriptive analysis methods are:

  • Measures of frequency (count, percent, frequency)
  • Measures of central tendency (mean, median, mode)
  • Measures of dispersion or variation (range, variance, standard deviation)
  • Measure of position (percentile ranks, quartile ranks)

Inferential analysis is used to make predictions about a larger population based on the analysis of the data collected from a smaller population. This analysis is used to study the relationships between different variables. Some commonly used inferential data analysis methods are:

  • Correlation: To understand the relationship between two or more variables.
  • Cross-tabulation: Analyze the relationship between multiple variables.
  • Regression analysis: Study the impact of independent variables on the dependent variable.
  • Frequency tables: To understand the frequency of data.
  • Analysis of variance: To test the degree to which two or more variables differ in an experiment.

Qualitative research involves an inductive method for data analysis where hypotheses are developed after data collection. The methods include:

  • Content analysis: For analyzing documented information from text and images by determining the presence of certain words or concepts in texts.
  • Narrative analysis: For analyzing content obtained from sources such as interviews, field observations, and surveys. The stories and opinions shared by people are used to answer research questions.
  • Discourse analysis: For analyzing interactions with people considering the social context, that is, the lifestyle and environment, under which the interaction occurs.
  • Grounded theory: Involves hypothesis creation by data collection and analysis to explain why a phenomenon occurred.
  • Thematic analysis: To identify important themes or patterns in data and use these to address an issue.

How to choose a research methodology?

Here are some important factors to consider when choosing a research methodology: 8

  • Research objectives, aims, and questions —these would help structure the research design.
  • Review existing literature to identify any gaps in knowledge.
  • Check the statistical requirements —if data-driven or statistical results are needed then quantitative research is the best. If the research questions can be answered based on people’s opinions and perceptions, then qualitative research is most suitable.
  • Sample size —sample size can often determine the feasibility of a research methodology. For a large sample, less effort- and time-intensive methods are appropriate.
  • Constraints —constraints of time, geography, and resources can help define the appropriate methodology.

How to write a research methodology ?

A research methodology should include the following components: 3,9

  • Research design —should be selected based on the research question and the data required. Common research designs include experimental, quasi-experimental, correlational, descriptive, and exploratory.
  • Research method —this can be quantitative, qualitative, or mixed-method.
  • Reason for selecting a specific methodology —explain why this methodology is the most suitable to answer your research problem.
  • Research instruments —explain the research instruments you plan to use, mainly referring to the data collection methods such as interviews, surveys, etc. Here as well, a reason should be mentioned for selecting the particular instrument.
  • Sampling —this involves selecting a representative subset of the population being studied.
  • Data collection —involves gathering data using several data collection methods, such as surveys, interviews, etc.
  • Data analysis —describe the data analysis methods you will use once you’ve collected the data.
  • Research limitations —mention any limitations you foresee while conducting your research.
  • Validity and reliability —validity helps identify the accuracy and truthfulness of the findings; reliability refers to the consistency and stability of the results over time and across different conditions.
  • Ethical considerations —research should be conducted ethically. The considerations include obtaining consent from participants, maintaining confidentiality, and addressing conflicts of interest.

Frequently Asked Questions

Q1. What are the key components of research methodology?

A1. A good research methodology has the following key components:

  • Research design
  • Data collection procedures
  • Data analysis methods
  • Ethical considerations

Q2. Why is ethical consideration important in research methodology?

A2. Ethical consideration is important in research methodology to ensure the readers of the reliability and validity of the study. Researchers must clearly mention the ethical norms and standards followed during the conduct of the research and also mention if the research has been cleared by any institutional board. The following 10 points are the important principles related to ethical considerations: 10

  • Participants should not be subjected to harm.
  • Respect for the dignity of participants should be prioritized.
  • Full consent should be obtained from participants before the study.
  • Participants’ privacy should be ensured.
  • Confidentiality of the research data should be ensured.
  • Anonymity of individuals and organizations participating in the research should be maintained.
  • The aims and objectives of the research should not be exaggerated.
  • Affiliations, sources of funding, and any possible conflicts of interest should be declared.
  • Communication in relation to the research should be honest and transparent.
  • Misleading information and biased representation of primary data findings should be avoided.

Q3. What is the difference between methodology and method?

A3. Research methodology is different from a research method, although both terms are often confused. Research methods are the tools used to gather data, while the research methodology provides a framework for how research is planned, conducted, and analyzed. The latter guides researchers in making decisions about the most appropriate methods for their research. Research methods refer to the specific techniques, procedures, and tools used by researchers to collect, analyze, and interpret data, for instance surveys, questionnaires, interviews, etc.

Research methodology is, thus, an integral part of a research study. It helps ensure that you stay on track to meet your research objectives and answer your research questions using the most appropriate data collection and analysis tools based on your research design.

  • Research methodologies. Pfeiffer Library website. Accessed August 15, 2023. https://library.tiffin.edu/researchmethodologies/whatareresearchmethodologies
  • Types of research methodology. Eduvoice website. Accessed August 16, 2023. https://eduvoice.in/types-research-methodology/
  • The basics of research methodology: A key to quality research. Voxco. Accessed August 16, 2023. https://www.voxco.com/blog/what-is-research-methodology/
  • Sampling methods: Types with examples. QuestionPro website. Accessed August 16, 2023. https://www.questionpro.com/blog/types-of-sampling-for-social-research/
  • What is qualitative research? Methods, types, approaches, examples. Researcher.Life blog. Accessed August 15, 2023. https://researcher.life/blog/article/what-is-qualitative-research-methods-types-examples/
  • What is quantitative research? Definition, methods, types, and examples. Researcher.Life blog. Accessed August 15, 2023. https://researcher.life/blog/article/what-is-quantitative-research-types-and-examples/
  • Data analysis in research: Types & methods. QuestionPro website. Accessed August 16, 2023. https://www.questionpro.com/blog/data-analysis-in-research/#Data_analysis_in_qualitative_research
  • Factors to consider while choosing the right research methodology. PhD Monster website. Accessed August 17, 2023. https://www.phdmonster.com/factors-to-consider-while-choosing-the-right-research-methodology/
  • What is research methodology? Research and writing guides. Accessed August 14, 2023. https://paperpile.com/g/what-is-research-methodology/
  • Ethical considerations. Business research methodology website. Accessed August 17, 2023. https://research-methodology.net/research-methodology/ethical-considerations/

Paperpal is an AI writing assistant that help academics write better, faster with real-time suggestions for in-depth language and grammar correction. Trained on millions of research manuscripts enhanced by professional academic editors, Paperpal delivers human precision at machine speed.  

Try it for free or upgrade to  Paperpal Prime , which unlocks unlimited access to premium features like academic translation, paraphrasing, contextual synonyms, consistency checks and more. It’s like always having a professional academic editor by your side! Go beyond limitations and experience the future of academic writing.  Get Paperpal Prime now at just US$19 a month!  

Related Reads:

  • Dangling modifiers and how to avoid them in your writing 

Scientific writing style guides explained

Language and grammar rules for academic writing, climatic vs. climactic: difference and examples, you may also like, what is an argumentative essay how to write..., research outlines: how to write an introduction section..., webinar: how to use generative ai tools ethically..., what is a narrative essay how to write..., what is a descriptive essay how to write..., 4 types of transition words for research papers , dangling modifiers and how to avoid them in....

  • Privacy Policy
  • SignUp/Login

Research Method

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Data Analysis


Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

  • Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
  • SQL : A programming language used to manage and manipulate relational databases.
  • R : An open-source programming language and software environment for statistical computing and graphics.
  • Python : A general-purpose programming language that is widely used in data analysis and machine learning.
  • Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
  • SAS : A statistical analysis software used for data management, analysis, and reporting.
  • SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
  • Matlab : A numerical computing software that is widely used in scientific research and engineering.
  • RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

  • Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
  • Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
  • Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
  • Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
  • Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
  • Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
  • Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
  • Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

  • Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
  • Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
  • Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
  • Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
  • Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
  • Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
  • Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

  • To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
  • To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
  • To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
  • To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
  • To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

  • Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
  • Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
  • Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
  • Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
  • Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
  • Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
  • Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

  • Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
  • Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
  • Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
  • Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
  • Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
  • Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
  • Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
  • Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

  • Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
  • Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
  • Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
  • Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
  • Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
  • Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
  • Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
  • Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

  • Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
  • Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
  • Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
  • Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
  • Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
  • Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
  • Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like


Assignment – Types, Examples and Writing Guide

References in Research

References in Research – Types, Examples and...

Figures in Research Paper

Figures in Research Paper – Examples and Guide


Delimitations in Research – Types, Examples and...

Research Findings

Research Findings – Types Examples and Writing...

Research Paper

Research Paper – Structure, Examples and Writing...

  • Open access
  • Published: 15 December 2023

A comprehensive assessment of exome capture methods for RNA sequencing of formalin-fixed and paraffin-embedded samples

  • Liang Zong 1 , 2   na1 ,
  • Yabing Zhu 3   na1 ,
  • Yuan Jiang 1 ,
  • Ying Xia 1 ,
  • Qun Liu 1 &
  • Sanjie Jiang 3  

BMC Genomics volume  24 , Article number:  777 ( 2023 ) Cite this article

70 Accesses

2 Altmetric

Metrics details

RNA-Seq analysis of Formalin-Fixed and Paraffin-Embedded (FFPE) samples has emerged as a highly effective approach and is increasingly being used in clinical research and drug development. However, the processing and storage of FFPE samples are known to cause extensive degradation of RNAs, which limits the discovery of gene expression or gene fusion-based biomarkers using RNA sequencing, particularly methods reliant on Poly(A) enrichment. Recently, researchers have developed an exome targeted RNA-Seq methodology that utilizes biotinylated oligonucleotide probes to enrich RNA transcripts of interest, which could overcome these limitations. Nevertheless, the standardization of this experimental framework, including probe designs, sample multiplexing, sequencing read length, and bioinformatic pipelines, remains an essential requirement. In this study, we conducted a comprehensive comparison of three main commercially available exome capture kits and evaluated key experimental parameters, to provide the overview of the advantages and limitations associated with the selection of library preparation protocols and sequencing platforms. The results provide valuable insights into the best practices for obtaining high-quality data from FFPE samples.

Peer Review reports


RNA sequencing has been developed as one of the most sensitive tools for gene expression analysis. Among the library preparation methods, the standard Poly(A) enrichment protocol provides a comprehensive and accurate view of polyadenylated RNAs. This method allows for simultaneous quantification of a multitude of RNA transcripts, enabling unbiased annotation of splicing variants, novel transcripts, and non-coding RNAs. It’s widely used in the investigation of human diseases, as well as the identification of novel drug targets and biomarkers. However, when working with human tissue specimens from biobanks, hospitals, and other clinical research facilities, the quality and yield of RNA can often be compromised due to several factors, including sampling techniques and preservation conditions, affecting the downstream bioinformatic analysis [ 1 ].

Fusion genes play a crucial role in tumorigenesis and are involved in approximately 20% of human cancer cases. The rapid and accurate identification of fusion genes holds significant promise for understanding cancer pathogenesis, enabling precise therapeutic interventions, and targeting drugs that can effectively inhibit these abnormal gene fusions [ 2 ]. There is great potential to fully assess the performance characteristics, including accuracy, reproducibility, and analytical sensitivity, of RNA-Seq for detecting the fusion events.

In recent years, RNA-Seq library preparation protocols specifically designed for FFPE samples have been developed [ 3 ], such as the RiboZero and TruSeq RNA Exome kit (Illumina). RNA capture is a novel approach used to profile RNA samples of low integrity [ 4 ]. This method employs capture probes that target known exons, allowing for the enrichment of coding RNAs. Biotech brands, including Illumina, Agilent, and Nanodigmbio have developed commercial products, each utilizing distinct mechanisms and technologies. These products offer standardized, reproducible, and user-friendly protocols, making them suitable for gene expression studies conducted in various research settings. However, there is a lack of studies that investigate the differences of their applications and provide insights into fundamental technical questions [ 5 ]. To address this gap, we designed the study to evaluate the performance of three exome capture-based library preparation kits on human reference RNA from the Sequencing Quality Control consortium [ 6 ] and commercially available FFPE sections (Fig.  1 ). To our knowledge, this research is the first to compare the exome capture-based kits with the well-established rRNA depletion protocols specifically on FFPE samples. Additionally, we investigated the gene expression measurement in comparison to the TaqMan standard data and assessed the detection of fusion genes engineered in the FFPE reference RNA. The results aim to provide the scientific community with a comprehensive assessment of exome capture methods for RNA sequencing of FFPE samples.

figure 1

Experimental design of this study. 100 ng of heat-degraded Universal Human Reference RNA (UHRR) and one commercially available Formalin-Fixed Paraffin-Embedded (FFPE) sample were utilized as input materials. Three commercially available exome capture kits, namely Illumina (IL), Agilent (AG), and NadPrep (NP/NS), were employed. The rRNA depletion method (RD) was employed as the benchmark for bioinformatics analysis

To emulate the effects of formalin fixation-induced degradation, the human reference RNA (UHRR, #740000, Agilent Technologies) was incubated at 94 °C for 60 minutes to obtain fragmented RNA [ 7 ]. The peak observed in the Bioanalyzer (Agilent Technologies) trace of the fragmented sample was below 200 nt, and the peaks corresponding to 18S and 28S rRNA were absent.

The Onco Fusion FFPE RNA Reference Standard (#GW-OPSM001, GeneWell Biotechnology) containing multiple engineered clinically relevant fusion genes (Table  1 ) was procured, and total RNA was extracted using the RNeasy FFPE Kit (#73504, Qiagen) following the manufacturer’s instructions. The RNA Integrity Number (RIN) of the sample was 2.2, indicating significant degradation during the FFPE sections preparation [ 8 ].

Library preparation

rRNA depleted RNA was generated with the Hieff NGS MaxUp Human rRNA Depletion Kit (#12257ES96, Yeasen Biotechnology), then strand-specific libraries were prepared using the Hieff NGS Ultima Dual-mode RNA Library Prep Kit (#12310ES96, Yeasen Biotechnology). For Illumina libraries, the TruSeq RNA Library Prep and Enrichment kit (#20020189, Illumina) was employed. Agilent and NadPrep libraries were generated using the Hieff NGS Ultima Dual-mode RNA Library Prep Kit, followed by the exome capture procedure using either SureSelect Human All Exon V6 (#5190–8864, Agilent Technologies) or Exome Plus Panel v2.0 (#1001841, Nanodigmbio). Multiplexing is a term used to describe the experimental strategy of pooling individual libraries together before probe hybridization. This approach is employed in Agilent and NadPrep kits to reduce the cost of the capture experiment, in which 2-plex means to pool two DNA libraries into one single tube. Illumina kits need to increase the consumption of the probes and hybridization reagent when pooling the libraries, multiplexing was not used in this study. The NadPrep libraries were divided into strand-specific and non-strand-specific groups. All protocols were conducted in accordance with the manufacturer’s instructions, with the recommended starting material of 100 ng of input RNA. The quality and yield of the prepared libraries were assessed using an Agilent 2100 Bioanalyzer.

All the prepared DNBSEQ libraries were sequenced on the MGISEQ-2000 platform (MGI Tech) with PE150 cycles, except for the NadPrep libraries, which underwent an additional run with PE100 cycles. The Illumina libraries, on the other hand, were sequenced on the Novaseq 6000 platform (Illumina). Each library generated more than 30 million paired end reads.

Data processing and quality control

Ribosomal RNAs were removed by bowtie2 (v2.4.5) alignment with the parameter “--very-sensitive-local --no-unal -I 1 -X 1000”. Duplication rate was calculated using fastp (v0.23.2). SOAPnuke (v1.5.6) was employed to filter out low quality reads based on the following conditions: 1) Reads containing over 50% of the length of 5′ or 3′ sequencing adapters; 2) Reads consisting of more than 1% of ambiguous bases; 3) Reads encompassing more than 20% of low-quality bases with quality score below 15. These filtering steps were carried out prior to any further analysis.

Alignment and quantification

The high-quality reads obtained from each sample were aligned to the human reference genome (NCBI version GCF_000001405.39_GRCh38.p13) using HISAT2 (v2.2.1) with the parameter “--sensitive --no-discordant --no-mixed -I 1 -X 1000 --rna-strandness RF”. Sense rate and distribution across genome features were calculated by RSeQC (v4.0.0). Simultaneously, the reads were also mapped to the reference mRNA sequences using bowtie2 (v2.4.5) with the parameter: “--sensitive --dpad 0 --gbar 300 --mp 1,1 --np 1 --score-min L,0,-0.1 -I 1 -X 1000 --no-mixed --no-discordant -k 200”. RSEM (v1.3.1) was utilized to estimate the normalized gene abundances, represented as Fragments Per Kilobase Million (FPKM).

Differential expressed gene (DEG) identification

The DEseq2 package (v1.31.16) was employed for the identification of DEGs. Two replicated libraries were considered as a single group for each sample. DEGs were determined based on the criteria of having a fold change greater than or equal to 1 and an adjusted p -value less than or equal to 0.05 when comparing the two groups.

Fusion gene detection

Three distinct approaches were employed for the identification of fusion genes. The first one was EricScript (v0.5.5), which used the BWA aligner (v0.7.17) for mapping against the transcriptome reference. Samtools (v0.1.19) was utilized to handle the SAM/BAM files. BLAT (v35) was utilized for the recalibration of exon junction references. The second method was FusionCatcher (v1.33) which used two aligners (Bowtie v1.2.3 and STAR v2.7.2b) for read mapping and candidate fusion gene finding. FusionCatcher can filter out likely false positive candidate fusion genes of several conditions, including pseudogene, paralog gene or miRNA genes. The last method employed was STAR-Fusion (v1.12.0), which leveraged the output generated by the STAR aligner (v2.7.8a) to map both junction reads and spanning reads. These reads were mapped against a reference annotation set GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play from the Trinity Cancer Transcriptome Analysis Toolkit (CTAT) genome lib. All the methods were implemented using default parameter settings.

In order to evaluate the performance of exome capture-based RNA-Seq methods in profiling FFPE samples, we conducted a meticulous technical assessment of the three distinct protocols mentioned above, namely Illumina (IL), Agilent (AG), NadPrep with both stand-specific treatment (NP) and non-strand-specific treatment (NS) on human reference RNA (UHRR) and commercially available FFPE RNA (FFPE). The schematic of the workflow along with statistical information at each library preparation step are presented in Table S 1 . We utilized the dataset to evaluate and assess the performance of RNA-Seq library preparation protocols. First, we examined the protocols’ ability to maintain consistent alignment rates. Second, we assessed the accuracy of these protocols in calculating gene expression by comparing the results to TaqMan data. At last, we explored the protocols’ capability to identify fusion genes that have been experimentally validated within the samples.

Alignment statistics

Overall, we prepared a total of 20 libraries for sequencing, both the DNBSEQ and Novaseq platforms generated comparable high-quality reads. The Novaseq libraries exhibited a significantly higher duplication rate compared to the DNBSEQ libraries (Table S 1 ). This can be attributed to the exponential amplification during the clustering process [ 9 ], which may have an impact on the saturation of data, particularly when sequencing resources are limited. We initially examined the overall alignment rates to the human genome (Table  2 ). The results demonstrated that all the protocols exhibited good performance, indicated by the high alignment rates greater than 90%. The AG libraries exhibited relatively higher rRNA contamination, suggesting lower specificity of the exome capture procedure in this protocol.

The sense rate is a metric that calculates the percentage of aligned forward reads mapped to the antisense gDNA strand and the percentage of aligned reverse reads mapped to the sense gDNA strand. Strand-specific RNA-Seq protocols offer the advantage of resolving read ambiguity in cases where overlapping genes are transcribed from the opposite strands. This specificity enhances the accuracy of gene quantification and prediction of fusion genes. The IL libraries showed the highest sense rate, indicating a more efficient strand-specific library preparation process [ 10 ]. Meanwhile, the IL libraries yielded the lowest count of identified genes and transcripts, potentially be attributed to the different design of targets and exome panels among these kits. We can make a preliminary inference about the genes for which Illumina kits lack probes, by noting a particular gene that exhibits consistent expression across all other libraries, yet in IL libraries the FPKM value is recorded as zero. Most of the selected genes are non-annotated, while some of them, such as gene C4orf48 may be disease relevant and important for clinical studies [ 11 ]. The NP libraries exhibited a higher genome mapping rate, but lower number of genes detected than the NS libraries. These findings align with previous studies based on Poly(A) RNA sequencing [ 12 ].

In addition, the protocols showed notable distinctions in the proportions of reads aligned to exons, introns, and other intergenic regions (Table S 2 ). For NS and NP libraries, the percentages of reads that aligned to exons were above 94%, indicating the highest efficiency of the exome pull down by the capture approach. As anticipated [ 13 ], with the RD libraries, we observed that approximately half of the reads mapped to exons, while the remaining reads predominantly mapped to intronic regions (ranging from 32 to 42%).

Transcript coverage

We proceeded to assess the coverage across the full length of the genes and observed that all the protocols demonstrated broad and uniform transcript coverage (Fig.  2 a). Notably, the exome capture-based protocols exhibited a slight 5′ bias, which can be attributed to the second structure of the transcripts and the mechanism of reverse transcription [ 14 ]. In other words, a higher proportion of reads mapped to the 5′ region of the transcripts compared to other methods, particularly those utilizing Poly(A) enrichment procedures. It is well established that with Poly(A) selected mRNA, the sensitivity of fusion detection is determined by the level of coverage at the position where the gene fusion resides, and this coverage decreases with the distance from the 3′ end of the mRNA, when the input RNA is in degradation [ 15 ]. Furthermore, the proportion of reads mapping to coding regions was significantly elevated comparing to RD libraries, making the exome capture-based protocols ideal for gene fusion detection in FFPE derived or other highly degraded samples.

figure 2

Statistics of alignment and expression of the experiments. a Normalized transcript coverage of UHRR libraries. b Expression level of genes detected in UHRR libraries. Correlation heatmap c between UHRR libraries and TaqMan data, d of fold-change values of genes between the two samples (FFPE vs UHRR), and e among NP libraries with different read lengths

Gene expression

We first investigated the abundance distribution of all the genes detected by different protocols [ 16 ] and categorized them into groups based on their respective expression levels (Fig. 2 b). Notably, the IL libraries exhibited the largest proportion of genes with low FPKM values (less than 1), when detecting the smallest total number of genes. This observation highlights a potential concern since clinical studies often establish thresholds for gene expression levels, like FPKM greater than 0.3 or 0.5, due to the methodology’s high sensitivity and inherent false-positive rate [ 17 ]. The mechanical filtering process may lead to the exclusion of clinical-relevant targets with low abundance or insufficient full-length coverage. This may further compromise the ability of IL libraries to identify rare fusion events.

We then assessed the agreement between the protocols by calculating the Pearson coefficient with all the UHRR libraries and the TaqMan reference data [ 18 ]. The exome capture-based libraries exhibited lower correlation (ranging from 0.66 to 0.76) than the RD libraries (exceeding 0.87). When comparing the different protocols against each other, the gene expression values exhibited significant protocol-specific biases, leading to a reduced agreement (correlations ranging from 0.46 to 0.88). However, the consistency within replicates and multiplexing was ideal, with correlations greater than 0.99 (Fig. 2 c).

Finally, we assessed the agreement between the protocols by calculating the Pearson coefficient of the fold-change values of the differentially expressed genes between the two samples (FFPE vs UHRR, Fig. 2 d) . The results demonstrated that the fold-change values correlated acceptably across the entire dynamic range of expression between the protocols (correlations ranging from 0.68 to 0.73), suggesting the possibility of comparing the DEGs of paired samples using different methods when the consistency of the whole study cannot be guaranteed.

Multiple bioinformatic pipelines have been developed to identify candidate fusion genes from RNA-Seq data [ 19 ]. Predicted fusions is typically supported by fragments found as junction reads that directly overlap the splicing site, or as spanning reads where each pair of reads maps to the opposite partner of the fusion genes (Fig.  3 a). In this study, we assessed three most cited methods for fusion detection: EricScript [ 20 ], FusionCatcher, and Star-Fusion. EricScript and FusionCatcher called more fusion genes than Star-Fusion (Fig. 3 b), but for FFPE samples, EricScript reported only 1 of the 6 validated transcripts, while FusionCatcher and Star-Fusion reported all the 6 events (Table S 3 ). Although it could be argued that these fusions represent a limited target reference for method comparison, EricScript exhibited lower sensitivity and higher false-positive rate. Notably, EricScript also reported fusion transcripts involving opposite partners of HLA-C and HLA-A with EricScore greater than 0.95, indicating a poor performance in distinguishing highly homologous genes. On the other hand, FusionCatcher reported more than 300 fusion candidates per library, although the majority of which were remarked as exonic or in-frame, researchers would face significant challenges in filtering and validating all these predictions.

figure 3

Statistics of fusion genes detection. a Diagram of COSF1196 transcript, depicting spanning and junction reads used to identify fusion genes. b Statistics of fusion events and the corresponding supporting read counts of all FFPE libraries by different pipelines. The circular diameter represents the total read counts for the cumulative fusion events, while the color corresponds to the mean read counts supporting each individual fusion event. Enrichment of 6 fusions between exome capture-based and rRNA depleted libraries calculated by c Star-Fusion, d FusionCatcher, and e using reads of different lengths

We then examined the target enrichment rate of different protocols by comparing the Fusion Fragments Per Million mapped reads (FFPM) from exome capture-based and rRNA depleted data (Table S 3 and Fig. 3 c, d), finding that Star-Fusion exhibited better reproducibility in replicate or multiplexed libraries, and achieved a mean 1.75, 5.03, 5.63 and 4.65-fold enrichment for the IL, AG, NS and NP libraries respectively, illustrating the advantage of exome capture-based protocols in fusion genes detection compared to the conventional methods, particularly for highly degraded samples [ 21 ].

Impact of read length

The length of the sequencing reads is a crucial factor to consider when conducting RNA-Seq experiments. Previous studies suggest that for generating a list of DEGs, 50 bp single-end reads are generally sufficient [ 22 ]. However, for isoform detection, longer reads are preferred to capture comprehensive information, while longer reads are not necessarily significantly better than shorter reads for differential expression analysis.

The DNBSEQ series of sequencers, including DNBSEQ-T7, MGISEQ-2000, and DNBSEQ-G99, offer a range of sequencing throughputs and read lengths for various research applications. In clinical occasions, researchers often face the decision between PE100 and PE150 run cycles, considering factors like sample quality, experimental costs, turnaround time requirements, and the challenges of pooling adequate libraries to fulfill a sequencing run. To investigate the impact of read length on FFPE RNA-Seq outcomes, we performed an additional MGISEQ-2000 PE100 run for NP libraries, made up three groups of NP data with different read lengths: PE150, PE150 trimmed into PE100 (PE150Trim) and PE100. The overall statistics of the sequencing results are presented in Table S 4 .

As expected, the PE100 data exhibited improved performance in terms of adapter contamination, reference alignment and number of genes detected. Nevertheless, it did not show significant changes in the accuracy of expression measurement (Fig. 2 e) and the target enrichment rate of fusion genes (Table S 3 and Fig. 3 e). These findings suggest that while PE100 data improvements are beneficial for certain aspects, they may not provide additional advantages for clinical study applications. At the meantime, when compare FFPE_PE150_1 vs FFPE_PE100_1 (Table S 5 ), there were 8904 specifically detected transcripts in PE150 data compared to PE100 data, hundreds of longer isoforms of clinically relevant transcripts can only be detected in PE150 data with low to median abundance (FPKM≥1), suggesting the potential of longer reads in disease studies.


In this study, we present a comprehensive assessment that encompasses all crucial factors to be considered in RNA-Seq experiments to optimize the utilization of clinical FFPE samples. Researchers are provided with the opportunity to make informed decisions regarding the selection of capture probes, library preparation and multiplexing, sequencing parameters, and bioinformatic pipelines, especially when sample quality is severely compromised (Table  3 ). For fusion genes detection, prudent filtering techniques and critical experimental validations are imperative to ensure accuracy and reliability of the results [ 23 ]. We recommend the adoption of exome capture-based RNA-Seq protocols when the input sample are not suitable for conventional methods. Although the commercial kits offer advantages in the sequencing depth of coding regions, the considerations of inconsistent capture efficiency and rRNA residue merit careful attention and pre-experiment contemplation to ensure the optimal application of the selected protocols. We look forward that exome capture-based RNA-Seq methodologies will experience growing adoption in clinical settings for the diagnosis of fusion genes, further advancing our understanding of fusion gene biology, and enhancing cancer diagnostics.

Availability of data and materials

All sequencing data have been deposited in the CNGB Nucleotide Sequence Archive ( https://db.cngb.org/cnsa ) under the accession number CNP0004621.

Boneva S, Schlecht A, Böhringer D, Mittelviefhaus H, Reinhard T, Agostini H, et al. 3′ MACE RNA-sequencing allows for transcriptome profiling in human tissue samples after long-term storage. Lab Invest; a J Technical Methods Pathol. 2020;100(10):1345–55. https://doi.org/10.1038/s41374-020-0446-z .

Article   CAS   Google Scholar  

Winters JL, Davila JI, McDonald AM, Nair AA, Fadra N, Wehrs RN, et al. Development and verification of an RNA sequencing (RNA-Seq) assay for the detection of gene fusions in tumors. The J Mol Diag JMD. 2018;20(4):495–511. https://doi.org/10.1016/j.jmoldx.2018.03.007 .

Jacobsen SB, Tfelt-Hansen J, Smerup MH, Andersen JD, Morling N. Comparison of whole transcriptome sequencing of fresh, frozen, and formalin-fixed, paraffin-embedded cardiac tissue. PLoS One. 2023;18(3):e0283159. https://doi.org/10.1371/journal.pone.0283159 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Heyer EE, Deveson IW, Wooi D, Selinger CI, Lyons RJ, Hayes VM, et al. Diagnosis of fusion genes using targeted RNA sequencing. Nat Commun. 2019;10(1):1388. https://doi.org/10.1038/s41467-019-09374-9 .

Endrullat C, Glökler J, Franke P, Frohme M. Standardization and quality management in next-generation sequencing. Appl Transl Genom. 2016;10:2–9. https://doi.org/10.1016/j.atg.2016.06.001 .

Article   PubMed   PubMed Central   Google Scholar  

Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol. 2014;32(9):926–32. https://doi.org/10.1038/nbt.3001 .

Schuierer S, Carbone W, Knehr J, Petitjean V, Fernandez A, Sultan M, et al. A comprehensive assessment of RNA-seq protocols for degraded and low-quantity samples. BMC Genomics. 2017;18(1):442. https://doi.org/10.1186/s12864-017-3827-y .

Decruyenaere P, Verniers K, Poma-Soto F, Van Dorpe J, Offner F, Vandesompele J. RNA extraction method impacts quality metrics and sequencing results in formalin-fixed, paraffin-embedded tissue samples. Lab Invest; a J Technical Methods Pathol. 2023;103(2):100027. https://doi.org/10.1016/j.labinv.2022.100027 .

Article   Google Scholar  

Li Q, Zhao X, Zhang W, Wang L, Wang J, Xu D, et al. Reliable multiplex sequencing with rare index mis-assignment on DNB-based NGS platform. BMC Genomics. 2019;20(1):215. https://doi.org/10.1186/s12864-019-5569-5 .

Harrington CA, Fei SS, Minnier J, Carbone L, Searles R, Davis BA, et al. RNA-Seq of human whole blood: evaluation of globin RNA depletion on Ribo-zero library method. Sci Rep. 2020;10(1):6271. https://doi.org/10.1038/s41598-020-62801-6 .

Endele S, Nelkenbrecher C, Bördlein A, Schlickum S, Winterpacht A. C4ORF48, a gene from the Wolf-Hirschhorn syndrome critical region, encodes a putative neuropeptide and is expressed during neocortex and cerebellar development. Neurogenetics. 2011;12(2):155–63. https://doi.org/10.1007/s10048-011-0275-8 .

Article   CAS   PubMed   Google Scholar  

Zhao S, Zhang Y, Gordon W, Quan J, Xi H, Du S, et al. Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genomics. 2015;16(1):675. https://doi.org/10.1186/s12864-015-1876-7 .

Zhao S, Zhang Y, Gamini R, Zhang B, von Schack D. Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion. Sci Rep. 2018;8(1):4781. https://doi.org/10.1038/s41598-018-23226-4 .

Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63. https://doi.org/10.1038/nrg2484 .

Davila JI, Fadra NM, Wang X, McDonald AM, Nair AA, Crusan BR, et al. Impact of RNA degradation on fusion detection by RNA-seq. BMC Genomics. 2016;17(1):814. https://doi.org/10.1186/s12864-016-3161-9 .

Piovesan A, Caracausi M, Antonaros F, Pelleri MC, Vitale L. GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics. Database. 2016;2016:baw153. https://doi.org/10.1093/database/baw153 .

Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods. 2013;10(7):623–9. https://doi.org/10.1038/nmeth.2483 .

Kumar A, Kankainen M, Parsons A, Kallioniemi O, Mattila P, Heckman CA. The impact of RNA sequence library construction protocols on transcriptomic profiling of leukemia. BMC Genomics. 2017;18(1):629. https://doi.org/10.1186/s12864-017-4039-1 .

Haas BJ, Dobin A, Li B, Stransky N, Pochet N, Regev A. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 2019;20(1):213. https://doi.org/10.1186/s13059-019-1842-9 .

Benelli M, Pescucci C, Marseglia G, Severgnini M, Torricelli F, Magi A. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics (Oxford, England). 2012;28(24):3232–9. https://doi.org/10.1093/bioinformatics/bts617 .

Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, Craig DW. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet. 2016;17(5):257–71. https://doi.org/10.1038/nrg.2016.10 .

Chhangawala S, Rudy G, Mason CE, Rosenfeld JA. The impact of read length on quantification of differentially expressed genes and splice junction detection. Genome Biol. 2015;16(1):131. https://doi.org/10.1186/s13059-015-0697-y .

Liu Y, Bhagwate A, Winham SJ, Stephens MT, Harker BW, McDonough SJ, et al. Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics. BMC Med Genet. 2022;15(1):195. https://doi.org/10.1186/s12920-022-01355-0 .

Download references


We would like to thank the BGI sequencing team for the generous technical advice. We would like to thank Xiaohui Li and Lina Wang for the help in managing this project.

Not applicable.

Author information

Liang Zong Yabing Zhu contributed equally to this work.

Authors and Affiliations

Wuhan BGI Technology Service Co., Ltd. BGI-Wuhan, Wuhan, China

Liang Zong, Yuan Jiang, Ying Xia & Qun Liu

College of Life and Health Sciences, Wuhan University of Science and Technology, Wuhan, China

BGI Tech Solutions Co., Ltd. BGI-Shenzhen, Shenzhen, China

Yabing Zhu & Sanjie Jiang

You can also search for this author in PubMed   Google Scholar


L.Z. and Y.Z. contributed equally to this work. L.Z. and Y.Z. conceived the study. L.Z., Y.J., Y.X. and Q.L. performed sample extraction, library preparation and sequencing experiments. Y.Z. and L.Z. performed bioinformatic analysis and wrote the initial draft. S.J. oversaw the project and reviewed the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sanjie Jiang .

Ethics declarations

Ethics approval and consent to participate.

This Study was approved by the Institutional Review Board on Bioethics and Bio-safety of BGI (BGI-IRB) under the approval number 20230713791.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Cite this article.

Zong, L., Zhu, Y., Jiang, Y. et al. A comprehensive assessment of exome capture methods for RNA sequencing of formalin-fixed and paraffin-embedded samples. BMC Genomics 24 , 777 (2023). https://doi.org/10.1186/s12864-023-09886-1

Download citation

Received : 22 August 2023

Accepted : 08 December 2023

Published : 15 December 2023

DOI : https://doi.org/10.1186/s12864-023-09886-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • FFPE samples
  • RNA sequencing (RNA-Seq)
  • Gene fusion

BMC Genomics

ISSN: 1471-2164

analysis in research methodology

  • Open access
  • Published: 16 December 2023

Validation of method for faecal sampling in cats and dogs for faecal microbiome analysis

  • Xavier Langon 1  

BMC Veterinary Research volume  19 , Article number:  274 ( 2023 ) Cite this article

66 Accesses

9 Altmetric

Metrics details

Reproducible and reliable studies of cat and dog faecal microbiomes are dependent on many methodology-based variables including how the faecal stools are sampled and stored prior to processing. The current study aimed to establish an appropriate method for sampling and storing faecal stools from cats and dogs which may also be applied to privately-owned pets. The approach investigated the effects of storing faeces for up to 12 h at room temperature and sampling from various locations within the stool in terms of microbial diversity, relative taxa abundances and DNA yield. Faeces were collected from 10 healthy cats and 10 healthy dogs and stored at room temperature (20 °C). Samples were taken from various locations within the stool (the first emitted part (i), the middle (ii) and the last emitted end (iii), at either surface or core) at 0, 0.5, 1, 2, 3, 6 and 12 h, stabilised and stored at -80 °C. DNA was extracted from all samples, using Illumina NovaSeq.

Faecal bacterial composition of dogs and cats shown no statistically significant differences in alpha diversity. Bacteroidetes, Firmicutes, Proteobacteria and Actinobacteria were the most prevalent phyla. Cat and dog samples were characterized by a dominance of Prevotella, and a lack of Fusobacterium in feline stools. Room temperature storage of cat and dog faecal samples generally had no significant effect on alpha diversity, relative taxa abundance or DNA yield for up to 12 h. Sampling from regions i, ii or iii of the stool at the surface or core did not significantly influence the outcome. However, surface cat faecal samples stored at room temperature for 12 h showed a significant increase in two measures of alpha diversity and there was a tendency for a similar effect in dogs. When comparing samples with beta diversity measures, it appeared that for dog and cat samples, individual effect has the strongest impact on the observed microbial diversity (R2 0.64 and 0.88), whereas sampling time, depth and horizontal locations significantly affected the microbial diversity but with less impact.

Cat and dog faeces were stable at room temperature for up to 12 h, with no significant changes in alpha diversity, relative taxa abundance and DNA concentration. Beta diversity analysis demonstrated that despite an impact of the sampling storing time and the surface of the sampling, we preserved the identity of the microbial structure linked to the individual. Finally, the data suggest that faecal stools stored for > 6 h at room temperature should be sampled at the core, not the surface.

Peer Review reports

Billions of microorganisms, in particular bacteria, inhabit the gastrointestinal tract of cats and dogs where they play a key role in sustaining host health [ 1 ]. Studies continue to use increasingly sophisticated molecular techniques to characterise the cat and dog gastrointestinal microbiome and to understand how it is influenced by factors such as age, diet, medication, and disease state [ 2 , 3 , 4 , 5 , 6 , 7 , 8 ]. It is recognised that the microbial community alters along the various sections of the gastrointestinal tract, with faecal samples offering an accepted and practical method of gut microbiome investigation [ 9 , 10 ].

Studies of the cat and dog faecal microbiome have largely involved relatively limited numbers of research colony animals and there is a need for investigation in larger cohorts that represent the broader population. Sample collection, handling and storage are critical steps that can alter the accuracy and reproducibility of downstream bacterial DNA analysis therefore it is critical to standardise these factors before embarking on such studies.

One of the issues associated with faecal analysis in privately-owned cats and dogs is the ability to obtain a fresh sample, especially from cats which may defaecate in the litter box overnight. Human studies have shown faeces stored for 48 h at ambient temperature show changes in faecal microbiota composition [ 11 , 12 ]. Reports describing the effect of companion animal faecal sample storage at ambient temperature on the bacterial population are scarce and tend to study much longer storage intervals. One such study found that storage of cat faeces for up to four days at ambient temperature had no apparent effect on the bacterial population [ 13 ]. A study on dog faeces found that storage of stabilised samples at ambient temperature for 14d did not affect alpha or beta microbial diversity, although unstabilised samples stored under the same conditions showed higher alpha diversity and altered relative abundance of dominant bacterial phyla, highlighting the importance of a stabilisation system [ 14 ].

There is evidence in humans to suggest that rectal swabs show bacterial composition differences when compared with rectal stool samples, with differing oxygen gradients in mucosal and luminal samples reported to influence gut microbiota data [ 15 , 16 ]. However, there are no reports comparing the gut microbiota at different locations of the faecal stool. These findings suggest that the influence of horizontal sampling location on the outcome should also be considered as well as the depth of sampling (surface versus core). Taken together, the above evidence supports the need to validate a standardised approach to faecal sampling and processing that will be appropriate for gut microbiome analysis within the research environment, and which can also be successfully applied to privately-owned pets.

The current study aims to validate a cat and dog faecal sampling and processing method in a group of research colony animals to inform on a suitable approach for further application onsite and in privately-owned animals. In particular, the study aims to understand the effect of storage for up to 12 h at room temperature on bacterial DNA concentration, microbial diversity, and relative abundance of taxa in cat and dog faecal samples. A maximum time of 12 h was selected as this corresponds with the maximum expected delay between a client-owned animal defaecating at home and the owner collecting and stabilising the sample. The study also aims to clarify how the above measures are influenced by horizontal and depth sampling location within the faecal stool by comparing samples from the first emitted part (i), the middle (ii) and last emitted end (iii), taken from the surface (S) or core (C). In addition, two stabilisation solutions are compared to determine their influence on the measures described.

A total of 263 samples taken from 20 stools from 10 dogs and 10 cats were extracted and sequenced. Two samples were missing in total due to insufficient sample size, meaning that timepoint two hours (T2) for cats number 7 (C7) and number 8 (C8) were omitted. All faeces produced were of acceptable consistency, graded 2–3 for cats and 2.5-3 for dogs. Total sequencing reads comprised over 90% high quality reads.

Effect of animal species on faecal metagenomic profiles and diversity

Samples were clustered by individuals to produce a unique faecal microbial signature for almost all animals. One dog sample (dog number 4 (D4) individual at time 720 (T720) on surface) was very dissimilar to all other dogs, with only 14 bacterial species detected so was removed from all other statistical comparisons (supplementary Fig.  1 ).

Overall, consistent differences were found in the faecal bacterial composition of dogs when compared with cats. At the phylum level, all cat faeces were dominated by Bacteroidetes, followed by Firmicutes/Actinobacteria and Proteobacteria (Fig.  1 ; Table  1 ). Spirochaetes and Fusobacteria were absent in cats.

All dogs, with the exception of two individuals, showed a high relative abundance of Bacteroidetes, followed by Firmicutes, Proteobacteria and Actinobacteria (Fig.  1 ; Table  1 ). Spirochaetes and Fusobacteria were also present in some dogs with overall prevalence of 38.1% and 10.4% respectively. Bacteroidetes, Actinobacteria and Firmicutes were the only phyla showing 100% prevalence in both cats and dogs.

figure 1

Relative abundance of predominant phyla in faecal samples from 10 healthy cats and 10 healthy dogs

When considering relative abundance at the genus level, Prevotella was dominant across all cats, making up 66.7% of classified reads (Fig.  2 ; Table  2 ). Although still the most dominant genus in dogs, Prevotella generally made up 49.0% of classified reads (Fig.  2 ; Table  3 ). Prevalence of Prevotella was 100% for both cats and dogs. Bacteroides showed 100% prevalence in cats and dogs and represented the next most abundant genus for both species (means 14.7 and 7.8% respectively) followed by an unclassified Proteobacteria in dogs (mean abundance 7.1%) and Collinsella in cats (mean abundance 4.0%). Other major genus differences between cats and dogs include a 1.7% mean abundance of Bifidobacteria in cats (< 0.1% in dogs), 5.4% mean abundance of Escherichia in dogs (< 0.01% in cats), 3.6% Streptococcus in dogs (< 0.001% in cats) and 3.6% mean abundance of Megasphaera in cats yet an absence in dogs.

figure 2

Relative abundance of predominant genera in faecal samples from 10 healthy cats and 10 healthy dogs

Comparisons of relative abundance by genus across individuals demonstrated individual variability (Fig.  3 ).

figure 3

Relative abundance of predominant genera in faecal samples from 10 healthy cats ( C ) and 10 healthy dogs ( D ), by individual

Species detected in almost all cats but not dogs were: Blautia wexlerae, Collinsella aerofaciens, Flavonifractor plautii, Megasphaera elsdenii, Ruminococcaceae bacterium D16, Fusicatenibacter saccharivorans, Bifidobacterium pullorum . Gemmiger sp An194 and Helicobacter bilis were detected in all dogs and not in cats. When relative abundances of species with high prevalence (> 60% in both cats and dogs) were compared across cat and dog, five species were significantly different (adjusted p  < 0.05) and 4 highly significantly different (adjusted p  < 0.01; Table  3 ).

Alpha diversity was compared between cats and dogs using a standardised T0 sample taken from the first emitted part (i) core region of the faecal stool. Similar numbers of bacterial species were identified in cats and dogs, with a mean of 43.4 ± 6.3 bacterial species in cats (median 45.5) and 41.4 ± 11.5 in dogs (median 39.0; adjusted p  = 0.20; Fig.  4 ).

figure 4

Cat and dog faecal microbial diversity according to three methods. Samples were taken at T0 from region ii of faeces from 10 healthy cats and 10 healthy dogs

Although not statistically significant, mean alpha diversity as measured by Inverse Simpson Index tended to be higher in dogs (4.4 ± 1.6) compared with cats (2.7 ± 2.0; adjusted p  = 0.05). Similarly, mean Shannon Index diversity tended to be higher in dogs compared with cats (adjusted p  = 0.18). Overall, there were no statistically significant differences in alpha diversity between dogs and cats although there was a pattern towards slightly higher diversity in dogs.

Effect of experimental variables on microbial diversity and relative taxa abundance

In standardised samples taken at T0 (core), there was no significant effect of the three horizontal sampling locations on microbial diversity in cats or dogs, as determined by Inverse Simpson Index (cat adjusted p  = 0.06; dog adjusted p  = 0.81), number of species (cat adjusted p  = 0.85; dog adjusted p  = 0.81) and Shannon Index (cat adjusted p  = 0.06; dog adjusted p  = 0.81) (Fig.  5 ).

figure 5

Comparisons of microbial diversity between regions i, ii and iii in healthy cat ( A-C ) and healthy dog ( D-F ) stool samples at T0

Comparisons of core and surface samples from region ii at T0 showed no effect of sampling depth on microbial diversity in cats and dogs (adjusted p values all > 0.05). At T12, cat surface samples from region iii showed increased mean alpha diversity when compared with core samples from the same region (Shannon Index: 2.3 ± 0.7 at surface and 1.8 ± 0.7 in core, adjusted p  = 0.018; Inverse Simpson Index: 2.7 ± 1.6 at surface and 2.3 ± 1.8 in core, adjusted p  = 0.048), but this was not true for diversity measured by number of species (adjusted p  = 0.06; Fig.  6 ). There was increased alpha diversity as measured by Shannon Index for dogs in region iii surface (2.6 ± 0.7) compared with region iii core samples (2.4 ± 0.7) at T12 (adjusted p  = 0.027) but not according to Inverse Simpson Index or number of species (adjusted p values 0.09 and 0.37 respectively; Fig.  6 ).

figure 6

Comparisons of microbial diversity between core and surface samples from healthy cat ( A-C ) and healthy dog ( D-F ) faecal stools stored at room temperature for 12 h

Storage time of samples at room temperature did not influence alpha diversity in cat or dog samples, according to evaluation of the impact of time on diversity using the Kruskal-Wallis test, with adjusted p -values (for cats, adjusted p for all measures = 0.74, and for dogs = 1.00).

There were no differences in within-dog diversity when comparing the two stabilisation buffers (for all diversity measures, adjusted p  = 0.5).

Beta diversity results are presented in Fig.  7 . Animal individuality is the most important factor observed that impacts the diversity (R2 of 0.64 and 0.88 for cat and dog respectively, and a probability F Prf < 0.001 in PERMANOVA results Fig.  7 b and c). For the cat samples, depth and horizontal locations are also significant statistically with a probability F < 0.001 for both, but the contribution measured is less important (R2 of 0.027 and 0.037). Time sampling does not affect sample diversity for cat using beta analysis (Prf = 0.07).

figure 7

Beta diversity representation with Principal Component Analysis (PCA). A : PCA representation with all samples from the study; B : PERMANOVA results for the cat samples; C : PERMANOVA results for the dog samples; D : PCA representation with dog samples; E PCA representation with cat samples

For dog samples, depth location is the second most important factor to impact the sample diversity (Prf = 0.001), but the measured effect is lower than indivual effect, (R2 of 0.008). The sampling time location is also significant if we defined the threshold to 0.05. (Prf = 0.016). The importance of the effect is equal to the depth location (R2 = 0.08). Type of stabilization buffer did not have any observed effect on the sample diversity (PERMANOVA p value > 0.3), if there is any biases associated to the buffer we observed the same trends between the 2 tested buffers.

Effect of experimental variables on DNA concentration in samples

There was individual variation in sample DNA concentration, as shown by pooled data for individuals at T0 (Fig.  8 ) and a significantly higher level of DNA in dog (6.3 ± 3.9 ng/µl) versus cat samples (3.1 ± 1.9 ng/µl; adjusted p  = 0.016; Fig.  9 ). The wider variation in DNA concentration in dog samples was noted.

figure 8

DNA concentration in T0 samples for 10 individual healthy cats ( C ) and 10 individual healthy dogs [ 10 ]

figure 9

DNA concentration in core faecal samples from region ii at T0 in 10 healthy cats and 10 healthy dogs

T0 R surface samples from cat faeces showed significantly higher DNA concentration when compared with T0 R core samples (5.8 ± 1.5 and 2.8 ± 2.2 ng/µl respectively; adjusted p  = 0.044). No other significant effects of horizontal or depth sampling location on DNA concentration were seen in cats or dogs.

There was no significant change in DNA concentration across all timepoints from T0 to T6 when comparing region ii core samples (adjusted p  = 0.99 for both cat and dog samples). DNA concentration was higher at T12 than at T0 for cat surface samples, increasing from 3.5 ± 2.1 ng/µl at T0 to 10.7 ± 7.56 ng/µl at T12 (adjusted p  = 0.045). For cat core samples and for all dog samples, there was no significant change in DNA concentration from T0 to T12 (adjusted p all > 0.05).

The results of the study suggest that DNA concentration, alpha diversity and taxa abundance in cat and dog faeces samples are not significantly influenced by storage for up to 12 h at room temperature, horizontal area of faeces sampled, depth of sampling nor choice of two stabilisation solutions. A more global approach using Bray Curtis beta diversity metric suggests that the effect of sample collection is minor comparing to the effect of the individual. Sampling time slightly impacts dog samples diversity and not the cat samples. The location of the sampling also changes the observed diversity for the two animals, but with a low contribution to the observed variability. The two devices for dog sample collection however do not impact the beta diversity. On the PCA, the impact of sampling time and sampling location do not impair the clustering of samples according to the animal ID, highlighting the minor effect. This result is confirmed with PERMANOVA results, where the percentage of variability explained by the individual is much larger than the other R 2 values (Fig.  7 ).

These results support an argument for core stool sampling in cats when studying the faecal microbiome, avoiding the possibility of surface sample contamination by litter box substrate. Lack of influence of horizontal sampling location on the outcomes was unexpected, given previous observations to suggest an effect of intestinal oxygen gradient on microbial profile [ 15 , 16 ]. The proximity of the rectal end of the stool to increased oxygen gradient, and the intense dehydration of the stool at this location were expected to influence microbial profile. However, stool samples are likely to represent only the transient luminal bacteria rather than the adherent mucosal bacteria, the latter being more sensitive to the changing oxygen gradient along the colorectum [ 16 ]. When considering prolonged exposure of faeces to room temperature, the study aimed to understand how the methodology could transfer to the privately owned pet in the home environment. Whilst dog owners can collect a fresh stool from their pet during a walk, cat stools may be voided in the litter box overnight which means a delay of up to 12 h before the sample is collected and chilled. The current study suggests that storage at room temperature for up to 12 h is acceptable in terms of microbial analysis studies. However, the significant increase in alpha diversity observed in surface samples from cats taken at 12 h when compared with core samples and indications for a similar effect in dogs indicates that faecal stools exposed to room temperature for > 6 h should be sampled from the core. This was further supported by the observation of a significant increase in DNA concentration from T0 to T12 in cat surface but not core samples.

Molecular-based microbial analysis of the faeces provided valuable insights into the differences between healthy cat and dog faecal bacteria in terms of prevalence and relative abundance of certain taxa. The four most abundant phyla in cats and dogs were Bacteroidetes, Firmicutes, Proteobacteria and Actinobacteria, an observation reported in several publications [ 4 , 7 , 17 , 18 , 19 , 20 , 21 ]. In agreement with other similar studies, the current study found Proteobacteria to be more abundant in the faeces of dogs whilst Actinobacteria were more abundant in cats [ 22 , 23 ]. The absence of Fusobacteria in cats was surprising, given that it has been described as one of the most prevalent phyla in faecal samples, although abundance in healthy cats is routinely reported at < 0.5% [ 8 , 18 , 19 , 20 ]. According to a short-term (5wk) feeding study, abundance of Fusobacteria was significantly lower in cats fed dry diets (0.3%) compared with those fed wet diets (23.1%). All cats in the current study were fed dry diet with one also receiving a pouch of wet diet daily. This could account for the lack of Fusobacteria found in cats in the current study [ 24 ]. Dietary protein level is also reported to influence abundance of Fusobacteria which was reduced in growing cats fed moderate compared to high protein diets [ 25 ].

The Prevotella genus showed 100% prevalence in cats and dogs, with respective mean relative abundances of 66.7 and 49.0%. It has been suggested that the broad variety of fibres and carbohydrates typically included in commercialised pet food has promoted colonisation of the domesticated cat and dog gastrointestinal tract with genera such as Prevotella that are capable of degrading a wide range of polysaccharides [ 26 ].

The Bacteroides genus represented the second most abundant genus in cats and dogs at 14.7 and 7.8% relative abundance respectively. In dogs, the combined relative abundances of Bacteroides and Prevotella appear to be inversely related to the abundance of the Fusobacteria phylum which suggests they may occupy the same niche [ 9 ]. Evidence to support this theory is apparent in the current study where a very low relative abundance of the Fusobacteria phylum was found in dogs alongside a high Bacteroides/Prevotella abundance. Based on this, an over-representation of Bacteroides and Prevotella in cats in the current study, combining to account for 74.5% relative abundance, may help to explain the absence of Fusobacteria.

At the species level, the differences in relative abundance of Prevotella copri between cats and dogs was the most striking. Whilst prevalence was 100% and 82.1% in cats and dogs respectively, cats showed a significantly higher mean relative abundance (65.1%) when compared with dogs (29.5%). A recent study using whole genome shotgun metagenomic sequencing also found P. copri to be the most abundant species in the healthy cat gut microbiome, at 12.9% [ 27 ]. The unusually high relative abundance of P. copri in cats in the current study is difficult to explain. P. copri abundance tends to be increased in obese compared to normal bodyweight cats and dogs [ 27 , 28 ], however none of the animals included in the current study were obese. The dominance of P. copri in cats influenced overall microbial diversity and, although not statistically significant, alpha diversity tended to be lower in cats when compared with dogs. Conversely, previous studies have indicated higher gut microbiome diversity in cats compared with dogs, although further studies are required to corroborate these findings [ 7 , 22 , 29 ].

The variation in microbial profiles observed across individuals was not surprising as proportions of faecal bacterial phyla in cats and dogs are known to be influenced by several factors including age, breed, diet and individual variation [ 7 , 9 , 30 , 31 , 32 , 33 ]. The current study deliberately included a non-standardised variety of animals to ensure a broad picture of the bacteria present in dog and cat faeces. However, the absence of standardisation of factors such as breed and age could have influenced inter-animal variation.

Some association patterns were observed between dietary fibre level or type and faecal microbial composition in cats and dogs (data not shown), in agreement with previous observations [ 34 , 35 , 36 ]. However, the small number of animals and diversity of diets in the current study necessitate additional investigation to explore this further.

There are several limitations associated with the current study that should be considered when interpreting the results. The exploratory nature of the study means that animal numbers were low and controlling for variables such as diet, age and breed of animal was not required. The results may be limited to animals fed a dry, commercial diet and the findings should be confirmed in animals fed different formats as well as in different cohorts, including privately-owned pets. The methodology used in the current study did not confirm description, function or viability of the bacterial strains identified and diversity could not be estimated using rarefaction curves as MetaPhLan3 does not have access to raw counts.


This exploratory study in dogs and cats showed that, based on DNA concentration and alpha diversity, faecal samples are generally stable for up to 12 h at room temperature. The study also showed that sampling at three different regions of the faecal stool either at the surface or within the core had no significant or a few influence on the outcomes, but individual specificity is the major driver of the observed diversity. There were indications that surface samples should not be taken from faeces stored at room temperature for > 6 h. The lack of influence of stabilisation buffer allows flexible choice for further studies, which will be controlled for variables such as diet.

The results of this study will be applied to study designs for further in-house studies as well as those in privately-owned pets.

Ten healthy adult cats, mean age 3.8 ± 2.7y (2.2-11.1y) and 10 healthy adult dogs, mean age 3.2 ± 1.5y (1.5-5.1y) were included in the study which took place in July 2021 (see Table  1 for animal details). Animals were qualified healthy as not being affected by any disease, controlled by clinical screenings and haemato-chemical monitoring. Cats were acquired from Isoquimen company, and dogs from breeders approved by the French Research Ministry. The study was powered for detection of an effect size of 1 in terms of change from baseline. Animals were selected from a preliminary screening study, based on regular production of a stool at least 15 cm in length. Animals received appropriate routine external anti-parasitic and de-wormer treatment and were excluded if they had received any drugs in the past 14 days or antibiotics six weeks prior to the study. Animals were housed at the Royal Canin Research Centre, Aimargues, France. Housing and protocols adhered to European regulatory rules for animal welfare and the study was approved by the Royal Canin Ethics Committee, Aimargues. Animals are fed a variety of test diets (confidential nutritional composition), with no deleterious impact on the study objectives, and even better by increasing the bacterial diversity monitored. The trial was conducted as an authorised digestibility trial according to (Ref CEO90). Dogs were housed indoors individually for the duration of the study, with free access to shared outdoor runs. The inside temperature varied between 22.3 and 22.9 °C. Artificial light was provided in addition to natural light, between 06:30 and 18:30. All dogs received outdoor exercise sessions of 40–60 min per day and dog-human socialisation for 20 min per day. Cats were group-housed in social rooms with a room temperature ranging between 22.2 and 23.6 °C at the time of study. All cats received daily socialisation sessions and had free access to enclosed outdoor runs. Cats defaecated in litter boxes containing Cat’s Best® Comfort non-clumping wood-based litter substrate. Animals were fed dry, nutritionally complete and balanced diets, according to individual energy needs. One cat also received a pouch of wet food daily.

Sample collection

A single fresh, naturally voided faecal stool was collected from each animal and stored at room temperature (20 °C) in a sealed, sterile container. Cat faeces were collected prior to the cat burying the stool in litter substrate. All faeces were scored using an adaptation of a method by Moxham 2001 [ 37 ]. Briefly, faeces are visually evaluated on a nine-point 1–5 scale, where a score of 1 is hard, dry and crumbly and a score of 5 is liquid. Faecal scores of 2-3.5 were considered acceptable for the study. A sample of approximately 1 g was taken from stored faeces at 0, 0.5, 1, 2, 3, 6 and 12 h post-collection. At T0, three horizontal regions of the faecal stool were sampled: the first emitted part (i), the middle (ii) and the last emitted part (iii), taking a sample from two depths at each location-surface and core-to give six samples at this timepoint per animal. The i and iii ends of the stool were identified using the observation that the i end is dry and the iii end is moist and soft. Surface samples of cat stools were taken from an area that had not been in contact with litter substrate. At T0.5, T1, T2, T3, T6, the sample was taken from the core of region ii for each animal and at T12, both a core and surface sample were taken from the iii region. Each sample was stored at -80 °C in DNA/RNA™ Shield (Ozyme) stabilisation solution prior to processing. To compare the use of an alternative stabilisation solution, additional aliquots from five dog samples (first emitted part (i) core, T0-T1) were placed in PERFORMABiome™ 200 stabilising solution (DNAgenotek®, Ottawa, Canada). All samples were uniquely coded and analysed blinded.

DNA extraction

A magnetic beads-based DNA extraction was performed at Eurofins Genomics Europe (Germany), according to the manufacturer’s instructions using KingFisher™ Flex (ThermoFisher Scientific, Waltham, MA, US), with an additional sample shredding step prior to lysis. Extracted DNA was stored in 1.5 ml Eppendorf microcentrifuge tubes at − 80° C until further analysis.

DNA preparation and sequencing

Shotgun sequencing libraries were prepared using NEB-Kit NEBNext Ultra II FS DNA Library Prep Kit for Illumina in combination with enzymatic fragmentation. Shotgun metagenomic sequencing was performed using Illumina NovaSeq 6000. At least 20 M paired-end reads/sample with sequences of approximately 150 bp in length were obtained.

Bioinformatics analyses and statistics

Paired-end reads were assembled and low quality reads (those with Phred quality score < 15) removed. Reads were subjected to taxonomic profiling by MetaPhlAn 3.0 [ 38 ] using the ChocoPhLan database (version mpa_v30_ChocoPhLan_201901). MetaPhlAn estimates the relative abundance of microbial taxa in a metagenome using the coverage of clade-specific marker genes. The outputs were used to generate relative abundance bar plots at phylum, genus and species level.

Alpha diversity of the samples was evaluated by Shannon Index [ 39 ] and Inverse Simpson Index [ 40 ], which account for both evenness and richness of the microbial species present. Both indices are on a scale of > 0 where a higher value denotes greater diversity.

Beta diversity was assessed using the Bray Curtis distance on the relative table at the species taxonomic range. Principal component analysis (PCA) was used to visualize samples similarities on a graph. One PCA was performed with all samples from dogs and cats, and 2 others were done to represent cat and dog samples respectively. Using data from dog samples and from cat samples, 2 PERMANOVA (adonis 2 function from R package vegan 2.6-4) were performed to evaluate the variance explained by the different sample characteristics (e.g., animal ID, sampling time, Depth sampling location, Horizontal sampling location, tube).

Statistical comparisons were performed using non-parametric tests. For comparisons between two groups, Wilcoxon rank sum tests were used. Wilcoxon signed rank tests were applied in cases of paired comparisons. For comparisons between more than two groups, Kruskal-Wallis rank sum tests were used. To correct for multiple comparisons, Benjamin-Hochberg adjustments were made. Comparisons were deemed statistically significant at p  < 0.05. Data management and statistical comparisons were carried out in R statistical software v 4.0.3.

Data Availability

The datasets generated and/or analysed during the current study are available in the GenBank NIH at the accession number/web link: [PRJNA939514], or from the corresponding author on reasonable request.

Suchodolski JS. Intestinal microbiota of dogs and cats: a bigger world than we thought. Vet Clin North Am Small Anim Pract. 2011;41(2):261–72.

Article   PubMed   PubMed Central   Google Scholar  

Deusch O, O’Flynn C, Colyer A, Morris P, Allaway D, Jones PG, et al. Deep Illumina-based shotgun sequencing reveals dietary effects on the structure and function of the fecal microbiome of growing kittens. PLoS ONE. 2014;9(7):e101021.

Guard BC, Barr JW, Reddivari L, Klemashevich C, Jayaraman A, Steiner JM, et al. Characterization of microbial dysbiosis and metabolomic changes in dogs with acute diarrhea. PLoS ONE. 2015;10(5):e0127259–e.

Guard BC, Mila H, Steiner JM, Mariani C, Suchodolski JS, Chastant-Maillard S. Characterization of the fecal microbiome during neonatal and early pediatric development in puppies. PLoS ONE. 2017;12(4):e0175718.

Hand D, Wallis C, Colyer A, Penn CW. Pyrosequencing the canine faecal microbiota: breadth and depth of biodiversity. PLoS ONE. 2013;8(1):e53115.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Minamoto Y, Otoni CC, Steelman SM, Büyükleblebici O, Steiner JM, Jergens AE, et al. Alteration of the fecal microbiota and serum metabolite profiles in dogs with idiopathic inflammatory bowel Disease. Gut Microbes. 2015;6(1):33–47.

Jha AR, Shmalberg J, Tanprasertsuk J, Perry L, Massey D, Honaker RW. Characterization of gut microbiomes of household pets in the United States using a direct-to-consumer approach. PLoS ONE. 2020;15(2):e0227289.

Whittemore JC, Stokes JE, Price JM, Suchodolski JS. Effects of a synbiotic on the fecal microbiome and metabolomic profiles of healthy research cats administered clindamycin: a randomized, controlled trial. Gut Microbes. 2019;10(4):521–39.

Pilla R, Suchodolski JS. The role of the canine gut Microbiome and Metabolome in Health and Gastrointestinal Disease. Front Veterinary Sci. 2020;6.

Wernimont SM, Radosevich J, Jackson MI, Ephraim E, Badri DV, MacLeay JM, et al. The effects of Nutrition on the gastrointestinal microbiome of cats and dogs: impact on Health and Disease. Front Microbiol. 2020;11:1266.

Carroll IM, Ringel-Kulka T, Siddle JP, Klaenhammer TR, Ringel Y. Characterization of the fecal microbiota using high-throughput sequencing reveals a stable microbial community during storage. PLoS ONE. 2012;7(10):e46953.

Roesch LF, Casella G, Simell O, Krischer J, Wasserfall CH, Schatz D, et al. Influence of fecal sample storage on bacterial community diversity. Open Microbiol J. 2009;3:40–6.

Tal M, Verbrugghe A, Gomez DE, Chau C, Weese JS. The effect of storage at ambient temperature on the feline fecal microbiota. BMC Vet Res. 2017;13(1):256.

Lin C-Y, Cross T-WL, Doukhanine E, Swanson KS. An ambient temperature collection and stabilization strategy for canine microbiota studies. Sci Rep. 2020;10(1):13383.

Sun S, Zhu X, Huang X, Murff HJ, Ness RM, Seidner DL, et al. On the robustness of inference of association with the gut microbiota in stool, rectal swab and mucosal tissue samples. Sci Rep. 2021;11(1):14828.

Jones RB, Zhu X, Moan E, Murff HJ, Ness RM, Seidner DL, et al. Inter-niche and inter-individual variation in gut microbial community assessment using stool, rectal swab, and mucosal samples. Sci Rep. 2018;8(1):4139.

Hernandez J, Rhimi S, Kriaa A, Mariaule V, Boudaya H, Drut A, et al. Domestic environment and gut microbiota: lessons from Pet Dogs. Microorganisms. 2022;10(5):949.

Ritchie LE, Burke KF, Garcia-Mazcorro JF, Steiner JM, Suchodolski JS. Characterization of fecal microbiota in cats using universal 16S rRNA gene and group-specific primers for Lactobacillus and Bifidobacterium spp. Vet Microbiol. 2010;144(1–2):140–6.

Article   PubMed   CAS   Google Scholar  

Tun HM, Brar MS, Khin N, Jun L, Hui RK, Dowd SE, et al. Gene-centric metagenomics analysis of feline intestinal microbiome using 454 junior pyrosequencing. J Microbiol Methods. 2012;88(3):369–76.

Fischer MM, Kessler AM, Kieffer DA, Knotts TA, Kim K, Wei A, et al. Effects of obesity, energy restriction and neutering on the faecal microbiota of cats. Br J Nutr. 2017;118(7):513–24.

Deusch O, O’Flynn C, Colyer A, Swanson KS, Allaway D, Morris P. A longitudinal study of the Feline Faecal Microbiome identifies changes into early adulthood irrespective of sexual development. PLoS ONE. 2015;10(12):e0144881.

Handl S, Dowd SE, Garcia-Mazcorro JF, Steiner JM, Suchodolski JS. Massive parallel 16S rRNA gene pyrosequencing reveals highly diverse fecal bacterial and fungal communities in healthy dogs and cats. FEMS Microbiol Ecol. 2011;76(2):301–10.

Moon CD, Young W, Maclean PH, Cookson AL, Bermingham EN. Metagenomic insights into the roles of Proteobacteria in the gastrointestinal microbiomes of healthy dogs and cats. Microbiologyopen. 2018;7(5):e00677.

Bermingham EN, Young W, Kittelmann S, Kerr KR, Swanson KS, Roy NC, et al. Dietary format alters fecal bacterial populations in the domestic cat (Felis catus). Microbiologyopen. 2013;2(1):173–81.

Hooda S, Vester Boler BM, Kerr KR, Dowd SE, Swanson KS. The gut microbiome of kittens is affected by dietary protein:carbohydrate ratio and associated with blood metabolite and hormone concentrations. Br J Nutr. 2013;109(9):1637–46.

Alessandri G, Milani C, Mancabelli L, Mangifesta M, Lugli GA, Viappiani A et al. The impact of human-facilitated selection on the gut microbiota of domesticated mammals. FEMS Microbiol Ecol. 2019;95(9).

Ma X, Brinker E, Graff EC, Cao W, Gross AL, Johnson AK et al. Whole-genome Shotgun Metagenomic sequencing reveals distinct gut Microbiome signatures of obese cats. Microbiol Spectr.0(0):e00837–22.

You I, Kim MJ. Comparison of gut microbiota of 96 healthy dogs by individual traits: Breed, Age, and Body Condition score. Anim (Basel). 2021;11(8).

Garcia-Mazcorro JF, Barcenas-Walls JR, Suchodolski JS, Steiner JM. Molecular assessment of the fecal microbiota in healthy cats and dogs before and during supplementation with fructo-oligosaccharides (FOS) and inulin using high-throughput 454-pyrosequencing. PeerJ. 2017;5:e3184.

Deng P, Swanson KS. Gut microbiota of humans, dogs and cats: current knowledge and future opportunities and challenges. Br J Nutr. 2015;113(Suppl):6–17.

Article   Google Scholar  

Bermingham EN, Young W, Butowski CF, Moon CD, Maclean PH, Rosendale D et al. The fecal microbiota in the domestic cat (Felis catus) is influenced by interactions between Age and Diet; a Five Year Longitudinal Study. Front Microbiol. 2018;9.

Bermingham EN, Kittelmann S, Henderson G, Young W, Roy NC, Thomas DG. Five-week dietary exposure to dry diets alters the faecal bacterial populations in the domestic cat (Felis catus). Br J Nutr. 2011;106(Suppl 1):49–52.

Kondreddy Eswar R, lt, sup, Hye-Ran gt, Jin Young K et al. J,. Impact of Breed on the Fecal Microbiome of Dogs under the Same Dietary Condition. Journal of Microbiology and Biotechnology. 2019;29(12):1947-56.

Middelbos IS, Vester Boler BM, Qu A, White BA, Swanson KS, Fahey GC. Jr. Phylogenetic characterization of fecal microbial communities of dogs fed diets with or without supplemental dietary fiber using 454 pyrosequencing. PLoS ONE. 2010;5(3):e9768.

Swanson KS, Dowd SE, Suchodolski JS, Middelbos IS, Vester BM, Barry KA, et al. Phylogenetic and gene-centric metagenomics of the canine intestinal microbiome reveals similarities with humans and mice. Isme j. 2011;5(4):639–49.

Finet SE, Southey BR, Rodriguez-Zas SL, He F, de Godoy MRC. Miscanthus Grass as a Novel Functional Fiber source in Extruded Feline diets. Front Veterinary Sci. 2021;8.

Moxham G. WALTHAM feces scoring system-a tool for veterinarians and pet owners: how does your pet rate? WALTHAM Focus2001. p. 24 – 5.

Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F, Maharjan S, et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife. 2021;10:e65088.

Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423.

Simpson EH. Meas Divers Nat. 1949;163(4148):688.

Google Scholar  

Robinson V. Finding alternatives: an overview of the 3Rs and the use of animals in research. Sch Sci Rev. 2005;87:111–4.

Sert NPd, Hurst V, Ahluwalia A, Alam S, Avey MT, Baker M, et al. The ARRIVE guidelines 2.0: updated guidelines for reporting animal research. PLoS Biol. 2020;18(7):1–12.

Download references


We wish to thank Marie-Odile Fardet (Specific Protocol Technician, Royal Canin) for assisting with sample collection, Jonathan Plassais (Biostatistician consultant, Stat-Alliance) and Alban Mathieu (Biostatistician consultant, AM Analysis) for their expertise and Dirk Berkelmann (Project Manager NGS, Eurofins Genomic) for his technical assistance with the laboratory work.

The study was funded by Royal Canin, a division of Mars Petcare and received no external funding.

Author information

Authors and affiliations.

Royal Canin Sas, 650 avenue de la Petite Camargue, AIMARGUES Cedex, CS, 10309, 30470, France

Xavier Langon

You can also search for this author in PubMed   Google Scholar


Xavier Langon designed the study, coordinated faecal collection, and was responsible for faecal sample control and shipment. Xavier Langon was responsible for data interpretation. Xavier Langon drafted and approved the manuscript.

Corresponding author

Correspondence to Xavier Langon .

Ethics declarations

Competing interests.

Xavier Langon is an employee of Royal Canin, with an interest in petfood manufacture.

Ethics approval and consent to participate

All the pets (dogs and cats) are our animals from our facilities. All experiments are performed in accordance with relevant guidelines and regulations. This study was conducted in accordance with the Mars Animal Research Policy ( www.mars.com ), adhering to the 3Rs approach to animal research as described by Robinson (2005) [ 41 ] and complies with the ARRIVE guidelines [ 42 ]. All studies were approved by the Royal Canin Ethic Committee.

Consent for publication

Not applicable.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.


Supplementary Material 1: Heatmap of D4 samples using the bray Curtis distance and the ward linkage method. Relative abundances of species are indicatedby the intensity of the red color.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Cite this article.

Langon, X. Validation of method for faecal sampling in cats and dogs for faecal microbiome analysis. BMC Vet Res 19 , 274 (2023). https://doi.org/10.1186/s12917-023-03842-7

Download citation

Received : 18 September 2022

Accepted : 03 December 2023

Published : 16 December 2023

DOI : https://doi.org/10.1186/s12917-023-03842-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Gastrointestinal
  • Temperature

BMC Veterinary Research

ISSN: 1746-6148

analysis in research methodology


Amazon Value Chain Analysis

Value chain analysis is an analytical framework that assists in identifying business activities that can create value and competitive advantage to the business. Figure below illustrates the essence of Amazon value chain analysis.

Amazon Value Chain Analysis

Amazon Value chain analysis

Amazon primary activities, amazon inbound logistics.

Inbound logistics within Amazon value chain analysis involve receiving and storing raw materials to produce goods and services. Due to massive global scope of its operations the e-commerce giant maintains complex, but sophisticated inbound logistics operations.

Generally, Amazon does not have long-term contracts or arrangements with its vendors to guarantee the availability of merchandise, particular payment terms, or the extension of credit limits. Fulfilment by Amazon (FBA) is the cornerstone of Amazon inbound logistics for company-owned retail business. Moreover, the economies of scale is an important source of value creation for Amazon inbound logistics.

Sellers can also use FBA by stowing their inventory in Amazon fulfilment centres. In this case, Amazon assumes full responsibility for logistics, customer service, and product returns. If a customer orders an FBA item and an Amazon owned-inventory item, the company ships both items to the customer in one box, as a significant gain of efficiency. The use of FBA is an optional choice for sellers and this choice makes the products of third-party sellers eligible for Amazon Prime free two-day shipping, free shipping and other benefits.

Amazon uses logistics beyond the point to serve Amazon Marketplace and starting from recently, the company has been offering logistics services to third parties. For example, Beijing Century Joyo Courier Services, an Amazon subsidiary registered with the U.S. government as an ocean shipping provider. [1] From this point of view, efficient logistics infrastructure also belongs to the list of Amazon competitive advantages.

Amazon Operations

Operations generally comprise the process of transforming raw materials into goods to be sold or the process of providing services. Amazon operations are organized into three segments:

1. North America . This segment operates North America-focused websites such as www.amazon.com, www.amazon.ca, and www.amazon.com.mx. Sales in this segment increased by 18,4% in 2021 compared to the previous year. [2] Although 18,4% can be considered a healthy growth rate by many businesses, this rate is a relatively modest for Amazon because its increase rate is rarely below 20%.

2. International . This segment operates internationally-focused websites such as www.amazon.com.au, www.amazon.com.br, www.amazon.cn and others. Revenues from international operations are subject to changes in currency exchange rates.

3. Amazon Web Services (AWS) . This segment deals with global sales of computing, storage, database, and other service offerings for start-ups, enterprises, government agencies, and academic institutions. AWS achieved USD50 billion annualized run rate in 2020. [3] In 2021 AWB revenues increased by 40% to reach USD 17.8 billion

AWS offers pay-as-you-go cloud storage, compute resources, networking and computing services and its major customers include Pinterest, Dropbox, and Airbnb. Moreover, AWS is positioned as a platform for building applications and businesses like GE, Major League Baseball, Tata Motors, and Qantas have built applications ranging from apps for crowdsourcing and personalized healthcare to mobile apps for managing fleets of trucks on the basis of AWS.

A creative and innovative approach to problem-solving is one of the major sources of value creation associated with Amazon operations. Specifically, the tech giant initially developed cloud storage and cloud compute resources for its own business needs in an attempt to sophisticate its business operations. Later the company commercialized cloud services once its benefits became evident.

Amazon Outbound Logistics

Outbound logistics refers to warehousing and delivering goods to the end-user. In the past Amazon had relied upon the services of overnight delivery businesses such as UPS, FedEx and TNT. Amazon has been shifting its outbound logistics operations in-house for the past few years.  Specifically, the online retail behemoth has evolved as one of the biggest transportation companies in the world with 400,000 drivers worldwide, 40,000 semi-trucks, 30,000 vans, and a fleet of more than 70 planes worldwide. [4]

Generally, Amazon outbound logistics integrates the following:

1. Fulfilment centres . The e-commerce giant operates more than 175 fulfilment centres around the globe and the company uses robotic technology in an extensive manner to manage receipt, stowing, picking, and shipment of products. [5]

2. Co-sourced and outsourced arrangements

3. Digital delivery. These relate to products and services that can be downloaded from Amazon website.

4. Physical stores . Started as purely online business, nowadays Amazon operates seven different types of physical stores such as Amazon Go Grocery, Whole Foods, Amazon Books and Amazon Pop Up themed kiosks.

Amazon Marketing and Sales

In 2016, Amazon spent more on marketing than Wal-Mart Stores, Target, Best Buy, Home Depot, and Kroger combined. [6] In 2021 alone The largest internet retailer in the world by revenue spent USD 16,9 billion in advertising and promotions worldwide. Therefore, it can be argued that marketing and sales is one of the major sources of value in Amazon chain of operations, but this value is generated thanks to excessive marketing investments.

Amazon marketing message conveys the promises of the largest selection of products and services, attractive prices, fast delivery of products and overall superior customer services. Several components of the marketing communication mix such as print and media advertising, sales promotion, events and experiences, public relations and direct marketing are used in an integrated way in order to communicate the marketing message to the target customer segment.

Amazon Service

Exceptional customer service is a major source of value creation for the e-commerce and cloud computing company. Amazon annual report says “we seek to be Earth’s most customer-centric company” [7] and accordingly, the company offers exceptional customer services. Customers can contact Amazon by phone, email, chat, or social media.

Amazon Marketplace and Prime has two types of customers – sellers on and buyers from Amazon platform. For sellers in particular, Amazon offers Selling Coach program “alerting sellers about opportunities to avoid going out-of-stock, add selection that’s selling, and sharpen their prices to be more competitive” [8] .

Moreover, “Amazon’s returns process is dealt with entirely online through a customer’s account. If there is an issue that does require a customer to speak with a customer service assistant over the phone, they will have access to the customer’s account and order details, meaning that any issues can be dealt with quickly and efficiently.” [9] The e-commerce giant offers free returns with no box, tape, or label needed, which is an unprecedented offer worldwide.

Amazon.com Inc. Report contains a full version of Amazon value chain analysis. The report illustrates the application of the major analytical strategic frameworks in business studies such as SWOT, PESTEL, Porter’s Five Forces, Ansoff Matrix and McKinsey 7S Model on Amazon . Moreover, the report contains analyses of Amazon leadership, business strategy, organizational structure and organizational culture. The report also comprises discussions of Amazon marketing strategy, ecosystem and addresses issues of corporate social responsibility.

Amazon.com Inc. Report 2022

[1] Schreiber, Z. (2016) “Is Logistics About To Get Amazon’ed?” Tech Crunch, Available at: https://techcrunch.com/2016/01/29/is-logistics-about-to-get-amazoned/

[2] Davis, D (2022) “Amazon’s North America revenue ticks up 18.4% in 2021” Digital Commerce, Available at: https://www.digitalcommerce360.com/article/amazon-sales/

[3] Annual Report (2020) Amazon.com Inc.

[4] Schoolov, K. (2021) “Amazon is now shipping cargo for outside customers in its latest move to compete with FedEx and UPS” CNBC, Available at: https://www.cnbc.com/2021/09/04/how-amazon-is-shipping-for-third-parties-to-compete-with-fedex-and-ups.html

[5] Why Amazon warehouses are called fulfilment centres (n.d.) Amazon, Available at: https://www.aboutamazon.co.uk/amazon-fulfilment/our-fulfilment-centres/why-amazon-warehouses-are-called-fulfilment-centers

[6] Green, T. (2017) “Amazon spends more on advertising than Walmart, Target, Best Buy, Home Depot, and Kroger combined” Business Insider, Available at: https://www.businessinsider.com/amazon-stock-price-advertising-spending-compared-competitors-2017-6         

[7] Annual Report (2016) Amazon.com Inc.

[8] Annual Report (2014) Amazon

[9] Amazon CRM Case Study (2017) Expert CRM Software, Available at: http://crmsystems.expertmarket.co.uk/Amazon-CRM-Case-Study

This paper is in the following e-collection/theme issue:

Published on 18.12.2023 in Vol 25 (2023)

Existing Barriers Faced by and Future Design Recommendations for Direct-to-Consumer Health Care Artificial Intelligence Apps: Scoping Review

Authors of this article:

Author Orcid Image

  • Xin He, MA   ; 
  • Xi Zheng, MA   ; 
  • Huiyuan Ding, BA  

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China

Corresponding Author:

School of Mechanical Science and Engineering

Huazhong University of Science and Technology

Luoyu Road 1037

Hongshan District

Wuhan, 430074

Phone: 86 18707149470

Email: [email protected]

Background: Direct-to-consumer (DTC) health care artificial intelligence (AI) apps hold the potential to bridge the spatial and temporal disparities in health care resources, but they also come with individual and societal risks due to AI errors. Furthermore, the manner in which consumers interact directly with health care AI is reshaping traditional physician-patient relationships. However, the academic community lacks a systematic comprehension of the research overview for such apps.

Objective: This paper systematically delineated and analyzed the characteristics of included studies, identified existing barriers and design recommendations for DTC health care AI apps mentioned in the literature and also provided a reference for future design and development.

Methods: This scoping review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews guidelines and was conducted according to Arksey and O’Malley’s 5-stage framework. Peer-reviewed papers on DTC health care AI apps published until March 27, 2023, in Web of Science, Scopus, the ACM Digital Library, IEEE Xplore, PubMed, and Google Scholar were included. The papers were analyzed using Braun and Clarke’s reflective thematic analysis approach.

Results: Of the 2898 papers retrieved, 32 (1.1%) covering this emerging field were included. The included papers were recently published (2018-2023), and most (23/32, 72%) were from developed countries. The medical field was mostly general practice (8/32, 25%). In terms of users and functionalities, some apps were designed solely for single-consumer groups (24/32, 75%), offering disease diagnosis (14/32, 44%), health self-management (8/32, 25%), and health care information inquiry (4/32, 13%). Other apps connected to physicians (5/32, 16%), family members (1/32, 3%), nursing staff (1/32, 3%), and health care departments (2/32, 6%), generally to alert these groups to abnormal conditions of consumer users. In addition, 8 barriers and 6 design recommendations related to DTC health care AI apps were identified. Some more subtle obstacles that are particularly worth noting and corresponding design recommendations in consumer-facing health care AI systems, including enhancing human-centered explainability, establishing calibrated trust and addressing overtrust, demonstrating empathy in AI, improving the specialization of consumer-grade products, and expanding the diversity of the test population, were further discussed.

Conclusions: The booming DTC health care AI apps present both risks and opportunities, which highlights the need to explore their current status. This paper systematically summarized and sorted the characteristics of the included studies, identified existing barriers faced by, and made future design recommendations for such apps. To the best of our knowledge, this is the first study to systematically summarize and categorize academic research on these apps. Future studies conducting the design and development of such systems could refer to the results of this study, which is crucial to improve the health care services provided by DTC health care AI apps.


The scarcity and uneven distribution of health care resources, such as medical facilities and professionals, often impedes people’s access to timely and effective health care services and professional medical advice, which has been a significant health concern worldwide [ 1 ]. The World Health Organization (WHO) and other institutions have identified artificial intelligence (AI) as a technology that has the potential to fundamentally transform health care and help address these challenges, especially the reduction in health inequalities in low- and middle-income countries (LMICs) [ 2 , 3 ].

Among AI programs that provide health care functions, there is a significant surge in health care apps that are sold directly to consumers for personal use. Most of these apps are based on predictive or diagnostic functions, providing consumers with a purportedly inexpensive and accurate diagnosis of various conditions [ 4 ]. A well-known example is the Apple Watch for atrial fibrillation, which has been authorized as a class II (moderate-risk) device [ 5 ]. The increased emphasis on telemedicine and home health care in the era of the COVID-19 pandemic [ 6 ], as well as the current advancements in generative AI technologies, such as ChatGPT (where GPT stands for Generative Pretrained Transformer), further stimulate and drive the emergence of direct-to-consumer (DTC) health care AI apps. Large enterprises are racing to deploy research and development of DTC health care AI apps. For example, Dr Karen DeSalvo, Google’s chief health officer, argued at “Check Up 2023” that the future of health is consumer driven. As a company with advanced AI technologies, Google will drive AI-enabled insights, services, and care across a range of health care use cases, from search to symptom tracking and treatment [ 7 ].

However, on the one hand, existing DTC health care AI apps carry risks of errors at both the individual and the societal level. At the individual level, consumers may face the costs and consequences of overdiagnosis or underdiagnosis when using these apps. For example, Google announced an AI-powered dermatology assist app that, according to the company, can use deep learning to identify 288 skin, hair, and nail conditions based on user-submitted images [ 8 ]. However, the app has a significant limitation due to its lack of data diversity, which could lead to overdiagnosis or underdiagnosis in non-White patients [ 9 ]. At the societal level, DTC health care AI apps are designed for cost-effective, immediate, and repeated use, increasing the likelihood that their errors will spread rapidly and place a significant burden on the overall health care system [ 4 ].

On the other hand, the manner in which consumers interact directly with AI in DTC health care AI apps is transformative and alters the traditional physician-patient relationships. These apps can directly provide consumers with various functions, such as heart dysfunction identification [ 10 , 11 ], eye disease diagnosis [ 12 ], and emotion regulation and treatment [ 13 ], which were previously provided by human health care experts. However, in the process of consumers directly interacting with AI, failure to incorporate consumer behavior insights into AI technological development will undermine their experience with AI [ 14 ], thereby affecting their adoption of such apps [ 15 ].

In the context of a surge in DTC health care AI apps, academic research focusing on consumers in the health care AI field is relatively scarce, and there is limited understanding of consumer acceptance of AI in the health care domain [ 16 ]. Furthermore, most trials of clinical AI tools omit the evaluation of patients’ attitudes [ 17 ]. The majority of existing reviews either concentrate on health care AI systems for expert users, such as health care providers [ 18 , 19 ], or do not clearly differentiate the user categories for AI apps in health care [ 20 , 21 ]. There is a need for a deeper understanding of how consumers interact with DTC health care AI apps, beyond merely considering the system’s technical specifications [ 4 ]. Previous studies have reviewed AI apps that are patient oriented and have unique features, functionalities, or formats [ 22 - 24 ]. However, the overall landscape of DTC health care AI apps in academic research remains unclear. There is also a lack of studies that systematically summarize the potential barriers faced by these apps, as well as design recommendations for future research.

To the best of our knowledge, this is the first academic study to systematically summarize and sort out the profile of health care AI apps directly targeting consumers. The objectives of this research are twofold: first, to provide a comprehensive overview of existing studies related to DTC health care AI apps, exploring and mapping out their study characteristics, and, second, to summarize observed barriers and future design recommendations in the literature. Understanding these issues is crucial for the future research, design, development, and adoption of DTC health care AI apps.

Study Design

A scoping review was conducted in line with Arksey and O’Malley’s 5-stage framework [ 25 ]. Study results were reported according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist [ 26 ] ( Multimedia Appendix 1 ).

Stage 1: Identifying the Research Question

To address the aim of this study, 3 research questions were formulated:

  • Research question 1: What characteristics of DTC health care AI apps have been identified in existing research?
  • Research question 2: What barriers are faced by DTC health care AI apps in existing research?
  • Research question 3: What design recommendations for DTC health care AI Apps have been put forward in existing research?

Stage 2: Identifying Relevant Studies

Studies were searched from inception until March 27, 2023. We searched 5 databases (Web of Science, Scopus, the ACM Digital Library, IEEE Xplore, and PubMed) for 4 concept areas and their lexical variants and synonyms ( Textbox 1 ): AI (technical basis), health care (application domain), consumer (user), and app (carrier). In addition, we retrieved gray literature from the top 10 pages of Google Scholar search results. Gray literature encompasses the literature produced by various levels of government, academia, business, and industry in both print and electronic formats, which is not controlled by commercial publishers [ 27 ]. Its forms include academic papers, dissertations, research and committee reports, government publications, conference papers, and ongoing research, among others.

Search concepts combined using “AND”

  • Artificial intelligence (AI)
  • Health care

Search terms combined using “OR”

  • AI, artificial intelligence, ML, machine learning, DL, deep learning
  • Health care, health, medical
  • Consumer, consumers
  • Application, applications, app, apps, system, systems, service, mHealth, eHealth

We also conducted snowball sampling on the reference lists of related papers included in the full-text review. The specific database search strings combined with Boolean operators are detailed in Multimedia Appendix 2 .

Stage 3: Study Selection

Inclusion criteria for this review were (1) peer-reviewed studies, (2) research papers, (3) papers published in English, (4) research topics focused on DTC health care AI apps or systems, and (5) either consumers as target users or multistakeholder users with consumers as main users. Exclusion criteria were (1) duplicate papers not identified by bibliography software, (2) nonresearch papers (eg, editorials, commentaries, perspectives, opinion papers, or reports), (3) papers not published in English, (4) inability to obtain the full text, and (5) app only intended to be used by professionals.

Inclusion and exclusion criteria ( Table 1 ) were used to screen titles, abstracts, and full-text papers. When the 2 authors (XH and XZ) disagreed on the selection of studies, consensus was reached through discussion.

a DTC: direct to consumer.

b AI: artificial intelligence.

Stage 4: Charting the Data

Two authors (XH and XZ) extracted the following data for each paper: title, author, publication year, country, publication type, study objective, study design, medical field, app type, user, existing barriers, and design recommendations. We exclusively extracted data related to barriers and design recommendations from the results or discussions within the papers (eg, insights, such as opinions expressed by consumers after using the apps or recommendations proposed by researchers following app evaluations). Descriptions that were not validated through the empirical research section of the papers were not extracted (eg, viewpoints that appeared only in the Introduction or Background section).

Stage 5: Collating, Summarizing, and Reporting Results

The extracted data related to RQ1 were mapped and summarized. A reflexive thematic analysis [ 28 - 30 ] was conducted on the data related to RQ2 and RQ3 to summarize existing barriers faced by and design recommendations for DTC health care AI apps through inductive coding. NVivo (QSR International) was used to facilitate data management and analysis. The analysis proceeded through 6 steps: familiarizing with the data set; coding; generating initial themes; developing and reviewing themes; refining, defining. and naming themes; and writing up. The coding and data analysis for this study were performed in parallel, and we addressed differences and reached consensus by discussing uncertainties.

Search Results

The initial search resulted in the retrieval of 4055 records. After removing duplicates, 2898 (71.5%) records remained. After screening titles and abstracts, 2752 records (95%) were excluded, and the remaining 146 (5%) records were assessed for eligibility through full-text review. An additional 3 records were obtained through a snowball search of the reference lists in the included full-text papers. Of these 149 records, 115 (77.2%) were excluded for reasons shown in Figure 1 , resulting in 32 (21.5%) papers being included in the final scoping review. Figure 1 shows the PRISMA-ScR (Preferred Reporting Item for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) flow.

analysis in research methodology

Research Question 1: Study Characteristics

An overview of the 32 papers included in the scoping review is provided in Tables 2 - 4 , including author, publication year, country, publication type, study objective, study design, medical field, app type, and user. We did not restrict the search year intentionally, as most health care AI review papers do [ 31 - 33 ]. However, the results indicated that the reviewed papers were fairly recent, with all the 32 (100%) included studies published between 2018 and 2023. Papers were from North America (7/32, 22%) [ 10 , 13 , 15 , 34 - 38 ], Asia (6/32, 19%) [ 39 - 44 ], Europe (6/32, 19%) [ 12 , 45 - 49 ], and Oceania (2/32, 6%) [ 17 , 50 ]. In addition, multiple regional cooperation was also prevalent (11/32, 34%) [ 51 - 61 ]. Publication types included 23 (72%) journal papers ( Tables 2 and 3 ) [ 10 , 12 , 15 , 17 , 34 , 37 , 39 , 41 , 43 , 45 - 49 , 52 - 62 ] and 9 (28%) conference papers ( Table 4 ) [ 13 , 35 , 36 , 38 , 40 , 42 , 44 , 50 , 51 ]. Study designs included quantitative research (22/32, 69%) [ 12 , 13 , 15 , 34 , 37 , 39 , 40 , 42 - 44 , 47 - 52 , 54 , 55 , 57 - 59 , 61 ], qualitative research (2/32, 6%) [ 35 , 60 ], and mixed methods studies (4/32, 12%) [ 38 , 41 , 45 , 46 ], in addition to systematic reviews (4/32, 12%) [ 17 , 36 , 53 , 56 ]. Most studies chose general practice (8/32, 25%) [ 34 , 37 , 40 , 41 , 46 , 49 , 54 , 55 ] as the target medical field. The app types mentioned in the studies included diagnosis (apps make determinations about the cause of a disease or pathology based on information provided by consumers; 14/32, 44%) [ 12 , 38 , 40 - 42 , 47 , 48 , 51 , 52 , 54 , 55 , 57 , 60 , 61 ], health self-management (apps encourage consumers to take actions to manage their continuous health status and quality of life, often in the management of chronic diseases or health problems; 8/32, 25%) [ 13 , 43 , 44 , 49 , 50 , 56 , 58 , 59 ], and health care information inquiry (apps extract relevant information from a large amount of health care information and generate answers based on consumer questions in common forms, such as conversational agents; 4/32, 13%) [ 35 , 37 , 39 , 46 ]. There were also review papers (4/32, 13%) [ 17 , 36 , 53 , 56 ] that reviewed apps involving more than 1 of the aforementioned function types. Some of these apps were aimed at the single-consumer group (24/32, 75%) [ 12 , 13 , 15 , 34 - 43 , 45 - 48 , 50 , 51 , 54 - 57 , 60 , 61 ], while other apps not only targeted consumers as the main users but also targeted user groups with other identities, including physicians (5/32, 16%) [ 17 , 49 , 52 , 53 , 59 ], health departments (2/32, 6%) [ 42 , 44 ], nursing staff (1/32, 3%) [ 58 ], and patients’ family members (1/32, 3%) [ 59 ]. Figure 2 shows an overview of the study characteristics of DTC health care AI apps, including country, year, application type, user, medical field, and study design.

a AI: artificial intelligence.

b N/S: not specified.

c XAI: explainable artificial intelligence.

d ECG: electrocardiogram.

analysis in research methodology

Research Question 2: Barriers

We identified 8 barriers to designing and developing DTC health care AI apps: (1) lack of explainability and inappropriate explainability, (2) lack of empathy, (3) effect of information input method and content on usability, (4) concerns about the privacy protection ability, (5) concerns about the AI accountability system, (6) lack of trust and overtrust, (7) concerns about specialization, and (8) the unpredictable future physician-patient relationship. These 8 existing barriers faced by DTC health care AI apps, along with their related subthemes, and the number of studies mentioning them are shown in Figure 3 .

analysis in research methodology


Lack of explainability.

Of the 32 studies, 10 (31%) [ 10 , 34 , 36 , 38 , 41 , 46 , 51 , 52 , 54 , 60 ] pointed out that the explanations provided by existing DTC health care AI apps are insufficient. Existing studies mostly provided explanations primarily for domain experts, paying less attention to the explainability needs of lay users, such as consumers [ 41 ]. In addition, 2 (6%) studies [ 46 , 51 ] pointed out that current DTC health care AI apps lack the explanations of relevant knowledge in the AI field (ie, the explanations of the working principle of the machine learning algorithm used by the apps, such as how AI correctly responds to consumers’ health consultations [ 46 ]). Furthermore, 4 (13%) studies [ 34 , 46 , 51 , 54 ] indicated that current DTC health care AI apps lack explanations of relevant knowledge of the medical field, such as highly specialized medical terminology [ 34 ] and rare diseases that have only been discussed in professional literatures [ 54 ], and 4 (13%) studies [ 36 , 38 , 51 , 60 ] pointed out the disadvantages of a lack of explainability, which caused consumers to doubt the usefulness, accuracy, and safety of the apps and even possibly view them as a threat. Moreover, 1 (3%) study [ 51 ] mentioned the advantages of providing explanations, which aided consumers in understanding the reasoning of the system, and this understanding was crucial for boosting the trust of lay users.

Inappropriate Explainability

Of the 32 studies, 3 (9%) [ 38 , 41 , 60 ] highlighted that current DTC health care AI apps contain inappropriate explanations. Specifically, 2 (6%) studies [ 38 , 41 ] mentioned that excessive explanations can result in information overload for users, which in turn would negatively impact the user experience and might cause users to ignore system prompts or suggestions. In addition, 1 (3%) study [ 60 ] pointed out that the poor information quality of explanations would be considered by users as “invalid, meaningless, not legit, or a bunch of crap” and even cause users to perceive it as a risk, prompting them to seek secondary confirmation of information through other channels (eg, online search or consultation with a doctor) to ensure their own safety. Furthermore, 2 (6%) studies [ 38 , 41 ] indicated that improper levels of transparency or inappropriate presentation formats in explanations can pose risks, potentially harming the interests of other stakeholders in the AI system or affecting the authenticity of users’ future performances. Specifically, inappropriate transparency of explanations might lead to the disclosure of sensitive details and intrusion of systems, harming the interests of AI service providers and violating the privacy of other consumers [ 38 ]. Explaining to users how a particular feature would accurately affect the disease diagnosis might affect their performance authenticity in the future diagnosis of related diseases, allowing them to manipulate the likelihood of being diagnosed or not diagnosed by deliberately meeting or avoiding meeting the characteristic threshold, respectively [ 41 ]. Inappropriate presentation forms of explanations, such as the function of counterfactual explanations that allowed users to freely edit data to view different diagnostic results, were popular with physicians because they met the needs of medical users to test different data and corresponding diagnostic possibilities, but they might become technical loopholes in the commercialization of DTC health care AI apps. Users could exploit this feature to input data for multiple individuals and view different results, thereby avoiding multiple payments and compromising the economic interests of the AI service provider [ 41 ].

In a total of 8 (25%) studies [ 17 , 36 , 39 , 41 , 45 , 46 , 51 , 60 ], users felt that AI lacked empathy and was impersonal. Among them, users in 2 (6%) studies [ 45 , 46 ] felt that AI was unable to understand emotion-related issues, especially mental health problems, and 2 (6%) studies [ 41 , 60 ] pointed out that the information-conveying method of AI, such as transmitting complex disease information without human presence [ 60 ] and explaining the disease from the perspective of “how bad it is” [ 41 ], could also lead users to think that AI is indifferent and inhumane. In addition, 5 (16%) studies [ 36 , 39 , 41 , 46 , 60 ] reported that the lack of empathy would lead to a series of negative consequences, including triggering users’ frustration, disappointment, anxiety, and other negative emotions [ 36 , 60 ]; impeding users’ acceptance of such apps [ 39 , 46 ]; and even affecting their subsequent treatments [ 41 ]. Furthermore, according to 2 (6%) studies [ 46 , 51 ], some users preferred to consult human physicians rather than AI because they could offer comfort and spiritual support.

Restricted Information Input Method

Of the 32 studies, 2 (6%) [ 36 , 54 ] pointed out that the restricted information input method in DTC health care AI apps (eg, a single way of typing) made users feel helpless and frustrated, which was contrary to their usage expectations, and even made them inclined to discontinue use.

Lack of Actionable Information

Of the 32 studies, 2 (6%) [ 10 , 54 ] pointed out that DTC health care AI apps lacked actionable information content, failing to inform users of the next actions to take, such as where to seek medical assistance.

In total, 4 (12%) studies [ 15 , 46 , 51 , 60 ] raised concerns about the ability of DTC health care AI apps to protect privacy, such as safeguarding users’ sensitive health-related information from data breaches. Users were concerned that their personal information (eg, habits, preferences, and health records) would be collected without their knowledge [ 46 ], that anonymous data would be re-identified through AI processes [ 15 ], that data would be sold by companies for secondary exploitation [ 51 ], and that their health data would be hacked and used against them [ 60 ].

Accountability and Supervision

In total, 4 (12%) studies [ 12 , 17 , 41 , 60 ] raised concerns about the accountability of DTC health care AI apps, and 2 (50%) of these studies [ 17 , 41 ] indicated that only few controversial studies exist on the distribution of AI responsibilities. Another study [ 12 ] exemplified the practice of some application manufactures who made general recommendations (eg, “recommend emergency care”) for almost every diagnosis, thereby transferring responsibility to users. In some countries, according to 1 (3%) study [ 17 ], there were concerns with the supervision of DTC health care AI apps. The absence of human supervision during the design, development, and deployment of AI not only failed to ensure the anticipated benefits but also posed a risk of potential injury to users.

Lack of Trust

A total of 10 (31%) studies [ 15 , 17 , 36 , 41 , 43 , 46 , 52 , 54 , 60 , 61 ] pointed out that users lacked trust in DTC health care AI apps. Among them, 5 (50%) studies [ 15 , 17 , 54 , 60 , 61 ] distrusted AI due to inadequate performance or the lack of performance explanations, 3 (30%) studies [ 41 , 43 , 46 ] found that even if the AI performed as well as or better than human physicians, users still placed more trust and reliance on humans, and 3 (30%) studies [ 15 , 36 , 52 ] indicated that users’ lack of trust might cause them to disregard AI recommendations or even stop using such apps.

Based on the calibration between trust and competence, trust can be divided into 3 levels: calibrated trust, distrust, and overtrust. Distrust refers to users being less willing to trust AI compared to similar human providers, even if AI shows superior performance; overtrust refers to the user’s trust in the system beyond its actual capabilities [ 63 ]. Of the 32 studies, 2 (6%) [ 47 , 52 ] indicated that users’ overtrust issues in DTC health care AI apps would impose a double burden on both individuals [ 47 , 52 ] and society [ 47 ]. At the individual level, 2 (6%) studies [ 47 , 52 ] pointed out that overtrusting false-positive results could result in users’ negative emotions (eg, stress [ 52 ]). Tools with a high rate of false positives might also reduce users’ trust in true-positive results [ 47 ]. In addition, 1 (3%) study [ 52 ] pointed out that overtrusting false-positive results could trigger users’ unnecessary behaviors, such as unnecessary medical treatment, while 1 (3%) study [ 47 ] pointed out that overtrusting false-negative results would provide users with a false sense of security and delay the disease diagnosis. At the societal level, 1 (3%) study [ 47 ] indicated that individuals’ overtrust in false-positive results could overwhelm the entire health care system, whereas individuals’ overtrust in false-negative results could exacerbate the social transmission of diseases (eg, COVID-19).


In total, 2 (6%) studies [ 48 , 51 ] raised concerns about the specialization of DTC health care AI apps. To be specific, users in 1 (3%) study [ 51 ] doubted the feasibility of substituting consumer-grade equipment for professional medical-grade equipment. For example, they argued that an artificial intelligence–electrocardiogram (AI-ECG) smartwatch that measured only the wrist could not replace a traditional ECG machine with 12 electrodes for detecting heart diseases. The other study [ 48 ] pointed out that the professional effect of DTC health care AI apps is influenced by the using environment. For example, an app that detects obstructive sleep apnea, which is affected by background noise, might work in tightly controlled laboratory conditions but might not be as accurate in in-home environments.

Physician-Patient Relationship

In total, 2 (6%) studies [ 17 , 53 ] believed that DTC health care AI apps would make the physician-patient relationship less predictable. As a result of AI user empowerment and the emergence of “do-it-yourself” medicine, users were less reliant on medical experts [ 17 ] and expert medical advice [ 53 ]. The effects of AI on the physician-patient relationship remains to be evaluated by more studies [ 53 ].

Research Question 3: Design Recommendations

The themes of design recommendations covered 6 types of recommendations and their specific contents mentioned by existing studies when designing and developing DTC health care AI apps: (1) enhance explainability, (2) improve empathy, (3) improve usability, (4) enhance privacy protection ability, (5) address AI accountability at both the individual and the government level, and (6) improve the diversity of participants to enhance inclusion. These 6 design recommendations for DTC health care AI apps, as well as the related subthemes and the number of studies mentioning them, are shown in Figure 4 .

analysis in research methodology

Enhance Explainability

Of the 32 studies, 5 (16%) [ 41 , 43 , 46 , 54 , 60 ] suggested designing and developing explainable DTC health care AI apps from 3 perspectives: the explanations’ primary content, their presentation form, and their legislation. First, 4 (13%) studies [ 41 , 46 , 54 , 60 ] provided content recommendations for explanations: input (explanations of the input data) [ 41 , 54 ], output (explanations of the generated output) [ 41 ], the how (explanations of how the system as a whole works) [ 41 , 54 , 60 ], performance (explanations of the capabilities, limitations, and verification process of the current system) [ 41 , 46 , 54 , 60 ], the why (explanations as to why, and why not, the system made a specific decision) [ 41 ], what-if (explanations to speculate on the system’s output under a particular set of settings and to describe what the system would do) [ 41 ], responsibility (explanations of the system’s accountability) [ 41 ], ethics (explanations of information from regulatory approvals or peer-reviewed publications that validated the system) [ 41 ], the social effect (explanations of the results of other social subjects using the system) [ 41 ], and domain knowledge (explanations of specific AI or medical terms and information sources in the system) [ 41 , 54 ]. Second, based on the complex diversity of consumer groups with varying domain knowledge, cognitive styles, and urgency of symptoms, 1 (3%) study [ 41 ] provided suggestions for explanations’ presentation forms: using a progressive disclosure approach to present various levels and formats of explanations to meet the needs of a wider consumer group. Third, 1 (3%) study [ 43 ] provided legislative suggestions for explanations: future governments and regulatory agencies, particularly in the medical field, would need to further establish and improve the legal framework for transparent AI to safeguard the right of consumers to obtain explanations based on algorithmic decisions.

Improve Empathy

In total, 6 (19%) studies [ 15 , 36 , 41 , 49 , 55 , 60 ] designed and developed empathetic DTC health care AI apps. Specifically, 3 (9%) studies [ 15 , 36 , 49 ] suggested that such apps could directly incorporate conversational agents or refer to research results in this field to embed richer semantics [ 49 ] and add more social cues [ 15 ], while 2 (6%) studies [ 41 , 60 ] suggested focusing on skills for delivering stressful information.

Improve Usability

In total, 6 (19%) studies [ 34 , 38 , 41 , 49 , 54 , 60 ] enhanced the usability of DTC health care AI apps in 3 aspects: information input method, result output form, and content actionability. Concerning the information input method, 1 (3%) study [ 54 ] suggested simplifying the way consumers input data (eg, by sharing and describing information in the form of audio recordings) to save their time and effort, while 1 (3%) study [ 49 ] simplified the way consumers input data (eg, by barcode-scanning prescription data) to reduce the risk of manual data entry errors. Concerning the result output form, 1 (3%) study [ 34 ] translated or simplified highly specialized language that was difficult for consumers to understand (eg, rare diseases that were only discussed in professional literature) and also provided illustrations to summarize the output; 2 (6%) studies [ 38 , 41 ] suggested avoiding outputting too much and too detailed information at once so as to prevent consumers from information overload. Concerning content actionability, 1 (3%) study [ 54 ] suggested, at the initial stage of interaction, providing introductory materials to teach consumers the most effective way to use advanced technology (eg, introducing basic functions, limitations, and the use process); 1 (3%) study [ 41 ] suggested, during the interaction, clearly explaining the purpose of the current operation and context-related information to consumers and informing them of the results of the current operation directly on the interface; and 1 (3%) study [ 54 ] suggested, at the end of the interaction, informing consumers of the next step (eg, where to seek medical help).

Enhance Privacy

Of the 32 studies, 3 (9%) [ 15 , 38 , 51 ] suggested enhancing the privacy protection capabilities of DTC health care AI apps to prevent consumers’ privacy from being violated. Specifically, the recommended using state-of-the-art technology to encrypt and authenticate users’ health data [ 51 ], obtaining informed consent for health care purposes to prevent data from being resold and exploited [ 15 ], and avoiding explanations with inappropriate transparency (eg, leaking flaws in algorithms or detecting sensitive data sources) to prevent systems from being intruded [ 38 ].

Address Accountability

In total, 4 (12%) studies [ 43 , 45 , 48 , 56 ] addressed the accountability issues of DTC health care AI apps from both individual and government perspectives. At the individual level, 1 (3%) study [ 47 ] addressed accountability by informing consumers whether the app was officially certified and encouraging them to seek professional medical advice or clinical testing beyond the app, and 1 (3%) study [ 49 ] empowered patients and provided them with more responsibilities (eg, motivating patients to take their medications, while informing them of possible drug interactions) but still opted for human medical staff to undertake the responsibility for complete drug therapy. At the government level, 1 (3%) study [ 60 ] suggested developing policies or guidelines to regulate the use of such apps and establish accountability mechanisms through legislation for AI output, and 1 (3%) study [ 52 ] suggested that national health authorities should clarify the position of these apps in the health care system (eg, whether they were for laypersons, general practitioners, or specialists).

Improve Diversity

In total, 6 (19%) studies [ 41 , 46 , 52 , 54 , 55 , 60 ] designed and developed DTC health care AI apps by diversifying the test populations of the diseases targeted by apps in the future. Specifically, studies focused on clinical populations [ 46 ], community populations [ 46 ], marginalized populations (eg, populations with low education levels [ 60 ] and the elderly [ 54 , 60 ]), and children [ 55 ] and the cultural and social factors in these populations [ 54 ] in order to capture more diverse user needs and develop a more comprehensive solution.

Principal Findings

In the context of a surge in DTC health care AI apps, this scoping review identified 32 studies in the existing academic literature that address this topic. The review summarized the characteristics of existing studies on DTC health care AI apps, highlighted 8 categories of extant barriers, and pointed out 6 categories of design recommendations.

Study Characteristics

In terms of the developmental timeline, although AI has been extensively used across various sectors of health care, studies focusing on DTC health care AI apps are still in their nascent stages. We did not artificially restrict the time frame for our review; however, the papers included in our results were all published recently (between 2018 and 2023).

In terms of geographical origins, the studies on DTC health care AI apps predominantly came from high-income countries, particularly the United States. This aligns with other reviews in the domain of health care AI [ 21 , 31 , 64 ]. This correlation is intrinsically tied to the fact that a more advanced digital health care infrastructure (eg, electronic health records (EHRs), health information exchanges (HIEs), and telehealth platforms) is present in these countries. More geographically diverse research is needed in the future, and we particularly expect a surge in studies originating from LMICs, because AI is considered a technology that can help bridge the digital gap and reduce health inequities worldwide [ 2 , 3 , 64 ]. However, the current study outcomes from high-income countries cannot be directly transferred to low-income regions due to significant risks, such as output bias, poor performance, or erroneous results, when using AI solutions trained in contexts that differ substantially from the local populations [ 65 ]. When AI systems are applied to new populations with differing living environments or cultural backgrounds, adaptations to the local clinical settings and practices are required, and the measures and outcomes for design, development, and evaluation may vary [ 41 , 66 ].

In terms of the study design, the majority of the papers we reviewed opted for quantitative methods to evaluate the apps, such as collecting performance metrics when consumers use the apps or obtaining quantitative data on existing user experience dimensions through questionnaires. Fewer papers delved into the barriers and recommendations arising from users’ usage of DTC health care AI apps. However, given that the emergence of such apps is still a nascent phenomenon, future work requires more qualitative research to explore the effects generated by these technological systems when used in society, to dig out initially overlooked new themes or deeper insights, and to assess user experiences beyond what short-term metrics can capture, while also incorporating edge cases that large-scale studies may overlook [ 67 , 68 ].

In terms of medical fields, existing studies on DTC health care AI apps primarily focused on the field of general practice. This is understandable because general practice usually serves as the first medical contact point for patients [ 69 ], thereby having a broad spectrum of user needs. Moreover, the health issues diagnosed and treated in general practice are generally more common and less complex [ 70 ], thereby presenting relatively lower risks. Consequently, most studies chose general practice as the entry point for the medical fields of designing and developing DTC health care AI apps.

In terms of intended users and provided functionalities among studies on DTC health care AI apps, some were designed solely for single-consumer user groups, offering functions such as disease diagnosis, health self-management, and health care information inquiry. Others also connected with other user groups, including physicians, family members, nursing staff, and health care departments, generally to alert these groups to abnormal conditions of consumer users. For example, these functionalities may include alerting hospitals about consumer user falls due to stroke, notifying physicians and family members about medication adherence issues, referring users with high-risk skin cancer ratings to doctors, or informing health care departments about potential diagnoses of COVID-19 or other infectious diseases. However, it is crucial to note that although such intelligent functionalities for alerting other groups about users’ anomalies may contribute positively to users’ health and the efficient functioning of health care systems, they also pose risks related to consumers’ human rights, democracy, false positives due to erroneous data capture, and even the manipulation of users with low behavioral capacity [ 71 ]. Future DTC health care AI apps, when designing features that involve 2 or more user groups, must consider how to allocate, balance, and constrain power among various stakeholders, while simultaneously ensuring ethical and legal compliance as they seek to benefit consumer groups in need.

Barriers and Design Recommendations

In terms of barriers and design recommendations, it is noteworthy that many challenges are not confined solely to apps targeting consumers; rather, they exhibit considerable similarities with the issues encountered by health care AI systems designed for other user groups, such as health care professionals. First, privacy concerns have been widely recognized as a significant barrier to the application of AI in the health care domain [ 20 , 21 , 72 , 73 ]. Privacy protection has become a hot topic in the health care AI research field [ 74 ], with numerous studies dedicated to developing innovative privacy-preserving solutions without compromising the performance of big data–driven AI models. These include developing privacy-enhancing technologies, such as homomorphic encryption [ 75 ], securing multiparty computation and differential privacy [ 76 ], and exploring new training methods and data governance models, such as distributed federated machine learning using synthesized data from multiple organizations [ 77 ], data-sharing pools [ 78 ], data trusts [ 79 ], and data cooperatives [ 80 ]. Second, the lack of clarity in accountability and regulation has also been universally identified in prior research as a key obstacle to the application of AI in health care [ 81 - 83 ]. Despite the existence of various worldwide policies and regulations concerning AI accountability and regulation, such as WHO [ 84 ], the General Data Protection Regulation (GDPR) [ 85 ], the Food and Drug Administration (FDA) [ 86 ], Health Canada [ 87 ], and the AI Act [ 88 ], the rapid advancement of AI technology makes it difficult for existing regulatory frameworks to keep up, let alone be able to anticipate its potential risks and impacts. Taking the AI Act, which is currently being advanced in Europe, as an example, the emergence of new generative AI systems, such as ChatGPT, has already posed challenges to the universality and applicability of this legislation [ 89 ]. Furthermore, usability has also been shown in previous studies concerning physicians as an aspect that doctors wish to see improved in health care AI tools, such as clinical decision support systems [ 41 , 66 ]. Additionally, the evolution of physician-patient relationships has been identified as a key point requiring long-term tracking following the deployment of various types of health care AI systems [ 90 ].

In addition to identifying challenges similar to those faced by health care AI systems targeted at other user groups, this review further identified some more subtle obstacles that are particularly worth noting in consumer-facing systems and distilled corresponding design recommendations, including enhancing human-centered explainability, establishing calibrated trust and addressing overtrust, demonstrating empathy in AI, improving the specialization of consumer-grade products, and expanding the diversity of the test population.

Enhance Human-Centered Explainability

The review findings identified current barriers to explainability in DTC health care AI apps, which included not only providing inadequate explanations to consumers (a lack of explanations relating to both AI and medical domain knowledge) but also providing inappropriate explanations to consumers (excessive content caused information overload to consumers, low-quality content exposed consumers to risks and burdens, and improper transparency and presentation forms could adversely impact other stakeholders’ interests in the system). To address these barriers, our review offered design recommendations for improvements in the content, form, and legislative aspects of explanations, which future research can consider.

Furthermore, we believe that the review results demonstrate and re-emphasize the importance of designing, developing, and evaluating AI explainability from a human-centered perspective. As AI increasingly powers decision-making in high-risk areas, such as health care, explainable artificial intelligence (XAI), aimed at enabling humans to understand the logic and outcomes of AI systems, has become a research hotspot in recent years [ 91 - 95 ]. Within this interdisciplinary field, algorithm-centered approaches aim to enhance the transparency of AI models and to develop inherently explainable models [ 96 ], while human-centered approaches emphasize considerations such as who the users of explanations are, why explanations are needed (eg, how social and individual factors influence explainability objectives), and what the timing and context of providing explanations (eg, contextual variations in explainability across different application domains) are [ 97 , 98 ]. As shown in our findings, consumers of health care AI had various needs concerning the content and form of explanations, and their interactions with explanations could influence their adoption toward the apps and subsequent behavior. Furthermore, wrong explanation design could produce correlation effects on other stakeholders in the AI system. All these findings indicate that the challenges in explainability in DTC health care AI apps are not merely technical issues concerning algorithmic transparency but also significantly involve human factors. Future studies need to enhance the explainability of DTC health care AI apps from a human-centered perspective, focusing on the cognitive abilities, physical characteristics, and social and psychological factors of the human in the loop, as well as how these human factors interact with explanations, AI systems, and the environment. This will enable the design of DTC health care AI apps that meet user needs and enhance human performance, safety, and overall well-being.

Establish Calibrated Trust and Pay Special Attention to Overtrust

Our findings indicated that current DTC health care AI apps face challenges related to trust, including both a lack of trust and overtrust. The need to establish calibrated trust in AI systems, meaning cultivating the users’ ability to know when to trust (accept correct advice) or not trust (reject erroneous advice) AI [ 99 ], has reached a consensus in current research [ 100 ]. Under this premise, we believe that future designs of DTC AI apps should pay more attention to the issue of overtrust. There are multiple rationales for this focus. On the one hand, from an academic research perspective, most extant studies on AI trust predominantly center on enhancing users’ trust [ 101 - 104 ], with less attention given to the issue of overtrust; on the other hand, from a practical application perspective, 3 influencing factors also need to be considered:

  • First, the users’ background knowledge. Consumers often possess limited prior knowledge of both medical and AI domains related to these apps [ 4 ], affecting their receptivity to AI advice. Research has shown that domain experts are more likely to question AI suggestions, whereas nonexperts are more receptive to them [ 105 ].
  • Second, the differential risk in decision-making: Consumers and health care professionals differ in their risk assessments when facing AI advice. Typical consumers are loss averse; for them, changes for the worse (losses) loom larger than equivalent changes for the better [ 106 ]. Hence, they are more inclined to accept AI advice and take subsequent medical actions, rather than potentially missing out on timely disease diagnosis and treatment if AI advice is not adopted [ 4 ]. In contrast, the biggest concern of health care professionals when adopting new products to assist medical diagnosis may not be the pursuit of improvement in work performance but the potential risks to patients’ lives and health [ 107 ], so their adopting is relatively cautious.
  • Third, the drive for commercial interests may also prompt these apps to exaggerate their capabilities, thereby further exacerbating the issue of consumer overtrust [ 36 ].

Therefore, in summary, although both domain expert and nonexpert users may display overreliance on automation [ 108 ], physicians’ overtrust in AI diagnostic features is not commonly observed at this current stage of medical AI development; many reviews in the AI domain concerning physician users, while identifying trust issues, primarily discuss a lack of trust [ 66 , 109 ]. However, consumer overtrust in health care AI, along with the ensuing personal and societal effects, has already emerged as an issue that needs to be considered sooner rather than later.

Demonstrate Empathy in Artificial Intelligence

Our review indicated that even if AI can be more accurate and logical, its lack of empathy may hinder consumer acceptance of DTC health care AI apps. Empathy, defined as the ability to understand or feel what other individuals are experiencing from their frame of reference [ 110 ], is widely acknowledged as a fundamental value for achieving optimal health care practices. It is crucial for enhancing patient satisfaction, treatment compliance, and clinical outcomes [ 111 - 113 ]. In conventional medical settings, health care professionals act as the conveyors of empathy, while patients are the recipients [ 114 ]; in human-AI collaborative medical settings, such as physicians using AI for diagnostic assistance, AI primarily contributes to improving efficiency and decision-making quality, allowing health care professionals to have more time and energy to convey empathy and improve overall treatment satisfaction [ 115 ]; However, in DTC health care AI scenarios, the initial touchpoint no longer has a human element, necessitating AI to become the direct conveyor of empathy.

The topic of AI empathy in health care has become a research hotspot [ 116 - 118 ]. To address this challenge, our review offered several design recommendations: embedding richer semantics and social cues through conversational agents, as well as techniques for conveying stressful information. Current cutting-edge research supports these design suggestions for enhancing empathy through conversational agents. Studies indicate that the new generation of AI chatbots, such as ChatGPT, has scored higher than human doctors in terms of empathy [ 119 ]. Our review is current up to March 2023, and the research included in the review has not yet covered ChatGPT. Therefore, the future integration of ChatGPT or similar large language model chatbots could potentially help alleviate the empathy barriers in DTC health care AI apps.

Improve Specialization of Consumer-Grade Products

Concerns regarding the specialization of DTC health care AI apps are totally understandable. First, from a scientific and technological standpoint, many health care AI apps on the consumer market have scarcely undergone original research for effectiveness or are loosely based on scientific studies but lack a scientific consensus on their efficacy [ 120 ]. Furthermore, the data collection devices for these apps are often consumer-owned smartphones, personal computers, or wearables designed for portability, rather than specialized medical devices tailored for specific disease domains.

Second, in terms of regulatory frameworks, in the United States, where most companies producing DTC health care AI products are located, existing tiered regulatory systems permit the manufacture of general wellness products without adhering to regulations typically applicable to devices intended for diagnosing or treating diseases [ 86 ]. Consequently, driven by commercial interests, the current market is flooded with numerous tools that are approved as general health products but subtly imply that they can be used for diagnosis or treatment. Consumers can easily access these products, although the products may not have undergone rigorous testing and regulation, thus rendering their effectiveness uncertain [ 71 , 121 ].

Existing research is working to close the performance gap between consumer-grade products and clinical-grade medical devices through technological innovations, for example, developing high-precision flexible sensors to improve the data collection capabilities of wearable devices [ 122 , 123 ], as well as through algorithm-hardware cooptimization to ensure model quality is not compromised while achieving device miniaturization [ 124 ]. However, overcoming this barrier will require not only technological advancements but also further refinement of the approval and regulatory frameworks for consumer-grade AI products in the future.

Expand the Diversity of Test Populations

The need to expand the diversity of test populations is also a future direction in the design and development of DTC health care AI apps, as identified by our review. It is worth noting that whenever this theme is mentioned in the papers included in our review, it appears in the Limitations or Future Work section. This indirectly indicates that it is a prevalent yet unresolved issue in this field of research. In existing research, either the test population involves a small subset of patients in the specific disease area with limited demographic characteristics and health information literacy or it is not even the target population for the disease but rather comprises participants recruited through convenience sampling. However, if such apps truly enter the market, their actual consumer users constitute an extremely broad and heterogeneous group, with widely varying demographic characteristics, education levels, and health and information literacy [ 125 ]. Applying AI models trained on small sample data and user feedback obtained from these samples to a broader population could pose multiple risks, including inaccuracies in AI diagnostics and predictions, poor generalization ability to unseen patient data, and perpetuating biases and exclusions against marginalized groups [ 126 ]. These risks could consequently misguide clinical decisions, exacerbate health care inequalities, and trigger legal and ethical crises. Future studies on DTC health care AI apps indeed needs to consider the diversity of the consumer population in terms of culture, society, demographics, and knowledge accomplishment in order to develop more accurate and inclusive health care AI solutions.


This study has a few limitations. First, we retrieved papers written in English, thereby potentially overlooking influential papers published in other languages. Additionally, we only captured papers that were found in the search. Given the novelty of the field and terminology associated with DTC health care AI apps, some relevant studies may have been omitted. However, we attempted to mitigate this limitation by using Google Scholar to search for gray literature and by snowball-sampling from the reference lists of relevant papers. Due to the wide-ranging formats and scopes of gray literature, it often serves as a robust source of evidence in systematic reviews, offering extra data not found in commercial publications, thus reducing publication bias and enabling a more balanced view of evidence [ 27 ]. Google Scholar’s gray literature includes papers from databases that have not yet been formally published, such as arXiv and medRxiv, helping capture research that might be overlooked due to the novelty of the field and terminology.

Furthermore, when using qualitative thematic analysis to synthesize study findings and generate themes, the themes produced were potentially influenced by the prior research experience and personal understanding of the 3 authors. Therefore, the themes may not be entirely comprehensive or may differ when other researchers replicate the coding process. To minimize potential coding bias, we strictly adhered to the 6 key steps of qualitative thematic analysis: familiarizing oneself with the data set; coding; generating initial themes; developing and reviewing themes; refining, defining, and naming themes; and writing up. Each step underwent group discussions, triangulation, and interrater reliability checks among the 3 authors to resolve disagreements and reach a final consensus, thereby striving to maintain consistency and reduce individual differences.

To the best of our knowledge, this is the first study to systematically summarize and organize academic research targeting consumers through DTC health care AI apps. In this study, we delineated the current characteristics of studies focusing on DTC health care AI apps, identified 8 existing barriers, and offered 6 design recommendations. We believe that future research, by considering the key points raised in this study, addressing existing barriers, and referencing design recommendations, can better advance the study, design, and development of DTC health care AI apps, thus improving the health care services they provide.


This work was supported by the Teaching Research Project of the Huazhong University of Science and Technology (grant number 2023038).

Data Availability

All data generated and analyzed during this study are included in this published paper and its Multimedia Appendices.

Conflicts of Interest

None declared.

Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) checklist.

Database search details.

  • World Health Organization. Global Strategy on Human Resources for Health: Workforce 2030. Geneva. World Health Organization; 2016.
  • Alami H, Rivard L, Lehoux P, Hoffman SJ, Cadeddu SBM, Savoldelli M, et al. Artificial intelligence in health care: laying the foundation for responsible, sustainable, and inclusive innovation in low- and middle-income countries. Global Health. 2020 Jun 24;16(1):52 [ https://globalizationandhealth.biomedcentral.com/articles/10.1186/s12992-020-00584-1 ] [ CrossRef ] [ Medline ]
  • Global strategy on digital health 2020-2025. World Health Organization. 2021. URL: https://www.who.int/docs/default-source/documents/gs4dhdaa2a9f352b0445bafbc79ca799dce4d.pdf [accessed 2023-12-08]
  • Babic B, Gerke S, Evgeniou T, Cohen IG. Direct-to-consumer medical machine learning and artificial intelligence applications. Nat Mach Intell. 2021 Apr 20;3(4):283-287 [ CrossRef ]
  • De novo classification request for irregular rhythm notification feature. Food and Drug Administration. Aug 8, 2018. URL: https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN180042.pdf [accessed 2023-12-08]
  • Yan L, Zhang H, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell. 2020 May 14;2(5):283-288 [ CrossRef ]
  • The check up with Google Health. Googlez. 2023. URL: https://health.google/the-check-up/#latest-events [accessed 2023-12-08]
  • Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, et al. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020 Jun 18;26(6):900-908 [ CrossRef ] [ Medline ]
  • Feathers T. Google's new dermatology app wasn’t designed for people with darker skin. Vice Media Group. May 20, 2021. URL: https:/​/www.​vice.com/​en/​article/​m7evmy/​googles-new-dermatology-app-wasnt-designed-for-people-with-darker-skin [accessed 2023-12-08]
  • Attia ZI, Harmon DM, Dugan J, Manka L, Lopez-Jimenez F, Lerman A, et al. Prospective evaluation of smartwatch-enabled detection of left ventricular dysfunction. Nat Med. 2022 Dec 14;28(12):2497-2503 [ https://europepmc.org/abstract/MED/36376461 ] [ CrossRef ] [ Medline ]
  • Zhu H, Cheng C, Yin H, Li X, Zuo P, Ding J, et al. Automatic multilabel electrocardiogram diagnosis of heart rhythm or conduction abnormalities with deep learning: a cohort study. Lancet Digital Health. 2020 Jul;2(7):e348-e357 [ CrossRef ]
  • Ćirković A. Evaluation of four artificial intelligence-assisted self-diagnosis apps on three diagnoses: two-year follow-up study. J Med Internet Res. 2020 Dec 04;22(12):e18097 [ https://www.jmir.org/2020/12/e18097/ ] [ CrossRef ] [ Medline ]
  • Ameko M, Beltzer M, Cai L, Boukhechba M, Teachman B, Barnes L. Offline contextual multi-armed bandits for mobile health interventions: a case study on emotion regulation. 2020 Presented at: 14th ACM Conference on Recommender Systems; September 22-26, 2020; Virtual p. 249-258 [ CrossRef ]
  • Puntoni S, Reczek RW, Giesler M, Botti S. Consumers and artificial intelligence: an experiential perspective. J Mark. 2020 Oct 16;85(1):131-151 [ CrossRef ]
  • Esmaeilzadeh P. Use of AI-based tools for healthcare purposes: a survey study from consumers' perspectives. BMC Med Inform Decis Mak. 2020 Jul 22;20(1):170 [ https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-020-01191-1 ] [ CrossRef ] [ Medline ]
  • Longoni C, Bonezzi A, Morewedge C. Resistance to medical artificial intelligence. J Consum Res. 2019;46(4):629-650 [ CrossRef ]
  • Scott IA, Carter SM, Coiera E. Exploring stakeholder attitudes towards AI in clinical practice. BMJ Health Care Inform. 2021 Dec 09;28(1):e100450 [ https://informatics.bmj.com/lookup/pmidlookup?view=long&pmid=34887331 ] [ CrossRef ] [ Medline ]
  • Buchanan C, Howitt ML, Wilson R, Booth RG, Risling T, Bamford M. Predicted influences of artificial intelligence on the domains of nursing: scoping review. JMIR Nurs. 2020 Dec 17;3(1):e23939 [ https://nursing.jmir.org/2020/1/e23939/ ] [ CrossRef ] [ Medline ]
  • Garvey KV, Thomas Craig KJ, Russell R, Novak LL, Moore D, Miller BM. Considering clinician competencies for the implementation of artificial intelligence-based tools in health care: findings from a scoping review. JMIR Med Inform. 2022 Nov 16;10(11):e37478 [ https://medinform.jmir.org/2022/11/e37478/ ] [ CrossRef ] [ Medline ]
  • Chew HSJ, Achananuparp P. Perceptions and needs of artificial intelligence in health care to increase adoption: scoping review. J Med Internet Res. 2022 Jan 14;24(1):e32939 [ https://www.jmir.org/2022/1/e32939/ ] [ CrossRef ] [ Medline ]
  • Sharma M, Savage C, Nair M, Larsson I, Svedberg P, Nygren JM. Artificial intelligence applications in health care practice: scoping review. J Med Internet Res. 2022 Oct 05;24(10):e40238 [ https://www.jmir.org/2022/10/e40238/ ] [ CrossRef ] [ Medline ]
  • You Y, Ma R, Gui X. User experience of symptom checkers: a systematic review. 2022 Presented at: AMIA 2022 Annual Symposium; November 5-9, 2022; Washington, DC
  • Parmar P, Ryu J, Pandya S, Sedoc J, Agarwal S. Health-focused conversational agents in person-centered care: a review of apps. NPJ Digit Med. 2022 Feb 17;5(1):21 [ https://doi.org/10.1038/s41746-022-00560-6 ] [ CrossRef ] [ Medline ]
  • Kocaballi AB, Sezgin E, Clark L, Carroll JM, Huang Y, Huh-Yoo J, et al. Design and evaluation challenges of conversational agents in health care and well-being: selective review study. J Med Internet Res. 2022 Nov 15;24(11):e38525 [ https://www.jmir.org/2022/11/e38525/ ] [ CrossRef ] [ Medline ]
  • Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005 Feb;8(1):19-32 [ CrossRef ]
  • Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018 Oct 02;169(7):467-473 [ https://www.acpjournals.org/doi/abs/10.7326/M18-0850?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
  • Paez A. Gray literature: an important resource in systematic reviews. J Evid Based Med. 2017 Aug 31;10(3):233-240 [ CrossRef ] [ Medline ]
  • Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol. 2006 Jan;3(2):77-101 [ CrossRef ]
  • Braun V, Clarke V. Thematic Analysis: A Practical Guide. Thousand Oaks, CA. SAGE Publications; 2022.
  • Braun V, Clarke V. Thematic analysis. University of Auckland. 2022. URL: https://www.thematicanalysis.net/ [accessed 2023-12-08]
  • Yin J, Ngiam KY, Teo HH. Role of artificial intelligence applications in real-life clinical practice: systematic review. J Med Internet Res. 2021 Apr 22;23(4):e25759 [ https://www.jmir.org/2021/4/e25759/ ] [ CrossRef ] [ Medline ]
  • Crossnohere NL, Elsaid M, Paskett J, Bose-Brill S, Bridges JFP. Guidelines for artificial intelligence in medicine: literature review and content analysis of frameworks. J Med Internet Res. 2022 Aug 25;24(8):e36823 [ https://www.jmir.org/2022/8/e36823/ ] [ CrossRef ] [ Medline ]
  • Beets B, Newman TP, Howell EL, Bao L, Yang S. Surveying public perceptions of artificial intelligence in health care in the United States: systematic review. J Med Internet Res. 2023 Apr 04;25:e40337 [ https://www.jmir.org/2023//e40337/ ] [ CrossRef ] [ Medline ]
  • Demner-Fushman D, Mrabet Y, Ben Abacha A. Consumer health information and question answering: helping consumers find answers to their health-related information needs. J Am Med Inform Assoc. 2020 Feb 01;27(2):194-201 [ https://europepmc.org/abstract/MED/31592532 ] [ CrossRef ] [ Medline ]
  • Oniani D, Wang Y. A qualitative evaluation of language models on automatic question-answering for COVID-19. 2020 Presented at: BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics; September 21-24, 2020; Virtual p. Article 33 [ CrossRef ]
  • Su Z, Figueiredo M, Jo J, Zheng K, Chen Y. Analyzing description, user understanding and expectations of AI in mobile health applications. 2020 Presented at: AMIA 2020 Annual Symposium; November 14-18, 2020; Virtual p. 1170-1179
  • Savery M, Abacha AB, Gayen S, Demner-Fushman D. Question-driven summarization of answers to consumer health questions. Sci Data. 2020 Oct 02;7(1):322 [ https://doi.org/10.1038/s41597-020-00667-z ] [ CrossRef ] [ Medline ]
  • Tsai C, You Y, Gui X, Kou Y, Carroll J. Exploring and promoting diagnostic transparency and explainability in online symptom checkers. 2021 Presented at: 2021 CHI Conference on Human Factors in Computing Systems; May 8-13, 2021; Online p. Article 152 [ CrossRef ]
  • Almalki M. Exploring the influential factors of consumers' willingness toward using COVID-19 Related chatbots: an empirical study. Med Arch (Sarajevo, Bosnia and Herzegovina). 2021 Feb;75(1):50-55 [ https://europepmc.org/abstract/MED/34012200 ] [ CrossRef ] [ Medline ]
  • Gupta P, Suryavanshi A, Maheshwari S, Shukla A, Tiwari R. Human-machine interface system for pre-diagnosis of diseases using machine learning. 2018 Presented at: ICMVA 2018: International Conference on Machine Vision and Applications; April 23-25, 2018; Singapore p. 71-75 [ CrossRef ]
  • He X, Hong Y, Zheng X, Zhang Y. What are the users’ needs? Design of a user-centered explainable artificial intelligence diagnostic system. Int J Hum–Comput Interact. 2022 Jul 26;39(7):1519-1542 [ CrossRef ]
  • Iqbal M, Faiz M. Active surveillance for COVID-19 through artificial intelligence using real-time speech-recognition mobile application. 2020 Presented at: 2020 IEEE International Conference on Consumer Electronics (ICCE); September 28-30, 2020; Taoyuan City, Taiwan [ CrossRef ]
  • Kyung N, Kwon HE. Rationally trust, but emotionally? The roles of cognitive and affective trust in laypeople's acceptance of AI for preventive care operations. Product Oper Manag. 2022 Jul 31:1-20 [ CrossRef ]
  • Park S, Hussain I, Hong S, Kim D, Park H, Benjamin H. Real-time gait monitoring system for consumer stroke prediction service. 2020 Presented at: 2020 IEEE International Conference on Consumer Electronics (ICCE); September 28-30, 2020; Taoyuan City, Taiwan p. 4-6 [ CrossRef ]
  • van Bussel MJP, Odekerken-Schröder GJ, Ou C, Swart RR, Jacobs MJG. Analyzing the determinants to accept a virtual assistant and use cases among cancer patients: a mixed methods study. BMC Health Serv Res. 2022 Jul 09;22(1):890 [ https://bmchealthservres.biomedcentral.com/articles/10.1186/s12913-022-08189-7 ] [ CrossRef ] [ Medline ]
  • Nadarzynski T, Miles O, Cowie A, Ridge D. Acceptability of artificial intelligence (AI)-led chatbot services in healthcare: a mixed-methods study. Digit Health. 2019;5:2055207619871808 [ https://journals.sagepub.com/doi/abs/10.1177/2055207619871808?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
  • Ponomarchuk A, Burenko I, Malkin E, Nazarov I, Kokh V, Avetisian M, et al. Project Achoo: a practical model and application for COVID-19 detection from recordings of breath, voice, and cough. IEEE J Sel Top Signal Process. 2022 Feb;16(2):175-187 [ CrossRef ]
  • Romero HE, Ma N, Brown GJ, Hill EA. Acoustic screening for obstructive sleep apnea in home environments based on deep neural networks. IEEE J Biomed Health Inform. 2022 Jul;26(7):2941-2950 [ CrossRef ]
  • Tschanz M, Dorner TL, Holm J, Denecke K. Using eMMA to manage medication. Computer. 2018 Aug;51(8):18-25 [ CrossRef ]
  • Sellak H, Grobler M. mHealth4U: designing for health and wellbeing self-management. 2021 Presented at: 35th IEEE/ACM International Conference on Automated Software Engineering Workshops; December 21-25, 2020; Virtual p. 41-46 [ CrossRef ]
  • Baldauf M, Fröehlich P, Endl R. Trust me, I’m a doctor – user perceptions of ai-driven apps for mobile health diagnosis. 2020 Presented at: MUM 2020: 19th International Conference on Mobile and Ubiquitous Multimedia; November 22-25, 2020; Essen, Germany p. 167-178 [ CrossRef ]
  • de Carvalho TM, Noels E, Wakkee M, Udrea A, Nijsten T. Development of smartphone apps for skin cancer risk assessment: progress and promise. JMIR Dermatol. 2019 Jul 11;2(1):e13376 [ CrossRef ]
  • Denecke K, Gabarron E, Grainger R, Konstantinidis ST, Lau A, Rivera-Romero O, et al. Artificial intelligence for participatory health: applications, impact, and future implications. Yearb Med Inform. 2019 Aug;28(1):165-173 [ http://www.thieme-connect.com/DOI/DOI?10.1055/s-0039-1677902 ] [ CrossRef ] [ Medline ]
  • Fan X, Chao D, Zhang Z, Wang D, Li X, Tian F. Utilization of self-diagnosis health chatbots in real-world settings: case study. J Med Internet Res. 2021 Jan 06;23(1):e19928 [ https://www.jmir.org/2021/1/e19928/ ] [ CrossRef ] [ Medline ]
  • Koren G, Souroujon D, Shaul R, Bloch A, Leventhal A, Lockett J, et al. “A patient like me” – an algorithm-based program to inform patients on the likely conditions people with symptoms like theirs have. Medicine. 2019;98(42):e17596 [ CrossRef ]
  • Lau AYS, Staccini P, Section Editors for the IMIA Yearbook Section on EducationConsumer Health Informatics. Artificial intelligence in health: new opportunities, challenges, and practical implications. Yearb Med Inform. 2019 Aug;28(1):174-178 [ http://www.thieme-connect.com/DOI/DOI?10.1055/s-0039-1677935 ] [ CrossRef ] [ Medline ]
  • Sangers T, Reeder S, van der Vet S, Jhingoer S, Mooyaart A, Siegel DM, et al. Validation of a market-approved artificial intelligence mobile health app for skin cancer screening: a prospective multicenter diagnostic accuracy study. Dermatology. 2022 Feb 4;238(4):649-656 [ https://doi.org/10.1159/000520474 ] [ CrossRef ] [ Medline ]
  • Sefa-Yeboah SM, Osei Annor K, Koomson VJ, Saalia FK, Steiner-Asiedu M, Mills GA. Development of a mobile application platform for self-management of obesity using artificial intelligence techniques. Int J Telemed Appl. 2021 Aug 27;2021:6624057 [ https://doi.org/10.1155/2021/6624057 ] [ CrossRef ] [ Medline ]
  • da Silva VJ, da Silva Souza V, da Cruz RG, Vidal Martinez de Lucena JM, Jazdi N, de Lucena Junior VF. Commercial devices-based system designed to improve the treatment adherence of hypertensive patients. Sensors (Basel). 2019 Oct 18;19(20):4539 [ https://www.mdpi.com/resolver?pii=s19204539 ] [ CrossRef ] [ Medline ]
  • Zhang Z, Citardi D, Wang D, Genc Y, Shan J, Fan X. Patients' perceptions of using artificial intelligence (AI)-based technology to comprehend radiology imaging data. Health Informatics J. 2021 Apr 29;27(2):14604582211011215 [ https://journals.sagepub.com/doi/abs/10.1177/14604582211011215?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
  • Zhang Z, Genc Y, Wang D, Ahsen ME, Fan X. Effect of AI explanations on human perceptions of patient-facing AI-powered healthcare systems. J Med Syst. 2021 May 04;45(6):64 [ CrossRef ] [ Medline ]
  • Jaswal G, Bharadwaj R, Tiwari K, Thapar D, Goyal P, Nigam A. AI-biometric-driven smartphone app for strict post-COVID home quarantine management. IEEE Consumer Electron Mag. 2021 May 1;10(3):49-55 [ CrossRef ]
  • Ullrich D, Butz A, Diefenbach S. The development of overtrust: an empirical simulation and psychological analysis in the context of human–robot interaction. Front Robot AI. 2021 Apr 13;8:554578 [ https://europepmc.org/abstract/MED/33928129 ] [ CrossRef ] [ Medline ]
  • Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob Health. 2018;3(4):e000798 [ https://gh.bmj.com/lookup/pmidlookup?view=long&pmid=30233828 ] [ CrossRef ] [ Medline ]
  • Ciecierski-Holmes T, Singh R, Axt M, Brenner S, Barteit S. Artificial intelligence for strengthening healthcare systems in low- and middle-income countries: a systematic scoping review. NPJ Digit Med. 2022 Oct 28;5(1):162 [ https://doi.org/10.1038/s41746-022-00700-y ] [ CrossRef ] [ Medline ]
  • Wang D, Wang L, Zhang Z, Wang D, Zhu H, Gao Y. “Brilliant AI doctor” in rural clinics: challenges in AI-powered clinical decision support system deployment. 2021 Presented at: 2021 CHI Conference on Human Factors in Computing Systems; May 8-13, 2021; Online p. 1-18 [ CrossRef ]
  • Sofaer S. Qualitative methods: what are they and why use them? Health Serv Res. 1999 Dec;34(5 Pt 2):1101-1118 [ https://europepmc.org/abstract/MED/10591275 ] [ Medline ]
  • Queirós A, Faria D, Almeida F. Strengths and limitations of qualitative and quantitative research methods. Eur J Educ Stud. 2017;3(9):369-387 [ CrossRef ]
  • The European definition of general practice/family medicine. WONCA Europe. 2023. URL: http://tinyurl.com/4muu47zc [accessed 2023-12-08]
  • DerSarkissian C. What is a general practitioner? WebMD. 2023. URL: https://www.webmd.com/a-to-z-guides/what-is-a-general-practitioner [accessed 2023-12-08]
  • Simon DA, Evans BJ, Shachar C, Cohen IG. Should Alexa diagnose Alzheimer's?: legal and ethical issues with at-home consumer devices. Cell Rep Med. 2022 Dec 20;3(12):100692 [ https://linkinghub.elsevier.com/retrieve/pii/S2666-3791(22)00228-2 ] [ CrossRef ] [ Medline ]
  • Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019 Jan 7;25(1):44-56 [ CrossRef ] [ Medline ]
  • Castagno S, Khalifa M. Perceptions of artificial intelligence among healthcare staff: a qualitative survey study. Front Artif Intell. 2020 Oct 21;3:578983 [ https://europepmc.org/abstract/MED/33733219 ] [ CrossRef ] [ Medline ]
  • Price WN, Cohen IG. Privacy in the age of medical big data. Nat Med. 2019 Jan 7;25(1):37-43 [ https://europepmc.org/abstract/MED/30617331 ] [ CrossRef ] [ Medline ]
  • Munjal K, Bhatia R. A systematic review of homomorphic encryption and its contributions in healthcare industry. Complex Intell Syst. 2022 May 03;9(4):1-28 [ https://europepmc.org/abstract/MED/35531323 ] [ CrossRef ] [ Medline ]
  • Kairouz P, Oh S, Viswanath P. Secure multi-party differential privacy. Adv NeurIPS. 2015;28:1-9
  • Yuan Y, Liu J, Jin D, Yue Z, Yang T, Chen R, et al. DeceFL: a principled fully decentralized federated learning framework. Natl Sci Open. 2023 Jan 10;2(1):20220043 [ CrossRef ]
  • Schneider G. Digital health research and health data pools. In: Health Data Pools Under European Data Protection and Competition Law: Health as a Digital Business. Switzerland. Springer Nature; 2022:7-60
  • Delacroix S, Lawrence N. Bottom-up data trusts: disturbing the ‘one size fits all’approach to data governance. Int Data Privacy Law. 2019;9(4):236-252 [ CrossRef ]
  • Luengo-Oroz M, Hoffmann Pham K, Bullock J, Kirkpatrick R, Luccioni A, Rubel S, et al. Artificial intelligence cooperation to support the global response to COVID-19. Nat Mach Intell. 2020 May 22;2(6):295-297 [ CrossRef ]
  • Choudhury A, Asan O. Impact of accountability, training, and human factors on the use of artificial intelligence in healthcare: exploring the perceptions of healthcare practitioners in the US. Hum Factors Healthc. 2022 Dec;2:100021 [ CrossRef ]
  • Smith H. Clinical AI: opacity, accountability, responsibility and liability. AI Soc. 2020 Jul 25;36(2):535-545 [ CrossRef ]
  • Habli I, Lawton T, Porter Z. Artificial intelligence in health care: accountability and safety. Bull World Health Organ. 2020 Feb 25;98(4):251-256 [ CrossRef ]
  • World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. Geneva. World Health Organization; 2021.
  • General Data Protection Regulation (GDPR). Intersoft Consulting. 2018. URL: https://gdpr-info.eu/ [accessed 2023-12-08]
  • Classify your medical device. Food and Drug Administration. 2020. URL: http://tinyurl.com/2n9ta6uy [accessed 2023-12-08]
  • Guidance document: software as a medical device (SaMD): definition and classification. Government of Canada. 2019. URL: http://tinyurl.com/4sc5wdkd [accessed 2023-12-08]
  • Madiega T. Artificial Intelligence Act. European Parliamentary Research Service. 2021. URL: https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf [accessed 2023-12-08]
  • Helberger N, Diakopoulos N. ChatGPT and the AI Act. Internet Policy Rev. 2023;12(1):1-6 [ CrossRef ]
  • de Miguel I, Sanz B, Lazcoz G. Machine learning in the EU health care context: exploring the ethical, legal and social issues. Inf Commun Soc. 2020 Jul 13;23(8):1139-1153 [ CrossRef ]
  • Gunning D. Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA). 2017. URL: https://asd.gsfc.nasa.gov/conferences/ai/program/003-XAIforNASA.pdf [accessed 2023-12-08]
  • Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160 [ CrossRef ]
  • Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. 2020 Jun;58:82-115 [ CrossRef ]
  • Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011-2022). Comput Methods Programs Biomed. 2022 Nov;226:107161 [ CrossRef ] [ Medline ]
  • Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, et al. Explainable artificial intelligence (XAI): what we know and what is left to attain trustworthy artificial intelligence. Inf Fusion. 2023 Nov;99:101805 [ CrossRef ]
  • Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv. 2018 Aug 22;51(5):1-42 [ CrossRef ]
  • Schoonderwoerd TA, Jorritsma W, Neerincx MA, van den Bosch K. Human-centered XAI: developing design patterns for explanations of clinical decision support systems. Int J Hum-Comput Stud. 2021 Oct;154:102684 [ CrossRef ]
  • Ehsan U, Liao Q, Muller M, Riedl M, Weisz J. Towards social transparency in ai systems. 2021 Presented at: 2021 CHI Conference on Human Factors in Computing Systems; May 8-13, 2021; Online [ CrossRef ]
  • Lee JD, See KA. Trust in automation: Designing for appropriate reliance. Hum Factors. 2004;46(1):50-80 [ CrossRef ]
  • Wischnewski M, Krämer N, Müller E. Measuring and understanding trust calibrations for automated systems: a survey of the state-of-the-art and future directions. 2023 Presented at: 2023 CHI Conference on Human Factors in Computing Systems; April 23-28, 2023; Hamburg, Germany [ CrossRef ]
  • Arnold M, Bellamy RKE, Hind M, Houde S, Mehta S, Mojsilovic A, et al. FactSheets: increasing trust in AI services through supplier's declarations of conformity. IBM J Res Dev. 2019 Jul 1;63(4/5):6:1-6:13 [ CrossRef ]
  • Bedué P, Fritzsche A. Can we trust AI? An empirical investigation of trust requirements and guide to successful AI adoption. J Enterp Inf Manag. 2021 Apr 30;35(2):530-549 [ CrossRef ]
  • Gillath O, Ai T, Branicky MS, Keshmiri S, Davison RB, Spaulding R. Attachment and trust in artificial intelligence. Comput Hum Behav. 2021 Feb;115:106607 [ CrossRef ]
  • Alboqami H. Trust me, I'm an influencer! - causal recipes for customer trust in artificial intelligence influencers in the retail industry. J Retail Consum Serv. 2023 May;72:103242 [ CrossRef ]
  • Logg JM, Minson JA, Moore DA. Algorithm appreciation: people prefer algorithmic to human judgment. Organ Behav Hum Decis Process. 2019 Mar;151:90-103 [ CrossRef ]
  • Novemsky N, Kahneman D. The boundaries of loss aversion. J Mark Res. 2018 Oct 10;42(2):119-128 [ CrossRef ]
  • Fan W, Liu J, Zhu S, Pardalos PM. Investigating the impacting factors for the healthcare professionals to adopt artificial intelligence-based medical diagnosis support system (AIMDSS). Ann Oper Res. 2018 Mar 19;294(1-2):567-592 [ CrossRef ]
  • Cabitza F, Campagner A, Ronzio L, Cameli M, Mandoli GE, Pastore MC, et al. Rams, hounds and white boxes: investigating human-AI collaboration protocols in medical diagnosis. Artif Intell Med. 2023 Apr;138:102506 [ https://linkinghub.elsevier.com/retrieve/pii/S0933-3657(23)00020-9 ] [ CrossRef ] [ Medline ]
  • Boillat T, Nawaz FA, Rivas H. Readiness to embrace artificial intelligence among medical doctors and students: questionnaire-based study. JMIR Med Educ. 2022 Apr 12;8(2):e34973 [ https://mededu.jmir.org/2022/2/e34973/ ] [ CrossRef ] [ Medline ]
  • Bellet PS. The importance of empathy as an interviewing skill in medicine. JAMA. 1991 Oct 02;266(13):1831 [ CrossRef ]
  • Compassion in practice: evidencing the impact. NHS England. May 2016. URL: https://www.england.nhs.uk/wp-content/uploads/2016/05/cip-yr-3.pdf [accessed 2023-12-08]
  • Spiro H. Commentary: the practice of empathy. Acad Med. 2009 Sep;84(9):1177-1179 [ CrossRef ] [ Medline ]
  • Tweedie J, Hordern J, Dacre J. Advancing Medical Professionalism. London, UK. Royal College of Physicians; 2018.
  • Decety J. Empathy in medicine: what it is, and how much we really need it. Am J Med. 2020 May;133(5):561-566 [ CrossRef ] [ Medline ]
  • Kerasidou A. Artificial intelligence and the ongoing need for empathy, compassion and trust in healthcare. Bull World Health Organ. 2020 Apr 01;98(4):245-250 [ https://europepmc.org/abstract/MED/32284647 ] [ CrossRef ] [ Medline ]
  • Pepito JA, Ito H, Betriana F, Tanioka T, Locsin RC. Intelligent humanoid robots expressing artificial humanlike empathy in nursing situations. Nurs Philos. 2020 Oct 20;21(4):e12318 [ CrossRef ] [ Medline ]
  • Montemayor C, Halpern J, Fairweather A. In principle obstacles for empathic AI: why we can't replace human empathy in healthcare. AI Soc. 2022 May 26;37(4):1353-1359 [ https://europepmc.org/abstract/MED/34054228 ] [ CrossRef ] [ Medline ]
  • Morrow E, Zidaru T, Ross F, Mason C, Patel KD, Ream M, et al. Artificial intelligence technologies and compassion in healthcare: a systematic scoping review. Front Psychol. 2022;13:971044 [ https://europepmc.org/abstract/MED/36733854 ] [ CrossRef ] [ Medline ]
  • Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023 Jun 01;183(6):589-596 [ CrossRef ] [ Medline ]
  • Wexler A, Reiner PB. Oversight of direct-to-consumer neurotechnologies. Science. 2019 Jan 18;363(6424):234-235 [ https://europepmc.org/abstract/MED/30655433 ] [ CrossRef ] [ Medline ]
  • De Zambotti M, Cellini N, Goldstone A, Colrain I, Baker F. Wearable sleep technology in clinical and research settings. Med Sci Sports exerc. 2019;51(7):1538 [ CrossRef ]
  • Ates HC, Nguyen PQ, Gonzalez-Macia L, Morales-Narváez E, Güder F, Collins JJ, et al. End-to-end design of wearable sensors. Nat Rev Mater. 2022 Jul 22;7(11):887-907 [ https://europepmc.org/abstract/MED/35910814 ] [ CrossRef ] [ Medline ]
  • Ryu WM, Lee Y, Son Y, Park G, Park S. Thermally drawn multi-material fibers based on polymer nanocomposite for continuous temperature sensing. Adv Fiber Mater. 2023 Jun 12;5(5):1712-1724 [ CrossRef ]
  • Ran S, Yang X, Liu M, Zhang Y, Cheng C, Zhu H, et al. Homecare-oriented ECG diagnosis with large-scale deep neural network for continuous monitoring on embedded devices. IEEE Trans Instrum Meas. 2022;71:1-13 [ CrossRef ]
  • Edgren L. Health consumer diversity and its implications. J Syst Sci Syst Eng. 2006 Mar;15(1):34-47 [ CrossRef ]
  • Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology. 2020 Apr;295(1):4-15 [ https://europepmc.org/abstract/MED/32068507 ] [ CrossRef ] [ Medline ]


Edited by A Mavragani; submitted 01.07.23; peer-reviewed by Z Zhang, A Nagappan, L Weinert; comments to author 19.07.23; revised version received 20.09.23; accepted 28.11.23; published 18.12.23

©Xin He, Xi Zheng, Huiyuan Ding. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.12.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.


  1. Research Methodology Phases

    analysis in research methodology

  2. 8 Types of Analysis in Research

    analysis in research methodology

  3. Example Of Methodology

    analysis in research methodology

  4. What Is Data Analysis In Research Process

    analysis in research methodology

  5. 1. Research methodology

    analysis in research methodology

  6. PPT

    analysis in research methodology


  1. Approaches to Content Analysis

  2. Research Methodology Course Part 3

  3. Research Methodology-21

  4. Research Methodology-20

  5. Content Analysis vs Thematic Analysis

  6. Research Methodology and Data Analysis-Refresher Course


  1. Research Methods

    Learn how to collect and analyze data for your research design. Find out the pros and cons of different types of data collection methods (qualitative, quantitative, primary, secondary, descriptive, experimental) and data analysis methods (statistical, thematic, content). See examples of each method and get tips on how to prevent plagiarism.

  2. What Is a Research Methodology?

    Your research methodology discusses and explains the data collection and analysis methods you used in your research. A key part of your thesis, dissertation, or research paper, the methodology chapter explains what you did and how you did it, allowing readers to evaluate the reliability and validity of your research and your dissertation topic.

  3. Research Methods--Quantitative, Qualitative, and More: Overview

    About Research Methods. This guide provides an overview of research methods, how to choose and use them, and supports and resources at UC Berkeley. As Patten and Newhart note in the book Understanding Research Methods, "Research methods are the building blocks of the scientific enterprise. They are the "how" for building systematic knowledge.

  4. Learning to Do Qualitative Data Analysis: A Starting Point

    For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...

  5. Data Analysis in Research: Types & Methods

    Data Analysis in Research: Types & Methods What is data analysis in research? Why analyze data in research? Data analysis in qualitative research Finding patterns in the qualitative data

  6. What Is Research Methodology? Definition + Examples

    Learn what research methodology means, how to choose between qualitative, quantitative and mixed-methods, and how to select data collection and analysis methods. Find out the difference between research design and methodology, and see examples of research methodologies with videos and tips.

  7. How to Do Thematic Analysis

    Like all academic texts, writing up a thematic analysis requires an introduction to establish our research question, aims and approach. We should also include a methodology section, describing how we collected the data (e.g. through semi-structured interviews or open-ended survey questions ) and explaining how we conducted the thematic analysis ...

  8. Content Analysis

    Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual: Books, newspapers and magazines Speeches and interviews Web content and social media posts Photographs and films

  9. Data analysis

    data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.

  10. 6. The Methodology

    Methodology is crucial for any branch of scholarship because an unreliable method produces unreliable results and, as a consequence, undermines the value of your analysis of the findings. In most cases, there are a variety of different methods you can choose to investigate a research problem.

  11. Textual Analysis

    Textual analysis is a broad term for various research methods used to describe, interpret and understand texts. All kinds of information can be gleaned from a text - from its literal meaning to the subtext, symbolism, assumptions, and values it reveals. The methods used to conduct textual analysis depend on the field and the aims of the ...

  12. The Beginner's Guide to Statistical Analysis

    Step 1: Write your hypotheses and plan your research design Step 2: Collect data from a sample Step 3: Summarize your data with descriptive statistics Step 4: Test hypotheses or make estimates with inferential statistics Step 5: Interpret your results Other interesting articles Step 1: Write your hypotheses and plan your research design

  13. Introduction to systematic review and meta-analysis

    A systematic review collects all possible studies related to a given topic and design, and reviews and analyzes their results [ 1 ]. During the systematic review process, the quality of studies is evaluated, and a statistical meta-analysis of the study results is conducted on the basis of their quality.

  14. What Is Qualitative Research?

    Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research. Qualitative research is the opposite of quantitative research, which involves collecting and ...

  15. Analysis

    In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for analysis ( Data Preparation) Data Preparation involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; transforming the data; and developing and documenting ...

  16. How to Write Research Methodology: Overview, Tips, and Techniques

    Methods and methodology in the context of research refer to two related but different things: method is the technique used in gathering evidence; methodology, on the other hand, "is the underlying theory and analysis of how a research does or should proceed" (Kirsch & Sullivan, 1992, p. 2).

  17. A tutorial on methodological studies: the what, when, how and why

    Methodological studies - studies that evaluate the design, analysis or reporting of other research-related reports - play an important role in health research. They help to highlight issues in the conduct of research with the aim of improving health research methodology, and ultimately reducing research waste. Main body

  18. What is data analysis? Methods, techniques, types & how-to

    Data analysis is the process of collecting, modeling, and analyzing data using various statistical and logical methods and techniques. Businesses rely on analytics processes and tools to extract insights that support strategic and operational decision-making.

  19. Research Methodology

    Qualitative Research Methodology. This is a research methodology that involves the collection and analysis of non-numerical data such as words, images, and observations. This type of research is often used to explore complex phenomena, to gain an in-depth understanding of a particular topic, and to generate hypotheses.

  20. Meta-Analytic Methodology for Basic Research: A Practical Guide

    Meta-analysis refers to the statistical analysis of the data from independent primary studies focused on the same question, which aims to generate a quantitative estimate of the studied phenomenon, for example, the effectiveness of the intervention (Gopalakrishnan and Ganeshkumar, 2013 ).

  21. What is Data Analysis?: Process, Types, Methods, and Techniques

    Data analysis is the process of cleaning, changing, and processing raw data and extracting actionable, relevant information that helps businesses make informed decisions. The procedure helps reduce the risks inherent in decision-making by providing useful insights and statistics, often presented in charts, images, tables, and graphs.

  22. What is Research Methodology? Definition, Types, and Examples

    A research methodology describes the techniques and procedures used to identify and analyze information regarding a specific research topic. It is a process by which researchers design their study so that they can achieve their objectives using the selected research instruments.

  23. Data Analysis

    Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation. Data Analysis Methods. Data Analysis Methods are as follows: Statistical Analysis. This method involves the use of mathematical models and statistical tools to analyze and interpret data.

  24. A comprehensive assessment of exome capture methods for RNA sequencing

    RNA-Seq analysis of Formalin-Fixed and Paraffin-Embedded (FFPE) samples has emerged as a highly effective approach and is increasingly being used in clinical research and drug development. However, the processing and storage of FFPE samples are known to cause extensive degradation of RNAs, which limits the discovery of gene expression or gene fusion-based biomarkers using RNA sequencing ...

  25. Water-rock interaction in the geothermal systems related to post

    The research methods used were: (1) statistical tests and (2) methods of multivariate data analysis, such as the Kruskal-Wallis test and Principal Component Analysis (PCA). The performed tests ...

  26. Validation of method for faecal sampling in cats and dogs for faecal

    Background Reproducible and reliable studies of cat and dog faecal microbiomes are dependent on many methodology-based variables including how the faecal stools are sampled and stored prior to processing. The current study aimed to establish an appropriate method for sampling and storing faecal stools from cats and dogs which may also be applied to privately-owned pets. The approach ...

  27. Amazon Value Chain Analysis

    Amazon.com Inc. Report contains a full version of Amazon value chain analysis. The report illustrates the application of the major analytical strategic frameworks in business studies such as SWOT, PESTEL, Porter's Five Forces, Ansoff Matrix and McKinsey 7S Model on Amazon. Moreover, the report contains analyses of Amazon leadership, business ...

  28. Journal of Medical Internet Research

    Background: Direct-to-consumer (DTC) health care artificial intelligence (AI) apps hold the potential to bridge the spatial and temporal disparities in health care resources, but they also come with individual and societal risks due to AI errors. Furthermore, the manner in which consumers interact directly with health care AI is reshaping traditional physician-patient relationships.