Introduction
Critical thinking, along with creative thinking, cooperative skills, and communication skills, is regarded as an essential competency for human productivity and development, and is described as a necessary skill for individuals in the 21st century. Critical thinking serves as a valuable guide for human actions. It embodies a rigorous standard, keen insight, prudent attitude, and accurate evaluation of human understanding and problem-solving processes. It comprehensively assesses the clarity, precision, and consistency of human thought and conclusions. Possessing critical thinking skills ensures the effectiveness of human actions. Therefore, critical thinking is a sophisticated ability that individuals must continually pursue. Cultivating critical thinking skills is an objective of education. Only by preparing essential thinkers for the marketplace can society in the 21st century thrive amid various forms of competition and pressure.
Critical thinking is vital to human beings, and clarifying its nature is a prerequisite for cultivating and learning it. In fact, countless experts and scholars have researched and defined its nature. Paul and Elder (2004) firmly believe that critical thinking is centered on judgment and evaluation, and its essence involves rethinking the human thinking process. Iswara et al. (2021) emphasize that critical thinking is a form of decision-making. Uliyandari et al. (2021) argue that critical thinking is based on the ability to reason and the logic of things, making it a cognitive skill. Sagkal Midilli and Altas (2020) conclude that critical thinking, as a targeted and highly analytical type of thinking, is closely related to problem-solving. Therefore, critical thinking is a cognitive skill, a thinking process, or a mental activity based on strong logical reasoning, which critically judges and evaluates one's thought processes for accurate decision-making and problem-solving.
In the information-saturated 21st century, critical thinking emerges as an indispensable cognitive competency, enabling learners to systematically decode, evaluate, and synthesize multidimensional information (Kusmaharti & Yustitia, 2022). Moreover, critical thinking exhibits three distinct and essential functions: (1) professional problem-solving capacity through evidence-based reasoning (Nadeak & Naibaho, 2020); (2) cultivation of intellectual virtues including resilience, inquisitiveness, and innovative capacity (Sunubi & Bachtiar, 2022); and (3) integration of success determinants through metacognitive regulation (Sarwanto et al., 2021). Given its transformative potential across cognitive, affective, and behavioral domains, critical thinking constitutes an educational imperative requiring curricular prioritization (Elbyaly & Elfeky, 2023; Hussin et al., 2019;Supriyatnoet al., 2020).
Educators have thus implemented a diverse range of methods to develop students’ critical thinking comprehensively and efficiently. Afdareza et al. (2020) utilized a novel 21st-century learning tool to enhance students’ critical thinking skills. Kusmaharti and Yustitia (2022) adopted self-directed learning-based approaches to boost students’ critical thinking. Some researchers suggest that problem-based learning effectively promotes the development of students’ critical thinking abilities (Sholihah & Lastariwati, 2020). Furthermore, other researchers have employed a variety of blended approaches to effectively promote the development of critical thinking. For example, Palinussa et al. (2023) combined problem-based learning with discovery learning to enhance students’ critical thinking. Other scholars have utilized advocacy for learning mathematical problems as a means to foster the development of critical thinking in mathematics (Ibrahim et al., 2021). There are also hybrid approaches, such as hands-on STEM activities (Asigigan & Samur, 2021) and blended learning (Sunubi & Bachtiar, 2022). Smart e-Books combined with problem-based learning (Ridho et al., 2021), worksheets combined with problem-based learning (Erlangga et al., 2021), problem-based learning models assisted by E-Modules (Rahmat et al., 2020), problem-solving learning models assisted by student worksheets (Is’ad & Sukarmin, 2022), and the Jigsaw model combined with problem-based learning (Saputra et al., 2019) are some of the hybrid methods. In summary, there are more than a dozen approaches to enhance students’ critical thinking, among which problem-based learning and its associated blended approaches are widely applied
PBL has its origin in the theory of experiential learning, with problem-solving serving as the linchpin of this approach (Razak et al., 2022). Essentially, PBL is a learning method grounded in real-world problems (Amin et al., 2020), centered around students (Masruro et al., 2021), and encouraging them to take the initiative in applying their knowledge and skills to address various real-world issues (Nadeak & Naibaho, 2020) through teamwork and independent thinking (Lubis et al., 2019). Consequently, PBL is a method that promotes real-world problem-solving, student-centered learning, and encourages initiative in applying knowledge and skills through teamwork and independent thinking.
Scholarly consensus confirms that Problem-Based Learning (PBL) serves as a potent cognitive scaffold for cultivating critical thinking through three synergistic mechanisms: (a) Authentic problem-driven inquiry that bridges academic content with real-world contexts, fostering analytical engagement and intrinsic motivation (Sholihah & Lastariwati, 2020; Uliyandari et al., 2021); (b) Structured cognitive processes encompassing systematic problem deconstruction, evidence evaluation, and conclusion formulation under guided epistemic frameworks (Akhdinirwanto et al., 2020; Mardi et al.,, 2021); and (c) Self-regulated learning cycles characterized by iterative questioning, metacognitive monitoring, and adaptive decision-making that stimulate neural plasticity (Mardi et al., 2021; Sholihah & Lastariwati, 2020). This tripartite architecture positions PBL as an epistemic engine driving critical thinking competencies. In summary, PBL is a scientific and systematic learning method that significantly impacts the development of critical thinking.
Problem-Based Learning (PBL), as a pedagogical strategy for cultivating core competencies such as critical thinking and self-directed learning in the 21st century, has had its comprehensive efficacy systematically validated through meta-analytic approaches (Manuaba et al., 2024). Numerous studies utilizing functionalities like effect size synthesis and heterogeneity testing in software such as CMA and Review Manager have elucidated the differential impacts of PBL across disciplines (e.g., medicine, education) and instructional contexts (Ranggi et al., 2021). These analyses have identified key moderating variables, including instructional strategies and assessment tools, thereby enhancing the methodological rigor of meta-analyses in deciphering PBL’s mechanisms of action. However, research employing meta-analysis to comprehensively investigate PBL’s effects on the development of critical thinking across various levels and types remains scarce, warranting further exploration.
Despite the widespread application of Problem-Based Learning across diverse fields to enhance students’ critical thinking, its effectiveness exhibits significant heterogeneity. This heterogeneity stems from multidimensional differences across studies, including sample sizes (ranging from 30 to 500 participants), measurement tools (standardized tests vs. self-developed instruments), instructional strategies (isolated or blended approaches), interactions, and learning stages. The substantial fluctuations in effect sizes pose challenges to accurately quantifying PBL’s overall efficacy and impact levels, while the heterogeneity of variables prevents individual studies from comprehensively revealing moderating factors influencing PBL outcomes. Consequently, two critical research gaps persist: first, existing evidence lacks systematic synthesis of PBL’s cross-domain overall effectiveness on critical thinking development; second, empirical support remains insufficient to clarify how learning stage, teaching method, sample size, measuring tool dynamically moderate PBL’s impact, thereby constraining a holistic understanding of its role. These research gaps necessitate the use of meta-analysis as the most appropriate methodology, as it enables the integration of effect sizes—either through fixed-effects or random-effects models—to synthesize findings from multiple studies, thereby systematically evaluating the overall impact of the intervention. Additionally, meta-analysis categorizes subgroups based on factors (moderating variables) and explains the consistency and variability of the effect sizes within these subgroups, thus guidingthe researcher in targeting the moderation of the various factors in order to improve the effectiveness of the intervention. Therefore, this paper focuses on two core research questions:
1. How effective is problem-based learning in improving students’ critical thinking skills compared to conventional methods?
2. How does the effectiveness of problem-based learning vary according to the learning stage, teaching method, sample size, measuring tool, and subjects?
The first question focused on the effectiveness of PBL in developing critical thinking compared to traditional methods. In experimental studies, effectiveness is indicated by the level of the effect size (Antonio & Prudente, 2024), which essentially means that the magnitude of the effect size of PBL is investigated. The effect size referred to above is the overall effect size derived from the meta-analysis, which is the weighted average of each specific effect size. The specific effect size originates from each primary literature, calculated from the mean and variance of the experimental and control groups in the experiment. Generally, the experimental group adopts PBL as the teaching method, while the control group uses the traditional method, making the experiment a comparison of the effectiveness of the two approaches. Therefore, the specific effect size indicates the effectiveness of PBL compared to the traditional method. Hence, this research calculates the overall effect size in the meta-analysis and then determines the effectiveness of PBL compared to traditional methods based on the significance of the overall effect size. It also assesses the level of effectiveness of PBL through the magnitude of the overall effect size.
The second question mainly analyzes various factors affecting PBL, including the learning stage, teaching method, sample size, measuring tool, and subject. These will be regarded as moderating variables in the meta-analysis, and a meticulous subgroup analysis will be carried out to clarify their specific influence on the effect size of PBL. The magnitude of these effects is primarily determined by the significance of the corresponding effect size of PBL under each moderating variable. The analysis of the moderating variables offers a clear understanding of the various factors affecting PBL and provides an effective strategy for the improved utilization of PBL.
Methodology
In order to conduct a detailed and thorough analysis of the actual effects of problem-based learning on critical thinking, this paper systematically investigates the relevant literature using a meta-analytical approach. Meta-analysis, a widely used literature review method, primarily calculates, analyzes, and compares the effect sizes of quantitative studies, and based on this, evaluates the overall effect sizes of the original documents to draw reliable conclusions. The use of meta-analysis requires the fulfillment of two prerequisites: the first is the provision of a series of specific effect sizes, which must be derived from each of the actual studies. The number of specific effect sizes is unspecified and generally ranges from 10 to 30 articles. Second, risk-of-bias analyses must be performed, and each specific effect size can only be meta-analyzed after a risk-of-bias test, primarily to ensure the accuracy of the meta-analysis. To address the above prerequisites, this study screened 30 pieces of literature through strict criteria and calculated specific effect sizes using the effect size formula, which was subsequently evaluated by applying five risk tests. This process excluded 5 articles that presented a risk of publication bias and ensured that the 25 selected articles complied with all the requirements of the meta-analysis.
Research Design
This meta-analysis will be conducted using CMA V3.0 software, consisting of the following steps: firstly, identifying the research question, thoroughly understanding the current state of research on the adoption of PBL to improve critical thinking, and exploring unresolved issues regarding the effectiveness of PBL to set the research topic and objective for analyzing the effectiveness of PBL on the development of critical thinking and it's influencing factors using meta-analysis; secondly, collecting literature, identifying keywords and screening criteria, and conducting a preliminary search in the database, followed by selecting literature that matches the requirements; thirdly, calculating effect sizes and conducting publication risk tests to identify literature that ultimately qualifies for meta-analysis; fourthly, conducting meta-analysis to calculate overall effect sizes and analyze heterogeneity and moderating variables; and fifthly, engaging in a discussion. The design is systematic and compact, ensuring that the study is conducted efficiently and in an orderly manner.
Collection of Data
The research questions in this paper include two primary ones: Firstly, is the effect size of PBL in improving students’ critical thinking significant compared to conventional methods? What level of effect size has been achieved? Secondly, is the effect size of PBL in enhancing students’ critical thinking influenced by the learning stage, teaching method, sample size, measuring tool, and subject? Focusing on these two questions, the researcher will first define the keywords and identify the databases from which the literature will be searched. Then, the researcher will apply the keywords to comprehensively retrieve the relevant literature from the databases.
There are three primary keywords for the search: critical thinking, problem-based learning (PBL), and enhancement or improvement. In this paper, we applied these three keywords to search several databases, including Scopus, Google Scholar, Eric, Web of Science, and PsycINFO. The initial search yielded a total of 203 relevant papers. After that, the 203 initial papers were progressively screened based on strict criteria as follows: first, limited to papers published between 2014 and 2024; second, written in English and available in full text; third, the study is a quasi-experiment or an actual experiment employing a PBL or blended PBL methodology as the treatment, with students’ critical thinking as the dependent variable; fourth, the study can provide data for effect size calculations, including mean, standard deviation, sample size, t-value, F-value, or p-value; fifth, the sample includes participants aged 6 years and older.
Table1. Inclusion Criteria
Screening Dimension Inclusion Criteria | |
Publication date | January 2014 – March 2024 |
Language & accessibility | English language, full-text available |
Quasi-experimental or true experimental design | Research design with PBL/blended PBL as intervention and critical thinking as dependent variable |
Data completeness | Sufficient data for effect size calculation (mean, SD, sample size, t/F/p-values) |
Sample characteristics | Participants aged ≥6 years (covering K-12 to higher education) |
Inter-Rater Reliability and Dispute Resolution Process
To ensure inter-rater reliability, a coding manual was first developed based on five screening criteria. Two independent reviewers applied this manual to screen and select studies. Only studies meeting all criteria were included. Inter-rater reliability was validated through two sequential phases: (1) a pilot phase, during which reviewers independently coded 10 randomly selected articles, yielding a Cohen’s kappa coefficient of κ = 0.85 (substantial agreement) for inclusion decisions, followed by (2) a complete screening phase, which involved weekly calibration meetings to resolve disputed cases through iterative discussions. Disagreements were systematically addressed through a tripartite protocol: inclusion criteria were rechecked against the coding manual, consensus was negotiated through deliberative dialogue, and persisting ambiguities (less than 5% of cases) were adjudicated by an independent senior researcher with domain expertise in PBL meta-analyses. The following is the exact process employed to screen the articles:

Figure 1. Flow of Literature Collection
Calculation of Effect Size
In order to perform an effect size analysis on the screened articles, the first step is to adopt a unified effect size formula to calculate the effect size of each study. Generally, Cohen’s d or Hedges’ g is used in meta-analysis to calculate the effect size. In this paper, Cohen’s d will be used as the formula for calculating the effect size. The specific formula is as follows:
Where d is the effect size, X̅̅1, and X̅̅2 is the mean value of the experimental and control groups, and SD12, and SD22 are the standard deviation of the two groups. The meaning of this effect size is the degree of difference between the two groups overall affected by the treatment, that is, the difference between the mean values of the two groups divided by the mean value of the standard deviation of the two groups.
In the process of effect size analysis, determining the level of the effect size serves as the basis for comparing and evaluating the effectiveness of PBL. In the meta-analysis, the level of effect size will be classified using three levels: high, medium, and low. Specific criteria are as follows:
Table 2. Effect Size Categorizes (Supriyadi et al., 2023)
Effect size Category |
0 ≤ ES ≤ 0.2 Low0.2 ≤ .ES ≤ 0.8 Medium ES > 0.8 High |
Analysis of Publication Bias
Publication bias is a critical factor that affects the effectiveness of meta-analysis; therefore, publication bias in the primary literature needs to be ruled out before formal analysis. In this study, the researchers employed the publication bias analysis procedure of the CMA software for bias detection and conducted comprehensive analyses through a Funnel plot, Orwin’s fail-safe N, Kendall’s tau correction test, Egger’s regression intercept test, and Duval and Tweedie’s trim and fill five tests for a comprehensive analysis. Among them, the Funnel plot mainly judges the publication bias by the position of the primary literature in the confidence interval, the uniformity and symmetry of the distribution in the funnel plot; Orwin’s fail-safe N mainly looks at the number of articles needed to change the significance of the existing composite effect size to judge the publication bias; Kendall’s tau correction test, Egger’s regression intercept test judged by the significance of p-value, both when p is greater than alpha .05 indicates no bias, and finally Duval and Tweedie’s trim and fill determines the risk of bias by the number of cut-and-paste of the literature (Siagian et al., 2023).
In the CMA software, the publication bias test is conducted subsequent to the preliminary analysis of overall effect sizes. Users can navigate to the Analyses tab and activate the Publication Bias module to perform the five bias detection tests. Following this, clicking on Funnel Plot will display the funnel plot, while clicking Table displays quantitative outputs for the remaining four tests, including p-values, fail-safe N values, and adjusted effect size estimates.

Figure 2. Funnel Plot of Standard Error
In the first-round publication bias analysis, five articles were identified as having publication bias, as their effect values exceeded 3, which is considered extremely high. After excluding these articles, the second-round publication bias risk analysis was conducted. As can be seen from the funnel plot in Fig. 1, the remaining 25 original papers were primarily distributed at the top of the funnel plot. This indicates a low error because they are close to the top. The majority of the papers fall within the confidence interval, and the distribution of the papers on the left and right is more uniform, suggesting that the publication bias of the original papers is minimal.
Second, as shown in Table 3, Orwin’s fail-safe N is equal to 2152. This indicates that more than 2.000 articles would be needed to make the p-value of the overall effect size greater than the alpha value of .05. Such a situation would reverse the overall effect size of the 25 original studies and render the effect size no longer significant. This number is difficult to achieve, suggesting that there is no publication bias in the literature. Third, the P-value of Kendall’s tau correction test (p = .1755) is greater than alpha .05, which indicates that the publication bias is not significant. Fourth, Egger’s regression intercept test has a p-value (p = .06975) greater than alpha .05, which also indicates that there is no publication bias. Finally, Duval and Tweedie’s trim and fill analysis shows that no literature needs to be trimmed and filled among the 25 articles, and all these analyses indicate that there is no publication bias in the 25 original articles.
After risk testing, 25 original articles were ultimately identified as meeting the rigorous requirements of the meta-analysis. The overall sample size of these articles is 1846. Among them, 461 are primary school students, accounting for 24.97%; 137 are middle school students, accounting for 7.42%; 770 are high school students, accounting for 41.17%; and 478 are university students, accounting for 25.89%. The literature spans the period from 2014 to 2024. These articles are well-documented, clearly argued, and cover the latest primary literature on the usage of PBL to enhance critical thinking. Moreover, the data can adequately support the findings of the meta-analysis..
Table 3. Analysis of Publication Bias
Publication bias analysis Result |
Orwin’s fail-safe N Z-value 18.2857 Z for alpha 1.9599 P-value 0.0000 N 25 Number of missing studies that would bring P-value to>alpha 2152 |
Kendall’s tau correction testTua 0.1933Z-value 1.3546P-value(2-tailed) 0.1755 |
Egger’s regression intercept testStandard error 1.07507t-value 1.90215P-value(2-tailed) 0.06975 |
Duval and Tweedie’s trim and fillQ-value 79.8274Studies trimmed 0 |
Analyzing of Data
After the previous publication bias analysis, the final 25 papers advanced to the final stage of the meta-analysis. CMA V3.0 was used to systematically analyze the effect size of each paper. The analyzed items included the effect size d (ES), Standard error (SE), Variance (V), Low limit (LL), Up limit (UL), Z value (Z), P value (p), as well as the overall effect size under the random model and the fixed model. The confidence interval used for the whole analysis was 95%, as specified in Table 3 below:
Table 4. Analysis of Effect Size
Study Name | ES | SE | V | LL | UL | Z | p |
Amin et al. (2020) | 2.227 | 0.437 | 0.191 | 1.371 | 3.083 | 5.101 | 0.000 |
Silviariza et al. (2020) | 1.270 | 0.274 | 0.075 | 0.734 | 1.806 | 4.642 | 0.000 |
Arifin et al. (2020) | 0.556 | 0.148 | 0.022 | 0.266 | 0.847 | 3.752 | 0.000 |
Dakabesi and Luoise (2019b) | 0.423 | 0.186 | 0.035 | 0.059 | 0.787 | 2.277 | 0.023 |
Palinussa et al. (2023) | 1.293 | 0.355 | 0.126 | 0.597 | 1.989 | 3.643 | 0.000 |
Umar et al. (2020) | 1.443 | 0.235 | 0.055 | 0.982 | 1.903 | 6.136 | 0.000 |
Wahyudiati (2022) | 1.247 | 0.269 | 0.072 | 0.720 | 1.773 | 4.642 | 0.000 |
Aswan et al. (2018) | 0.794 | 0.354 | 0.125 | 0.101 | 1.487 | 2.245 | 0.025 |
Hidayati and Purwaningsih (2023) | 1.568 | 0.555 | 0.308 | 0.418 | 2.655 | 2.827 | 0.005 |
Rahmat et al. (2020) | 0.622 | 0.309 | 0.095 | 0.017 | 1.227 | 2.014 | 0.044 |
Widyatiningtyas et al. (2015) | 2.081 | 0.247 | 0.061 | 1.598 | 2.564 | 8.439 | 0.000 |
Qondias et al. (2022) | 1.211 | 0.184 | 0.034 | 0.851 | 1.571 | 6.539 | 0.000 |
Table 4. Continued
Study Name | ES | SE | V | LL | UL | Z | p |
Gholami et al. (2016) | 0.345 | 0.334 | 0.111 | 0.309 | 0.999 | 1.034 | 0.301 |
Ibrahim et al. (2020) | 0.839 | 0.247 | 0.061 | 0.355 | 1.324 | 3.396 | 0.001 |
Pramestika et al. (2020) | 1.022 | 0.292 | 0.085 | 0.449 | 1.595 | 3.495 | 0.000 |
Cecily and Omoush (2014) | 0.802 | 0.285 | 0.081 | 0.242 | 1.361 | 2.809 | 0.005 |
Santuthi et al. (2020) | 1.504 | 0.323 | 0.104 | 0.871 | 2.137 | 4.655 | 0.000 |
Putranta and Kuswanto (2018) | 1.469 | 0.323 | 0.104 | 0.836 | 2.102 | 4.547 | 0.000 |
Anazifa (2016) | 0.671 | 0.279 | 0.078 | 0.123 | 1.218 | 2.401 | 0.016 |
Yolanda (2019) | 0.536 | 0.246 | 0.060 | 0.055 | 1.018 | 2.182 | 0.029 |
Zamroni et al. (2020) | 0.762 | 0.352 | 0.124 | 0.072 | 1.452 | 2.165 | 0.030 |
Zuryanty et al. (2019) | 0.953 | 0.271 | 0.073 | 0.422 | 1.483 | 3.520 | 0.000 |
Dianita and Tiarani (2023) | 2.220 | 0.610 | 0.372 | 1.025 | 3.416 | 3.640 | 0.000 |
Wenno et al. (2021) | 1.743 | 0.358 | 0.128 | 1.042 | 2.444 | 4.872 | 0.000 |
Mardi et al. (2021) | 0.308 | 0.765 | 0.585 | 1.191 | 1.807 | 0.402 | 0.688 |
Fixed model | 0.998 | 0.055 | 0.003 | 0.890 | 1.105 | 18.195 | 0.000 |
Random model | 1.081 | 0.106 | 0.011 | 0.874 | 1.288 | 10.248 | 0.000 |
Analysis of Overall Effect Size
The data in Table 4 display the specific effect size for each piece of literature as well as the significance of the effect size. When p is less than the alpha value of .05, it indicates that PBL has a more significant impact on the development of critical thinking than the conventional method. From the table, it can be observed that 16 out of the 25 original papers have high-level effect sizes, and 9 have medium-level effect sizes. Twenty-three of them have effect sizes with p-values less than the alpha value of .05, indicating that PBL can significantly improve students’ critical thinking in experiments compared to the conventional method. Two pieces of literature, Gholami2016 (ES = 0.345, p = .301) and Mardi2021 (ES = 0.308, p = .688), had p-values greater than the alpha value of .05, indicating that PBL played a role in enhancing students’ critical thinking in these two studies, but not as significantly as in the other studies. In terms of the overall effect size, the values in the fixed-effects model and the random-effects model were 0.998 and 1.081, respectively, with p-values less than the alpha value of .05.
The meta-analytic synthesis of 25 independent studies revealed a robust aggregated effect size, indicating that problem-based learning (PBL) pedagogy demonstrates statistically superior efficacy (p < .05) in enhancing students' critical thinking competencies relative to conventional instructional approaches.
Table 5. Result of Heterogeneity Test
Q-value df(Q) P I2 |
79.827 24 0.000 69.935 |
After the comprehensive meta-analysis, the effect sizes under both fixed and random models are known. The next heterogeneity test will determine which model to use for the overall effect sizes. When significant heterogeneity exists among studies, the random model is employed. The core rationale lies in the model's acknowledgment that the variation in true effect sizes across studies arises not only from sampling error but may also be influenced by potential confounding factors such as study design, sample characteristics, or intervention conditions. Consequently, this model incorporates these variations into the calculation of the overall effect size, utilizing broader confidence intervals and more precise methods for effect size estimation. In contrast, the fixed model assumes that all studies are derived from a homogeneous population, attributing observed differences in effect sizes solely to sampling error without considering other influencing factors. Therefore, in the presence of substantial heterogeneity, the random-effects model is necessary to obtain a more accurate overall effect size. (Suryono et al., 2023). In the analysis of heterogeneity, when the p-value is less than alpha .05, it indicates a significant imbalance in the composite effect size, and the heterogeneity is moderate when the I2is between 25% and 75% (Aytaç & Kula, 2020). The results of the heterogeneity test in Table 4 have a p-value of .00 and an I2of 69.935%, which suggests that there is significant heterogeneity in the 25 original papers (Q = 79.827, p < .05, I2= 69.935) and therefore a random effects model will be used in this analysis, that is, the overall effect size for the 25 original papers will be 1.081 with a confidence interval (CI) of [0.874 ,1.288] and a p < .05.The significant heterogeneity observed across the 25 included studies suggests that certain moderating variables may systematically influence the effectiveness of PBL in developing critical thinking skills, necessitating rigorous examination of heterogeneity sources through subgroup analysis to optimize the effectiveness of PBL.
Analysis of Heterogeneity
Analysis of Heterogeneity could determine which factors could promote the effectiveness of PBL and which ones might hinder it. In the meta-analysis of critical thinking, the common moderating variables for subgroup analysis include learning stage, teaching method, sample size, measuring tool, and subject (Xu et al., 2023). The learning stage encompasses all levels, from primary school to university. The teaching method includes PBL and blended PBL (i.e., PBL combined with other methods, denoted as PBL+). The sample size categories are fewer than 30, 30 to 60, and more than 60. The measuring tool includes both standardized tests and self-developed tests. The subject includes all disciplines from primary school to university. The following are the results of the subgroup analysis:
Table 6. Analysis of Heterogeneity
Group | Effect size and 95% confidence interval | Test of null [2-Tail] | Heterogeneity | ||||||||
N | ES | SE | V | LL | UL | Z | p | Q | p | F | |
Learning stage | |||||||||||
Elementary | 6 | 1.240 | 0.113 | 0.013 | 1.020 | 1.461 | 11.026 | 0.000 | 5.384 | 0.371 | 7.133 |
Junior high | 2 | 0.891 | 0.196 | 0.038 | 0.508 | 1.274 | 4.555 | 0.000 | 5.683 | 0.017 | 82.400 |
Senior high | 8 | 0.895 | 0.085 | 0.007 | 0.729 | 1.061 | 10.570 | 0.000 | 46.250 | 0.000 | 84.870 |
University | 9 | 0.975 | 0.107 | 0.011 | 0.765 | 1.184 | 9.134 | 0.000 | 16.030 | 0.042 | 50.100 |
Total within | 73.354 | 0.000 | |||||||||
Total between | 6.474 | 0.091 | |||||||||
Over all | 25 | 0.998 | 0.055 | 0.003 | 0.890 | 1.105 | 18.195 | 0.000 | 79.827 | 0.000 | 69.935 |
Teaching method | |||||||||||
PBL | 18 | 1.076 | 0.064 | 0.004 | 0.950 | 1.202 | 16.768 | 0.000 | 60.581 | 0.000 | 71.938 |
PBL+ | 7 | 0.786 | 0.786 | 0.011 | 0.579 | 0.993 | 7.443 | 0.000 | 13.727 | 0.033 | 56.289 |
Total within | 74.308 | 0.000 | |||||||||
Total between | 5.520 | 0.019 | |||||||||
Over all | 25 | 0.998 | 0.055 | 0.003 | 0.890 | 1.105 | 18.190 | 50.000 | 79.827 | 0.000 | 69.935 |
Sample size | |||||||||||
30-60 | 9 | 0.933 | 0.111 | 0.012 | 0.716 | 1.150 | 8.425 | 0.000 | 20.542 | 0.000 | 61.056 |
Less than 30 | 3 | 1.516 | 0.362 | 0.131 | 0.807 | 2.224 | 4.191 | 0.000 | 3.838 | 0.147 | 47.894 |
More than 60 | 13 | 1.003 | 0.064 | 0.004 | 0.878 | 1.129 | 15.649 | 0.000 | 53.043 | 0.000 | 77.377 |
Total within | 77.423 | 0.000 | |||||||||
Total between | 2.404 | 0.301 | |||||||||
Over all | 25 | 0.998 | 0.055 | 0.003 | 0.890 | 1.105 | 18.195 | 0.000 | 79.827 | 0.000 | 69.935 |
Measurement tool | |||||||||||
SD | 9 | 0.999 | 0.077 | 0.006 | 0.848 | 1.151 | 12.924 | 0.000 | 45.523 | 0.000 | 82.426 |
SFD | 16 | 0.997 | 0.078 | 0.006 | 0.844 | 1.149 | 12.807 | 0.000 | 34.304 | 0.003 | 56.273 |
Total within | 79.827 | 0.000 | |||||||||
Total between | 0.001 | 0.982 | |||||||||
Over all | 25 | 0.998 | 0.055 | 0.003 | 0.890 | 1.105 | 18.195 | 0.000 | 79.827 | 0.000 | 69.935 |
Subject | |||||||||||
Accountancy | 1 | 0.308 | 0.765 | 0.585 | 1.191 | 1.807 | 0.402 | 0.688 | 0.000 | 1.000 | 0.000 |
Biology | 2 | 0.755 | 0.193 | 0.037 | 0.376 | 1.133 | 3.910 | 0.000 | 0.302 | 0.583 | 0.000 |
Chemistry | 2 | 0.690 | 0.153 | 0.023 | 0.390 | 0.989 | 4.514 | 0.000 | 6.358 | 0.012 | 84.273 |
Geography | 2 | 1.540 | 0.232 | 0.054 | 1.085 | 1.994 | 6.642 | 0.000 | 3.451 | 0.063 | 71.023 |
Law | 1 | 0.762 | 0.352 | 0.124 | 0.072 | 1.452 | 2.165 | 0.030 | 0.000 | 1.000 | 0.000 |
Mathematics | 7 | 0.991 | 0.090 | 0.008 | 0.815 | 1.167 | 11.042 | 0.000 | 36.290 | 0.000 | 83.466 |
Nursing | 2 | 0.609 | 0.217 | 0.047 | 0.184 | 1.034 | 2.806 | 0.005 | 1.081 | 0.298 | 7.496 |
Physics | 1 | 1.469 | 0.323 | 0.104 | 0.836 | 2.102 | 4.547 | 0.000 | 0.000 | 1.000 | 0.000 |
Science | 7 | 1.216 | 0.112 | 0.013 | 0.996 | 1.436 | 10.837 | 0.000 | 10.833 | 0.094 | 44.615 |
Total within | 58.315 | 0.000 | |||||||||
Total between | 21.512 | 0.006 | |||||||||
Over all | 25 | 0.998 | 0.055 | 0.003 | 0.890 | 1.105 | 18.195 | 0.000 | 79.827 | 0.000 | 69.935 |
N: number, ES: effect size, SE: standard error, V: variance, LL: low limit, UL: up limit, PBL: problem-based learning, PBL+: bland problem learning, SD: standard, SFD: self-developed.
The results of heterogeneous analyses for learning stage (Q = 6.474, p = .091), teaching method (Q = 5.520, p = .019), sample size (Q = 2.404, p = .301), measurement tool (Q = 0.001, p = .982), subject (Q = 21.512, p = .006), and sample size (Q = 1.512, p =.006) are shown in Table 5. Here, the p-values of the learning stage, measurement tool, and sample size are greater than the alpha value of .05 proving that the heterogeneity of these three subgroups is not significant, while the p-values of teaching method, subject are less than .05, proving significant heterogeneity in these two subgroups. The characteristics of each group will be analyzed next, especially those of the groups with significant heterogeneity.
Analysis of Moderating Variables
In terms of learning stages, the effect of using PBL to improve students’ critical thinking is the same at all stages of learning from primary schools to university. Regarding terms of specific effect values for Elementary (ES = 1.240, p < .05), Junior high school (ES = 0.891, p < .05), Senior high school (ES = 0.895, p < .05), and University (ES = 0.975, p < .05). Thus, PBL demonstrates consistent pedagogical efficacy across all learning stages, with nonsignificant cross-stage variation in robust high-level effect sizes, substantiating its universal applicability as an evidence-based strategy for systematic critical thinking development in multilevel education systems.
Regarding sample size, there was no major difference in the effect of PBL on different sample sizes: less than 30 (ES = 0.933, p < .05), between 30 and 60 (ES = 1.516, p < .05), and greater than 60 (ES = 1.003, p < .05), PBL exhibited sample size-independent effects on critical thinking enhancement, with robust effect sizes maintained across both small-scale and large-scale intervention studies, demonstrating its scalability as a reliable pedagogical intervention for critical thinking cultivation in varied educational contexts.
Concerning the measurement tool, PBL improved students’ critical thinking regardless of whether a standard measurement tool (ES = 0.999, p < .05); or a self-developed measurement tool (ES = 0.997, p < .05). This empirically substantiates that PBL demonstrates measurement-invariant robustness in enhancing critical thinking, with converging high-level effect sizes indicating its transcending educational value beyond specific evaluation frameworks.
In terms of teaching methods, single PBL (ES = 1.076, p < .05) and blended PBL+ (ES = 0.786, p < .05) had significant effects on improving students’ critical thinking. However, there was a significant difference in the effects, with the effect of single PBL (ES = 1.076) being higher than 0.8 belonging to the high-level effect size, and the effect of blended PBL+ (ES = 0.786) effect is lower than 0.8 and belongs to medium level effect size, which indicates that single PBL improves students’ critical thinking more significantly.
With respect to subject, there are significant differences in the effect of PBL on critical thinking across subjects, specifically, for Geography (ES = 1.540, p < .05), for Mathematics (ES = 0.991, p < .05), for Physics (ES = 1.469, p < .05), for Science (ES = 1.216, p < .05) have effect sizes at the high level, indicating an extremely significant effect. For Biology (ES = 0.755, p < .05), Chemistry (ES = 0.690, p < .05), Law (ES = 0.762, p < .05), Nursing (ES = 0.609, p < .05) the effect sizes are at a moderate level and the p-values are less than .05, so the effect sizes are significant. However, for Accountancy (ES = 0.308, p = .688) has a non-significant p > .05, which indicates that there is no significant effect of PBL on critical thinking in Accountancy. In conclusion, PBL demonstrates the highest effect size on Geography, Mathematics, Physics, and Science, with statistically significant outcomes. Its cross-disciplinary applicability in cultivating critical thinking is evident, achieving optimal efficacy in discipline-specific contexts.
In summary, in the 25 studies of applying PBL to improve students’ critical thinking, the analysis of subgroup heterogeneity regarding learning stage, teaching method, sample size, measuring tool, and subject showed that the heterogeneity of the learning stage, sample size, and measuring tool was not significant, while the heterogeneity of the teaching method and subject was significant. The results indicate that single PBL enhances students’ critical thinking more effectively than blended PBL. The PBL- related methods significantly improve the critical thinking of students in four subjects: Geography, Mathematics, Physics, and Science.
Findings
The Effect Size and Significance of PBL Compared to the Traditional Method
Following a comprehensive and systematic meta-analysis of 25 articles, the results indicated that the overall effect size of these articles was at an advanced level (ES = 1.081, p < .05). This suggests that, compared with the conventional approach, PBL significantly enhances students’ critical thinking, which is consistent with the findings of numerous researchers. Numerous studies have demonstrated that PBL is an effective strategy for enhancing critical thinking. (Hussin et al., 2019). Oderinu et al. (2020) conducted an educational experiment on cariology, comparing the effects of PBL and traditional teaching methods on improving students’ critical thinking. The study found a significant difference (p < .05) between PBL and traditional teaching methods in improving students’ critical thinking, indicating that PBL was effective in enhancing this skill. Amin et al. (2020) employed the PBL approach to enhance students’ critical thinking in an experiment within a social science education program. After one semester of the experiment, the conclusions reached showed that there was a significant effect of PBL on students’ critical thinking compared to the conventional method (p < .05).
Masruro et al. (2021), through a systematic review of 43 articles, highlighted that PBL enhances critical thinking via the following mechanisms: transforming students from passive learners into autonomous problem-solvers; converting monotonous learning environments into structured problem-solving scenarios; shifting teacher-centered instruction to student-centered classrooms; and fostering collaborative deep thinking (e.g., mutual motivation and brainstorming) over individual cognition. This process activates students’ integration of critical thinking skills—conceptualization, analysis, application, synthesis, and evaluation—thereby advancing higher-order cognitive development. Razak et al. (2022), based on an analysis of 20 studies spanning 2016–2021, demonstrated that PBL’s effectiveness in enhancing critical thinking stems from its use of authentic problems (enabling real-world problem-solving), peer collaboration, and teacher guidance. These elements strengthen students’ learning accountability and independence, positioning them as proactive learners, reflective practitioners, and critical thinkers. Monalisa et al. (2019) found that, compared to traditional methods, PBL improves critical thinking by elevating students’ learning initiative, intrinsic motivation, interpersonal regulation, and skill development. This approach equips learners to identify and resolve practical problems while reinforcing self-reflection and evaluative capacities during problem-solving. Kusuma et al. (2018) emphasized that PBL elevates critical thinking by framing real-world problems as platforms for cognitive growth. Students refine their analytical, inferential, and evaluative skills through problem-solving, supported by perseverance, responsibility, and cooperative engagement, collectively driving efficient skill advancement.
As a teaching methodology, PBL actively guides students to identify problems, motivates them to explore and interpret those problems independently, encourages collaboration with peers to find solutions, and provides timely supervision and feedback. It also pushes students to refine their methods to solve problems. This process positively contributes to the development of students' critical thinking (Nadeak & Naibaho, 2020).
The Influence of Learning Stages on the Effectiveness of PBL
This study also found that PBL had a significant effect on improving critical thinking for students at all learning stages, with a high-level effect size. This indicates that PBL can be applied to a wide range of student audiences to enhance critical thinking at every stage of learning, from primary school to university.
This conclusion was supported by other studies. For instance, Zuryantyet al. (2019) employed PBL as a teaching method in an elementary science course. After one semester of study, preliminary evidence suggested that PBL demonstrated potential efficacy in fostering the development of critical thinking among elementary school students. Febrianto et al. (2021) combined PBL with STEM to improve junior high school students’ critical thinking, and the results showed that this approach facilitated their development in virtual science learning. In a study on training high school students in writing, Hidayat et al. (2019) concluded that PBL was effective in stimulating critical thinking in writing among high school students. Saepuloh et al. (2021) utilized PBL to enhance students’ critical thinking in a higher-order thinking study, and the results indicated that PBL had a significant effect on the critical thinking skills of university-level business school students. Therefore, PBL can be widely applied to effectively improve the critical thinking skills of elementary, middle, high school, and university students.
The Influence of Samples and Measurement Tools on the Effectiveness of PBL
Another finding of this study is that the size of the experimental sample, as well as the type of measurement tool used in the experiment, has no significant effect on the effectiveness of PBL in improving critical thinking. Whether the sample selected in the experiment is less than 30, between 30 and 60, or greater than 60, PBL significantly improves students’ critical thinking. Additionally, whether the measurement tool used in the experiment is a standard measurement tool (ES = 0.999, p < .05) or a self-developed measurement tool (ES = 0.997, p < .05), PBL significantly improves students’ critical thinking. In a review study of PBL, Haniko et al. (2023) concluded that PBL enhances students’ critical thinking mainly because the problems presented by PBL stimulate students to engage in a complete and systematic process of investigation, exploration, and problem-solving. During this process, students independently construct knowledge and form concepts, thereby promoting the development of critical thinking.
Although the enhancement of students’ critical thinking by PBL is influenced by various factors, the sample size is not a key factor affecting the effectiveness of PBL. Fadilla et al. (2021), in their review study on PBL, pointed out that the factors affecting the effectiveness of PBL include students’ interest, the opportunity for students to present their own views, the use of various teaching methods, and the motivation of both teachers and students. Thus, the sample size does not impact the effectiveness of PBL. Regarding different measurement tools, since both standard measurement tools and self-developed tools have been tested for reliability and validity, and tools that fail these tests will not be used to assess the level of students’ critical thinking, any measurement tool employed can accurately detect the true level of students’ critical thinking. Therefore, the sample size in the experimental study and the use of different measurement tools with established reliability and validity will not interfere with the effect of PBL on students’ critical thinking.
Effectiveness of Single PBL Versus Blended PBL
This study also found a significant difference between the effect of applying PBL alone as an instructional method and that of a mixed method combining other methods with PBL in improving students’ critical thinking. Specifically, PBL alone is more effective in enhancing students’ critical thinking (ES = 1.076, p < .05), with a high-level effect. In contrast, the effect of blended PBL+ (ES = 0.786, p < .05) was not as significant and was at the medium level. This result was corroborated by other researchers. Setiawan and Islami (2020) used single PBL to guide students’ mental activities in problem-solving within a physics course. After three rounds of study, the results showed that students developed all critical-thinking indicators, particularly in the self-correction and analysis indicators. In a study of critical thinking in mathematics, Susilo et al. (2020) noted that the application of PBL led to an intermediate level of development in students’ critical thinking. Masruro et al. (2021), in a literature review, stated that the use of the PBL model can increase students’ critical thinking by 50%. Therefore, PBL is effective in improving students’ critical thinking.
Studies highlight blended PBL’s effectiveness in enhancing critical thinking. Rahmawati et al. (2021) found e-module-based blended PBL improves physics students’ critical thinking compared to traditional methods by fostering group learning, problem-solving autonomy, and knowledge application, though no direct comparison with single PBL was made. Similarly, Rahmat et al. (2020) noted blended PBL’s superiority over traditional approaches but lacked comparison to single PBL. Rahmadita et al. (2021) reported technology-assisted PBL moderately boosted critical thinking, while Febrianto et al. (2021) showed Schoology-assisted PBL achieved 82.5% task completion in high schoolers, with structured support and management enhancing skill development. Lukitasari et al. (2019) observed blended PBL strengthened college students’ critical thinking by boosting motivation, peer communication, and active learning engagement.
This was mainly because blended PBL enhances students’ learning motivation, encourages active engagement in learning, strengthens communication among students, and promotes their participation in learning. Thus, blended PBL is conducive to the development of students’ critical thinking.
Yennita and Zukmadini (2021) combined PBL with online learning to improve students’ critical thinking in a biochemistry course. The study concluded that this hybrid method could enhance students’ critical thinking after two rounds of application, increasing their criticality by 64.43%. This improvement was attributed to hybrid PBL encouraging students to search for information in online libraries and engage in more active communication and information exchange. As a result, students’ knowledge expands, their motivation to participate in activities increases, and their frequency of participation rises. Additionally, it increases opportunities for communication between students and teachers. Therefore, this hybrid PBL positively impacts students’ motivation, learning, and communication, effectively promoting the development of students’ critical thinking.
In conclusion, whether it is single or blended PBL, both can improve students’ critical thinking, with PBL playing a core role. The reason for using blended PBL is to adapt to the requirements of certain subjects or the characteristics of students. Thus, the choice of method should be based on the actual situation.
Influence of Subject on the Effectiveness of PBL
PBL significantly improves critical thinking in four subjects: geography, mathematics, physics, and science. It also has a notable effect on the development of critical thinking in the fields of biology, chemistry, law, and nursing. This finding aligns with other studies. Miterianifa et al. (2019), in a meta-analysis, highlighted that PBL has a wide range of disciplinary adaptability, particularly in physics, biology, and chemistry, as it is both analytical and problem-solving. Specifically, it can enhance students’ critical thinking across several subjects, such as physics, biology, and chemistry, since PBL not only analyzes but also solves problems, teaches both concepts and methods, and assimilates theories while accumulating experience.
Santyasa et al. (2019) utilized PBL to improve high school students’ critical thinking in physics, and the results demonstrated that PBL significantly enhanced students’ critical thinking. The experimental study by Fitriani et al. (2020) confirmed that PBL was effective in improving students’ critical thinking in biology, and that this improvement positively influenced their performance in the biology course. Dakabesi and Luoise (2019a) enhanced students’ critical thinking using PBL in chemistry courses, achieving significant results. In their study, Susilo et al. (2020) found that PBL can be effectively applied to improve students’ critical thinking in mathematical problem-solving.
This provides ample evidence that PBL can be employed in a variety of subjects to foster the development of critical thinking. In fact, its high adaptability makes it a reliable choice. With the advancement of technology and the evolution of educational paradigms, future research on PBL to promote critical thinking development may focus on three key aspects: systematically tracking the long-term sustained impact of PBL on students' critical thinking; conducting empirical research to reveal the effects of PBL on critical thinking development across gender dimensions, learning motivation levels, and cross-cultural contexts; and advancing the integration mechanisms between artificial intelligence technologies and PBL frameworks to enhance critical thinking cultivation. Research in these directions will significantly strengthen the efficacy of PBL in fostering critical thinking development.
Conclusion
This paper is dedicated to using meta-analysis methods to summarize the overall effect size of Problem-based learning on the development of critical thinking and to analyze the moderating variables affecting PBL. This article makes a multifaceted and important contribution to the literature in the field of PBL and adds to the body of knowledge in the field. First, it provides robust empirical evidence demonstrating PBL's superior efficacy over traditional instructional methods, thereby reinforcing its practical value in educational practice. Second, the research elucidates the intrinsic mechanisms through which PBL fosters critical thinking: stimulating interest in real-world problem exploration, promoting multi-strategy integration, encouraging proactive solution-seeking, and guiding reflective self-evaluation, these findings offer novel perspectives for understanding PBL's pedagogical processes. Third, regarding moderating variables, the study confirms that both pure PBL and blended PBL effectively enhance critical thinking, with cross-disciplinary applicability unaffected by learning stages, sample sizes, or assessment tools, thus expanding traditional understanding of PBL's implementation contexts. The innovative construction of a moderating variable framework incorporating instructional approaches and subject disciplines not only addresses methodological gaps in PBL meta-analysis research but also exemplifies the precise application of meta-analytic techniques in evaluating educational interventions, advancing methodological progress in educational research.
Recommendations
When implementing PBL, educators should establish problem scenarios closely tied to students' lived experiences and construct engaging learning environments to stimulate inquiry motivation. By allocating sufficient time for analysis and problem-solving to reinforce student agency, teachers should cultivate responsibility and collaborative skills during task execution. Guide students to develop reflective thinking and independent judgment through structured discussions, information synthesis, and evaluation processes, while optimizing the learning cycle via intelligent monitoring, timely feedback, and equitable assessment. Teachers may flexibly integrate educational technologies and digital platforms based on disciplinary characteristics, student profiles, and personal teaching styles—particularly prioritizing PBL in subjects like geography, mathematics, and physics to enhance critical thinking. Implementation must rigorously adhere to core procedures including problem design, group guidance, outcome presentation, and comprehensive evaluation.
Future research should prioritize investigating the core mechanisms through which PBL enhances critical thinking in high-efficacy disciplines such as geography, mathematics, and physics, extracting transferable strategies applicable to humanities and social sciences. Comparative studies must examine differences in critical thinking outcomes between traditional PBL and technology-enhanced variants through meta-analysis (e.g., AI-powered problem generation, virtual collaboration platforms). Researchers should systematically develop longitudinal assessment tools to map trajectories of cognitive development, addressing PBL's extended cultivation cycle, thereby providing robust methodological support for optimizing critical thinking pedagogy across educational contexts.
Limitations
This study has several limitations that should be noted. First, it relied solely on self-reported data, which may introduce response biases such as social desirability. Second, the cross-sectional design limits the ability to infer causal relationships among variables. Third, the sample was limited to undergraduate students from specific regions in China, restricting the generalizability of the findings. Future studies should consider using longitudinal designs and more diverse samples to enhance external validity.
Authorship Contribution Statement
Li: Concept and design, data collection, data analysis, statistical analysis, and drafting manuscript. Mustakim: Admin, technical, software maintenance, securing funding, and data proofreading. Muhamad: Critical revision of manuscript, manuscript review, supervision, final approval.
Generative AI Statement
The author has not used generative AI or AI-supported technologies.