Game-Based Assessment of Statistical Self-Efficacy

An Alternative to the Self-Report of Internal Unobservable Beliefs

G. Curt Fulwider

2026-06-16

Self-Efficacy

Defined as the belief one holds about their ability to manage and execute actions required to achieve specific outcomes, self-efficacy plays a critical role in how people think, behave, and feel [1].

Why self-efficacy matters

  • Self-efficacy is pivotal in understanding motivation and achievement [2].
  • It shapes whether students engage, avoid, persist, or withdraw.
  • It is one of the strongest self-belief predictors of academic achievement [3].

Perceived threat versus perceived opportunity figure

Self-belief bias

Overconfident learner example

Underconfident learner example

  • Self-efficacy beliefs can distort how students interpret their own performance, leading to overconfidence or underconfidence [4].
  • Self-efficacy can change performance itself, so assessment scores are not insulated from self-belief [1], [4].
  • Regularly assessing self-efficacy can help identify and address these biases.

The Problem

Assessing Self-Efficacy

Hard to measure

  • Self-efficacy is internal and not directly observable.
    • Side effects (e.g., persistence) are present, but beliefs are internal.
  • It is also dynamic and task performance specific.
  • Self-report is the only method.

Self-report is Problematic

  • Self-report introduces both bias and burden [5], [6], [7].
  • Added time and effort required to complete surveys [6], [7].
  • Overtested students and overburdened teachers.

Design Constraints

  • To make this work, the study had to satisfy three design constraints.
  • The game had to elicit relevant behavior.
    • Sufficient difficulty to elicit persistence and risk-taking
    • Opportunities for goal setting and interest expression
  • The assessment had to remain unobtrusive.
    • Interrupting play breaks authentic behavior
    • More frequent measurement opportunities
  • The models that can be feasible deployed in practice.

The Idea

Bridge Variables

  • Observable behaviors can serve as bridge variables to reach an unobservable construct.
  • Relied on ECD to construct the logic [8].
  • The variables persistence, goal setting, and risk-taking are…
    • Theoretically related to self-efficacy beliefs.
    • Directly observable (by definition) in a learning game context.
Bridge variables for self-efficacy

Bridge Variable Operationalization

Bridge variable Operationalized definition
Persistence Higher self-efficacy, more attempts and more time spent on task
Goal setting Higher self-efficacy, sets higher goals
Risk-taking Higher self-efficacy, willing to take risks

Educational Data Mining

Educational Data Mining (EDM) offers a complementary response to the limitations of ECD by focusing on the extraction of meaningful patterns from educational data to better understand and improve learning processes [9].

  • In practice, simple models often did not recover the outcome well.
  • The relationship between self-efficacy and its behavioral correlates may be too complex to reduce cleanly to a simple pattern.
    • Consider a 6-dimensional scatterplot.
  • Educational Data Mining (EDM) helped by supporting large-scale feature engineering and fast model testing.
  • That made it possible to search for stable patterns across many theoretically derived variables.

The Method

Study context

  • Setting: two-day classroom study in a video game design course at a K–12 research school in the southeastern United States
  • Population: grades 8–12; predominantly male, consistent with the course context
  • Gameplay: about 48 minutes across two class periods, generating full event-level telemetry
Sample stage n
Full gameplay telemetry 109
Matched pretest survey + gameplay 102
Self-efficacy posttest available 95
Content posttest available 93
Common analytic modeling subset 86

Mean Alchemy

  • A custom learning game designed to elicit interpretable behavioral evidenc of statistical self-efficacy.
  • Targeting middle school students learning about measures of center and spread (i.e., 6th to 8th grade).

Cycle of gameplay flow

Bounty board screenshot

Screenshot of the alchemy table

Example figure three

Study Design

  • Design:
    • quantitative,
    • nonexperimental predictive,
    • multiverse-style model search [10]
  • Model-building choices as analytic uncertainty rather than one final model.
    • Not asking: “Which single model wins?”
    • I’m asked: “What patterns remain when reasonable analytic choices change?”
  • In practice, that meant varying feature subsets, preprocessing pipelines, model families, and hyperparameters while holding the theoretical target constant.

The Results

Main findings

  • When predicting pretest self-efficacy, there was a modest but consistent signal across models.
  • When predicting posttest self-efficacy, the signal was weaker but more consistent across models.
  • The strongest recurring features were:
    • goal setting,
    • persistence after failure,
    • and interest.

PREtest Model Metric Profile

Pretest model metric profile

POSTtest Model Metric Profile

Posttest model metric profile

Model Metric Heatmaps

Pretest top models heatmap

Pretest models

  • Stronger agreement across model families
  • Average metrics higher than posttest

Posttest top models heatmap

Posttest models

  • Weaker agreement across models for posttest
  • Single model that performed well, maybe too well…

Feature Importance from Top Three Models

L1 logistic regression feature importance Naive Bayes feature importance Logistic regression feature importance

Final pretest (L1 Logistic Regression) confusion matrix

Confusion matrix for final pretest model

What changed across targets

  • Dynamic or shifting cases were harder to classify.
  • Aggregated participant-level features may be more useful for static constructs.
  • The most informative errors pointed toward movement in self-efficacy over time.

On- versus off-diagonal content and interest

On- versus off-diagonal content and interest comparison

Self-efficacy change by classification group

Self-efficacy change by classification group

What It Means

Interpretation

  • Behavioral traces can carry information about self-efficacy beliefs.
  • The value is in recurring agreement across models, not one isolated result.
  • This supports the use of theory-guided behavioral proxies for hard-to-measure constructs.

Methodological implication

  • Participant-level aggregate features appear to compress too much within-session variation.
  • Dynamic constructs likely require finer-grained temporal modeling.
  • Future work should preserve more of the sequence and timing of behavior.
Bridge variables for self-efficacy

Limits and Implications

Limits

  • Modest sample
  • One game context
  • Short exposure in an MVP environment
  • Predominantly male sample from a video game design course

Implications

  • Accessibility: game-based assessment could reduce reliance on burdensome testing formats for students whose performance is undermined by test anxiety or other barriers tied to traditional assessment.
  • Differentiation: if self-belief and related behaviors can be assessed continuously, support could be offered sooner and with better targeting rather than waiting for failure on a later test.
  • More complete education: the goal is not only that students leave knowing statistical facts, but that they leave believing they can use those ideas and keep learning beyond the classroom.

Closing

Main takeaway

Behavior in a learning game produced a modest but credible signal for statistical self-efficacy, suggesting a viable path beyond self-report alone.

Questions

Thank you.

References

[1]
A. Bandura, Self-efficacy: The exercise of control. W. H. Freeman, 1997.
[2]
D. H. Schunk and F. Pajares, “The development of academic self-efficacy,” in Development of achievement motivation, A. Wigfield and J. S. Eccles, Eds., Academic Press, 2002, pp. 15–31. doi: 10.1016/B978-012750053-9/50003-6.
[3]
L. Stankov, S. Morony, and Y. P. Lee, “Confidence: The best non-cognitive predictor of academic achievement?” Educational Psychology, vol. 34, no. 1, pp. 9–28, Jan. 2014, doi: 10.1080/01443410.2013.814194.
[4]
F. Pajares and D. H. Schunk, “Self-beliefs and school success: Self-efficacy, self-concept, and and school achievement.” in Self perception., in International perspectives on individual differences, vol. 2., Westport, CT, US: Ablex Publishing, 2001, pp. 239–265.
[5]
C. Kormos and R. Gifford, “The validity of self-report measures of proenvironmental behavior: A meta-analytic review,” Journal of Environmental Psychology, vol. 40, pp. 359–371, Dec. 2014, doi: 10.1016/j.jenvp.2014.09.003.
[6]
K. Watson, T. Baranowski, D. Thompson, R. Jago, J. Baranowski, and L. M. Klesges, “Innovative application of a multidimensional item response model in assessing the influence of social desirability on the pseudo-relationship between self-efficacy and behavior,” Health Education Research, vol. 21, pp. i85–i97, Oct. 2006, doi: 10.1093/her/cyl137.
[7]
P. Ben-Nun, “Respondent fatigue,” in Encyclopedia of survey research methods, 2455 Teller Road, Thousand Oaks California 91320 United States of America: Sage Publications, Inc., 2008, p. 743. Accessed: Sep. 17, 2024. [Online]. Available: https://methods.sagepub.com/reference/encyclopedia-of-survey-research-methods/n480.xml
[8]
S. Toulmin, The uses of argument. Cambridge: Cambridge University Press, 1958.
[9]
R. S. Baker and P. S. Inventado, “Educational Data Mining and Learning Analytics,” in Learning Analytics, J. A. Larusson and B. White, Eds., New York, NY: Springer New York, 2014, pp. 61–75. doi: 10.1007/978-1-4614-3305-7_4.
[10]
S. Steegen, F. Tuerlinckx, A. Gelman, and W. Vanpaemel, “Increasing transparency through a multiverse analysis,” Perspectives on Psychological Science, vol. 11, no. 5, pp. 702–712, 2016, doi: 10.1177/1745691616658637.