How (and How Not) Do Code Complexity Measures Predict Cognitive Load? (ICER 2026 - Research Papers)

Who

Sverrir Thorgeirsson, Jan Vahrenhold

Track

ICER 2026 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 12 Aug 2026 11:20 - 11:45 at Main conference room - Measures

Abstract

Background and Context: Code complexity measures have been used to guide the design of various activities within computing education, such as instructional sequencing and assessment. However, empirical evidence for the link of these measures to actual cognitive difficulties remains mixed, with studies suffering from small sample sizes and non-controlled experimental design.

Objectives: We sought to investigate how code complexity measures predict the cognitive load of university students when tracing code and whether their predictive power is moderated by computer science achievement. We also compared how these measures stacked up against the comparative judgment from an 18-member expert panel.

Methods: We conducted a preregistered laboratory study to investigate the strength of code complexity measures identified in a recent neuroimaging study as predictors of cognitive load. In this controlled study, N=551 university students traced a random selection of 24 expert-curated code snippets in Java, Python and C++, and then reported their cognitive load using two validated measures of cognitive load. We assessed preregistered hierarchical regression models with respect to the predictive strength of the code complexity measures and possible moderation.

Findings: Contrary to the findings from the previous neuroimaging study, we could not confirm data-flow complexity to be the strongest predictor of measured cognitive load; instead, the simple source lines of code measure dominated all other static measure. A recent, more sophisticated measure also fared poorly, while experts ratings were strongly predictive of cognitive load.

Implications: In the educational context studied, measuring source lines of code is a simple and effective heuristic for ordering tracing tasks by difficulty and outperforms more sophisticated efforts involving data and control flow. The unexpected finding that easy-to-obtain rankings based on pairwise-comparison sessions involving experts have a much stronger predictive power than static metrics opens up avenues for follow-up research.

Sverrir Thorgeirsson

ETH Zurich

Switzerland

Jan Vahrenhold

University of Münster

Germany