MSc Dissertation

Global trends and cognitive correlates of adolescent reading habits using PISA data (2000–2022).

20254 monthsView dissertation

Overview

This dissertation grew out of a long-standing interest in reading, both as a personal habit and as a cognitive activity. Using large-scale PISA datasets spanning more than two decades, the project examines global trends in adolescent reading behaviour, how those trends differ across socio-economic contexts, and what reading engagement might be associated with in terms of cognitive and metacognitive outcomes.

Working with data at this scale meant constantly balancing ambition with restraint: asking meaningful questions while remaining honest about what can and cannot be inferred from observational data.

Dissertation cover

Why It Matters

I enjoy reading, yet persistent claims suggest that reading is in decline, particularly among younger cohorts. At the same time, the benefits of reading are often asserted loosely, without much clarity about what those benefits actually are or how far they extend.

This project emerged from a desire to examine those claims more carefully. Rather than treating reading as inherently virtuous, I was interested in whether it is meaningfully associated with how people understand, monitor, and regulate their thinking. That interest led naturally into the neuroscience and psychology of reading, especially work on metacognition, comprehension, and cognitive engagement.

Large-scale datasets like PISA force difficult methodological questions: what counts as reading, which cognitive outcomes are measurable at scale, and how much interpretation is justified when working with proxies rather than direct measures. Those tensions became central to the project, particularly in a cultural context increasingly shaped by fragmented attention and shallow media consumption.

Statistical models and methodology

Data & Methods

The analysis draws on repeated cross-sectional PISA data from over 80 countries, combining behavioural indicators, socio-economic measures, and cognitive and metacognitive variables. Working with such a broad dataset required careful preprocessing, extensive validation, and constant cross-checking against the underlying survey design.

Methodologically, this was as much a learning exercise as an analytical one. Translating theoretical concepts from psychology and neuroscience into operationalisable variables involved repeated iteration and close engagement with the literature.

Graphs

Key Findings

The results point to a sustained global decline in voluntary reading, with increasingly pronounced differences along socio-economic lines. Adolescents from more advantaged backgrounds tend to maintain higher levels of reading engagement, while participation declines more sharply elsewhere.

Reading engagement is positively associated with metacognitive outcomes, such as self-monitoring, empathy and reflective strategies. These effects are small in magnitude but statistically reliable across cohorts and regions, suggesting that reading may play a meaningful role in cognitive development.

Regression results

Challenges

This was a demanding project from the outset. Identifying the right data was only the first step; cleaning, restructuring, and validating it proved far more time-consuming than anticipated. Building custom indices for cognitive and metacognitive outcomes was the most difficult component, requiring sustained engagement with the neuroscience and psychology literature to ensure conceptual coherence.

In total, the analysis relied on roughly twenty modular Python scripts, each designed for specific stages of loading, cleaning, transforming, and analysing the data. Much of the work involved debugging edge cases, revisiting assumptions, and learning when a model was answering the wrong question, even if it appeared statistically sound.

Summary statistics table

Learning

The project clarified how much care is required when working with large observational datasets. Small modelling choices, variable definitions, and assumptions about what survey items represent can materially change the story a model appears to tell, even when results look statistically robust.

The main value of the work was therefore methodological. It involved learning how to test ideas cautiously, document decisions clearly, and recognise when a model answers a different question than the one originally posed.

Built with

  • Excel
  • Python
  • Overleaf