Dataset of an Inferred Bayesian Model of Word Learning

Published: 10-07-2020| Version 2 | DOI: 10.17632/3jkv7hggbb.2
Hannah Marlatte


Theories of word learning differentially weigh the role of repeated experience with a novel item, leading to internalization of statistical regularities over time, and the learners use of prior knowledge to infer in-the-moment. Bayesian theories suggest both are critical, but which is weighed more heavily depends on how ambiguous the situation is. To examine this interplay and how it relates to memory, we adapted a Bayesian model of learning (Tenanbaum, Kemp, Griffiths, & Goodman, 2011; Xu & Tenanbaum, 2007) to an inferential word learning task of novel animals, as outline in the following article: “Bayesians learn best: an inferred Bayesian model accounts for individual differences in prior knowledge use during word learning.” Briefly, the model used (i) contextual information provided in the task, quantified by collecting norms for how informative each trial was (likelihood) and (ii) participant’s trial selection accuracy (posterior distribution) to (iii) infer their prior distribution, a proxy for their belief before exposure to the contextual information. Trial accuracy data for the word learning task was collected on one day, and free recall and recognition memory of learned animal names was completed the next day. Norms for how informative each trial was to guide correct selection were collected in a single session with a separate group of participants. Primary data include trial informativeness norms and trial accuracy in the task, both of which were used as input for the Bayesian model. The model infers prior distribution shape parameters from task accuracy and trial norms, completed using the Excel add-in Solver. This is also included in the primary dataset. Output of the model were used to mathematically derive measures of central tendency and spread for participants’ inferred prior distributions, included in the Secondary dataset. These values, along with average block accuracy, were regressed for each participant to examine change across the task. Output from these regressions (slope, intercept and error terms) were used in the statistical analyses with memory measures, which can be found in the Secondary data.