Multisensory expectations about dynamic visual objects facilitates early sensory processing of congruent sounds
In everyday life, the perception of a moving object can lead to the expectation of an object’s sound, yet little is known about how visual expectations influence early auditory processing. We examined how dynamic visual input – an object moving continuously across the visual field – influences early auditory processing of a sound that is either congruent with the object’s motion, and thus likely perceived as being part of the visual object, or incongruent with the object’s motion. In Experiment 1, EEG activity was recorded from 29 adults who passively viewed a ball that appeared either on the far left or right boundary of a display and continuously traversed along the horizontal midline to make contact and elicit a bounce sound off the opposite boundary. Our main analysis focused on the N1 component of the auditory-evoked event-related potential. For audio-visual (AV) trials, a knocking sound accompanied the visual input the moment the ball made contact with the opposite boundary (AV-synchronous), or the sound occurred shortly before contact (AV-asynchronous). We also included audio-only and visual-only trials. For Experiment 1, AV-synchronous sounds elicited an earlier and attenuated auditory N1 response relative to AV-asynchronous or audio-only events. Experiment 2 was conducted to examine the roles of expectancy and multisensory integration in influencing this early auditory response. In addition to the audio-only, AV-synchronous, and AV-asynchronous conditions, 19 adults were shown a ball that became visually occluded prior to reaching the boundary of the display, but elicited an expected knocking sound at the point of occluded collision. Here, the auditory N1 response during the AV-occluded condition resembled the N1 response of the audio-only condition, indicating that multisensory input is essential for altered processing of an auditory stimulus under conditions of synchronous AV expectation. However, additional exploratory analyses of the later P2 component suggest that expectancies have an influence on later auditory processing, pointing to important temporal differences in how multisensory integration and expectation influence audition. Taken together, dynamic visual stimuli can help generate expectations about the timing of auditory events, which then facilitates the processing of auditory information that matches these expectations. The EEG/ERP data attached (.set/.fdt files) were processed in MATLAB using EEGLAB/ERPLAB software. These files can be found in the "Segmented ERP Data" folder. EEG/ERP data were processed using the scripts contained within the "EEG ERP Scripts (MATLAB)" folder. The EEG data were processed using the specifications outlined below. N1 and P2 peak amplitude and latency stats and other relevant data sets are contained within the "Data Sets (Long Format)" folder. Lastly, all statistical analyses were conducted in R, and the scripts used to conduct such analyses can be found in the "R Scripts" folder.
Steps to reproduce
Raw EEG data sets are available upon request. The raw EEG recordings (~45 minutes) are extremely large. Movie stimuli were borrowed from the labs of Dr. Dima Amso and Dr. David Lewkowicz. The stimuli were made using the software Adobe After Effects. A full description of these stimuli can be found in Werchan et al., 2018 (see link below for reference). Continuous EEG was recorded via a 128-channel HydroCel Geodesic Sensor Net (Electrical Geodesics, Inc.; EGI). Impedances were kept below 50 kOhms in all electrodes and the raw EEG data were referenced online to the vertex (Cz) and digitized at 500 Hz. EEG data were amplified according to the default settings of an EGI internal amplifier (model type: Net Amps 300). All data were processed off-line using MATLAB (Mathworks, Inc.) and EEGLAB/ERPLAB software (Delorme & Makeig, 2004; Lopez-Calderon & Luck, 2014). The raw EEG data were first digitally filtered using a 0.05 to 50 Hz bandpass (Butterworth) and 60 Hz notch filters. Data were then manually inspected for individual bad channels present throughout at least 50% of the recording, as well as electromyographic (EMG) and other movement artifacts. EEG data with evidence of egregious EMG, movement, or muscle artifacts were rejected from the analysis. Data from bad channels were replaced using a spherical spline interpolation algorithm. The cleaned EEG data were then taken through an independent component analysis (ICA), where evidence of eye artifact (eye blinks and saccades) was removed from the data set. The EEG data were then segmented into 1000ms epochs (-200 to 800ms relative to stimulus onset), and baseline corrected using mean voltage during the 200ms pre-stimulus baseline period. ERPs were time-locked to the onset of the sound in all conditions except the visual-only condition in which case the ERPs were time-locked to the exact moment the ball touched the boundary. Each segmented data set was again manually inspected for excessive artifacts. Once artifact rejection was completed, the EEG data were again filtered, this time using a 30 Hz lowpass (Butterworth) filter and then re-referenced to an average reference. Grand-averaged ERPs were then obtained for each participant by averaging all available epochs for each condition. The N1 was operationalized here as the minimum peak amplitude and latency occurring within 100-200ms after sound onset. The P2 was operationalized here as the maximum peak amplitude and latency occurring within 200-300ms after sound onset. Both the time window and the regions of interest were selected based on our hypotheses about the timing of each ERP component (Stekelenburg & Vroomen, 2007; Vroomen & Stekelenburg, 2010) and from visual inspection using the grand averaged ERP across all participants and conditions. A six-channel frontal-central auditory region was constructed to evaluate differences in N1 and P2 activity between each sensory condition.