Contributors: Thomas Hueber, Gérard Bailly
... This article investigates the use of statistical mapping techniques for the conversion of articulatory movements into audible speech with no restriction on the vocabulary, in the context of a silent speech interface driven by ultrasound and video imaging. As a baseline, we first evaluated the GMM-based mapping considering dynamic features, proposed by Toda et al. (2007) for voice conversion. Then, we proposed a ‘phonetically-informed’ version of this technique, based on full-covariance HMM. This approach aims (1) at modeling explicitly the articulatory timing for each phonetic class, and (2) at exploiting linguistic knowledge to regularize the problem of silent speech conversion. Both techniques were compared on continuous speech, for two French speakers (one male, one female). For modal speech, the HMM-based technique showed a lower spectral distortion (objective evaluation). However, perceptual tests (transcription and XAB discrimination tests) showed a better intelligibility of the GMM-based technique, probably related to its less fluctuant quality. For silent speech, a perceptual identification test revealed a better segmental intelligibility for the HMM-based technique on consonants.
Contributors: Justin London, Birgitta Burger, Marc Thompson, Petri Toiviainen
... Musical tempo is most strongly associated with the rate of the beat or “tactus,” which may be defined as the most prominent rhythmic periodicity present in the music, typically in a range of 1.67–2Hz. However, other factors such as rhythmic density, mean rhythmic inter-onset interval, metrical (accentual) structure, and rhythmic complexity can affect perceived tempo (Drake, Gros, & Penel, 1999; London, 2011Drake, Gros, & Penel, 1999; London, 2011). Visual information can also give rise to a perceived beat/tempo (Iversen, et al., 2015), and auditory and visual temporal cues can interact and mutually influence each other (Soto-Faraco & Kingstone, 2004; Spence, 2015). A five-part experiment was performed to assess the integration of auditory and visual information in judgments of musical tempo. Participants rated the speed of six classic R&B songs on a seven point scale while observing an animated figure dancing to them. Participants were presented with original and time-stretched (±5%) versions of each song in audio-only, audio+video (A+V), and video-only conditions. In some videos the animations were of spontaneous movements to the different time-stretched versions of each song, and in other videos the animations were of “vigorous” versus “relaxed” interpretations of the same auditory stimulus. Two main results were observed. First, in all conditions with audio, even though participants were able to correctly rank the original vs. time-stretched versions of each song, a song-specific tempo-anchoring effect was observed, such that sped-up versions of slower songs were judged to be faster than slowed-down versions of faster songs, even when their objective beat rates were the same. Second, when viewing a vigorous dancing figure in the A+V condition, participants gave faster tempo ratings than from the audio alone or when viewing the same audio with a relaxed dancing figure. The implications of this illusory tempo percept for cross-modal sensory integration and working memory are discussed, and an “energistic” account of tempo perception is proposed.
Contributors: Mike Kuznetsov, Andreas Friedrich, Gerold Stern, Natalie Kotchourko, Simon Jallais, Beatrice L'Hostis
... A series of medium-scale experiments on vented hydrogen deflagration was carried out at the KIT test side in a chamber of 1 × 1 × 1 m3 size with different vent areas. The experimental program was divided in three series: (1) uniform hydrogen–air mixtures; (2) stratified hydrogen–air mixtures within the enclosure; (3) a layer deflagration of uniform mixture. Different uniform hydrogen–air mixtures from 7 to 18% hydrogen were tested with variable vent areas 0.01–1.0 m2. One test was done for rich mixture with 50% H2. To vary a gradient of concentration, all the experiments with a stratified hydrogen–air mixtures had about 4%H2 at the bottom and 10 to 25% H2 at the top of the enclosure. Measurement system consisted of a set of pressure sensors and thermocouples inside and outside the enclosure. Four cameras combined with a schlieren system (BOS) for visual observation of combustion process through transparent sidewalls were used. Four experiments were selected as benchmark experiments to compare them with four times larger scale FM Global tests (Bauwens et al., 2011) and to provide experimental data for further CFD modelling. The nature of external explosion leading to the multiple pressure peak structure was investigated in details. Current work addresses knowledge gaps regarding indoor hydrogen accumulations and vented deflagrations. The experiments carried out within this work attend to contribute the data for improved criteria for hydrogen–air mixture and enclosure parameters to avoid unacceptable explosion overpressure. Based on theoretical analysis and current experimental data a further vent sizing technology for hydrogen deflagrations in confined spaces should be developed, taking into account the peculiarities of hydrogen–air mixture deflagrations in presence of obstacles, concentration gradients of hydrogen–air mixtures, dimensions of a layer of flammable cloud, vent inertia, etc.
Contributors: Mark A. Bee
... The perceptual analysis of acoustic scenes involves binding together sounds from the same source and separating them from other sounds in the environment. In large social groups, listeners experience increased difficulty performing these tasks due to high noise levels and interference from the concurrent signals of multiple individuals. While a substantial body of literature on these issues pertains to human hearing and speech communication, few studies have investigated how nonhuman animals may be evolutionarily adapted to solve biologically analogous communication problems. Here, I review recent and ongoing work aimed at testing hypotheses about perceptual mechanisms that enable treefrogs in the genus Hyla to communicate vocally in noisy, multi-source social environments. After briefly introducing the genus and the methods used to study hearing in frogs, I outline several functional constraints on communication posed by the acoustic environment of breeding “choruses”. Then, I review studies of sound source perception aimed at uncovering how treefrog listeners may be adapted to cope with these constraints. Specifically, this review covers research on the acoustic cues used in sequential and simultaneous auditory grouping, spatial release from masking, and dip listening. Throughout the paper, I attempt to illustrate how broad-scale, comparative studies of carefully considered animal models may ultimately reveal an evolutionary diversity of underlying mechanisms for solving cocktail-party-like problems in communication.
The Lancet Respiratory Medicine Commission - Respiratory risks from household air pollution in low and middle income countries
Contributors: Stephen B Gordon, Nigel G Bruce, Jonathan Grigg, Patricia L Hibberd, Om P Kurmi, Kin-bong Hubert Lam, Kevin Mortimer, Kwaku Poku Asante, Kalpana Balakrishnan, John Balmes
... A third of the world's population uses solid fuel derived from plant material (biomass) or coal for cooking, heating, or lighting. These fuels are smoky, often used in an open fire or simple stove with incomplete combustion, and result in a large amount of household air pollution when smoke is poorly vented. Air pollution is the biggest environmental cause of death worldwide, with household air pollution accounting for about 3·5–4 million deaths every year. Women and children living in severe poverty have the greatest exposures to household air pollution. In this Commission, we review evidence for the association between household air pollution and respiratory infections, respiratory tract cancers, and chronic lung diseases. Respiratory infections (comprising both upper and lower respiratory tract infections with viruses, bacteria, and mycobacteria) have all been associated with exposure to household air pollution. Respiratory tract cancers, including both nasopharyngeal cancer and lung cancer, are strongly associated with pollution from coal burning and further data are needed about other solid fuels. Chronic lung diseases, including chronic obstructive pulmonary disease and bronchiectasis in women, are associated with solid fuel use for cooking, and the damaging effects of exposure to household air pollution in early life on lung development are yet to be fully described. We also review appropriate ways to measure exposure to household air pollution, as well as study design issues and potential effective interventions to prevent these disease burdens. Measurement of household air pollution needs individual, rather than fixed in place, monitoring because exposure varies by age, gender, location, and household role. Women and children are particularly susceptible to the toxic effects of pollution and are exposed to the highest concentrations. Interventions should target these high-risk groups and be of sufficient quality to make the air clean. To make clean energy available to all people is the long-term goal, with an intermediate solution being to make available energy that is clean enough to have a health impact.
Contributors: Marcin Kowalski, Kenneth A. Ellenbogen, Jayanthi N. Koneru
Contributors: Christina A.S. Mumm, Maria C. Urrutia, Mirjam Knörnschild
... Social calls conveying identity yield several advantages in managing social group living. Signalling identity to conspecifics and the perception of the calling individual by receivers allow for appropriate behavioural responses based on experience of previous interactions. Contact calls help maintain group cohesion and often provide individual signatures. Giant otters, endemic to Amazonian rainforests and wetlands, are a highly social and vocally active species. Their family groups consist of a monogamous alpha pair with offspring of different ages, and elder siblings assist in rearing the young. During collective fishing bouts, individuals frequently become separated from their group. Giant otters use two types of cohesion calls. The ‘contact call’ is often uttered when the otters are visually separated, and is then followed by the reunion of group members. The ‘hum’ is produced in close proximity to manage group movements. We predicted giant otters would have individually distinct cohesion calls and be able to discriminate between the cohesion calls of different individuals. We recorded and measured calls from wild and captive individuals and conducted habituation–dishabituation playbacks with two captive groups. Our results provided statistical evidence for a strong individual signature in contact calls but not in hums. Nevertheless, the giant otters were able to distinguish individuals in both cohesion calls tested. We conclude that individual signatures seem to be advantageous in terms of managing group movements. Giant otters might additionally benefit from discriminating individuals within their social group, where kin recognition is insufficient to identify equally related individuals that cooperate in hunting and rearing of the young.
Contributors: Simon W. Townsend, Benjamin D. Charlton, Marta B. Manser
... Formants, the resonance frequencies of the vocal tract, are the key acoustic parameters underlying vowel identity in human speech. However, recent work on nonhuman animal communication systems has shown that formant variation provides potentially important information to receivers about static and dynamic attributes of callers. Meerkats, Suricata suricatta, produce broadband noisy bark vocalizations, lacking a clear fundamental frequency and harmonic structure, when they detect aerial or terrestrial predators. Here we investigated whether formants in meerkat barks have the potential to provide reliable information on caller identity and the predator context (aerial versus terrestrial predator) in which they are delivered. Acoustic analyses of naturally occurring barks and measurements of this species' vocal tract length were used to confirm that the six clear frequency bands below 15kHz in meerkat barks represent formants. Discriminant function analyses subsequently demonstrated significant interindividual variation in the formant pattern of meerkat barks, suggesting that formants could be used by meerkats to identify conspecifics. In addition, mixed-effects models indicated that the frequency of the first formant was lower in barks produced in aerial versus terrestrial predation contexts. These results add to a growing body of literature on the potential function of formants in nonhuman animal vocal communication systems, and also imply that signalling external and referential information through such resonance frequencies, as in human language, might be more widespread in animals than previously thought.
Contributors: Gene Fellner
... In Broadening our lenses of perception, I address the need to assess students through multiple lenses rather than through the dominant lens of standardized tests. I propose what I call multilectical lenses to provide multidimensional pictures of poor students of color; these highlight student skills and knowledge that tests disregard. Multilectics employs multi level analysis, sound and images to analyze the gestures and voices of students during classroom activities. In combination with student writing, the data produced by multilectical practice provide a rich foundation for advancing the academic achievement of our most underserved students.
Contributors: Gang Chen, Jody Kreiman, Abeer Alwan
... Laryngeal high-speed videoendoscopy is a state-of-the-art technique to examine physiological vibrational patterns of the vocal folds. With sampling rates of thousands of frames per second, high-speed videoendoscopy produces a large amount of data that is difficult to analyze subjectively. In order to visualize high-speed video in a straightforward and intuitive way, many methods have been proposed to condense the three-dimensional data into a few static images that preserve characteristics of the underlying vocal fold vibratory patterns. In this paper, we propose the “glottaltopogram,” which is based on principal component analysis of changes over time in the brightness of each pixel in consecutive video images. This method reveals the overall synchronization of the vibrational patterns of the vocal folds over the entire laryngeal area. Experimental results showed that this method is effective in visualizing pathological and normal vocal fold vibratory patterns.