Sudden Queen Loss Event in an Africanized Honeybee Colony

Published: 21 May 2024| Version 1 | DOI: 10.17632/j97khfj656.1


The dataset consists of 23 features extracted from audio recordings of an Africanized honeybee hive in Fortaleza-CE, Brazil. The first feature is the recording date, and the last is the label indicating the queen's presence status. The label can take two values: "QR" for queenright (presence of queen) or "QL" for queenless (absence of queen). The remaining features are directly extracted from the audio signal, divided into three groups: time-domain features (zcr, energy, and energy entropy), spectral features (centroid, spread, entropy, flux, and rolloff), and 13 MFCC coefficients. For further details on the meaning of each feature, please refer to The data were collected from daily recordings over a 6-day period, with the queen bee removed from the dataset on the last day. Consequently, the QR and QL classes are unbalanced, with QL representing only 1/6 of the data. This situation is common in this type of monitoring, where the hive's functioning is expected to remain within normal well-being parameters most of the time. Naturally, anomalies such as the sudden queen loss are uncommon and therefore represent a smaller portion of the data. The experiment and the data aim to replicate and incorporate these conditions for greater fidelity to the addressed problem. Such issues can be addressed using techniques such as anomaly detection, one-class classification, or incremental learning. Additionally, techniques for handling unbalanced data in classification problems, such as data augmentation and resampling, can be employed. Using OC-SVM, we achieved results with 96% accuracy and 99% precision.


Steps to reproduce

1. With a beekeeper technical assistance, a microphone was positioned inside the observation hive, closer to the hive center. The microphone used was a standard smartphone earphone with a P3 plug output. A basic smartphone was used as the recording device, utilizing its native recording app. 2. Approximately four (4) hours of audio were collected per day, always in the afternoon (1 pm to 5 pm), for six (6) days. 3. During the first five days, the colony had overall well-being and the presence of the queen was confirmed by the beekeeper. On the sixth day, the beekeeper removed the queen early in the morning (to avoid manipulation influence on the audio), and the recording was conducted in the afternoon. 4. After collecting the raw audio, 300 consecutive 3-second windows were selected from each original audio file, resulting in a reduction from four hours to 900 seconds for each day. 5. Using the pyAudioAnalysis library (, we extracted mid-term audio features for each audio segment. The parameters were set as follows: mid-term window = 1.0s, mid-term step = 1.0s, short-term window = 50ms, and short-term step = 25ms. With this configuration, a one-to-one correspondence was established between one second of recording and one sample in the dataset. 6. Only the first 21 features out of all the features available in the library were used, including 3 time-domain features, 5 spectral features, and 13 MFCCs. 7. Additionally, the date of each audio segment and the label of the queen's status (present or absent) were added. In total, the dataset contains 5,400 samples.


Universidade Federal do Ceara


Audio Recording, Feature Extraction, Brazil, Audio Signal Analysis, Honey Bee


Coordenação de Aperfeiçoamento de Pessoal de Nível Superior


Conselho Nacional de Desenvolvimento Científico e Tecnológico

311845/2022-3 and 140696/2023-7