Multimodal Computer Vision and Acoustic Bird Detection Dataset for Smart Rice Farming Environments

Name: Multimodal Computer Vision and Acoustic Bird Detection Dataset for Smart Rice Farming Environments
Creator: Samson Otieno Ooko
Published: 2026-06-01T08:00:16.551Z
Keywords: Acoustics, Computer Vision

Ooko, Samson Otieno; Mbonimpa, Pacome Simon; Ndashimye, Emmanuel; Twahirwa, Evariste; Busogi, Moise

doi:10.17632/p6m5tmr477.1

Multimodal Computer Vision and Acoustic Bird Detection Dataset for Smart Rice Farming Environments

Published: 1 June 2026| Version 1 | DOI: 10.17632/p6m5tmr477.1

Contributors:

,

Description

This dataset was developed as part of the Birds’ Detector and Repellent System for Large-Scale Smart Farming project to support research in smart agriculture, computer vision, acoustic sensing, TinyML, embedded artificial intelligence (AI), and Internet of Things (IoT)-based crop protection systems. The dataset was collected from rice farming environments and nearby park ecosystems in Kenya and Rwanda using distributed sensing devices including 64MP Raspberry Pi Arducam OwlSight cameras, AudioMoth recorders, MEMS microphones, and embedded edge computing systems. The dataset consists of multimodal environmental data containing image and acoustic recordings collected under real agricultural field conditions. A total of over 700 environmental images and motion-based image sequences were captured using deployed cameras positioned across rice farms and surrounding environments. The image dataset was designed primarily for motion-aware bird detection, environmental monitoring, and bird-versus-background classification rather than species-level image annotation. The images include environmental motion sequences, vegetation-rich scenes, sky-background imagery, long-range agricultural views, and dynamic field conditions captured under varying illumination and weather conditions. The acoustic dataset contains over 20,000 audio clips of approximately 2–3 seconds duration collected over six months using AudioMoth devices and MEMS-based acoustic sensors deployed within rice farming areas. The recordings were processed using Audacity software and annotated using the BirdNET platform into six classes: common waxbill, red-billed quelea, village weaver, yellow-fronted canary, other birds, and environmental noise. The environmental noise category includes wind interference, rainfall, insect sounds, human activities, and farm equipment noise collected under natural field conditions. Image preprocessing involved resizing, normalization, frame differencing, adaptive thresholding, contour extraction, and bounding box generation. Acoustic preprocessing included segmentation, noise filtering, spectrogram generation, and Mel-Frequency Cepstral Coefficients (MFCCs) extraction for acoustic feature representation. The dataset supports research in precision agriculture, environmental monitoring, edge AI, multimodal machine learning, acoustic classification, embedded AI systems, and intelligent crop protection technologies operating under resource-constrained deployment environments.

Files

Steps to reproduce

Deploy the data acquisition system within rice farming environments and nearby ecological park areas using Raspberry Pi-based embedded systems equipped with 64MP Arducam OwlSight cameras, AudioMoth devices, and MEMS microphones. Position the camera systems strategically to capture environmental motion activity, long-range field views, sky-background scenes, and vegetation-rich agricultural environments. Configure the cameras for continuous image acquisition under varying environmental conditions including bright sunlight, cloudy weather, wind disturbance, and motion-intensive scenes. Collect environmental image data and motion-based image sequences from deployed field systems. Store captured image frames in structured directories for preprocessing and annotation. Deploy AudioMoth and MEMS microphone systems across rice fields and surrounding areas to continuously record environmental audio and bird vocalizations. Collect audio clips of approximately 2–3 seconds duration under real agricultural field conditions. Preprocess the image dataset using resizing, normalization, frame differencing, adaptive thresholding, contour extraction, and bounding box generation to support motion-aware environmental monitoring and bird-versus-background detection. Preprocess the acoustic dataset using Audacity for noise filtering, segmentation, and audio cleaning. Extract acoustic features including spectrograms and Mel-Frequency Cepstral Coefficients (MFCCs) for machine learning experiments. Annotate the acoustic recordings using the BirdNET platform into six classes: common waxbill, red-billed quelea, village weaver, yellow-fronted canary, other birds, and environmental noise. Train lightweight convolutional neural network (CNN) models and motion-aware computer vision frameworks using the processed image and acoustic datasets.

Institutions

College of Science and Technology - University of Rwanda
Kigali, Kigali
Carnegie Mellon University Africa
Kigali, Kigali

Multimodal Computer Vision and Acoustic Bird Detection Dataset for Smart Rice Farming Environments

Description

Files

Steps to reproduce

Institutions

Categories

Licence