Agarwood Leaf Image Dataset for Pest and Disease Analysis in Real-World Environment

Published: 24 June 2025| Version 2 | DOI: 10.17632/8f8wtr9zwn.2
Contributors:
,
,
, Nurul Hazlina Zaini, Soon Boon Yu, Mohammad Amiruddin Ruslan, Rosyzie Anna Awg Haji Mohd Apong,

Description

This is part of the entire dataset comprises a total of 5,472 images of agarwood leaves curated and collected 14 classes from Benutan, Bukit Silat, Batong, Brunei Darussalam. Among which, this repo contains 1,071 images of Spider and Webs, Scale Insects, Mealybugs, Flea Beetles, and Brown Clumps in 5 pest classes. Each image has been carefully captured indicate specific regions as either healthy or diseased, given that each image includes a complex natural background. For structured model development, the dataset is divided into five folders each containing distinctive agarwood pests that can be categorized into training, validation and training classes with a 70:15:15 ratio of training, validation and testing, respectively for performance or 80:20 ratio for training and testing, correspondingly. This dataset is particularly valuable for training and validating deep learning and machine learning algorithms aimed at identifying and detecting diseases and pests in agarwood leaves. It offers researchers and learners a robust resource for analysing and improving the health management of agarwood plants through the development of advanced computational models. The rest of the 4,401 images were collected on agarwood diseases in 9 classes, including Downy mildew, Anthracnose, Black spots, Powdery mildew, Translucent lesions, Brown spots, Mosaic Viruses, Sooty mold, and one of which is healthy class can be accessed at: 10.5281/zenodo.14842100.

Files

Steps to reproduce

The following steps outline the procedures used to collect, curate, preprocess, and organize the agarwood tree leaf image dataset for disease and pest classification. Researchers aiming to replicate or extend this work may follow the outlined methodology: 1. Image Collection This entails four steps: (a) Tree Selection: Identify and select Aquilaria malaccensis (agarwood) trees from three plantation sites in Brunei Darussalam—Batong, Benutan, and Bukit Silat. (b) Capture Device: Use a Canon PowerShot G7X Mark III camera to capture high-resolution images in natural lighting conditions. (c) Leaf Sampling: Capture images of both sides of the leaf, ensuring varied angles and lighting to simulate real-world conditions. (d) Metadata Logging: Record image context such as location, date, and symptoms observed to support labeling and dataset traceability. 2. Image Labeling (a) Expert Annotation and labeling: You may want to consult plant pathologists, botanists, agronomy experts, or related disciplines who can visually inspect or use scientific validation (for instance, microscopic analysis, lab-confirmed diagnoses) to ensure that disease pests and diseases are identified appropraitely, providing these details would significantly enhance the dataset's credibility and reliability and categorize the leaves into 14 classes: 8 Disease Types, 5 Pest Damage Types and 1 Healthy Leaf Class (b) Manual Verification: Conduct iterative quality checks and resolve any ambiguous or mislabeled images based on expert feedback. 3. Image Selection (Quality Control) (a) Manual Review: Examine each image and exclude those with blurring or camera motion, underexposure or overexposure, occluded, cut, or incomplete leaves, poor texture, or indistinguishable leaf surfaces. (b) Retain only high-quality images with clearly visible features. 4. Image Formatting Resizing: Standardize all images to a single pixel to ensure compatibility with deep learning models. Color Space Conversion: Convert all images to the RGB format. Normalization: Scale pixel values to a [0, 1] range for consistency across the dataset. File Format: Convert all files to .jpg format for ease of integration into machine learning pipelines. 5. Image Splitting (a) Stratified Sampling: Organize the dataset into three parts while preserving class distribution, 70% for training, 15% for validation, 15% for testing. (b) Store all subsets in clearly named directories: /train, /val, and /test. 6. Directory Structure (Example) dataset/ ├── train/ │ ├── disease1/ │ ├── disease2/ │ └── healthy/ ├── val/ │ ├── disease1/ │ └── healthy/ └── test/ ├── pest1/ └── healthy/ 7. Tools and Environment (a) Perform image processing, as we detailed in data description paper (you may want to check it out for details). (b) Congratulations, the dataset is now reproduced and ready for application.

Institutions

  • Universiti Brunei Darussalam

Categories

Artificial Intelligence, Image Processing, Environmental Analysis, Pesticide Toxicology, Disease, Image Acquisition, Data Acquisition, Health Promotion in Environmental Health, Image Classification, Forest Pest, Sustainable Agriculture, Biotic Transformation, Environmental Assessment, Agricultural Plant, Native Plant, Environmental Impact of Agriculture, Environmental Auditing, Organophosphorus Pesticide, Technology Acquisition, Acquisition Research, Agriculture Industry, Precision Agriculture, Plant Health, Data Collection in Agriculture, Community Supported Agriculture, Deep Learning, Computer Vision Algorithms, Agriculture Applications of Pyrolysis, Inorganic Pesticide, Hardwood Plantation, Agriculture, Botanical Pesticide, Microbial Pesticide

Licence