TomatoPGT: A 3D point cloud dataset of tomato plants for segmentation and plant-trait extraction
Description
TomatoPGT is a three-dimensional (3D) point cloud dataset of greenhouse-grown tomato plants designed to support research in semantic segmentation, graph-based plant representation, and plant trait extraction. The dataset comprises three tomato cultivars: Celebrity Hybrid, Beefsteak, and Big Boy Hybrid, each captured at multiple developmental stages using a rotational multi-view imaging system with controlled illumination. For each plant and stage, 60–70 overlapping RGB images were acquired and reconstructed into metrically scaled 3D point clouds using structure-from-motion and multi-view stereo techniques. The dataset provides RGB images, dense colored point clouds, semantic and instance annotations, semantic graph representations encoding plant topology, and tabular phenotypic traits derived from the graph structures. Graph representations describe the structural relationships between plant organs, including stem segments, junctions, stalks, and compound leaves, enabling organ-resolved geometric and architectural analysis.
Files
Steps to reproduce
Steps to Reproduce the Dataset Processing This section describes how the data products provided in TomatoPGT were generated from raw images to graph representations and phenotypic traits. The workflow is fully reproducible using the files included in the repository together with standard 3D processing software. Custom tools used during annotation and graph extraction will be released separately. Step 1: Image Acquisition Each tomato plant was placed on a motorized 360° turntable inside a controlled photo booth. A consumer RGB camera was mounted on a rotating arm and synchronized with the turntable motion. For each plant and developmental stage, 60–70 overlapping images were captured at fixed angular increments covering a full 360° rotation. Step 2: 3D Reconstruction Multi-view RGB images were reconstructed into dense 3D point clouds using Structure-from-Motion and multi-view stereo in Agisoft Metashape. A printed marker sheet with known physical spacing (0.03 m) was included in each capture to enable metric scaling of the reconstructed point clouds. The resulting point clouds were exported with XYZ coordinates and RGB color information. Step 3: Point-Cloud Preprocessing Reconstructed point clouds were filtered to remove background points and noise using standard masking and cleaning procedures within the reconstruction software. The cleaned point clouds form the basis for all subsequent annotation and analysis steps. Step 4: Semantic and Instance Annotation Each point cloud was manually annotated at the organ level using a custom interactive point-cloud annotation tool (Cloud-Seg). Points were assigned semantic class identifiers (e.g., stem, junction, stalk, compound leaf) and instance identifiers to distinguish individual organs. Annotation results were saved in plain-text tabular format aligned with the point-cloud coordinates. Note: The annotation tool is not required to reproduce the dataset, as all annotated files are included. The tool will be released separately to support future annotation and extension of the dataset. Step 5: Semantic Graph Construction Annotated point clouds were converted into semantic plant graph representations using a custom graph-extraction pipeline (Cloud-Graph). The resulting graphs encode plant structure as nodes (e.g., root, junctions, organ attachment points) and edges (e.g., stem segments, stalks, leaves), with associated geometric centerline information. Graph data are stored in JSON format and provided directly in the dataset. Step 6: Phenotypic Trait Extraction Traits include whole-plant measures (e.g., plant height and canopy extent) and organ-level measures such as internode length, lateral insertion angles, phyllotactic angles, and curvature-related descriptors. Extracted traits are provided as comma-separated value (CSV) files for downstream analysis.
Categories
Funders
- United States Department of AgricultureGovernment of the United States of AmericaUnited StatesGrant ID: USDA-NRCS-UAIP-22-NOFO0001178