A field boundary dataset for the Canadian Prairies derived from Sentinel-2 imagery using the Segment Anything Model
Description
Content: This dataset contains delineated agricultural field boundary polygons for the Canadian Prairies covering Alberta, Saskatchewan, and Manitoba. The field boundaries were derived from automated image segmentation applied to satellite imagery and represent individual crop field extents across agricultural regions. Each polygon includes geometric properties and identifiers that link to spatial regions (RMs). The dataset comprises 457 shapefiles encompassing 656,082 field polygons across the three provinces, providing a comprehensive geospatial representation of agricultural land parcels suitable for agricultural monitoring, land management, and regional analysis applications. Location: The data were collected across the Canadian Prairies, including Alberta, Saskatchewan, and Manitoba, covering major agricultural regions between approximately 49°–55° N latitude and 96°–114° W longitude. The processed dataset is stored and maintained at the authors’ institutional affiliation. Structure : The dataset is organized as a structured repository containing vector shapefiles representing agricultural field boundaries across the Canadian Prairies. The repository consists of the following hierarchy: /Field Boundaries/ ├── /Alberta/ (field boundary shapefiles for Alberta) ├── /Saskatchewan/ (field boundary shapefiles for Saskatchewan) └── /Manitoba/ (field boundary shapefiles for Manitoba) /Metadata/ (metadata files describing the dataset) Each field boundary shapefile contains polygon geometries in ESRI Shapefile format with standard geometry files (.shp, .shx, .dbf, .prj) and is projected in a consistent coordinate reference system suitable for regional-scale analysis. The shapefiles represent delineated field boundaries derived from automated image segmentation. Format: Field boundary polygons are provided in ESRI Shapefile format with associated geometry files and projection information. Attribute tables are standardized across all shapefiles to include unique field identifiers and basic geometric properties (e.g., area and perimeter). Attribute values are numeric or categorical and are consistent across all files to facilitate integration with other geospatial datasets. How To Access: All scripts used for image preprocessing, segmentation, post-processing, dataset assembly, and GEE App development are publicly available in the associated GitHub repository: https://github.com/thuanhavan/CSA_Field_Boundary_Segmentation
Files
Steps to reproduce
Methodology Image preprocessing and composite generation: Multispectral Sentinel-2 surface reflectance imagery (2021-2024) was accessed through Google Earth Engine (GEE) and filtered for the Canadian Prairies during the main agricultural growing season (May to September). Images were screened for cloud contamination using standard quality assurance bands and cloud masking procedures. An Agriculture and Agri-Food Canada (AAFC) annual crop mask was applied to restrict analysis to cropland areas and exclude non-agricultural land cover. Sentinel-2 images were grouped into seasonal periods corresponding to key crop phenological stages. For each period, the Red, Green, and Blue bands were composited using median pixel values to reduce residual cloud effects and temporal noise. The AAFC crop mask was applied to the seasonal RGB composites to retain cropland pixels, resulting in cropland-only RGB images for segmentation processing. Field boundary segmentation and post-processing: Seasonal RGB composites were input to the Segment Anything Model (SAM), a foundation vision model for general-purpose image segmentation. SAM was applied in an automated segmentation workflow without manual training data or user-defined prompts, performed at full Sentinel-2 spatial resolution to generate raster masks representing field boundaries. Segmentation masks were filtered to remove non-cropland artifacts and small isolated regions based on area thresholds. Raster masks were converted to vector polygon features using geospatial raster-to-vector conversion tools. Polygon geometries were further processed using ArcPy (ArcGIS Pro 3.5) to enforce topological consistency, including removal of sliver polygons, geometry simplification, and boundary smoothing. Dataset assembly: Final field boundary polygons were saved in ESRI Shapefile format with associated geometry files and projection information. Attribute tables were standardized across all shapefiles to include unique field identifiers and geometric properties (area and perimeter). Files were organized into a structured repository by province and Rural Municipality division. Software and code availability: Data generation used Google Earth Engine for cloud-based satellite data access, preprocessing, and compositing; Python-based geospatial tools for segmentation, vectorization, and post-processing; the Segment Anything Model (SAM) for automated segmentation; and GDAL and GeoPandas libraries for raster and vector processing. All scripts are publicly available at: https://github.com/thuanhavan/CSA_Field_Boundary_Segmentation