Multimodal Dataset for Detecting AI-Generated Deceptive Fashion Product Representations on Pinterest

Published: 26 January 2026| Version 1 | DOI: 10.17632/ydmpkd52kr.1
Contributors:
Tereza Semerádová,

Description

This dataset contains a curated collection of fashion gown product listings collected from Pinterest, designed to support research on AI-generated content, deceptive product representations, and multimodal signaling in visually mediated electronic markets. The dataset integrates visual, textual, and contextual features for each listing, enabling joint analysis of image aesthetics, linguistic framing, and engagement-based platform dynamics. The data were collected from publicly accessible Pinterest pins linking to gown product listings during a defined sampling period. Listings were identified using keyword-based and category-based queries related to formal gowns and fashion apparel, and were restricted to content intended for commercial or promotional purposes. Only publicly visible pins and associated metadata were collected; no private user data were accessed. Each observation corresponds to a unique product listing and includes one or more associated images, descriptive text, and engagement metrics (e.g., saves, likes). Listings were subsequently labeled as AI-generated deceptive or non-deceptive (human-created or verifiable) based on a multi-stage verification procedure combining visual plausibility assessment, textual grounding analysis, and external validation of product existence or seller credibility. Ambiguous cases were excluded to preserve label reliability. Data structure and variables For each listing, the dataset includes: Visual features extracted from product images, capturing aesthetic complexity, texture regularity, and structural consistency relevant to fashion imagery Textual features derived from listing descriptions, including measures of abstraction, affective language, and informational specificity Contextual and control variables, such as price, seller tenure (where available), and basic listing metadata Engagement indicators, reflecting platform-level user interaction with the listing Binary labels indicating whether the listing was classified as AI-generated deceptive or non-deceptive

Files

Institutions

  • Technicka Univerzita v Liberci Ekonomicka Fakulta
    Liberec, Liberec

Categories

Artificial Intelligence, Advertising, Computer in Marketing, Fashion Industry, Economics of Electronic Commerce

Licence