A synthetic data set with new product demand and characteristics

Published: 21-07-2020| Version 1 | DOI: 10.17632/g3v9xcxjgc.1
Robert van Steenbergen,
Martijn Mes


This is a synthetic data set with 2000 new products. The data set consists of product characteristics, 18 demand points, and the total demand. The data set has been created in such a way that there are links (with a certain level of noise) between product characteristics and demand. This data set can serve as a benchmark for new product forecasting and uncertainty estimation of new product demand.


Steps to reproduce

The demand per product throughout 18 periods is defined by a profile and the total demand. The profiles are, arbitrarily, one with a 10% exponential increase per time period, one with a 10% exponential decrease per time period, and one stable profile. The total demand is generated randomly with the Gamma distribution using alpha = 2 (shape) and beta = 150 (rate). The demand for each demand point is the profile multiplied by the total demand. Normal distributed noise is added to each demand point with a coefficient of variation of 0.25. The products from the synthetic data set are characterized by a category, brand, color, and price. The names of the ten categories, ten brands, and ten colors are chosen arbitrarily. The category and brand relate to a profile (increase, decrease, or stable) based on categorical distributions, where 80% of the product characteristics relate to a specific profile and otherwise randomly to one of the other profiles. The color and price relate to the demand. We divided the demand into five equal segments (0-20th, 20-40th, 40-60th, 60-80th, 80-100th percentile), to relate the color to the demand. Colors relate in 80% of the cases to a specific demand segment and otherwise randomly to one of the other segments. The price is a numeric characteristic that is inversely proportional to the demand: price = 2000/demand. We applied noise to this inverse proportional relationship by adding a coefficient of variation of 0.5 to the price. The true demand segment and true profile are also included in the data set.