A synthetic data set with new product demand and characteristics

Published: 21 July 2020| Version 1 | DOI: 10.17632/g3v9xcxjgc.1


This is a synthetic data set with 2000 new products. The data set consists of product characteristics, 18 demand points, and the total demand. The data set has been created in such a way that there are links (with a certain level of noise) between product characteristics and demand. This data set can serve as a benchmark for new product forecasting and uncertainty estimation of new product demand.


Steps to reproduce

The demand per product throughout 18 periods is defined by a profile and the total demand. The profiles are, arbitrarily, one with a 10% exponential increase per time period, one with a 10% exponential decrease per time period, and one stable profile. The total demand is generated randomly with the Gamma distribution using alpha = 2 (shape) and beta = 150 (rate). The demand for each demand point is the profile multiplied by the total demand. Normal distributed noise is added to each demand point with a coefficient of variation of 0.25. The products from the synthetic data set are characterized by a category, brand, color, and price. The names of the ten categories, ten brands, and ten colors are chosen arbitrarily. The category and brand relate to a profile (increase, decrease, or stable) based on categorical distributions, where 80% of the product characteristics relate to a specific profile and otherwise randomly to one of the other profiles. The color and price relate to the demand. We divided the demand into five equal segments (0-20th, 20-40th, 40-60th, 60-80th, 80-100th percentile), to relate the color to the demand. Colors relate in 80% of the cases to a specific demand segment and otherwise randomly to one of the other segments. The price is a numeric characteristic that is inversely proportional to the demand: price = 2000/demand. We applied noise to this inverse proportional relationship by adding a coefficient of variation of 0.5 to the price. The true demand segment and true profile are also included in the data set.


Uncertainty Analysis, Demand Estimation, New Product Launch, Demand Forecasting