E-Commerce Dataset

Published: 23-11-2020| Version 1 | DOI: 10.17632/ggbkd8ck3x.1
Debjit Roy,
Vishal Bansal


This dataset contains 100,000 records of customer orders from a large e-commerce retailer in South Asia. There are six columns - rec_id, order_id, order_date, shipped_at, prod_sku, and prod_qty in the dataset. The first two columns, "rec_id" and "order_id" provide unique identifiers of the record and the customer order, respectively. Columns "order_date" and "shipped_at" give the time and date of the order generation (by the customer) and order shipping from the fulfillment center, respectively. Columns "prod_sku" and "prod_qty" capture the requested SKU (product type or item class) and the corresponding quantity required to fulfill the customer order. We use this dataset to gain insights into the distribution of customer orders in a typical e-commerce fulfillment center. We classify orders into two categories: single-line and multi-line orders. A single-line order is defined as a customer order which requires exactly one SKU, while a multi-line order requires more than one SKU. We conclude from this dataset that single-line orders often dominate e-commerce fulfillment requests.