Seoul Bike Trip duration prediction
Trip duration is the most fundamental measure in all modes of transportation. Hence, it is crucial to predict the trip-time precisely for the advancement of Intelligent Transport Systems (ITS) and traveller information systems. In order to predict the trip duration, data mining techniques are employed in this paper to predict the trip duration of rental bikes in Seoul Bike sharing system. The prediction is carried out with the combination of Seoul Bike data and weather data. The Data used include trip duration, trip distance, pickup-dropoff latitude and longitude, temperature, precipitation, wind speed, humidity, solar radiation, snowfall, ground temperature and 1-hour average dust concentration. Feature engineering is done to extract additional features from the data. Four statistical models are used to predict the trip duration. (a) Linear regression, (b) Gradient boosting machines, (c) k nearest neighbor and (d) Random Forest(RF). Four performance metrics Root mean squared error, Coefficient of Variance, Mean Absolute Error and Median Absolute Error is used to determine the efficiency of the models. In comparison with the other models, the optimum model RF can explain the variance of 93% in the testing set and 98% (R2) in the training set. The outcome proves that RF is effective to be employed for the prediction of trip duration.