Code and Data for Predicting Large-Scale Systematic Missing Pipe Attributes in Water Distribution Networks
Description
This collection of geospatial data, water network models, and Python code were developed for a pending journal paper with the following abstract: "Water distribution network (WDN) models are an essential tool used by water utilities for hydraulic analysis. Unfortunately, missing data and insufficient resources often make creating and maintaining these models unfeasible. Existing methods for missing pipe properties like sequential imputation for missing values and reconstruction using graph metrics are designed to accommodate random patterns of missing information and require at least 50% of the values to be known. However, these assumptions about data completeness do not always align with real-world scenarios where large sections of the WDN model have missing data. To address this challenge, this study proposes a data-driven approach for estimating the properties (e.g., diameter) of pipes in a WDN when considering different spatial patterns and degrees of data completeness (i.e., 0% - 90%). Using data from 16 WDNs in Kentucky, this study compares the use of machine learning (ML) to predict pipe diameter using topological, geospatial, and hydraulic features against an existing deterministic approach. Results demonstrate that WDN models completed by the proposed ML approach had comparable hydraulic performance to the true models. Moreover, results suggest that WDN system classification and percentage of missing data significantly influence the ML model and its parametrization. Insights from this study help advance the ability to leverage partial data to create WDN models that can be used for resilience analysis, a capability particularly important for communities without adequate resources."
Files
Institutions
Categories
Funding
Laboratory Directed Research and Development