doi.org
August 2022 • Adriane Chapman, Luca Lauro, Paolo Missier, Riccardo Torlone
Successful data-driven science requires a complex combination of data engineering pipelines and data modelling techniques. Robust and defensible results can only be achieved when each step in the pipeline that is designed to clean, transform and alter data in preparation for data modelling can be justified, and its effect on the data explained. The DPDS toolkit presented in this paper is designed to make such justification and explanation process an integral part of data science practice, adding value while remain…