PubMed • Vol 29
Data Programming: Creating Large Training Sets, Quickly.
December 2016 • Alexander Ratner, Christopher De, Sen Wu, Daniel Selsam, Cristina Re
Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called <i>data programming</i> in which users express weak supervision strategies or domain heuristics as <i>labeling functions</i>, which are programs that label subse…