Skip to main content

A Practical Global Provenance System for Responsible Data Handling

Mike CafarellaPI: Michael Cafarella

Principal Research Scientist, Computer Science & Artificial Intelligence Laboratory
Massachusetts Institute of Technology

Faculty profile

Evaluation framework Data icon
Framework component: Data

Many safety problems connected to machine intelligence require answering empirical questions about data handling; acting responsibly and safely requires that practitioners have information about how data was handled in the past. This provenance-style information is generally not collected and stored as a by-product of data activities, making it tedious and expensive to reconstruct at a later date. This information should be part of the daily routine of anyone engaged in machine intelligence work, but today using it is difficult and rare. Safe machine intelligence work requires a system for collecting and sharing data provenance that can be universally deployed across both tools and organizations. Universality means it must be designed to be far less expensive and obtrusive than past provenance attempts. Finally, it must be able to operate while still observing privacy and intellectual property concerns of its users. This project will construct three core software elements to develop an approach to universal data provenance collection in order to facilitate safe and responsible machine intelligence.

Back to top