Understanding and Reducing Safety Risks of Learning with Large Pre-Trained Models
PI: Sharon Li
Assistant Professor in Computer SciencesUniversity of Wisconsin-Madison
Framework component: Algorithm
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, GPT-3, CLIP) that are trained on massive data and are adaptable to a wide range of downstream tasks. It is now the common practice of the AI community to adopt pre-trained models for transfer learning rather than learning from scratch. Without appropriately understanding the safety risks, development can exacerbate and propagate safety concerns writ large, causing profound impacts on society.
In response to these urgent challenges, the overall objective of this project is to understand and reduce the risks of learning with large pre-trained models (Safe L2M). This project seeks to address the research questions that arise in building responsible and ethical AI models: What is the extent of inequity and out-of-distribution risks when performing transfer learning from large pre-trained models? And how can we design learning algorithms to de-risk the potential negative impact? This project will make the following contributions to operationalizing safe machine intelligence: (1) We propose a novel evaluation framework to comprehensively understand how the two types of risks are propagated through the transfer learning process; (2) Building on the understanding, we then propose new learning algorithms that enhance safety when transferring knowledge from pre-trained models.
Key Personnel
Rheeya Uppaal
Graduate Student, Computer Science
University of Wisconsin-Madison
Outcomes and Updates
-
Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for Out-of-Domain Detection
-
Study Finds Simple Method Can Improve Safety for Language Models
- Are Vision Transformers Robust to Spurious Correlations?