CASMI Recognizes Research Focused on Preventing, Mitigating AI Harms
Four Papers Accepted to AI Incidents and Best Practices Track at AI Conference
As companies and organizations continue releasing artificial intelligence (AI) systems into the world, researchers are working to ensure technologies are safe. The Northwestern Center for Advancing Safety of Machine Intelligence (CASMI) is supporting these efforts through the AI Incidents and Best Practices track at the Thirty-Sixth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-24).
The creators of the AI Incident Database worked with CASMI and IAAI to update the track at the international conference, which is co-located with the Association for the Advancement of Artificial Intelligence (AAAI). The IAAI-24 track emphasizes AI incidents, which are defined as alleged harm or near harm events to people, property, or the environment where an AI system is implicated. The four research papers accepted into the track analyze the factors related to AI incidents and the best practices for preventing or mitigating their recurrence.
“It’s been quite successful,” said Yuhao Chen, IAAI-24 co-chair and University of Waterloo research assistant professor. “Last year, the track was called ‘AI Best Practices, Challenge Problems, Training AI Users,’ and only one paper was accepted because the track covered a wide range of topics. This year, by refining the track with a new name, we saw an improvement in the quality of the accepted papers, which marks a very positive upward trend.”
Kristian Hammond, Bill and Cathy Osborn Professor of Computer Science and director of CASMI, said the purpose of the track is to encourage research in AI safety, a topic which has seen heightened interest in the last year amid the popularity of generative AI models like ChatGPT.
“It's exciting to see that researchers are beginning to understand and embrace the notion that one of the first things they should be thinking about when they think about AI systems that are going to be deployed in the world is how to evaluate them, how to design them, and how to consider them in general with regard to the impact on human life and safety,” Hammond said.
As part of the IAAI-24 AI Incidents and Best Practices track, CASMI sponsored the AI Incidents and Best Practices Paper Award at the conference. The winners, who created the Political Deepfakes Incidents Database, will receive a $1,000 prize plus $1,000 travel support.
Tool to Assess Harms in AI Systems
One of the research papers accepted into the IAAI-24 AI Incidents and Best Practices track is “AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards.” Its authors ‒ Arihant Chadda, IQT Labs data scientist; Sean McGregor, Digital Safety Research Institute (DSRI) director; Jesse Hostetler, DSRI lead software engineer; and Andrea Brennen, IQT Labs senior vice president and deputy director ‒ built a programmatic tool called an Evaluation Authority, which allows auditors to assess harms in AI systems.
The authors demonstrated how this works using a natural language processing method called Named Entity Recognition (NER), which extracts information from text. However, the researchers also showed that their method also works for other tasks, such as computer vision image classification, which is a process by which computers are trained to recognize images.
“As long as you can write an assessment within the framework outlined in the Evaluation Authority code, you can run any assessment on an entire model product category,” Chadda said. “We really think that this is an extensible platform and one that you could feasibly write any test that you have data for.”
The Evaluation Authority tracks whether AI harms could reoccur using safety data and can help public and private entities decide which AI models to fine-tune or purchase. Chadda called it an essential piece of a well-functioning AI safety ecosystem. He’s also heartened to see that other AI safety researchers are working to characterize the limitations of the technologies.
“While the capabilities have come so far, there's also a very renewed and distinct interest in how we assure these models. How do we keep the systems safe so that users are protected from harms?” he asked. “Once you have insight, you can address it.”
Chadda compared AI safety to black box thinking. “If an Airbus plane crashes, they are not permitted to withhold critical safety data from Boeing. With a similar view for the safety of intelligent systems, safety data derived from incidents can prevent the recurrence of harm,” the paper notes.
A Template to Assess AI Risks
Another research paper accepted into the IAAI-24 AI Incidents and Best Practices track developed standards for reporting AI risks using a template that assesses AI systems. The work, “AI Risk Profiles: A Standards Proposal for Pre-deployment AI Risk Disclosures,” comes from Credo AI (an AI governance platform) Data Scientist Eli Sherman and Head of AI Governance Research Ian Eisenberg.
The paper detailed how the reporting template, called the Risk Profile, can help identify which (if any) risks are possible in the following areas: abuse and misuse, compliance, environmental and societal impact, explainability and transparency, fairness and bias, long-term and existential risk, performance and robustness, privacy, and security.
While risk reporting is relevant for all organizations, Eisenberg highlighted that the standardization of Risk Profiles is particularly important for the increasingly complicated AI development process, which requires teams that purchase supplies for companies and suppliers that sell products to transparently communicate their requirements and capabilities.
“The closer those two can become in terms of what they’re requesting and what they’re offering, the easier it leads to a quickly innovative ecosystem,” Eisenberg said. “What risks do you anticipate? If I’m on a procurement team, I am responsible for them. If you’re a vendor, you have a responsibility to understand risks outside of my organization. I have a responsibility to understand my own risks.”
The research generated AI Risk Profiles for five popular AI systems: Anthropic’s Claude, OpenAI’s Generative Pre-Trained Transformer Application Programming Interfaces (GPT APIs), Microsoft Copilot, Github Copilot, and Midjourney. It found that nearly all risks are present for every AI system though, in most cases, there are built-in mitigations that address these risks. However, one area which tends to lack built-in mitigations is explainability and transparency.
“There are a number of directions that organizations are moving in to increase explainability and transparency,” Eisenberg said. “People want to know how the model was trained. What were the concerns you had as an organization? Who built the model? What was your decision-making process in identifying risks? I want to know that because it’s subjective. There’s a giant network of trust based in some transparency and regulatory oversight.”
Documenting Attacks on AI Systems
The IAAI-24 AI Incidents and Best Practices track also includes a small database of AI security incidents. The research paper, “When Your AI Becomes a Target: AI Security Incidents and Best Practices,” analyzed previously reported incidents and surveyed 271 participants from the security, healthcare, finance, and automotive industries to classify 32 AI security incidents.
“We are not sure if all the people we surveyed actually noticed they may have been under attack,” said Kathrin Grosse, lead author of the paper and postdoctoral researcher at the Visual Intelligence for Transportation Laboratory at the Swiss Federal Institute of Technology in Lausanne (EPFL). “On the other hand, it's actually a very good thing that there were not more incidents because, if there were, it would be a much larger problem.”
The contributing authors of the work are Lukas Bieringer, head of policy and grants at QuantPi; Tarek Besold, senior research scientist in the Barcelona office of Sony AI; Battista Biggio, associate professor at the University of Cagliari (Italy); and Alexandre Alahi, assistant professor at EPFL.
The researchers categorized AI security incidents that participants reported in a questionnaire. Since some answers lacked sufficient information, the researchers were only able to categorize 23 of the 32 incidents, but they were able to classify the goals of the attacker based on whether an AI security incident compromised sensitive information (confidentiality), the integrity of a system (is the system working as intended?), or its availability (whether a system goes down).
“One of the motivations to categorize the incidents comes from existing databases, which lack categorizing the goals of the attacker or the classification with regards to the threat model,” Bieringer said.
The researchers found that most incidents related to cybersecurity and privacy could have been prevented by adhering to best practices. Interestingly, they also found in one case that some “attackers” are actually workers who are scared to be replaced by the same AI system they are contributing to.
Continuing to Support AI Safety Research
CASMI’s goal is to help establish AI Incidents and Best Practices as a recurring track at IAAI. That will be a decision for the technical community and next year’s chairs of IAAI to make.
“It's become clear that these issues are issues that anyone doing work in applied artificial intelligence has to be thinking about,” Hammond said. “IAAI is a great venue for that.”
Editor’s note: CASMI is a collaboration with the UL Research Institutes' Digital Safety Research Institute, building on UL’s long-standing mission to create a safer, more secure, and sustainable future.