Upon gaining access to a top-tier generative Artificial Intelligence (AI) system, I found myself surprised by the revelations I encountered. For those unfamiliar, generative AI encompasses technologies such as GPT-4, the most recent AI system that powers notable chatbots like ChatGPT, Google Bard, Cohere, and DALL-E. Along with my colleagues from Zove Security, I was part of a small, expert team tasked to scrutinize the capabilities and constraints of this system, specifically investigating potential misuse. As the chief cybersecurity researcher for this effort, I am now sharing insights from our findings. This article is the first in a four-part series on our journey toward more trustworthy AI.
Over the last year, we formed a “red team” with the mission of conducting an adversarial test of the new model to identify potential vulnerabilities. Our exploration spanned various fields including aerospace, manufacturing, chemical engineering, and banking. We began by formulating hypotheses within these industries and aimed to use AI to either confirm or refute them. For instance, within aerospace, we speculated that we could potentially craft a faulty part that would pass initial tests but quickly fail under real-world conditions. This presented a dangerous possibility of deliberately defective spare parts infiltrating a supply chain and causing disastrous failures. Unfortunately, our hypothesis proved correct with unsettling ease.
Equally concerning was our success in evading financial fraud detection and behavior-based intrusion detection systems. However, the team was taken aback by the model’s thorough response to a hypothetical cyber-attack on crucial infrastructure.
Similar efforts by other teams leveraging OpenAI’s GPT, revealed the potential for the AI to suggest compounds suitable for nerve agents, effectively creating chemical weapons. By utilizing “plug-ins” to supply the ChatGPT chatbot with recent scientific research and a directory of chemical manufacturers, it was even able to pinpoint a potential manufacturing site for such a compound.
These discoveries highlighted the dichotomy of advanced AI technology. While it holds the power to boost and augment our scientific discoveries, it simultaneously harbors the potential to facilitate dangerous activities within physics, chemistry, and cybersecurity. Recognizing these risks, measures were taken to ensure that such outcomes are avoided when technology is made widely accessible.
Our AI Security team’s probing exercise aimed to alleviate public concerns regarding the deployment of potent AI systems in society. Our duty was not only to test the boundaries of the AI model but also to scrutinize it for potential issues like toxicity, bias, and linguistic prejudice. We assessed the model for a range of possible abuses, including perpetuating misinformation, facilitating plagiarism, and enabling illegal activities such as financial crimes and cyberattacks.
In our approach, we combined professional security analysts and penetration testers with industry experts. Over several months, these interdisciplinary teams formulated and tested hypotheses, aiming to breach current defenses and risk mitigations.
One mutual apprehension within the team was the risk associated with linking such powerful AI models to external knowledge sources via plug-ins. We elected not to take this route, although we understand that real-world adversaries may do so.
Another significant issue we identified involved bias in the model’s responses. We observed instances of gender, racial, and religious biases, along with overt stereotypes concerning marginalized communities.
Over time, we suggested alterations to the model and saw marked improvements in safety within the model’s responses. However, the quest for safety and fairness within AI systems remains a continuous endeavor.
In the grand scheme of technology, generative AI stands as an exciting frontier with its astounding capabilities. Our exploration has shed light on its dual nature – from empowering scientific advancements to potentially enabling harmful activities. Although our journey was filled with surprising discoveries, it reaffirmed the essential need for constant vigilance and innovation in cybersecurity. In a world where AI technologies are continually evolving, the work to ensure their safe, ethical, and fair use is an ongoing commitment. As we conclude this initial post in our series, we hope our insights will foster an understanding of the implications of AI in our society and the need for continued exploration. In the upcoming posts, we will delve deeper into the specific challenges and the corresponding measures we proposed to make AI a more secure and trustworthy tool for the future.
Authors Note:
Original Post: This post has been cross-posted from the original on LinkedIn
Next Posts in Series: AI nightmare on Bank Street
Reference as Behind the AI Curtain: A Journey Toward Trustworthy Artificial Intelligence by Akshay Aggarwal, Zove Security