The Threat of Harm

Illustration by DALL·E of OpenAI. This image was created using the prompts machine learning, biological and chemical synthesis, cyber vibes, spores and molecules, and photorealistic.

Listen to this article:

AI’s capacity to be harnessed for harm is an unknown – all the more reason to get better governance controls in place before it is too late, argues philosopher Dr Nathaniel Sharadin.

ChatGPT has a hidden prompt that begins before users enter their query. It asks the AI to be kind, ethical and helpful to the user. But there is no guarantee the subsequent information or actions taken will be similarly positive. With more large language learning models (LLMs) similar to ChatGPT coming online, there is a prospect of AIs being manipulated to dangerous ends.

This is a worry of Assistant Professor of Philosophy, Dr Nathaniel Sharadin, who is also a Research Affiliate at the Centre for AI Safety in San Francisco. He originally was concerned about AI’s misalignment with human values, such as being hurtful or racist. He’s less concerned about that now – recent AI models have become better in this regard, he said – but that does not rule out the potential to do damage in future.

“I’m more worried that these systems are much more capable than we really understand. And the capabilities for misuse are pretty stark,” he said. “Biological misuse is the most obvious example. LLMs are great at quickly finding promising drug targets, but they also need to know how to identify and formulate dangerous molecules to avoid harm. It doesn’t mean this is happening now, but it obviously increases the threat of biological or chemical misuse.”

To illustrate what that threat might mean, consider that the French government asked drug researchers using a model specific to drug discovery, to seek out harmful targets as an exercise in what might happen. In less than six hours, the model found 40,000 potential chemical weapons, including some of the deadliest chemicals around, such as sarin.

Lowering the bar

“It’s important to say with some confidence whether an LLM can produce formulas for novel pathogens. For the record, I don’t think ChatGPT has the capabilities to do this at the moment because I don’t think it has enough of that training data. But it’s very clear that some models can and that would be very dangerous,” he said.

One of the problems is the difficulty in understanding the capabilities of AI models, a question that Dr Sharadin has been exploring. There are two methods – benchmark against existing information or interact and poke around with the AI to test its limits. Both have major limitations for dangerous substances. For example, there is no benchmark for synthesising smallpox, so how can one know whether an AI can do this? And the interaction approach cannot provide systematic evidence of capabilities because every single capability would need to be tested.

“Just because you can’t test it with a benchmark or interaction is not evidence that AI can’t do it,” he said. “Another worry is that you could have LLM assistants in the lab acting like a coach, to explain why something has failed. It lowers the bar of technical know-how for chemical and biological pathogens.”

Despite the bleakness of the situation, and the rapid pace that AI is developing, Dr Sharadin believes it is still possible to address the lurking threats. While the current polarised world is unlikely to produce treaties as successful as those on biological and chemical weapons, within nations the solutions might be easier.

A hard problem

“It’s a hard problem, but the first step should be that companies do not train increasingly large, capable models and release them publicly because this is dual-use technology with a high capacity for misuse,” he said. “Model developers should also remove things like the repository of dangerous molecules from their training data, which they’re not doing. There is no reason why ChatGPT needs to see this in its training.”

China has acted to rein in its companies, but in the US, which is the leader in AI development, there are no controls at all. “With the pace moving so rapidly, I’m worried that even though present-day models are not by themselves extraordinarily dangerous, there is increasing scale for that,” he said.

The release of the open-source LLM, Llama 2, by Meta makes matters worse. Unlike ChatGPT, whose architecture and operations are controlled by OpenAI, any user can download Llama 2 and fine-tune it to their liking, such as training it to respond in nefarious ways. All that is needed is English language skills and a computer. Setting aside concerns about new chemicals and weapons, the effects of that capability will likely soon be felt by much of the world.

“It’s certainly going to make a nuisance of itself. More than a billion people vote next year. If you think misinformation on the internet is a pain, wait till you see what can be done with a system that can produce realistic, human-tailored, micro-targeted misinformation in a heartbeat,” he said.

I’m more worried that these systems are much more capable than we really understand. And the capabilities for misuse are pretty stark.

DR NATHANIEL SHARADIN

#AI #humanities