Aligned AI

Tools for building fundamentally safe artificial intelligences

Research update

Chatbot Safety with Prompt Evaluator for Chat-GPT

There have been many successful, published attempts by the general public to circumvent the safety guardrails OpenAI has put in place on their remarkable new AI chatbot, ChatGPT.  We propose using a second and fully separate, fine-tuned LLM to evaluate prompts before sending them to ChatGPT.

Read more

Research update

Announcing AcCElerate for goal generalisation

AcCElerate identifies the different features that explain a classification. Even if the labeled training data has a perfect correlation, AcCElerate is capable of distinguishing the two features and training a different classifier for each. AcCElerate can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.

Read more

Research update

Concept extrapolation for hypothesis generation

For a multi-hypothesis neural net, gathering extra unlabeled data does two things simultaneously. It suggests new and better hypotheses - and it gives us an understanding and an interpretation of the previous hypotheses.

Read more

Research update

Happy faces benchmark

The aim of this benchmark is to encourage the design of classifiers that are capable of using multiple different features to classify the same image. The features themselves must be deduced by the classifiers without being specifically labeled, though they may use a large unlabeled dataset on which the features vary. We have constructed a benchmark where the features are very different: facial expressions versus written text.

Read more

Building Aligned AI

Stuart Armstrong, Aligned AI’s chief research officer, talks to the London Futurists about the power of AI, the challenge of alignment, and how to ensure our future is full of human flourishing.

Read more


Rebecca Gorman
Rebecca is an AI alignment researcher, technology hobbyist, leader and designer. She has pursued a lifelong dedication to finding ways of making technology serve users’ true values.  

Co-Founder and CEO

Dr Stuart Armstrong
Previously a Researcher at the University of Oxford’s Future of Humanity Institute, Stuart is a mathematician and philosopher and the originator of the value extrapolation approach to artificial intelligence alignment.

Co-Founder and Chief Research Officer


Dylan Hadfield-Menell
Assistant professor at MIT in Artificial Intelligence, Co-Founder and Chief Scientist of Preamble, Expert in Cooperative Inverse Reinforcement Learning

Research Advisor

Adam Gleave
Adam Gleave is an artificial intelligence PhD candidate at UC Berkeley working with the Center for Human-Compatible AI. His research focuses on adversarial robustness and reward learning, and his work on adversarial policies was featured in the MIT Technology Review and other media outlets.

Research Advisor

Justin Shovelain
Co-founder of Convergence, AI safety advisor to Causal Labs and Lionheart Ventures, AI safety advisor to Causal Labs and Lionheart Ventures

Ethics and Safety Advisor

Dr Anders Sandberg
Fellow, Ethics and Values at Reuben College, Oxford, Senior Researcher, Future of Humanity Institute, Oxford

Information Hazards Policy Advisor

Romesh Ranawana
Serial entrepreneur, AI technologist, programmer and software architect with more than 20 years of deep tech development experience, and a highly experienced technology chief executive. Member of the Board of Management of the University of Colombo School of Computing and founding chairman of the SLASSCOM AI Center of Excellence (AICx). Co-founder of SimCentric Technologies and Co-Founder and CTO of Tengri UAV.

Commercialisation Advisor

Charles Pattison
Charles has 15 years experience working in capital markets, from pricing derivatives to investment in listed or unlisted equities. He currently works at a large Asia-based equity-focused fund.

Finance Advisor