Aligned AI

Tools for building fundamentally safe artificial intelligences

Product announcement

Announcing EquitAI for de-biasing generative AI

Aligned AI has released EquitAI, a tool that can be applied to language models to ensure they create text without gender bias or prejudice.

Read more

Product announcement

Announcing AcCElerate for goal generalisation

AcCElerate identifies the different features that explain a classification. Even if the labeled training data has a perfect correlation, AcCElerate is capable of distinguishing the two features and training a different classifier for each. AcCElerate can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.

Read more

Product announcement

AcCElerate Mitigates Simplicity Bias

Neural nets tend to learn the simplest features that are sufficient for their task. So, for instance, an image classifier that needs to distinguish blond-haired female celebrities without glasses, from dark-haired male celebrities with glasses, might just focus on one simple feature - maybe the glasses - and neglect the others.

Read more

Safety announcement

Chatbot Safety with Prompt Evaluator for Chat-GPT

There have been many successful, published attempts by the general public to circumvent the safety guardrails OpenAI has put in place on their remarkable new AI chatbot, ChatGPT.  We propose using a second and fully separate, fine-tuned LLM to evaluate prompts before sending them to ChatGPT.

Read more

ML update

Concept extrapolation for hypothesis generation

For a multi-hypothesis neural net, gathering extra unlabeled data does two things simultaneously. It suggests new and better hypotheses - and it gives us an understanding and an interpretation of the previous hypotheses.

Read more

Benchmark announcement

Happy faces benchmark

The aim of this benchmark is to encourage the design of classifiers that are capable of using multiple different features to classify the same image. The features themselves must be deduced by the classifiers without being specifically labeled, though they may use a large unlabeled dataset on which the features vary. We have constructed a benchmark where the features are very different: facial expressions versus written text.

Read more


Rebecca Gorman
Rebecca is an AI alignment researcher, technology hobbyist, leader and designer. She has pursued a lifelong dedication to finding ways of making technology serve users’ true values.  

Founder and CEO

Dr Stuart Armstrong
Previously a Researcher at the University of Oxford’s Future of Humanity Institute, Stuart is a mathematician and philosopher and the originator of the value extrapolation approach to artificial intelligence alignment.

Co-Founder and Chief Research Officer

Some of our advisors

Dylan Hadfield-Menell
Assistant professor at MIT in Artificial Intelligence, Co-Founder and Chief Scientist of Preamble, Expert in Cooperative Inverse Reinforcement Learning

Research Advisor

Adam Gleave
Adam Gleave is an artificial intelligence PhD candidate at UC Berkeley working with the Center for Human-Compatible AI. His research focuses on adversarial robustness and reward learning, and his work on adversarial policies was featured in the MIT Technology Review and other media outlets.

Research Advisor

Justin Shovelain
Co-founder of Convergence, AI safety advisor to Causal Labs and Lionheart Ventures, AI safety advisor to Causal Labs and Lionheart Ventures

Ethics and Safety Advisor

Romesh Ranawana
Serial entrepreneur, AI technologist, programmer and software architect with more than 20 years of deep tech development experience, and a highly experienced technology chief executive. Member of the Board of Management of the University of Colombo School of Computing and founding chairman of the SLASSCOM AI Center of Excellence (AICx). Co-founder of SimCentric Technologies and Co-Founder and CTO of Tengri UAV.

Commercialisation Advisor