01 December 2022
What distinguishes Owen Wilson from Beyoncé? There are a whole host of possible features - age, hair-color, gender, etc... But neural nets will generally focus on the easiest feature they can find; in this instance, maybe the glasses. So, instead of an Owen Wilson-vs-Beyoncé classifier, we may end up with a mere eyeglasses detector.
To solve this, Aligned AI developed acCElerate to identify the different features that explain a classification. Even if the labeled training data has a perfect correlation - even if all blonds wear glasses and all non-blonds don’t - AcCElerate is capable of distinguishing the two features and training a different classifier for each.
AcCElerate can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.
In the following five illustrative sets, the diagonal images - top left and bottom right - are those images with features correlated. The off-diagonal images - bottom left and top right - are the most ambiguous images, where the two classifiers disagree.
Goal misgeneralisation is a problem in artificial intelligence (AI) where an AI agent has learned a goal based on a given environment, but is unable to transfer its knowledge to different environments. This is because the AI agent has only been exposed to a limited set of scenarios, and lacks the ability to generalise from those scenarios to new ones. This means that the AI agent may fail to learn new goals or behaviours when it encounters a different environment.
Coinrun is the classic example of goal misgeneralisation. When an agent trains on Coinrun, the coin is always on the right. The agent fails to learn 'get the coin and go to the right' as the true objective, and ends up merely learning 'go to the right'.
With AcCElerate built into the agent, an agent trained on the spuriously correlated game can identify that it must 'get the coin' when the coin begins to show up in new locations. It has now generated the two possible rewards it could be following, and is able to achieve them both.
We are now opening the waitlist for our alpha product build on AcCElerate. Fill out your email below and we'll let you know when your spot to access AcCElerate Alpha is ready.
Dr. Stuart Armstrong and Rebecca Gorman
with thanks to the following researchers for their contributions: Oliver Daniels-Koch, Jessica Cooper, Brady Pekley, Joe Kwon, Matthew Watkins, Sam Marks, and Patrick Leask