Tools for building fundamentally safe artificial intelligences
Aligned AI has released EquitAI, a tool that can be applied to language models to ensure they create text without gender bias or prejudice.
AcCElerate identifies the different features that explain a classification. Even if the labeled training data has a perfect correlation, AcCElerate is capable of distinguishing the two features and training a different classifier for each. AcCElerate can then select the most ambiguous unlabeled images - the ones on which the two classifiers disagree the most - and get a human to select the correct classifier by choosing which category those ambiguous images belong to.
Neural nets tend to learn the simplest features that are sufficient for their task. So, for instance, an image classifier that needs to distinguish blond-haired female celebrities without glasses, from dark-haired male celebrities with glasses, might just focus on one simple feature - maybe the glasses - and neglect the others.
There have been many successful, published attempts by the general public to circumvent the safety guardrails OpenAI has put in place on their remarkable new AI chatbot, ChatGPT. We propose using a second and fully separate, fine-tuned LLM to evaluate prompts before sending them to ChatGPT.
For a multi-hypothesis neural net, gathering extra unlabeled data does two things simultaneously. It suggests new and better hypotheses - and it gives us an understanding and an interpretation of the previous hypotheses.
The aim of this benchmark is to encourage the design of classifiers that are capable of using multiple different features to classify the same image. The features themselves must be deduced by the classifiers without being specifically labeled, though they may use a large unlabeled dataset on which the features vary. We have constructed a benchmark where the features are very different: facial expressions versus written text.
Founder and CEO
Co-Founder and Chief Research Officer
Ethics and Safety Advisor