Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake; they’re like optical illusions for machines. In this post we’ll show how adversarial examples work across different mediums, and will discuss why securing systems against them can be difficult.At OpenAI, we think adversarial examples are a good aspect of security to work on because they represent a concrete problem in AI safety that can be addressed in the short term, and because fixing them is difficult enough that it requires a serious research effort. (Though we’ll need to explore many aspects of machine learning security to achieve our goal of building safe, widely distributed AI.)To get an idea of what adversarial examples look like, consider this demonstration from Explaining and Harnessing Adversarial Examples: starting with an image of a panda, the attacker adds a small perturbation that has been calculated to make the image be recognized as a gibbon with high confidence.

An adversarial input, overlaid on a typical image, can cause a classifier to miscategorize a panda as a gibbon.
An adversarial input, overlaid on a typical image, can cause a classifier to miscategorize a panda as a gibbon.