Moving beyond feature attribution

So far, we have encountered many methods to explain black-box models through feature attribution. However, there are some limitations regarding the feature-based approach. First, features are not necessarily user-friendly in terms of interpretability. For example, the importance of a single pixel in an image usually does not convey much meaningful interpretation. Second, the expressiveness of a feature-based explanation is constrained by the number of features.

Today we will spend the day reading about one interessting concept that goes beyond visualizing key pixels.

Adverserial Explanations

What are adverserial examples? As we may have seen so far, sometimes neural networks focus on aspects of an image that has no real relevence to the final classification. Further, small modifications to the input image can result in the final image being missclassified. Consider these following scenarios.

A self-driving car crashes into another car because it ignores a stop sign. Someone had placed a picture over the sign, which looks like a stop sign with a little dirt for humans, but was designed to look like a parking prohibition sign for the sign recognition software of the car.
A spam detector fails to classify an email as spam. The spam mail has been designed to resemble a normal email, but with the intention of cheating the recipient.
A machine-learning powered scanner scans suitcases for weapons at the airport. A knife was developed to avoid detection by making the system think it is an umbrella.

These examples are known as adverserial examples and they are a critical threat to the adoption of AI in a lot of industries. Please watch this video to learm more about adverserial examples and the threat they pose.

Please read more about adverserial examples are applied to XAI here

Assignment

Start by watching this video giving an overview of XAI and the current state of the art.

Now that you have gained more insight into Responsible and Explainable AI. Please summarize your thoughts on the field in an essay of ~1000 words by picking a topic (or method) of interest.

Specifically

Provide a general introduction to the topic
Illustrate it's application with an example
Discuss it's advantages and limitations (as you perceive them) critically.
Upload your essays to Github and please remember to cite resources used.

Now, it's time to sharpen your critical pens. Please watch the following discussion to get some ideas for your paper. .

Preparation for tomorrow's DataLab