The Paper List for My PQE

This is a paper list that I summarized for my PQE.

Resources for XAI

Paper List

  1. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.

    The paper addressed the important role of human in the field of machine learning. It also stated that trust is crucial for end-users to utilize the power of a machine learning model. Then the paper proposed to use explanation as a solution to build users’ trust.

    The paper also described situations where the current quantitative evaluation of ML models might be erroneous. Problems like data leakage and dataset shift are difficult to detect just by observing raw data and prediction. However, they can be efficiently detected if we provide fidel explanations of the model/prediction.

    On the other hands, in some situations, human priors (even non-expert) are valuable in improving ML models. There is frequently a mismatch between computable metrics like accuracy and our interested but incomputable metrics like user engagement. In this case, an expert’s knowledge may be helpful in choosing a better model using his/her prior.

    A good theoretical contribution of the paper is the formulation of interpretability and local fidelity of an explanation of the model’s prediction (see Section 3).


    Specific implementations of this framework are ill-argued, many possible improvements can be done. (E.g. the 0-1 formulation of explanation is limited, we have many other visual methods for explanation; the K-LASSO algorithm they proposed seems to be inefficient)

    Data: Image, Text

    Model: General classification model

    Explanation Type: instance and model-level, model-irrelevant explanation

    Evaluation: Simulated user experiment, User Experiment: select best classifier, improve classifier, comparison.

  2. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI ’15). ACM, New York, NY, USA.

    This paper presents how explanations can help users build better mental models of an interactive machine learning system so that they can more efficiently personalize their learning systems. This paper was the first that shows the effectiveness of explanations in interactive learning systems.

    One important contribution of the paper is the summary of principles for Explanatory Debugging (for interactive learning systems). The basic “Explaianbility” principle is a good reference for designing algorithms or interface for general explainable/interpretable ML systems. However, the sub-principles of “Soundness” and “Completeness”, though are necessary for building a faithful mental model, seem to be not necessary under the context of explainable ML. It is acceptable that we can use simpler model to locally approximate and “explain” the model-to-be-explained.

    Another downside of the paper is that the technique it proposed is only applicable to the Naive Bayes Classifier and can only serve as a illustration of concept.

    Data: Text

    Model: Naive Bayes Classifier

    Explanation Type: instance-level, model-specific explanation

    Evaluation: A complete end-user study with careful discussion

  3. David Martens and Foster Provost. 2014. Explaining data-driven document classifications. MIS Q. 38, 1 (March 2014).

    This paper distinguishes 2 types of explanations: Global Explanations and Instance-level Explanations.

    The paper claimed to have contribute a new format of explanations for text classifiers: a minimal set of words.

    Data: Text (Web pages)

    Explanation Type: instance-level, model-irrelevant explanation

    Model: General classification model for text

  4. Baehrens, David, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. How to explain individual classification decisions. Journal of Machine Learning Research 11.Jun (2010).

    This paper proposed a gradient based framework for explaining the decision for each individual data point. The paper defines explanation vector as the class probability gradients w.r.t inputs. The paper also noted that for non-probability based classification models (e.g. SVM), the probability gradient method will fail. For these cases, it uses a probability based model g’ to approximate the model g.

    Using the gradient of prediction value with respect to input at a local point is an intuitive way for explaining a model. If the prediction is very sensitive w.r.t. a feature at the data point, then it means the this feature is important here, since prediction will changed a lot if this feature is fluctuated. However, gradient-based methods will easily fail for nowadays ML models. (E.g., the local gradient may mean nothing for a very noisy and wavy function) There are also cases that although the gradient of a feature is small, but the feature is actually contribute to the prediction a lot.

    Data: Image (USPS digits), Chemical compound structure data (a vector of counts of 142 molecular substructures)

    Explanation Type: instance-level, model-irrelevant explanation

    Model: General classification model

  5. Féraud, Raphael, and Fabrice Clérot. A methodology to explain neural network classification. Neural Networks 15.2 (2002).

    The first paper that I found to explicitly use the term “explain” on machine learning models. The paper focused on solving the interpretability of a specific kind of model – neural networks. In the early days, explaining an ML model is not that difficult. Given the limit computational power and data at that time, most popular ML models out there are actually not complex and have no strong need for explanability.

    This paper presents an intuitive methodology to explain neural networks (multi-layer perceptrons):

    • train a neural net
    • feature selection based on the net’s prediction/activation.
    • train a new neural net only based on the selected features
    • cluster the representation of the hidden layer of the new net.

    A few problems of this method:

    • Why train a new net? If raw feature results in unexplainable clusters, I don’t think a newly trained model can explain the previous one.
    • Problems related to clustering algorithm. E.g., how to choose k? cluster quality?

    Data: categorical data

    Model: vanilla neural network (MLP)

    Explanation Type: Model-level explanation

    Evaluation: Use cluster result of the hidden representation directly for classification and measure classification accuracy.

  6. Bach, Sebastian, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10(7) (2015)

    A technique paper that proposed the LRP method for explaining deep neural networks (CNN) that has layer-wise architectures.

  7. Arras L, Montavon G, Müller KR, Samek W. Explaining recurrent neural network predictions in sentiment analysis. Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2017

    An extension of the LRP on RNN for sentiment analysis.

  8. Cognitive psychology for deep neural networks: a shape bias case study

  9. Network Dissection: Quantifying Interpretability of Deep Visual Representations.

  10. Understanding the Representation and Computation of Multilayer Perceptrons: A Case Study in Speech Recognition

Vis Papers

  1. Liu, M., Shi, J., Li, Z., Li, C., Zhu, J. and Liu, S., 2017. Towards better analysis of deep convolutional neural networks. IEEE transactions on visualization and computer graphics, 23(1), pp.91-100.

    The good:
    • Formulate the problem of analyzing CNN, and the requirements of the system very well
    • The system is complete, well polished. And the paper has argued the visual design in a reasonable manner.
    • Bi-clustering-based edge bundling algorithm. Seems inspiring (formulate the CNN as a layer-wised bi-cluster graph)
    • Good and abundant case study to demonstrate the usefulness of their design.
    The bad:
    • The system is somewhat over designed.
      1. the activation matrix seems can be better co-designed with the learned feature matrix
      1. Biclustering-based edge bundling is not intuitive
    • The system is hard to be rebuilt / lack generalization and scalability. e.g. when the network gots too large, the learned feature matrix seems vague and hard understand.

    Further thinking: This paper has already done a lot of work on the CNN, although not perfect. But to improve the system or designing a new one for CNN is not as meaningful in the sense of research. To focus diagnosing and understanding RNN seems to be a better idea. The techniques and ideas of this work can be of reference in RNNVis

  2. RNNVis

  3. LSTMVis (Harvard)

  1. Mark W. Craven and Jude W. Shavlik. Extracting tree-structured representations of trained networks. In Proceedings of the 8th International Conference on Neural Information Processing Systems (NIPS) (1995).