Anda di halaman 1dari 11

Trends and Trajectories for Explainable, Accountable and

Intelligible Systems: An HCI Research Agenda


ABSTRACT
Advances in artificial intelligence, sensors and big data management have far-reaching
societal impacts. As these systems augment our everyday lives, it becomes increasingly
important for people to understand them and remain in control. We investigate how HCI
researchers can help to develop accountable systems by performing a literature analysis of
289 core papers on explanations and explainable systems, as well as 12,412 citing papers.
Using topic modeling, co-occurrence and network analysis, we mapped the research space
from diverse domains, such as algorithmic accountability, interpretable machine learning,
context-awareness, cognitive psychology, and software learnability. We reveal fading and
burgeoning trends in explainable systems, and identify domains that are closely connected
or mostly isolated. The time is ripe for the HCI community to ensure that the powerful new
autonomous systems have intelligible interfaces built-in. From our results, we propose
several implications and directions for future research towards this goal.

MOTIVATION
The advantages of AI have been extensively discussed in the medical literature. Can use
sophisticated algorithms to ‘learn’ features from a large volume of healthcare data, and
then use the obtained insights to assist clinical practice. It can also be equipped with
learning and self-correcting abilities to improve its accuracy based on feedback. An AI
system can assist physicians by providing up-to-date medical information from journals,
textbooks and clinical practices to inform proper patient care. In addition, an AI system can
help to reduce diagnostic and therapeutic errors that are inevitable in the human clinical
practice. Moreover, an AI system extracts useful information from a large patient
population to assist making real-time inferences for health risk alert and health outcome
prediction.

INTRODUCTION
Artificial Intelligence (AI) and Machine Learning (ML) algorithms process sensor data from
our devices and support advanced features of various services that we use every day. With
recent advances in machine learning, digital technology is increasingly integrating
automation through algorithmic decision-making. Yet, there is a fundamental challenge in
balancing these powerful capabilities provided by machine, learning with designing
technology that people feel empowered by. To achieve this, people should be able to
understand how the technology may affect them, trust it and feel in control. Indeed, prior
work has identified issues people encounter when this is not the case (e.g., with smart
thermostats and smart home. Algorithmic decision-making can also affect people when they
are not directly interacting with an interface. Algorithms are used by stake-holders to assist
in decision-making in domains such as urban planning, disease diagnosis, predicting
insurance risk or risk of committing future crimes, and may be biased e.g.To address these
problems, machine learning algorithms need to be able to explain how they arrive at their
decisions. There has been increased attention into interpretable, fair, ac-countable and
transparent algorithms in the AI and ML communities, with examples such as DARPA’s

1
Explain-able AI (XAI) initiative and the “human-interpretable machine learning” community.
Recently, the European Un-ion approved a data protection law that includes a “right to
explanation”, and USACM released a statement on algorithmic transparency and
accountability. The time is clearly ripe for researchers to confront the challenge of de-
signing transparent technology head on.
However, much work in AI and ML communities tends to suffer from a lack of usability,
practical interpretability and efficacy on real users. Given HCI’s focus on technology that
benefits people, we, as a community, should take the lead to ensure that new intelligent
systems are trans-parent from the ground up. The need for interfaces that allow users “to
better understand underlying computational processes” and give users “the potential to
better control their (the algorithms’) actions” as one of the grand challenges for HCI
researchers. Several researchers are already contributing towards this goal, e.g. with
research on interacting with machine-learning systems algorithmic fairness and
accountability.
As a first step towards defining an HCI research agenda for explainable systems, this paper
maps the broad landscape around explainable systems research and identifies opportunities
for HCI researchers to help develop autonomous and intelligent systems that are
explainable by design. We have three contributions that a first step towards HCI
 Based on a literature analysis of 289 core papers and 12,412 citing papers, we
provide an overview of re-search from diverse domains relevant to explainable
systems, such as algorithmic accountability, interpretable machine learning, context-
awareness, cognitive psychology, and software learnability.

 We reveal fading and burgeoning trends in explainable systems and identify domains
that are closely connected or mostly isolated.

 We propose several implications and directions for future research in HCI towards
achieving this goal.

RELATED WORK

We grouped prior work into three areas: related landscape articles relevant to
explainable artificial intelligence, work on intelligibility and interpretability in HCI,
and methods to analyze trends in a research topic. For brevity and to fore-shadow
the results of our literature analysis, we will high-light only a few key research areas.
Explainable Artificial Intelligence (XAI) Systems

There has been a surge of interest in explainable artificial intelligence (XAI) in recent years
driven by DARPA’s initiative to fund XAI. Historically, there has been occasional interest in
explanations of intelligent systems over the past decades with expert systems in the 1970s
Bayesian networks (for a review, refer to) and artificial neural networks in the 1980s, and
recommender systems in the 2000s The recent successes of AI and machine learning for
many highly visible applications and the use of increasingly complex and non-transparent
algorithms, such as deep learning, calls for another wave of interest for the need to better
understand these systems.

2
The response from the AI and ML communities has been strong with a wide range of
workshops: Explanation-aware Computing (Exact) at ECAI 2008, 2010-2012 and AAAI
Symposia 2005, 2007. Fairness, Accountability, and Transparency (FAT-ML) workshop at
KDD 2014-2017 , ICML 2016 Workshop on Human Interpretability in Ma-chine Learning
(WHI) at , NIPS 2016 Interpretable ML for Complex Systems , IJCAI 2017 Workshop on Ex-
plainable AI . Workshops have also been organized at HCI venues: CHI 2017 Designing for
Uncertainty in HCI, CHI 2016 Human-Centre Machine Learning, and IUI 2018 Explainable
Smart Systems.
This has produced a myriad of algorithmic and mathematical methods to explain the inner
workings of machine learning models for a survey. However, despite their mathematical
rigor, these works suffer from a lack of usability, practical interpretability and efficacy on
real users. For in-stance, Lipton argues that there is no clear agreement on what
interpretability means, and provides a taxonomy of both the reasons for interpretability and
the methods to achieve interpretability.
Intelligibility and Explainable Systems Research in HCI
The challenges of interaction with intelligent and autonomous systems have been discussed
in the HCI community for decades.
In the late 90s and early 2000s, low-cost sensors and mobile devices drove research in
context-aware computing forward .It also became clear that people needed to be able to
understand what was being sensed and which actions were being taken based on that
information. Researchers argued for systems to provide accounts of their behavior.
This fuelled further work in supporting intelligibility. Researchers proposed tailored
interfaces that explained underlying context-aware rules, or pro-vided textual and visual
explanations for these rules. Other works explored how to de-sign for implicit interaction
with sensing-based systems and help people predict what will happen through feed
forward. Another relevant stream of work in the intelligent user interfaces community
explored how end-users make sense of and control machine-learned programs, working
towards intelligible and debuggable ma-chine-learned programs. The importance of
understandability and predictability has also been recognized in interaction with
autonomous vehicles
HCI researchers have studied algorithmic transparency accountability. A computational
journalism perspective and curates a list of newsworthy algorithms used by the U.S.
government. Similarly, efforts have been made in the information visualization and visual
analytics communities to visualize ML algorithms. Unfortunately, the streams of research in
explainable systems in the AI and ML communities and in the HCI community tend to be
relatively isolated, which we demonstrate in our analysis. Therefore, this work lays out the
relevant domains involved in explanations and understanding and sets out an HCI research
agenda for explainable systems.

3
RESEARCH PROBLEMS
A complex set of issues exist at the intersection of AI development and the application
divide between the Global North and the Global South. Some of the thematic areas include
health and wellbeing, education, and humanitarian crisis mitigation, as well as cross-cutting
themes such as data and infrastructure, law and governance, and algorithms and design,
among others. We are examining the core areas and cross-cutting themes through research,
events, and multi-stakeholder dialogues. The following materials are informed by the
multitude of these efforts, which incorporates perspectives from a wide array of experts in
this emerging field.
The problem of non-neutrality is further compounded by the opaqueness of machine
learning and AI: it’s often difficult or impossible to know why an algorithm has made a
decision. Certain algorithms act as a “black box” where it is impossible to determine how
the output was produced, while the details of other algorithms are considered proprietary
trade secrets by the private companies who develop and market AI systems. Thus, we often
find ourselves in situations in which neither the person using the algorithm nor the person
being categorized by the algorithm can explain why certain determinations are made. A
common claim among companies that use AI is that algorithms are effectively neutral actors
that make decisions in an unbiased manner unlike humans who are affected by their
prejudices. However, as machine-learning algorithms are typically trained to make decisions
based on past data, the application of invisible AI is liable to reinforce existing inequalities.
BASIC MODELS:

4
SYSTEM DESIGNS AND ITS DETALS
Network Theme
With the citation network, we can distinguish which networks are associated or disengaged
and recognize the patterns over time. Next, we need to comprehend what the key themes
in every network are, the way firmly coupled networks relate to one another and why citing
papers cites the core papers. Hence, we divided the system into four sub networks
containing networks that are near one another and perform topic modelling on the abstract
of the core papers, abstract of the citing paper, and the paragraph of the citing papers
where the core papers are referred to. The instinct behind this is subjects in theory are
about the paper when all is said in done though the referring to passage would contain
more logical data that relates the core paper to the referred to paper.
Modelling
With the 289 core and 12,412 referring to papers recognized, we endeavoured to download
the transparently open PDF where accessible. The downloaded PDFs were then handled
utilizing Gravid to separate the unique (both centre and referring to papers) what's more,
the referring to section (from the referring to paper). The removed content from the
referring to abstracts and referring to sections were additionally handled by expelling non-
alphanumeric characters, stemming plural structures to make them solitary and stop word
expulsion. Subsequent to pre processing, we had 289 core modified works, 6597 refer to
abstracts and 10,891 refer to sections. Thought about to the reference organize we have
roughly 47% less referring to papers since a) not all PDFs are freely accessible to be
recovered by the mechanized creeping procedure and b) we confine our examination to
English dialect writings. To concentrate the talk on focal research groups, combine more
elevated amount topics, and see how proximate bunches identify with one another, we
dissect four sub networks from the reference arrange as appeared in Figure 1. Beginning
from the base left quadrant and moving a clockwise way, we see causality and brain
research of clarifications. The upper left quadrant comprises of algorithmic reasonableness,
straightforwardness also, interpretable machine learning. The focal part of the upper right
quadrant comprises of the densest of the sub networks, identifying with shrewd and
encompassing frameworks. At long last, the focal part of the base right quadrant comprises
of associations and programming learn ability and is firmly associated with encompassing
and astute frameworks. For each of the sub networks, we played out the accompanying
advances:
1) Iterative Open Coding of Topics in Core Abstracts: The creators of this paper explored the
core modified works to mark the core papers with the distinguished themes. Our names
were at that point joined with the creator gave watchwords of the ore papers. This last
arrangement of watchwords was iteratively refined until the point that a compact
arrangement of theme catchphrases developed.
2) Topic Modelling of Citing Abstracts: We performed LDA based point displaying using
Gibbs examining on the pre processed content of referring to abstracts. LDA is a generative
probabilistic model. Given a sack of words portrayal of records, LDA regards each report as
an appropriation over points and every subject as a circulation over words. LDA requires the
quantity of themes as one of the contributions to take in the theme appropriations. While
there are measurements accessible to help with determination of number of subjects, they

5
are a long way from impeccable past standard benchmark datasets. To decide the number
of themes, we utilize the R bundle ldatuning , which executes different measurements, as a
beginning stage and iteratively perform theme demonstrating until the quantity of points
gives a decent trade off between excessively broad and unmistakably particular. LDA
produces unlabeled themes. To mark them, we produce the best 30 words for every theme
and physically named them. Numerous co authors examined the names to merge on the last
concurred names.
3) Topic Modelling of Citing Paragraphs: This is the equivalent as stage 2, yet for pre
processed writings from referring to sections.
4) Topic Network Generation: After the subject displaying also, marking is finished, we have
an arrangement of themes for every one of the three wellsprings of writings: core modified
works, referring to abstracts and referring to sections. We at that point utilize the individual
prepared LDA models for every content and foresee up to five in all likelihood themes that
happen with likelihood more noteworthy than a uniform appropriation on the quantity of
subjects. To comprehend the relationship between these subjects, we make a point
organize by building an undirected chart as pursues:
1. Nodes :
Each unique topic from each of the sources of texts is treated as a node. We have three
types of nodes, cite paragraph topic of green colour, cite abstract topic of yellow colour,
core abstract topic of red colour.
2. Edges:
We have an aggregate of six distinct sorts of edges weighted by the recurrence of co-
occurrence.
• Co-occurrence edges:
There are three sorts of co-occurrence edges: red colour of core abstract to core abstract,
yellow colour of cite abstract to cite abstract and the green colour of cite paragraph to cite
paragraph . This exemplifies which subjects inside an arrangement of writings co-happen.
• System edges:
There are three sorts of system edges. Relationship between topics in the core and the
citing paper are orange colour of cite abstract to core abstract. Light green colour provides
context on why a citing paper cites a core paper of cite paragraph to cite abstract.

RESULT AND ANALYSIS


The ordinary system to give an audit of the state of the craftsmanship and to study an
examination subject is to play out a composition review. Models in HCI are Jansen et al's.
ask about arrangement for data physicalization, Chong et al's. outline of device association ,
Pierce and Paulos' review of imperativeness related HCI , Froehlich et al's. eco-input
development survey , Grosse-Puppendahl et al's. capacitive distinguishing review and
Grossman's item learn ability outline .
In this paper, we look to inspect inclines over various scholastic areas, covering a huge
number of papers. In this way, we utilize theme displaying, a semi-robotized strategy to

6
perform writing investigation. Bibliometric investigation has been utilized beforehand to
describe look into regions. Most applicable to our work is Liu and partners' co-word
examination to break down patterns and connections between various ideas for the
Ubicomp and CHI people group. Co-word investigation is a set up writing examination
technique that has been utilized to review brain research , programming building and
undeveloped cell explore . It distinguishes bunches of watchwords that frequently seem
together in papers. For instance, Liu and partners, naturally remove catchphrases from
papers to play out a watchword investigation. In this work, we investigate a significantly
bigger dataset (> 12,000 research papers) from numerous areas covering around 100
production scenes, of one gathering. We performed Latent Dirichlet Allocation (LDA) based
subject demonstrating alongside co-event examination to outline the exploration space. In
outline, while our review is HCI-driven, it covers a bigger number of papers furthermore, a
more extensive scope of research regions than traditional writing surveys. We play out an
information driven writing examination what's more, help out further examination through
representation.

USAGE
One of the primary computational and logical difficulties in the advanced age is to remove
valuable data from unstructured writings. Theme models are one well known machine-
learning approach which induces the idle topical structure of an accumulation of archives. In
spite of their prosperity - specifically of its most generally utilized variation called Latent
Dirichlet Allocation (LDA) - and various applications in humanism, history, and phonetics,
subject models are known to experience the ill effects of extreme calculated and pragmatic
issues, e.g. an absence of legitimization for the Bayesian priors, inconsistencies with factual
properties of genuine writings, and the failure to appropriately pick the quantity of points.
Here we acquire a new view on the issue of distinguishing topical structures by relating it to
the issue of discovering networks in complex systems. This is accomplished by speaking to
content corpora as bipartite systems of archives and words. By adjusting existing network
discovery strategies - utilizing a stochastic square model (SBM) with non-parametric priors -
we get a more adaptable and principled structure for subject displaying (e.g., it
consequently identifies the quantity of points and progressively bunches both the words
and archives). The examination of counterfeit and genuine corpora shows that our SBM
approach prompts preferable theme models over LDA regarding measurable model choice.
All the more significantly, our work demonstrates to formally relate strategies from network
location and theme displaying, opening the likelihood of cross-treatment between these
two fields.

7
METHODS
1. You Tell The Calculation What Number Of Themes You Think There Are:
You can either utilize an educated gauge (e.g. results from a past investigation), or
essentially experimentation. In attempting diverse appraisals, you may pick the one
that produces themes to your coveted level of translate capacity, or the one yielding
the most elevated measurable assurance (i.e. log probability). In our precedent over,
the quantity of points may be surmised just by eye-balling the reports.

2. The calculation will allot each word to a brief subject:


Point assignments are impermanent as they will be refreshed in Step 3. Brief points
are allotted to each word in a semi-arbitrary way (as indicated by a Dirichlet
appropriation, to be correct). This likewise implies if a word seems twice, each word
might be relegated to various themes. Note that in dissecting real records, work
words (e.g. "the", "and", "my") are evacuated and not doled out to any points.

3. The calculation will check and refresh theme assignments:


Circling through each word in each report. For each word, its subject task is
refreshed dependent on two criteria:
How predominant is that word crosswise over themes?
How pervasive are subjects in the report?
To see how these two criteria function, envision that we are currently checking the
theme task for "angle" in Doc Y:

 How predominant is that word crosswise over themes? Since "angle" words
crosswise over the two reports contain almost 50% of residual Topic F words yet 0%
of outstanding Topic P words, an "angle" word picked indiscriminately would more
probable be about Topic F.

8
 How pervasive are subjects in the record? Since the words in Doc Y are allotted to
Topic F and Topic P in a 50-50 proportion, the rest of the "angle" word appears to
probably be about either point.

Gauging ends from the two criteria, we would appoint the "angle" expression of Doc
Y to Topic F. Doc Y may then be an archive on what to bolster cats.
The way toward checking theme task is rehashed for each word in each archive,
pushing through the whole accumulation of reports on numerous occasions. This
iterative refreshing is the key component of LDA that produces a last arrangement
with cognizant points.
FLOW CHARTS:

9
DISCUSSION AND IMPLICATIONS
Based on our analysis of the citation and topic networks, we articulate trends, trajectories
and research opportunities for HCI in explainable systems. We articulate insights based on:
(i) reading and analyzing sample papers from different communities; (ii) closeness of
communities indicated by separation.
Trend: Production Rules to Machine Learning Interestingly:
We found that the earlier methods to explain machine learning models (Classifier
Explainers) were more strongly related to the Expert Systems and Bayesian Networks
clusters, rather than the new cluster on Interpretable ML. This suggests that the new
approaches could (i) be using different techniques and the older methods are obsolete, (ii)
be targeting very different problems, or (iii) have neglected the past. Furthermore, while
rule extraction is an old research area, and quite separated from the recent developments
in iML and FAT, there have been new papers within the past few years, such as methods for
rule extraction from deep neural networks [187, 75]. Therefore, we should reflect on the
past research topics to rediscover methods that could be suitable for current use cases.
Trend: Individual Trust to Systematic Accountability While research on explainable
Intelligent and Ambient Systems (I&A) and Interpretable Machine Learning (iML) have
focused on the need to explain to single end-users to gain their individual trust research on
fair and accountable, and transparent (FAT) algorithms aims to address macroscopic societal
accountability There is a shift in perceived demand for intelligibility from the individual's
need for understanding, to their need for institutional trust. This would require
understanding requirements arising from social contexts other than just form usability or
human cognitive psychology. Therefore, it is important to draw insights from social science
too.
Trajectory: Road to Rigorous and Usable Intelligibility Explainable:
AI (including FAT and Interpretable Machine Learning) focuses on the mathematics to
transform a complex or “black box” model into a simpler one, or create mathematically
interpretable models. While these are significant contributions, they tend to neglect the
human side of the explanations and whether they are usable and practical in real-world
situations Research in HCI on explainable interfaces has demonstrated the value of
explanations to users with classical machine learning models (e.g., linear SVM, naïve Bayes)
these are good starting points to understanding how users use explanations. However, real
world applications and datasets have challenges which require more sophisticated models
(e.g., missing data, high dimensionality, and unstructured data). Furthermore, sometimes
the reason the system does not work may lie outside the system, which may require the
provision of additional information that is not related to the internal mechanics of the
system. To improve rigor, as we test the effectiveness of explanation interfaces, we should
use real-world data with functional complex models that deal with intricacies.
LIMITATIONS
While we have made every effort to ensure a broad coverage of papers relevant to
explainable systems, we had yet to analyze other research domains which cover

10
explanations, such as theories from philosophy and social sciences, Bayesian and constraint-
based cognitive tutors, relevance feedback and personalization in information retrieval. It is
possible that the process of manual duration of the initial set of core explanation papers
may have introduced a bias. This could be improved by iterative citation tracing, where
citing papers identified as central to explanation systems could be added to core
explanation papers; and by backward reference searching (chain searching) to discover root
explanation papers of core explanation papers. Such an iterative process would provide
further evidence for gaps or collaborations between the various communities. Our topic
modeling led to many nodes that we filtered out to improve readability and interpretability.
However, this filtered out interesting but less frequent topics, such as the need for
intelligibility in energy-efficient smart homes to explain smart thermostats or smart laundry
agents. Nevertheless, the semi-automated analysis methods allowed us to capture
important trends and relationships across many papers and spanning many domains. We
assume that the citing paragraph is relevant to the core paper that it cites. While this
assumption could be questioned (e.g., Marshall reported on shallow citation practices in CHI
papers]) the bibliometric analysis method that we employed, including the use of citation
network visualizations, is well-established. Given the scale of our citation network spanning
multiple domains, we also expect that our results will be less sensitive to noise.

CONCLUSION
Recent advances in machine learning and artificial intelligence have far-reaching impacts on
society at large. While researchers in the ML and AI communities are working on making
their algorithms explainable, their focus is not on usable, practical and effective
transparency that works for and benefits people. Given HCI’s core interest in technology
that empowers people, this is a gap that we as a community can help to address, to ensure
that these new and powerful technologies are designed with intelligibility from the ground
up. From a literature analysis of 12,412 papers citing 289 core papers on explainable
systems, we mapped the research space from diverse domains related to explainable
systems. We revealed fading vs. burgeoning trends and connected vs. isolated domains, and
from this, extracted several implications, future directions and opportunities for HCI
researchers. While this is only a first step, we argue that true progress towards explainable
systems can only be made through interdisciplinary collaborations, where expertise from
different fields (e.g., machine learning, cognitive psychology, human computer interaction)
is combined and concepts and techniques are further developed from multiple perspectives
to move research forward.
CONTRIBUTION

 KHALIL-UR-REHMAN (Page 1 - 2)
 IBRAR-UMAR (Page 3 - 4)
 AIMAN MUNIR (Page 5 - 7)
 HIRA BATOOL (Page 7 - 9)
 NEHA RAJA (Page 10 - 11)

11

Anda mungkin juga menyukai