Home Articles Trust and explainability for artificial intelligence

Trust and explainability for artificial intelligence

Artificial Intelligence (AI) has regained centre stage in the last few years. Military application utilizing geospatial datasets and deep learning techniques based on convoluted neural networks (CNN) have also matured to a deployable state in developed countries. These applications are capable of providing a suggestion of tactical importance in real-time. The applications may be in control of an autonomous platform and might therefore be executing the decisions in real-time. In these scenarios, which may involve causing destruction and loss of lives, it is very important that the human operator or analyst has confidence over the suggestion made by the application or the decision taken by it. Deep neural networks are particularly opaque in their internal working and it is difficult to understand why a particular suggestion/decision has been made. Advances in AI, geospatial technologies and deep CNN can only be adopted by military only if the algorithm can be trusted and it can provide an explanation for its actions.

One of the most glaring examples of lack of trust in application of AI and geospatial technology is the risk assessment of an individual for committing crimes used in a number of states of USA. Many a US states use risk assessment software which have come under scrutiny and are being criticized for racially discriminating the individuals being assessed. A recent report by a US-based publication ProPublica [1] has found out that one of the most widely used software COMPAS mislabels blacks as high risk twice as much as it mislabels whites. COMPAS (Correctional Offender Management Profiling for Alternate Sanctions) is based on major theories of criminality, criminal personality, social isolation etc. In spite of a sound theoretical basis, the software is racially biased. Most probable cause could be the underlying data which was used as training data for the program. Risk assessment has been carried out in many states since 1970 and for many years race, poor neighborhood etc. were widely used as high risk indicator. This practice later became unacceptable and was stopped being used in 80s. Since the feature being used is race and race correlated features, AI-based algorithms also trained themselves to be racially biased. The challenge today is to maintain the performance of the algorithm while ignoring race, gender and other race correlated data like neighbourhood. It is important to have such algorithms to be fair as the risk score is referred by a judge while deciding on the quantum of bond or punishment. The implications of a wrong decision are very grave; fairness of the judiciary is dependent on the fairness and explainability of the algorithm.

The software COMPAS is proprietary software and the algorithm is not disclosed by the designers. There are other software also like LSI-R, PSA and LARNA which are used by various law enforcement authorities, but surprisingly none of them reveals the algorithm or provide any explanation of the risk scores they assign. This opaqueness in the algorithms has made them target of the civil rights activists.

Predictive policing [2] is also now being used by many law enforcement agencies. It involves predicting crime, their location, offenders, perpetrators’ identity and victims. One of the available software is PredPol. Narcross, a small town of about 16,000 people in Georgia, USA, claimed to have observed a 15-30% reduction in crime [3, 4] after adopting PredPol. Concerns of racial profiling were raised by the media in this case too. The software again is proprietary and does not provide any explanation for the prediction it makes. Although, some amount of human observable clustering can be seen in prediction of location of high crimes possibility [5] which are positively correlated to locations of previous crimes committed.

Visual perception is one field which has come to limelight due to advancements in deep learning. ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is being organized annually since 2010 and image classification and object detection scores have been steadily rising. The best score for object detection in 2014 was 0.37 while in 2016 it was 0.66 [6]. Visual perception has perhaps benefited the most to geospatial applications like land usage classification, scene segmentation, geologic feature classification, crop yield prediction, surface water estimation, population density estimation, etc. Military applications like satellite image analysis and autonomous surveillance etc. too can get benefited from advancement in this field. With little explanation of working of these algorithms, they are unlikely to gain trust of the users or military commanders. The result of misclassification in military scenario can be enormous. For example, misidentification of target can be catastrophic.

Autonomous driving cars, flying drones, underwater vehicles and robots can be expected to be deployed in future battle field, for national security, in relief and rescue operations etc. They will be thoroughly tested before deployment; however, the actual conditions are likely to present unanticipated scenarios which the reinforcement learning modules of these autonomous agents may not have come across earlier. The operator in field presently does not understand why a decision has been taken by the autonomous agent.

Attempts have been made to use computers as expert system in medical domain since 1970s. Deep neural nets are now being used to help physicians carry out more accurate diagnosis. However, physicians generally don’t accept an advice of a system without explanation [7, 8]. Seeking explanation is spontaneous and fundamental activity for human understanding. In situations involving decision making of significant value, it will be necessary for the humans responsible for the operations and upkeep of the AI applications to fully understand and trust the application. Explanation helps in the core cognitive issues of the humans such as induction, learning and conceptual representation [9] hence are intrinsically necessary for trust.

The five principles considered core to instilling trust and accountability in an algorithm are accuracy, auditability, responsibility, fairness and explainability [10] are:

  • Accuracy: AI algorithms don’t predict with 100% probability, there will always be possibility of incorrect classification or inaccurate regression. The application must be able to account for all possible uncertainties and present it in an understandable manner.
  • Auditability: An authorized third party should be able to analyze the behavior and internal steps to ascertain fallibilities which may help further improve the algorithms.
  • Responsibility: Any application based on AI need to have a team/ individual who can take the ownership of the application and in case of undesired results should be empowered to take corrective actions.
  • Fairness: In case of application handling public services like risk assessment of an individual by law enforcement training data may be biased already and measures need to be taken to avoid such scenarios. In case of applications concerning military and national security, fairness will also imply ‘fair’ resource allocation which can be presented to the human decision maker.
  • Explainability: The authorized stakeholders should be able to understand ‘how’ and ‘why’ a particular decision has been arrived at by the algorithm. The explanations also need to be presented in a manner understandable by all the stakeholders and not just the technical members of the team.

Military decision making reflects OODA cycle as proposed by Col (Retd) John Boyd of US Air Force [11]. OODA stands for Observe – Orient – Decide – Act which has become basic tenets of manoeuvre warfare. Any autonomous agent or an AI-based system will also have to fit in the OODA loop. A trustworthy algorithm whose action can be explained will be accepted easily and quickly by military commanders. It can be seamlessly integrated in the decision making which can make own OODA loop faster than the enemy’s. Like an autonomous agent, whose actions can be explained and audited, can be deployed in minimum time without worrying about its outcome. Being able to audit the actions of the autonomous agents at a later stage is also important, especially in case there are any undesirable incidents like the Tesla car’s fatal crash while in autopilot mode and a car crash caused by Google’s autonomous car.

Google autonomous car
Google autonomous car

If law enforcement application for predictive policing and risk assessment can be made trustworthy by making them fair, auditable, accurate and explainable, backed up with a responsible team of developers, its adoption can increase in US with more public support rather that public criticism. With successful deployment in US other countries and maybe India can also adopt AI technology in low enforcement. Effective use of AI technology, with greater public acceptance, can possibly lead to lower crime rates and higher conviction rates.

Medicine is another field which can immensely benefit if physicians can trust the AI systems like computer assisted diagnosis which can possibly lead to better healthcare with timely medical intervention if trusted by physicians. Remote diagnostics can be augmented with such AI applications which can help proliferate specialist medical services in rural areas. Epidemiology can take help of AI applications to predict spread of infections and take timely actions to prevent its spread. Such actions may involve large scale public involvement hence it is very important that predictions are accurate, explainable and auditable and that public health executives trust these applications.

There is a growing interest towards algorithms and trust, primarily from the perspective of privacy and fairness. Forums like ‘Data Transparency Lab’ and ‘Fairness, Accountability and Transparency in Machine Learning (FAT-ML)’ have been organising annual events to make the researchers aware about the issues concerning trust in algorithms and explore ways to find computationally rigorous methods to address the issues [12]. The increase in the interest in FAT-ML can be gauged from the fact that there were 13 papers submitted in FAT-ML 2016 as against only four in 2015 and none in 2014. Earlier work to explain the recommendations and actions of AI application has been in the field of medicine [13] and tactical simulation [14] for training. More recent approaches to understand image classification with description of the features of the image to explain the classification has been used [15].

Interest in Explainable AI (XAI) will continue to grow as interest in military application of AI is also growing. China has been aggressively investing in AI research over past several years possibly as part of their programmes called 863 Programme – National High Tech R&D Programme and Project 985 to promote and develop the Chinese higher education system. As per SCImago country ranking, China has been publishing more papers than USA in AI category of subject area Computer Science since 2007. In this category China has published 87,000 papers between 1996 and 2015, USA 71,000 and India 14,000 papers in the same period [16]. Imagenet Large Scale Visual Recognition Challenge (ILSVRC) has seen steadily increasing participation of teams from China. In ILSVRC-2015, 25 out of 67 teams were from China [17] including four out of five winning teams. In ILSVRC-2016, 33 out of 82 teams were from China and all the winning teams were from China [18]. It doesn’t need lot of imagination to figure out military potential of algorithms employed in ILSVRC. These algorithms with very little modifications can be used in surveillance, monitoring and autonomous weapons platform like UAVs, self-driving land vehicles and robots.

There is tremendous interest in the field of XAI and a lot of work is being done, however, there are difficult challenges also. It is ironic that the problem of explainability in AI can be considered to be result of the success of AI. In earlier days of AI with human readable reasoning [13] it was possible to understand the steps. With advent of deep neural networks, reinforcement learning, support vector machine, random forest etc. it became more difficult to understand internal working of the algorithms. Poor quality of the underlying data used for training is also responsible for reduced trust in algorithms. If there are human introduced biases in the data it will reflect in the output of the algorithms too, for example, risk assessment of individuals for law enforcement is found to be racially biased [1] because the training data was from the period when racial profiling was in vogue. With advent of GPUs and ASICs for increased performance of deep learning algorithms it may become even more difficult to incorporate explainability in the algorithms.

More work is certainly required to incorporate explainability in algorithms or algorithms need to be designed with explainability intrinsically built into them. This approach may require revisiting reasoning frameworks and incorporating them into learning mechanism like case based reasoning approaches. In certain cases where it is possible to carry out automated data collection Internet of Things (IoT) can possibly help. IoT sensors can collect more accurate and detailed data which can be used to produce more accurate models.

DARPA has also shown interest in incorporating explanation in AI based algorithm and has sought proposals on XAI. The program is expected to complete by 2021 [19]. Deep Neural Networks and other AI applications utilizing geospatial datasets and providing geo-intelligence are going to proliferate in almost every domain, be it military, medical, law enforcement, entertainment  or scientific research. These applications can be best utilized only after the users can fully trust them. While research is still going on to enhance performance in various fields of AI, it will be prudent to think about addressing issues related to trust and explainability ab initio. A collaborative and concerted effort will lead to trustworthy and explainable AI applications which will solve a large number of humanity’s problem and protect us against internal and external threats.

References:

[1]   Machine Bias; ProPublica; Julia Angwin; 23 May 2016.

[2]   Predictive Policing; RAND; Walter L. Perry et al; 2013.

[3]   The Future of Crime Fighting; The World Post; 28 March 2016.

[4]   https://www.predpol.com/results; accessed 07 December 2016.

[5] Self-Exciting Point Process Modeling of Crime, GO Mohler et al; Journal of American Statistical Association; March 2011.

[6] ImageNet Large Scale Visual Recognition Challenge, Olga Russakovsky et al; International Journal of Computer Vision; December 2015.

[7]   An analysis of physician attitudes regarding computer-based clinical consultation systems; Teach RL, Shortliffe EH; Comput Biomed Res; 1981.

[8]   Decision support systems for clinical radiological practice; SM Stivaros et al; The British Journal of Radiology; 2010.

[9]   The structure and function of explanations; Lombrozo, Tania; Trends in Cognitive Sciences; October 2006.

[10]   How to hold algorithms accountable; Nicholas Diakopoulos et al; MIT Technology Review; November 2016.

[11]   A Critique of the Boyd Theory; Major Robert Polk, United States Army; Dec 1999.

[12]   https://www.fatml.org/ accessed 01 December 2016.

[13]   A Model of Inexact Reasoning in Medicine; Edward H. Shortliffe; Mathematical Biosciences; April 1975.

[14]   Agents that Learn to Explain Themselves; W. Lewis Johnson; AAAI Proceedings; 1994.

[15]   Generating Visual Explanations; Lisa Hendricks et al; Computer Vision; September 2016.

[16]   https://www.scimagojr.com/countryrank.php?area=1700&category=1702 accessed 01 December 2016.

[17]   https://image-net.org/challenges/LSVRC/2015/results accessed 01 December 2016.

[18]   https://image-net.org/challenges/LSVRC/2016/results accessed 01 December 2016.

[19]   Explainable Artificial Intelligence (XAI); DARPA-BAA-16-53; Aug 2016.