Feb 6, 2019 - obtained. On-line decision making and causal learning. In (Gonzalez-Soto, Sucar, and Escalante 2018) a decision- making procedure was ...

0 downloads 0 Views 280KB Size

arXiv:1902.02279v1 [cs.AI] 6 Feb 2019

Coordinaci´on de Ciencias Computacionales, ´ Instituto Nacional de Astrof´ısica Optica y Electr´onica (INAOE) Luis Enrique Erro 1, Santa Maria Tonanzintla, Puebla, M´exico.

Abstract We define a Causal Decision Problem as a Decision Problem where the available actions, the family of uncertain events and the set of outcomes are related through the variables of a Causal Graphical Model G. A solution criteria based on Pearl’s Do-Calculus and the Expected Utility criteria for rational preferences is proposed. The implementation of this criteria leads to an on-line decision making procedure that has been shown to have similar performance to classic Reinforcement Learning algorithms while allowing for a causal model of an environment to be learned. Thus, we aim to provide the theoretical guarantees of the usefulness and optimality of a decision making procedure based on causal information.

Introduction Decision making under uncertainty is a fundamental part of intelligent reasoning (Lake et al. 2017) and many real-world applications rely on decisions made by an autonomous agent, such as self-driving cars. Current decision making methods rely on associative methods, which find only statistical patterns in data. On the contrary, causal knowledge allows both for planning and counterfactual reasoning as well as interpretability and explainability (Spirtes, Glymour, and Scheines 2000), (Woodward 2005), (Pearl and Mackenzie 2018).We propose in this work a way of considering causal information for decision making under uncertainty with rational preferences in such a way that optimal actions are chosen according to the principle of Maximum Expected Utility, which is the formal criteria for making choices under uncertain conditions if rationality is assumed (Bernardo 2000).

Rationality and Expected Utility Rationality in a Decision Making setting is defined axiomatically in a way that the preferences of a decision maker are logically consistent. If rational preferences are assumed, then it is known that the coherent criteria for making choices is the maximization of expected utility, either with respect to a known utility function and probability distribution or a pair of subjective objects (Bernardo 2000), (Gilboa 2009). Rational decision making has been the standard theory in economics both as a normative and a descriptive theory of c 2018, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved.

human behavior and it has been the subject of multiple debates (Tversky and Kahneman 1974), (Kahneman, Slovic, and Tversky 1982). In this work we pretend to take a normative view for a rational decision maker who faces an uncertain environment which is controlled by some unknown causal mechanism.

Optimal Policies Given a Reinforcement Learning Problem defined over a Markov Decision Process, a policy is a function from the space of states to the space of actions which is interpreted as what should an agent do in a given state (Sutton and Barto 1998). An optimal policy is a policy which is optimal in the sense of achieiving the maximum possible expected reward. Optimal policies can be characterized by the Bellman equations (Puterman 2005), from where it can be shown to beequivalent to finding the optimal action in the sense of the maximum expected utility (Webb 2007).

Related Work Human use and learning of Causal Relations has been extensively studied by Cognitive Scientists. In particular, (Hagmayer and Sloman 2009), (Wellen and Danks 2012) (Hagmayer and Meder 2013) show that human beings conceive their actions in their environment as interventions over it and they are able to learn, use, and modify previous causal knowledge during a sequential decision process. From the Machine Learning point of view, (Lattimore, Lattimore, and Reid 2016) consider a bandit problem where the actions available to an autonomous agent are interventions over a known causal model. In their work it is required that the causal model is known, an assumption later relaxed by (Sen et al. 2017) who considers as unknown only a part of the causal model. Our formulation of a Causal Decision Problem attempts to give a framework for an agent to learn optimal actions where a causal model controls his environment and the agent is aware of this.

Causal Decision Problems We define a Causal Decision Problem under Uncertainty (CDPU) as a tuple (A, E, C, G, ) where (A, E, C, ) is a classical Decision Problem under Uncertainty (Bernardo

2000) and G is a Causal Graphical Model (Sucar 2015) such that the set of available actions A and the set of outcomes C are related through the variables of the Causal Model G; i.e., the events in the family E correspond to variables in G. It is assumed that the agent does not know the Causal Model G, which is equivalent to not knowing the probabilities of the events E ∈ E. The model G is also assumed to remain fixed and to be invariant under interventions (Woodward 2005) and to satisfy the conditions expressed in (Spirtes, Glymour, and Scheines 2000). The variable in G which encodes the consequence of the action taken by the agent will be referred as target variable since it is the variable where the agent whishes to obtain a desired result. In this way, in a CDPU we have a rational agent who chooses an action a among the many available in A, then this action will produce some random effect in the environment which will cause a certain consequence, or outcome c ∈ C. Since

Proposed Solution Since rationality is assumed we must seek to maximize the expected utility of the agent in terms of his current knowledge, which is expressed as a (subjective) probability distribution. Using the awareness of the agent about a causal mechanism governing the environment, intuition encourages to use causal relations to cause some desirable action, as expressed by (Joyce 1999) Consider a Causal Decision Problem as stated above where the target variable, call it Y , takes its values in the set {0, 1} and without loss of generality assume that 1 is the desired output for the agent, then he must choose the action a∗ ∈ A such that

The actions learned after a series of decision rounds obtained similar performance (in terms of average reward) as an agent learning using the classical Q-Learning procedure. Experimentally, this shows that causal information allows an agent to learn an optimal action, in the sense of expected utility, as well as learning a causal model of the environment.

Experiments In (Gonzalez-Soto, Sucar, and Escalante 2018) a test scenario about a medict trying to learn an optimal treatment for a sick patient was used in order to show how causal information could effectively guide a decision making process while also allowing for learning of a causal model. We reproduce here the results obtained in (Gonzalez-Soto, Sucar, and Escalante 2018), which show the average reward of a causal agent, where the agent knows the structure of the graphical model and helds beliefs about the parameters of the true causal model. The beliefs of the causal agent are used in each decision round as if they were the truth about the causal relations and used as stated above. Comparison against an agent simply choosing at random is also shown. In Figure 1 we observe the average reward obtained by the three agents in 100 rounds, where our algorithm slightly outperforms Q-Learning.

P (Y = 1|do(a∗ )) ≥ P (Y = 1|do(a)) for all a ∈ A. Since the action chosen is the action with the highest probability of producing the most desired action, then it is the action that maximizes the expected utility for the agent. If an action a0 ∈ A yielded a higher expected utility, that would imply that it has higher probability of causing the same desired action, but this is not possible because of how a∗ is obtained.

On-line decision making and causal learning In (Gonzalez-Soto, Sucar, and Escalante 2018) a decisionmaking procedure was proposed using the Proposed Solution together with Bayesian belief updating procedure in order to learn an optimal action while acquiring a causal model in the environment and using the current causal knowledge to make choices, this was applied in the simpler case where the decision maker knows the structure of the model G. In the referred work, a decision maker held beliefs about the causal information of the environment, which were encoded into probability distributions. Those beliefs were used as if they were the true model governing the environment in order to choose an action according to the Solution proposed here. Beliefs were updated in a Bayesian way after observing the causally produced outcome, or consequence, of the action chosen by the agent.

Figure 1: Average reward obtained in each round for each agent In Figure 2 we observe the average reward obtained by the three agents in 200 rounds. The average reward obtained is very similar for Q-learning and our algorithm.

Future Work Numerical results show that the trayectories of the average rewards both from our Causal Agent and the Q-agent seem to stay together after some number of rounds, while leaving behind the random-agent. In order to get a valid form of concurrent validation it is required that this behavior will remain like that from a certain point. We thus state a conjecture which is yet to be proven:

the agent and to provide theoretical convergence results both to the true causal model and the optimal action.

References

Figure 2: Average reward obtained in each round for each agent Conjecture 1. Let (X1 , X2 , ...) ∈ R∞ the rewards obtained by a decision-making procedure which is known to converge to the max expected utility (or an optimal policy), then, if (Y1 , Y2 , ...) ∈ R∞ are the rewards obtained by a decisionmaking procedure that uses causal information in the way we propose, then for all ε > 0 there exists an Nε ∈ N such that |Xt − Yt | < ε for any t > Nε .

Conclusions We have proposed an optimality criterion for decision making under uncertainty when the environment where the agent is situated is governed by an unknown causal mechanism. This criteria, when used to build a decision making procedure yields similar performance as classical algorithms which aim towards the same objective: maximizing expected utility, thus showing that causal information, and causalbased decision making is at least as useful as associativebased case, while it also allows to learn a causal model of the environment. The experiments shown serve as a form of concurrent validation, where our proposed method is compared to a decision-making method that it is known to learn optimal policies (i.e., maximize expected utility) Learning a causal model is useful because of the interpretability and explainability that it provides when analyzing why a particular decision was made. For the hypothetical scenario used in the experiments, the causal model allows for further inquiries about the choices made by the medic. As (Pearl and Mackenzie 2018) mention, the three levels of causal learning are observing, intervening and counterfactual reasoning. Our proposed decision-making method based on the solution stated above allows for each of the three levels to be used. First of all, it allows to observe (and learn) of effects of interventions. In second place, it allow to intervene, and in third place to have the ability of explaining why a particular choice was made in terms of the effects it would produce given a certain level of causal knowledge. It is left as future work to provide a decision making procedure when the causal model is completely unknown for

Bernardo, J. 2000. Bayesian theory. Wiley Series in Probability and Statistics. Gilboa, I. 2009. Theory of Decision under Uncertainty. Cambridge University Press. Gonzalez-Soto, M.; Sucar, L. E.; and Escalante, H. J. 2018. Playing against nature: causal discovery for decision making under uncertainty. arXiv preprint arXiv:1807.01268. Hagmayer, Y., and Meder, B. 2013. Repeated causal decision making. Journal of Experimental Psychology: Learning, Memory, and Cognition 39(1):33. Hagmayer, Y., and Sloman, S. A. 2009. Decision makers conceive of their choices as interventions. Journal of Experimental Psychology: General 138(1):22. Joyce, J. M. 1999. The Foundations of Causal Decision Theory. Cambridge University Press. Kahneman, D.; Slovic, P.; and Tversky, A. 1982. Judgment under uncertainty. Technical report, Cambridge University Press. Lake, B. M.; Ullman, T. D.; Tenenbaum, J. B.; and Gershman, S. J. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences 40. Lattimore, F.; Lattimore, T.; and Reid, M. D. 2016. Causal bandits: Learning good interventions via causal inference. In Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; and Garnett, R., eds., Advances in Neural Information Processing Systems 29. Curran Associates, Inc. 1181–1189. Pearl, J., and Mackenzie, D. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books. Puterman, M. L. 2005. Markov decision processes: Discrete stochastic dynamic programming (wiley series in probability and statistics). Sen, R.; Shanmugam, K.; Dimakis, A. G.; and Shakkottai, S. 2017. Identifying best interventions through online importance sampling. In International Conference on Machine Learning, 3057–3066. Spirtes, P.; Glymour, C. N.; and Scheines, R. 2000. Causation, prediction, and search. MIT press. Sucar, L. E. 2015. Probabilistic graphical models. Advances in Computer Vision and Pattern Recognition. London: Springer London. doi 10:978–1. Sutton, R. S., and Barto, A. G. 1998. Reinforcement learning: An introduction. MIT Press. Tversky, A., and Kahneman, D. 1974. Judgment under uncertainty: Heuristics and biases. science 185(4157):1124– 1131. Webb, J. N. 2007. Game theory: Decisions, Interaction and Evolution. Springer Undergraduate Mathematics Series. Wellen, S., and Danks, D. 2012. Learning causal structure through local prediction-error learning. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 34.

Woodward, J. 2005. Making things happen: A theory of causal explanation. Oxford University Press.