Gamification of Neuropsychological Decision Making Experiments
This project is already assigned.
1. Motivation
Reinforcement Learning (RL) is an adaptive process in which an agent uses its past experiences to improve the outcomes of its future decisions. It could be described as a framework where agents interact with an environment by observing its current state, taking actions, and observing new states and rewards returned by the environment in order to maximize their cumulative rewards over all their actions. The probabilities of transition from one state to another could described by a model, in this case it is a model-based RL algorithm. When this model is not available, it is then called model-free algorithm [1] [2].
Reinforcement Learning can also be described as sequential decision-making under uncertainty with the objective of optimizing reward. Sequential decision-making assumes that the environment is Markov, meaning the future of the system is only dependent on the current state and the next actions taken and not on its history. There are different types of sequential decision-making environments:
- In so-called Bandits, actions do not influence future observations and rewards are immediate.
- In Markov Decision Processes and Partially Observable Markov Decision Processes, actions do influence future observations and rewards may be delayed.
In the second case maximizing rewards may mean choosing actions with lower rewards in the short term to reach higher rewards later. This may be easier to achieve using a model-based approach.
In computational neuroscience research, reinforcement learning is used to formalize and understand people’s behavior by collecting data from human participants in RL environments. This allows to research whether people are model-based learners or model-free learners in the specified environment. Furthermore, these techniques permit to uncover the strategies and policies adopted by the participants. Experiments where human participants interact with an RL environment model, trying to maximize their cumulative rewards, can be easily designed. Normally the human participant is shown a collection of cues representing the current state and the available actions. The participant has to choose one of these displayed actions in a timely manner. After his choice, he receives a reward and the new state is presented to repeat this simple cycle. Often the reward is monetary to keep the participant motivated throughout the experiment. Usually the participant has to complete around 200 cycles to make the collected data statistically relevant for research.
This thesis will introduce and implement a framework to translate these experiments into computer games where the gameplay results in acquiring equivalent data with at least equal quality compared to the original experiment. The computer game is designed using rigorous game design methodologies and games engineering techniques to provide an engaging environment for the player, thereby increasing the participants’ compliance with the study. Furthermore, the game design will ensure that the RL elements are translated into game elements thus the gaming experience will result in a learning experience. It can also be easily disseminated as a web application or android app due to being developed using a game engine, therefore reaching a larger number of potential participants.
2. Related Work
In order to examine the relationship between dopamine levels an model-based vs. model-free control/behavior, Deserno et al. [4] let participants interact with two sequential pairs of cues they have to choose from and then receive or not a monetary reward.
In their experiment, the participant is shown an image of two visual cues on two sides of a screen. The participant then has to choose a side in less than 2 seconds. The choice is displayed for 1.5 seconds, upon which a new set of two visual cues is presented for the participant to choose from in under 2 seconds. After displaying the second choice for 1.5 seconds, the reward, a visual representation of the monetary reward earned or nothing, is displayed for 1.5 seconds. This process is repeated multiple times. There are two different sets of two cues for the second choice which are selected with different probabilities depending on the first choice. Each choice has an associated set with a 70\% probability of selecting the set associated with the first choice and a 30\% probability of selecting the other set. The reward probabilities are changed over time to facilitate the continuous updating of action values. Deserno et al. show that participants use a balance of model-based and model-free control, and they also show a correlation between higher ventral striatal presynaptic dopamine levels and a bias toward more model-based control.
This thesis will mimic the environment model and experiment setup they used as the starting point for the conversion into a computer game, and try to produce comparable data.
3. Methodology
This section will first describe the theoretical foundation of the framework in subsection 3.1 , then going into more detail on the framework in 3.2. 3.3 defines the parameters of the replicated experiment and 3.4 explains the game concept derived from those parameters. In subsection 3.5 the process of implementing the game and making it engaging is described, and finally 3.6 lists the results obtained by playtesting the game with a group of participants.
3.1 Mapping between RL elements and Game Elements
The theoretical basis for the conversion between experiments on RL and computer games is a direct mapping between the elements of RL and the elements of computer games.
The environment that responds to the actions taken by the agent with a reward and a new state corresponds to the game system. The environment model is directly used by the game to decide which rewards to give and how to change the state.
The state in RL corresponds to the game state. Everything displayed to the player is a representation of the current state. In particular, the cues/stimuli presented to the experiment participants correspond to certain (visual) features of this representation of the game state.
The human participant in the experiment corresponds to the human player of the game.
The actions the RL agent can take correspond to game mechanics. Every available action has an equivalent mechanic, e.g. choosing between two stimuli being equivalent to choosing between two options in the game. For any kind of action the agent can take, a corresponding game mechanic can be found in the Unifying Game Ontology — a collection of all abstract game elements and their relations used in the game design process [3].
The rewards in games correspond to evaluations given after completing or failing a level, collectibles encountered while playing the game, a satisfying/unsatisfying ending to the game. The reward can take many different forms and games engineering has many approaches to reward or punish the player in different ways that, in the end, all have an emotional meaning and are therefore effective.
3.2 Details of the Framework
To convert an experiment into a corresponding game, the first step is to rigidly define the parameters of the experiment.
The environment model has to be represented as a collection of states, actions, and rules on which state and action pairs are mapped to which reward and new state. These rules may be probabilistic as long as they guarantee a result when evaluated. This thesis proposes to represent this environment model as the transition graph of a finite automaton, which lends itself to a simple description and clearly defined evaluation rules. The nodes of this graph represent states and the edges represent actions, probabilities, and rewards.
In addition to the finite automaton description, the experiment parameters contain a mapping between states, actions, and the cues that should be displayed, and additional constraints on the human player that don’t have an equivalent in RL like for example a time limit for each decision.
Given these experiment parameters and the mapping between RL elements and game elements, a game is designed that fits all parameters. Usually, the designed game applies to a whole class of experiments. In particular, the proposed game applies to all experiments that have a similar finite automaton description, use the same cues, and only have one additional constraint of the same type, e.g. a different time limit for the players decision.
3.3 The Chosen Experiment
This thesis will replicate the experiment used in [4]. In particular the changing reward probabilities over time will be omitted to keep the experiment setup simple.
3.4 The concrete game design
Game Design following the MDA framework [5] starts with selecting the mechanics. The proposed game has the simple mechanic of choosing one of two options represented by a cue before a timer runs out. If the timer does run out the player has to restart the game. After the first choice, a second choice is presented, using the same mechanic. After the second choice, a reward is presented to the player. This cycle is repeated while cumulative rewards are displayed to the player. From this simple mechanic, the dynamics and aesthetics of the game follow. This thesis proposes a minimalistic visual style to not distract from the cues. The story of the game could be as follows. Some game characters (applicants) want to join a gang. As a test of courage, they have to, one after another, steal some money from the gang’s headquarters and escape while not being caught by the gang members. As a reward for a successful escape, the character can keep the money and is allowed to join the gang, while being rejected otherwise. This results in a feeling of success or failure. During this escape the character is chased by some people, resulting in the time limit for the decisions, and encounters two blockades of two people each who have a symbol on their torso (the cue). The player has to choose which of those two people he will try to circumvent, with the first blockade always being overcome and the second one only if the internal environment model dictates a positive reward. After completing one test of courage, the player starts a new run, controlling the next applicant that wants to join the gang. There is a total score over all runs of how many applicants were successful, resulting in a cumulative reward the player needs to maximize. Only if the player completes 200 runs/trials, the produced data is statistically relevant. Therefore different game concepts will be tested in the iterative implementation phase to choose the one that best motivates the player using the games engineering perspective.
3.5 Implementing the game
The game will be implemented starting with a programmatical representation of the experiment parameters and a proof of concept implementation of the game mechanics. This minimum viable product will then be improved with an iterative approach to improve its engagement and emotional impact on the player by for example telling the story described in \ref{design}, changing the genre and story, adding animations and visual effects, and adding audio.
3.6 Collecting data by play testing
When the game is developed another improvement iteration using human playtesters playing the game. These playtesting sessions can be used to collect data from the players and compare it to the results obtained by the original experiment [4].
4. Schedule
Literature Review: Weeks (1 - 3)
- Research on Reinforcement Learning methods and concepts
- Research & Definition of RL elements suitable for the mapping approach
- Research & Classification of Games using RL elements
Conceptualization and Iterative Development: Weeks (4 - 9)
- Brainstorming and prototype development
- In this phase, we will have the prototypes tested as we progress
Thesis writing: Weeks (10 - 12)
- In this phase, all the concepts that were explored will be analysed and synthesized in the utmost detail
- Testing will continue during this phase and the results are added to the writeup
5. References
[1] Sutton, Richard S., and Andrew G. Barto. Introduction to reinforcement learning. Vol. 135. Cambridge: MIT press, 1998.
[2] Szepesvári, Csaba. “Algorithms for reinforcement learning.” Synthesis lectures on artificial intelligence and machine learning 4.1 (2010): 1-103.
[3] Debus, Michael S. Unifying game ontology: a faceted classification of game elements. IT University of Copenhagen, Pervasive Interaction Technology Lab, 2019.
[4] Deserno, Lorenz, et al. “Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making.” Proceedings of the National Academy of Sciences 112.5 (2015): 1595-1600.
[5] Hunicke, Robin, Marc LeBlanc, and Robert Zubek. “MDA: A formal approach to game design and game research.” Proceedings of the AAAI Workshop on Challenges in Game AI. Vol. 4. No. 1. 2004.
Contact Persons at the University Würzburg
Prof. Dr. Sebastian von MammenGames Engineering, Universität Würzburg
sebastian.von.mammen@uni-wuerzburg.de
Mounsif Chetitah (Primary Contact Person)
Games Engineering, Universität Würzburg
mounsif.chetitah@uni-wuerzburg.de
Prof. Dr. med. Lorenz Deserno
Cognitive and Computational Neuroscience in Developmental Psychiatry, Universitätsklinikum Würzburg
Deserno_L@ukw.de
Maria Waltmann
Cognitive and Computational Neuroscience in Developmental Psychiatry, Universitätsklinikum Würzburg
Waltmann_M@ukw.de