A Q-Learning Algorithm with Continuous State Space

Abstract : We study in this paper a Markov Decision Problem (MDP) with continuous state space and discrete decision variables. We propose an extension of the Q-learning algorithm introduced to solve this problem by Watkins in 1989 for completely discrete MDPs. Our algorithm relies on stochastic approximation and functional estimation, and uses kernels to locally update the Q-functions. We give a convergence proof for this algorithm under usual assumptions. Finally, we illustrate our algorithm by solving the classical moutain car task with continuous state space.
Type de document :
Article dans une revue
Optimization Online, 2006
Liste complète des métadonnées

https://hal-ensta.archives-ouvertes.fr/hal-00977539
Contributeur : Aurélien Arnoux <>
Soumis le : vendredi 11 avril 2014 - 11:42:50
Dernière modification le : mercredi 6 décembre 2017 - 16:46:01

Identifiants

  • HAL Id : hal-00977539, version 1

Collections

Citation

Kengy Barty, Pierre Girardeau, Jean-Sébastien Roy, Cyrille Strugarek. A Q-Learning Algorithm with Continuous State Space. Optimization Online, 2006. 〈hal-00977539〉

Partager

Métriques

Consultations de la notice

106