Presentation | 2009-09-14 Parameter acquisition of an evaluation function for games by reinforcement learning Yasuhiro TAJIMA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | On finite two-person zero-sum perfect-information games, we can find the best move by minmax search on the game tree with an evaluation function. In this paper, we propose a parameter acquisition method of an evaluation function by Q-learning. In our method, there are three variations of rewards on a state transition: (1) the winning rate of random simulations, (2) the winning rate of the output of UCB1 algorithm, and (3) the winning rate of UCT algorithm. Then, we evaluate the effectiveness of our method in experiments. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Q learning / game tree / evaluation function / k-armed bandit problem |
Paper # | COMP2009-28 |
Date of Issue |
Conference Information | |
Committee | COMP |
---|---|
Conference Date | 2009/9/7(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Theoretical Foundations of Computing (COMP) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Parameter acquisition of an evaluation function for games by reinforcement learning |
Sub Title (in English) | |
Keyword(1) | Q learning |
Keyword(2) | game tree |
Keyword(3) | evaluation function |
Keyword(4) | k-armed bandit problem |
1st Author's Name | Yasuhiro TAJIMA |
1st Author's Affiliation | Okayama Prefectural University Faculty of Information Engineering() |
Date | 2009-09-14 |
Paper # | COMP2009-28 |
Volume (vol) | vol.109 |
Number (no) | 195 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |