Chao Huang

Chao Huang

Ph.D. Candidate

Reinforcement Learning -- DQN

paper tittle: human-level control through deep reinforcement learning.

classification

model-free, off-policy, value-based, continuous state, discrete action space.

The goal is to find one optimal policy that maximizes cumulative future reward: $Q(s,a)=max_{\pi}E[r_t + \gamma r_{t+1} + \gamma^2 r_{t+2} + …]$. In this paper, a deep Neural Network was used to approximate the optimal action-value function.

The q-learning update uses the following loss function: $L(\theta)=E_{(s,a,r,s’) \in U(D)}[(r + \gamma max_{a’}Q(s’,a’; \theta’) - Q(s,a; \theta))^2]$.

Experiment environment

Atari 2600 platform.

Metrics

rewards from game.

^_^