Chao Huang

Chao Huang

Ph.D. Candidate

Reinforcement Learning -- Dueling DQN

paper tittle: Dueling Network Architectures for Deep Reinforcement Learning.

classification

model-free, off-policy, value-based, continuous state, discrete action space.

Problem

In some states, it is of paramount importance to know which action to take, but in many other states the choice of action has no repercussion on what happens. Proposed method has issue of identifiability.

reasones

xxx

problem's results

xxx

solution

The proposed network architecture, which we name the dueling architecture, explicitly separates the representation of state values and (state-dependent) action advantages. This dueling network should be understood as a single Q network with two streams that replaces the popular single-stream Q network in existing algorithms. The dueling network automatically produces separate estimates of the state value function and advantage function, without any extra supervision. To address this issue of identifiability, we can force the advantage function estimator to have zero advantage at the chosen action.

benefits analysis

the dueling architecture can learn which states are (or are not) valuable, without having to learn the effect of each action for each state. This is particularly useful in states where its actions do not affect the environment in any relevant way.

Experiment environment

Atari 2600 platform.

Metrics

rewards from games.

^_^