Continue rapport · gwen.works/internshiplogs@4d5217e

gwen.works / internshiplogs

fork

this repo has no description

fork

+5 -1

2 changed files

expand all

rapport

context.typ

main.pdf

+5 -1

rapport/context.typ

··· 183 183 === Descente de gradient 184 184 185 185 186 - ==== _Deep Q-Network_ 186 + ==== _Q-learning_ 187 187 188 + La récompense associée à un état et une action, appelée $Q$ ici pour "quality" #refneeded, est mise à jour ainsi: 188 189 190 + $ 191 + Q(S_t, A_t) <- (1 - alpha) underbrace(Q(S_t, A_t), "valeur actuelle") + alpha ( underbrace(R_(t+1), "récompense") + gamma underbrace(max_a Q(S_(t+1), a), "récompense de la meilleure\naction pour l'état suivant") ) 192 + $ 189 193 190 194 ==== _Trust Region Policy Optimization_ 191 195

rapport/main.pdf

This is a binary file and will not be displayed.

Configure Feed

Configure Feed