···582582 parent:
583583 type: periodical
584584585585+ppo:
586586+ type: article
587587+ title: Simple Policy Optimization
588588+ author: Xie, Zhengpeng
589589+ date: 2024-01
590590+ url:
591591+ value: http://arxiv.org/abs/2401.16025v2
592592+ date: '2025-10-16'
593593+ serial-number:
594594+ arxiv: 2401.16025v2
595595+ abstract: PPO (Proximal Policy Optimization) algorithm has demonstrated excellent
596596+ performance in many fields, and it is considered as a simple version of TRPO (Trust
597597+ Region Policy Optimization) algorithm. However, the ratio clipping operation in
598598+ PPO may not always effectively enforce the trust region constraints, this can
599599+ be a potential factor affecting the stability of the algorithm. In this paper,
600600+ we propose Simple Policy Optimization (SPO) algorithm, which introduces a novel
601601+ clipping method for KL divergence between the old and current policies. Extensive
602602+ experimental results in Atari 2600 environments indicate that, compared to the
603603+ mainstream variants of PPO, SPO achieves better sample efficiency, extremely low
604604+ KL divergence, and higher policy entropy, and is robust to the increase in network
605605+ depth or complexity. More importantly, SPO maintains the simplicity of an unconstrained
606606+ first-order algorithm. Code is available at https://github.com/MyRepositories-hub/Simple-Policy-Optimization.
607607+ parent:
608608+ type: periodical
609609+