![强化学习](https://wfqqreader-1252317822.image.myqcloud.com/cover/245/34233245/b_34233245.jpg)
上QQ阅读APP看书,第一时间看更新
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_566.jpg?sign=1739292841-P9WSvUSWJvoJsv0IkkF6wNJU0x3nTl2f-0-ada74199381eee664cb564f802ba2e15)
图4-1 宝盒
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_569.jpg?sign=1739292841-mGyTwJ1AmvbGuQIYI417ZLigddJzF7u0-0-3054c79377d50058484d4178a6211095)
图4-15 两种方法计算出来的最优值函数对比图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_572.jpg?sign=1739292841-WKRYrQjEZq7IXBK0m4U5ITjzXbjDRIVT-0-ea9d39c846775508846f47945ad1ba77)
图5-1 MC方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_575.jpg?sign=1739292841-so09opaoEMFvLaYSB6pAiSQHAm52mTDy-0-24705e08296e4c0943f05a271d9b94db)
图5-2 DP方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_578.jpg?sign=1739292841-tT7ttUd1p5lr2lg3RMb9ED6vJoQ6xhur-0-31dda88dbfc982f6eddc9f7b1c23e055)
图5-3 TD方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_581.jpg?sign=1739292841-5sEBj27x25BT5FCri4iszdNTOp7KENpj-0-74197c0600ee13447123ddcf4f54de0b)
图5-6 迷宫环境
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_584.jpg?sign=1739292841-kIWiL0vdmRceFfND7mHEWyjk83KSLDRn-0-4215723ad0ebf7f43ebf887c7b0c068f)
图5-7 Sarsa方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_587.jpg?sign=1739292841-6DuyCtA2QdiE762EosdEgmo4xsqOJCyv-0-f7c5939a4fba26908fceef68f376bcc9)
图6-12 风格子世界
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_591.jpg?sign=1739292841-23Kw9WoMn8O10PYjbeLWMQcf2XV6vgyo-0-62aa64b91bbc3fede04aacf9e547d325)
图6-13 后向Sarsa(λ)方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_595.jpg?sign=1739292841-NHpVAgSZot58HHg9nEN6GxJJxT20ndgQ-0-64270a8a42b17756420fd8261677a8f3)
图6-14 后向Sarsa(λ)方法得到的最优路径
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_598.jpg?sign=1739292841-ANC4GK8pql1awBw4G266cxzOGowigiKw-0-d24891b838c1fbc3dc21ae45d41eb8c3)
图6-15 后向Q(λ)方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_601.jpg?sign=1739292841-CtGZJvHxsODbISq9WdbqRTLiZjE7CMH0-0-3e71a925786d4d2c160d4a9bb9b40751)
图6-16 后向Q(λ)方法得到的最优路径
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_604.jpg?sign=1739292841-EUJbYk2pdAuGxHTr0UkUrpvKnXhNl6LU-0-59ef45a380749bc26783f6cf9ffb0309)
图7-3 DQN的神经网络结构
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_607.jpg?sign=1739292841-6yIoV2ksz83sa8ECuZXi8ZYiAtCaBB8k-0-bacf4741465a2a31c1ec9e82202a0808)
图7-7 驾驶汽车
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_610.jpg?sign=1739292841-pqDiRL9uOKFxiAmpCVoAMbCKpvkHBHcu-0-ff143bae4e0abd0156ec302a0659b96c)
图7-10 飞翔的小鸟
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_613.jpg?sign=1739292841-uiP6zECFOIlKbGIATaqDUkvW1teTnNiQ-0-86b2f11b89c30d52ba60d1d859776448)
图7-11 删除游戏背景
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_616.jpg?sign=1739292841-8rgCeWM50RrDXHGV4VRctFN4MzbqMSeh-0-1d107c5922b690dcec76eeb0d80b7490)
图7-13 灰度化和二值化
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_619.jpg?sign=1739292841-afrRP3DMeeevVHsGMqVmdSomaUaEeh8Y-0-60ebf1b2d31eb8cca7e54a2e452277c9)
图8-4 )及
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_622.jpg?sign=1739292841-BcrrLiCSlfPG1sRj1Nt8HQYuWvCENUh7-0-6b572e45aaaa8e7e8a60866b19e833ef)
图9-1 异步方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_625.jpg?sign=1739292841-so7MTfBuN1Uu5DVx0yuUmY6ViEiYq4L6-0-6ccfef9d27482d34e69c9fd5746f4366)
图13-12 策略网络结构示意图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_628.jpg?sign=1739292841-o3aYqsKv4ayX6ZaFNQwe3tl435aN748v-0-af5db8f449a60b71a486c9ec248461a5)
图13-13 价值网络结构示意图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P9_631.jpg?sign=1739292841-twrABthETLU7WTmFtrPdS5XuRzIuwaSo-0-94a25c1b5ce9325d6f139971bd4574b2)
图13-16 AlphaGo整体架构
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P10_646.jpg?sign=1739292841-Df4Tosv06yXhHloo2jLl2tXOAUtJNxaT-0-07d84ccff6d51f922b7b021218ce56ef)
图13-17 在线对弈过程
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P10_649.jpg?sign=1739292841-kakp8ZyObJ1x3KsKwFtRIKzfpQ1qBqmf-0-f92e8accd9427608ee77ed5996c6db14)
图13-18 AlphaGo Zero下棋原理