不用人类知识成为围棋大师:AlphaGo Zero Mastering the game of Go without human knowledge

人工智能的长期目标是学习超过人类的算法,在具有挑战性领域。最近,AlphaGo 成为在 Go 游戏中打败世界冠军的第一个计划。该 AlphaGo 中的树搜索使用深层神经网络评估位置和选择的移动位置。这些神经网络是通过人类专家动作的监督学习进行训练,并通过自我博弈中进行强化学习。在这里我们介绍一种仅基于强化学习的算法,没有人类数据,游戏以外的指导或领域知识规则。 AlphaGo成为自己的老师:神经网络被训练来预测 AlphaGo 自己的移动,选择 AlphaGo 游戏的获胜者。这种神经网络提高了树搜索的强度,导致了更高的质量在下一次迭代中移动选择和更强的自我发挥。开始tabula rasa,我们的新程序 AlphaGo Zero实现了超人的表现,赢得了100-0对于之前发布的版本,即击败冠军的AlphaGo。

AlphaGo Zero 论文下载

nature24270

原文地址:https://www.nature.com/nature/journal/v550/n7676/pdf/nature24270.pdf

Mastering the game of Go with deep neural networks and tree search

AlphaGo 论文:https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in
challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The
tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were
trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce
an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game
rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also
the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality
move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved
superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

Be the first to comment

Leave a Reply

Your email address will not be published.