编程日寄 - 机器学习常用算法（19）--粉丝服务平台-粉丝头条-fensifuwu.com

编程日寄 - 机器学习常用算法（19）

科技 07-01 来源： LearningYard学苑

分享兴趣，传播快乐，

增长见闻，留下美好!

亲爱的您，这里是LearningYard新学苑。

今天，小编为大家带来：

《编程日寄 | 机器学习常用算法（19）》，

欢迎您的访问。

Share interests, spread happiness,

increase knowledge, and leave good news!

Dear you, this is the new LearningYard Academy

Today, the editor brings you

"Common Algorithms for Machine Learning (19)"

Welcome your visit!

什么是机器学习

机器学习是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。它是人工智能核心，是使计算机具有智能的根本途径。

Machine learning is a multidisciplinary discipline, involving probability theory, statistics,approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in the study of how computers simulate or realize human learning behavior to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve its own performance.It is the core of artificial intelligence and the fundamental way to make computers intelligent.

机器学习的定义

（1）机器学习是一门人工智能的科学，该领域的主要研究对象是人工智能，特别是如何在经验学习中改善具体算法的性能。

（2）机器学习是对能通过经验自动改进的计算机算法的研究。

（3）机器学习运用数据或以往的经验，以此优化计算机程序的性能标准

Definition of machine learning
(1) Machine learning is a science of artificial intelligence. The main research object of this field is artificial intelligence, especially how to improve the performance of specific algorithms in experiential learning.
(2) Machine learning is the study of computer algorithms that can be improved automatically through experience.
(3) Machine learning is the use of data or past experience in order to optimize the performance criteria of computer programs

Reinforcement Learning（强化学习）

强化学习（RL），又称再励学习、评价学习或增强学习，是机器学习的范式和方法论之一，用于描述和解决智能体（agent）在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。

强化学习的常见模型是标准的马尔可夫决策过程（Markov Decision Process, MDP）。按给定条件，强化学习可分为基于模式的强化学习（model-based RL）和无模式强化学习（model-free RL），以及主动强化学习（active RL）和被动强化学习（passive RL）。强化学习的变体包括逆向强化学习、阶层强化学习和部分可观测系统的强化学习。求解强化学习问题所使用的算法可分为策略搜索算法和值函数（value function）算法两类。深度学习模型可以在强化学习中得到使用，形成深度强化学习。

强化学习理论受到行为主义心理学启发，侧重在线学习并试图在探索-利用（exploration-exploitation）间保持平衡。不同于监督学习和非监督学习，强化学习不要求预先给定任何数据，而是通过接收环境对动作的奖励（反馈）获得学习信息并更新模型参数。

强化学习问题在信息论、博弈论、自动控制等领域有得到讨论，被用于解释有限理性条件下的平衡态、设计推荐系统和机器人交互系统。一些复杂的强化学习算法在一定程度上具备解决复杂问题的通用智能，可以在围棋和电子游戏中达到人类水平。

Reinforcement learning (RL), also known as reinforcement learning, evaluation learning or reinforcement learning, is one of the paradigms and methodologies of machine learning. It is used to describe and solve the problem that agents maximize returns or achieve specific goals through learning strategies in the process of interacting with the environment. A common model of reinforcement learning is the standard Markov Decision Process (MDP). According to the given conditions, reinforcement learning can be divided into model-based RL, model-free RL, active RL and passive RL. The variants of reinforcement learning include reverse reinforcement learning, hierarchical reinforcement learning and partial observable system reinforcement learning. The algorithms used to solve reinforcement learning problems can be divided into strategy search algorithm and value function algorithm. Deep learning model can be used in reinforcement learning to form deep reinforcement learning. Reinforcement learning theory, inspired by behaviorist psychology, focuses on online learning and attempts to strike a balance between exploration and exploitation. Different from supervised and unsupervised learning, reinforcement learning does not require any data to be given in advance, but obtains learning information and updates model parameters by receiving rewards (feedback) from the environment for actions. Reinforcement learning problems have been discussed in information theory, game theory, automatic control and other fields, and have been used to explain equilibrium states under bounded rationality, design recommendation systems and robot interaction systems. Some complex reinforcement learning algorithms have a degree of general-purpose intelligence for solving complex problems that can reach human levels in go and video games.