斯坦福大学计算机系教授:很多机器学习的应用涉及生命安全,非同儿戏

2018-06-14 04:01
使用微信扫一扫查看全文干货

编者按:本文来自“香侬科技”推出的《香侬说》栏目,36氪经授权转载。

"我认为机器学习仍然是一种'雏形'阶段;它距离成为一个成熟的工程学科还有一段路程要走。"近日,斯坦福大学计算机系教授Percy Liang在接受“香侬科技”采访时表示。在采访中,Percy Liang还提到:

  1. 机器学习现在的很多应用往往涉及生命安全,非同儿戏;

  2. 语言是关于与人的交流的,这一点在NLP社区中是缺失的;

  3. 可重复性在所有科学领域都是一个巨大的问题,人工智能也不例外;

  4. 尽管人们可能认为不存在数据短缺的问题(毕竟,这不是大数据时代吗),事实上,拥有大量的好用的数据仍是一个挑战。

斯坦福大学计算机系助理教授、斯坦福人工智能实验室成员 Percy Liang 主要研究方向为自然语言处理(对话系统,语义分析等方向)及机器学习理论,他与他的学生合作的论文刚刚获得ACL 2018 短论文奖,其本人亦是2016年IJCAI 计算机和思想奖(Computers and Thought Award)得主。

Percy的团队推出的SQuAD阅读理解挑战赛是行业内公认的机器阅读理解标准水平测试,也是该领域的顶级赛事,被誉为机器阅读理解界的ImageNet(图像识别领域的顶级赛事)。参赛者来自全球学术界和产业界的研究团队,包括微软亚洲研究院、艾伦研究院、IBM、Salesforce、Facebook、谷歌以及卡内基·梅隆大学、斯坦福大学等知名企业研究机构和高校,赛事对自然语言理解的进步有重要的推动作用。

以下是详细的采访对话:

香侬科技:

SQuaD (The Stanford Question Answering Dataset)在推进机器阅读理解和问答领域非常成功。然而,除了可以被NLP研究者用来开发更好的阅读理解系统,你认为这个数据集是否潜藏着其他机会?

图1. SQuaD中的样本举例

Percy:

虽然SQuaD(实际上,任何阅读理解数据集)名义上都是关于阅读理解的,但我认为它们可以有两个方面更广泛的影响:第一,数据集鼓励人们开发新的通用模型。例如,神经机器翻译产生了基于注意力的模型,这在机器学习领域里如今已成为最常见的模型之一。第二,在一个数据集上训练的模型对其他任务是有价值的。例如,在ImageNet上训练卷积神经网络,模型会学习到可用于各种视觉问题的通用图像特征。SQuaD所带来的影响与上面列出的两个例子类似(尽管可能不及它们那么大)。SQuaD已经达到了极限,因为多个系统已经超过了这个数据集上的人类水平。但是,正如Robin Jia和我在EMNLP 2017的一篇论文中所展示的那样,这样的系统可以很容易地被对抗样本所愚弄,在即将到来的ACL 2018中,我们有一篇论文将发布SQuaD 2.0,它包含5万个额外的问题,它们看起来像是有答案的问题,但实际上没有答案。希望这样一个新的数据集,与最新层出不穷的其他数据集(例如,RACE、TriviaQA等)的出现,将有助于推动该领域的进步。

香侬科技:

过去您和您的学生已经做了许多非常有影响力的关于人工智能安全的工作(Raghunathanet al.ICLR 2018,Steinhardtet al.NIPS 2017)。同时,您还对神经网络的可解释性进行了研究,包括Koh et al.,ICML 2017最佳论文)。您认为提高深度神经网络的解释性有助于解决人工智能的安全问题吗?为什么?

Percy:

到目前为止,人工智能研究的主要驱动力一直是获得预测更准确的模型。但是,最近可解释性和鲁棒性/安全性的问题得到了更多的关注,我认为这是特别重要的,因为机器学习现在的很多应用往往涉及生命安全,非同儿戏。如自主驾驶、医疗保健等。然而,可解释性和鲁棒性是模糊的术语,人们对它们并没有统一的定义。在这一点上,我认为仍然有许多概念性工作要做,使这些术语形式化,这样人们才可以做出可量化的进步。我们已经通过使用影响函数(influence functions,Koh et al. ICML 2017)和半定松弛(Raghunathan et al.ICLR 2018)在形式化这些术语方面取得了一些初步的进展,而这两种方法都是统计和优化的经典工具。我认为机器学习仍然是一种“雏形”阶段;它距离成为一个成熟的工程学科还有一段路程要走。

香侬科技:

您的许多自然语言处理研究与人类语言处理有着密切的联系(例如,Wang et al., ACL 2016 杰出论文奖:通过人机交互使机器从零开始学习语言,He et al. ACL 2017: 通过学习动态知识图谱嵌入来构建对称合作型聊天机器人)。您认为理解人类语言处理在何种程度上会帮助我们建立更好的机器语言处理系统?

图2. Wang et al. ACL 2016 中的SHRDLURN 语言游戏。机器需通过与人交互从零开始学习语言。

Percy:

的确,我们有很多情况下利用众包或直接让模型与人类交互来学习语言,这是因为从根本上讲,语言是关于与人的交流的。有时候,我觉得这一点在NLP社区中是缺失的。现在大部分的工作都是基于大数据的任务——机器翻译、问答、信息提取。这与人类如何通过语言来学习新的知识能力,和完成任务有很大的不同。我认为,理解语言的目的不是简简单单地模仿人类。而是,如果我们想要建立可以与人类互动的系统,这些系统从根本上需要理解人类是如何思考和行动的,至少是在行为层面上。沟通和语言并不仅仅是关于词语,而是关于词语背后的个体和他们的目标。

香侬科技:

正如您在您的网站上提到的,您是一个强烈支持高效和可重复性研究的人。您一直在致力于开发CodaLabWorksheets,这是可以使研究人员完整记录一个实验从原始数据到最终结果的全过程的平台。您认为在机器学习中可重复性研究的最大障碍是什么?我们应该怎么突破它们?

图3. CodaLab工作原理。详情见CodaLab官方网站 https://worksheets.codalab.org/ 。

Percy:

可重复性在所有科学领域都是一个巨大的问题,人工智能也不例外,虽然我认为作为人工智能研究者,我们真的没有任何借口——这一切只是在数据上跑代码。这个领域确实在开放代码和数据上有了很大的进步,但是往往代码和数据是不足以再现一篇论文的结果的,因为代码是如何运行的可能没有被记录下来。CodaLab通过跟踪代码实际执行的整个过程,可以保证最终结果是由该代码和数据产生的。我们试图使CodaLab尽可能方便易用——人们可以使用任何编程语言、数据格式等。然而,挑战仍然存在:人们还没有足够的动力去达到这种程度的可重复性。即使大家都知道,这样其实是更好的,因为存在网络效应——如果每个人都是用CodaLab来达到更高的可重复性,那么在别人的工作基础上开发自己的模型就会容易得多,而且研究的速度也会大大加快。我认为这一切只是时间的问题。

香侬科技:

在加入斯坦福大学之前,您从加州大学伯克利分校获得博士学位,并在谷歌做过一段时间博士后。作为一个机器学习的研究者,您的思维方式是如何随着时间的推移而改变的?

Percy:

当我读博士的时候,我非常喜欢机器学习的建模、算法和分析。但是我意识到即使是很强的算法也是有局限性的:你会看到系统所犯的错误,然后你意识到如果只有一个固定的数据集你可能就是做不出来最完善的算法。后来我在斯坦福大学的时候(也是部分源于我在谷歌的时间的影响),我开始将数据-建模两件事放在一起思考。尽管人们可能认为不存在数据短缺的问题(毕竟,这不是大数据时代吗),事实上,拥有大量的好用的数据仍是一个挑战。我们已经提出了许多能够改变这一问题的方法(例如,在Wang et al.,ACL 2015中,我们有一篇论文研究了如何通过让人们改述句子而不是注释逻辑形式的方式来构建语义分析器)。把数据和建模放在一起思考可以拓宽解决方案的各种可能,让你更有创造力。

图4.通过让人们改述句子而不是注释逻辑形式的方式来快速构建语义分析器 (图片来源于Wang et al.,ACL 2015)。

香侬科技:

作为一个机器学习的研究者,您认为最令人兴奋的是什么事情?

Percy:

研究机器学习使你既能思考潜在的数学原理,又能思考如何对社会产生真正的积极影响。

香侬科技:

作为一个机器学习的研究者,您认为最令人沮丧的是什么事情?

Percy:

有时你只是在黑暗中探险。你看到一个系统的错误,你会做一些试图修复它们的事情,然而并没有什么改进。在某种意义上,当你不理解一个东西的内在机制时,你才会使用机器学习,因为这个东西的机制太复杂了(不然的话你就直接写一个程序了)。

香侬科技:

刚进入NLP领域的学生来说,该如何培养对于科研项目的品味?

Percy:

学习基本原理并广泛阅读,尤其是在NLP和AI之外,你永远不知道从编程语言、语言学、认知科学、优化、统计学中得到的想法是否与你正在做的事情有关。

人总是很容易被那些很酷炫的模型带偏,会在自己的研究中加入各种华丽复杂的算法--你应该试图做相反的事情:用简单的方法解决问题比用复杂的方法解决问题更令人叹服。

选择一个你心怀信仰的问题,并满怀激情去探索它。你会知道是它,因为它会让你夜不能寐,一直想一直想。把这个问题变成一个属于你的问题,你的私人珍藏。

《香侬说》:是香侬科技打造的一款以机器学习与自然语言处理为专题的访谈节目。本期采访嘉宾是斯坦福大学计算机系教授Percy Liang。

Percy Liang:斯坦福大学计算机系助理教授、斯坦福人工智能实验室成员 ,主要研究方向为自然语言处理(对话系统,语义分析等方向)及机器学习理论。

附英文原文采访稿:

ShannonAI: SQuAD has been extremelysuccessful in pushing forward the field of reading comprehension and question answering. However, do you see any other opportunities brought by this dataset that could be utilized by NLP researchers beside developing better reading comprehension system?

Percy:While SQuAD (really, any reading comprehension dataset) is nominally about reading comprehension, I think the impact can be broader in two ways: First, datasets encourage people to develop new general-purpose models.For example, neural machine translation gave rise to attention-based models, which are now ubiquitous in deep learning.Second, models trained on datasets can be of value to other tasks.For example, training CNNs on ImageNet gives rise to generalizable image features that are used for all sorts of vision problems.SQuAD has had some impact along these two lines, but not to the same extent as the two examples listed above.I think SQuAD has reached its limit, as multiple systems have now exceed human-level performance on this dataset.But as Robin Jia and I showed in a paper from EMNLP 2017, such systems can be easily fooled by adversarial examples, showing that they don't really understand language at a deep level.In an upcoming ACL 2018 short paper, we are releasing SQuAD 2.0, which will contain 50K more questions which look like they have answers but don't actually have an answer.Hopefully, this, along with the flurry of new datasets coming out (e.g., RACE, TriviaQA, etc.) will help drive the progress of the field forward.

ShannonAI:In the past you and your students have done several influential pieces of work on AI safety (Raghunathan et al. ICLR 2018, Steinhardt,et al. NIPS 2017). You have also done work on theinterpretability ofneural networks, includingKoh et al.ICML 2017Best Paper. Do you think enhancing the interpretability of deep neural networks could help address the issues of AI safety? Why or why not?

Percy:The principal driver of AI research thus far has been trying to obtain more accurate models.But more recently, issues of interpretability and robustness/safety have gained more traction, which I think is especially important given that machine learning is making its way into serious applications such as autonomous driving, healthcare, etc.However, interpretability and robustness are vague terms and people don't necessarily agree on their meaning.At this point, I think there's still a lot of conceptual work to be done to formalize these notions so that one can make measurable progress.We have made some initial progress in trying to capture these notions using influence functions (ICML 2017) and semidefinite relaxations (ICLR 2018), which are both classic tools from statistics and optimization.I think machine learning is still in kind of a "prototype" phase; there is still work to be done to evolve it into a mature engineering discipline.

ShannonAI:Many of your researchprojectshave close connections with human language use (e.g.,Learning language games through interaction.Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings, etc.). To what extent do you think understanding human language processing would inspire us for building better machine language processing?

Percy:We work a lot with crowdsourcing and humans, because fundamentally, language is about communication with humans. Sometimes I feel like this aspect of language is lost in the NLP community, where most work is on large-scale tasks - machine translation, question answering, information extraction. This is very different from how humans use language to learn and accomplish goals.I would say the relevance of understanding language isn't so much about doing so so that we can mimic humans, but rather, if we want to build systems that can interact with humans, these systems fundamentally need to understand how humans think and act, at least at a behavioral level.Communication and language is not just about the words but about the underlying agents and their goals.

ShannonAI:As mentioned on yourwebsite, you are a strong proponent of efficient and reproducible research. You have been developingCodaLab Worksheets, a platform that allows researchers to maintain the full provenance of an experiment from raw data to final results. What do you think is the biggest obstacle for reproducible research in machine learning? And how should we address them?

Percy:Reproducibility is a huge problem across all of science, including AI, though I think as AI researchers, we really have no excuse - it's all just about executing code on data.The community isdefinitely getting better at releasing code and datasets compared to in the past, but often just having the code and data isn't enough to reproduce the results of a paper, because how the code is run might not be documented. CodaLab, by keeping track of the provenance of the actual execution, certifies that the result you got in the end actually is produced by the code and data.We've tried to make CodaLab as unobstrusive as possible - people can use any programming language, dataset format, etc.However, the challenge is that even still, the incentives are not set up properly for people to aim for this level of reproducibility, even though there is a network effect - if everyone were to use it, then it would be so much easier to build on others' work and the pace of research would be vastly accelerated.I think it's a matter of time.

ShannonAI:Before joining Stanford as a faculty member, yougot your PhD fromBerkeley and have worked at Google as a post-doc. How does your approach as a machine learning researcher change over time?

Percy:When I was in grad school, I was very much into the modeling, algorithms, and analysis of machine learning algorithms.But I realized there's only so much that fancy methods can do - you'd look at the errors that systems make and you realize that it was just impossible to get it right given a fixed dataset.Partly influenced by my time at Google, during my time at Stanford, I've really been thinking about the data-modeling pipeline jointly.Even though one might think there is no shortage of data (after all, isn't this the era of big data?), having large amounts of the right type of data is still a challenge.We've thought about ways of turning a problem on its head (e.g., in ACL 2015, we had a paper showing how to build a semantic parser by having people paraphrase sentences rather than annotate logical forms). Thinking about data and modeling together broadens the design space of solutions and allows you to be more creative.

ShannonAI:What is the most rewarding thing about being a machine learning researcher?

Percy:Machine learning allows you to think both about the underlying mathematical principles and about how to have a real positive impact on society.

ShannonAI:What is the most frustrating thing about being a machine learning researcher?

Percy:Sometimes you're just taking stabs in the dark. You look at the errors of a system, you do something that tries to fix them, and nothing improves. In a certain sense, you use machine learning when you don't understand the underlying phenomena because it's too complicated (or else you would have just written a program directly).

ShannonAI:Do you have any advice for students just entering the field of NLP on developing good taste for research projects?

Percy: Learn the fundamentals and read broadly, especially outside NLP and AI - you never know whether ideas from programming languages, linguistics, cognitive science, optimization, statistics could be relevant to what you're doing.

It's tempting to be carried away by fancy things and try to throw in all the bell and whistles; try to do the opposite.It's more impressive to solve a problem using a simple method rather than solving a problem with a complex method.

Pick a problem that you believe in and pursue it passionately.You'll know because you'll want to think about it all the time.Make it personal.

Copyright © 2018-6 ShannonAI. All rights reserved.

关于来源:

《香侬说》为香侬科技打造的一款以机器学习与自然语言处理为专题的访谈节目。由斯坦福大学,麻省理工学院, 卡耐基梅隆大学,剑桥大学等知名大学计算机系博士生组成的“香侬智囊”撰写问题,采访顶尖科研机构(斯坦福大学,麻省理工学院,卡耐基梅隆大学,谷歌,DeepMind,微软研究院,OpenAI等)中人工智能与自然语言处理领域的学术大牛, 以及在博士期间就做出开创性工作而直接进入顶级名校任教职的学术新星,分享他们广为人知的工作背后的灵感以及对相关领域大方向的把控。

完结了