转载自公众平台:npj计算材料学
本文以传播知识为目的,如有侵权请后台联系我们,我们将在第一时间删除。以数据为中心的科学已被确定为科学研究的第四个范式。这种范式引入两种新颖的科学研究方法:一方面是创建大型的、相互关联的科学数据库;另一方面是利用人工智能算法研究科学数据,来探索很难通过人类观察得到的模式和趋势。Fig. 1 Home page of the NOMAD Artificial-Intelligence Toolkit.
近几年来,材料科学在这两个方面都取得了进展。2014年底人们构建了NOMAD存储库,它是第一个针对计算材料科学数据的可查找、可访问、可互操作和可重复使用的存储设施。它存储了50多种不同原子尺度代码的输入和输出文件,有超过1亿个总能量计算。NOMAD通过元数据模式对数据进行转换、标准化和特征化,使数据便于进行AI分析。Fig. 2 Snapshot of the Visualizer in the ‘Querying the Archive and performing Artificial Intelligence modeling’ notebook. 为了更方便地利用NOMAD,来自柏林洪堡大学的Luigi Sbailò等,推出了NOMAD人工智能工具包。这个工具包主要有三方面应用:1. 通过最先进的AI工具提供一个API和库来访问和分析NOMAD存档数据。2. 提供一套浅显易懂的教程,从AI技术的实践入门到掌握为止。3. 维护一个社区驱动的、不断增长的计算笔记集合。Fig. 3 An example of a high-quality plot that can be produced using the visualizer.通过提供带注释的数据和分析脚本,来自世界各地的学生和学者都能够追溯原始研究人员所遵循的所有步骤,以达到发表级的结果。NOMAD人工智能工具包的主要特点在于将存储在NOMAD存档中的数据与其人工智能分析连接在同一基础设施中。Fig. 4 Graphical input interface for the SISSO training of tetradymite-materials classification.此外,用户在同一环境中拥有所有可用的人工智能工具以及访问NOMAD数据的权限,无需安装任何内容。该工具包的发展将有助于提高数据驱动的材料科学论文的可重复性,并降低新人进入该领域的学习壁垒。该文近期发布于npj Computational
Materials 8: 250 (2022)。Fig. 5 Interactive map of tetradymite materials, as produced with the AI-Toolkit visualizer.NOMAD Artificial-Intelligence Toolkit
Data-centric science has been
identified as the 4th paradigm of scientific research. It is observed that the
novelty introduced by this paradigm is twofold. First, the creation of large, interconnected databases of scientific data;
On the other hand, it is to use artificial intelligence algorithms to study
scientific data in order to explore patterns and trends that are difficult to
observe through human observation. In the past few years, material science has
made progress in both areas. The NOMAD Repository & Archive was constructed
in late 2014, which is the first Findable, Accessible, Interoperable, and
Reusable (FAIR) storage facility for computational materials science data. It
stores input and output files of more than 50 different atomistic codes, with
over 100 million total energy calculations. The NOMAD converted, normalized,
and characterized data through metadata schemas, making it ready for AI
analysis. To facilitate the use of these
databases, Luigi Sbailò and colleagues from Humboldt University in
Berlin have presented the NOMAD AI Toolkit. This toolkit has three main applications: 1. Providing an API and libraries for accessing and analyzing the NOMAD
Archive data via state-of-the-art (and beyond) AI tools. 2. Providing a set of
shallow-learning-curve tutorials from the hands-on introduction to the
mastering of AI techniques. 3.Maintaining a
community-driven, growing collection of computational notebooks, each dedicated
to an AI-based materials-science publication. By providing both the annotated
data and the scripts for their analysis, students and scholars worldwide are
enable to retrace all the steps that the original researchers followed to reach
publication-level results. The main specificity of the
NOMAD AI toolkit is in connecting within the same infrastructure the data, as
stored in the NOMAD Archive, to their AI analysis. Moreover, users have in the
same environment all available AI tools as well as access to the NOMAD data,
without the need to install anything. This will allow for enhanced
reproducibility of data-driven materials science papers and dampen the learning
curve for newcomers to the field. This article was recently published in npj Computational
Materials 8,: 250 (2022).The NOMAD Artificial-Intelligence Toolkit: turning materials-science data
into knowledge and understanding (NOMAD人工智能工具包:将材料-科学数据转化为知识和认识)Luigi
Sbailò, Ádám
Fekete, Luca
M. Ghiringhelli & Matthias
Scheffler Abstract We present the
Novel-Materials-Discovery (NOMAD) Artificial-Intelligence (AI) Toolkit, a
web-browser-based infrastructure for the interactive AI-based analysis of
materials-science findable, accessible, interoperable, and reusable (FAIR)
data. The AI Toolkit readily operates on the FAIR data stored in the central
server of the NOMAD Archive, the largest database of materials-science data
worldwide, as well as locally stored, users’ owned data. The NOMAD Oasis, a
local, stand-alone server can be also used to run the AI Toolkit. By using
Jupyter notebooks that run in a web-browser, the NOMAD data can be queried and
accessed; data mining, machine learning, and other AI techniques can be then
applied to analyze them. This infrastructure brings the concept of
reproducibility in materials science to the next level, by allowing researchers
to share not only the data contributing to their scientific publications, but
also all the developed methods and analytics tools. Besides reproducing
published results, users of the NOMAD AI toolkit can modify the Jupyter
notebooks toward their own research work.摘要我们提出了Novel-Materials-Discovery(NOMAD)人工智能(AI)工具包,这是一个基于Web浏览器的基础设施,用于可查找、可访问、可互操作和可重复使用(FAIR)数据的基于AI的交互式材料分析。AI工具包可以轻松操作存储在NOMAD Archive的中央服务器中存储的FAIR数据,该数据库是全球最大的材料科学数据库,同时也可以操作本地存储的用户数据。本地独立服务器NOMAD Oasis也可以用于运行AI工具包。通过在Web浏览器中运行的Jupyter笔记本,可以查询和访问NOMAD数据;然后可以应用数据挖掘、机器学习和其他AI技术进行分析。该工具包将材料科学中的可重复性概念提升到了一个新的水平,使研究人员不仅可以分享其科学出版物中所涉及的数据,还可以分享所有开发的方法和分析工具。除了重现已发表的结果外,NOMAD AI工具包的用户还可以修改Jupyter笔记本以适用于自己的研究工作。