在计算笔记本中实现透明、可重用和可定制的数据科学 Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

作者:Frederick Choi Sajjadur Rahman Hannah Kim Daz Zhang

数据科学工作流程是以人为中心的过程,涉及按需编程和分析。虽然可编程和交互式界面(如嵌入计算笔记本中的窗口小部件)适用于这些工作流,但它们缺乏强大的状态管理功能,也不支持交互式组件的用户定义自定义。这些功能的缺乏阻碍了工作流的可重用性和透明度,同时限制了最终用户的探索范围。作为回应,我们开发了MAGNETON,这是一种在计算笔记本中创作交互式小部件的框架,可以实现透明、可重复使用和可定制的数据科学工作流程。该框架增强了现有的小部件,以支持细粒度的交互历史管理、可重用状态和用户定义的自定义。我们在真实世界的知识图构建和服务平台中进行了三个案例研究,以评估这些小部件的有效性。根据观察结果,我们讨论了

Data science workflows are human-centered processes involving on-demandprogramming and analysis. While programmable and interactive interfaces such aswidgets embedded within computational notebooks are suitable for theseworkflows, they lack robust state management capabilities and do not supportuser-defined customization of the interactive components. The absence of suchcapabilities hinders workflow reusability and transparency while limiting thescope of exploration of the end-users. In response, we developed MAGNETON, aframework for authoring interactive widgets within computational notebooks thatenables transparent, reusable, and customizable data science workflows. Theframework enhances existing widgets to support fine-grained interaction historymanagement, reusable states, and user-defined customizations. We conductedthree case studies in a real-world knowledge graph construction and servingplatform to evaluate the effectiveness of these widgets. Based on theobservations, we discuss future implications of employing MAGNETON widgets forgeneral-purpose data science workflows.

论文链接:http://arxiv.org/pdf/2303.13447v1

更多计算机论文:http://cspaper.cn/

Related posts