Repro Samples Method for Irregular Inference Problems and forUnraveling Machine Learning Blackboxes可用于不规则推断问题和解开机器学习黑箱的样本复制方法

时间:2023-07-14         阅读:

光华讲坛——社会名流与企业家论坛第6560期

主题:Repro Samples Method for Irregular Inference Problems and forUnraveling Machine Learning Blackboxes可用于不规则推断问题和解开机器学习黑箱的样本复制方法

主讲人:罗格斯大学 谢敏革(Min-ge Xie)教授

主持人:统计学院 林华珍教授

时间:7月15日 15:00-16:00

举办地点:弘远楼408会议室

主办单位:统计学院 国际交流与合作处 科研处

主讲人简介

Dr. Min-ge Xie (谢敏革) is a Distinguished Professor from Department of Statistics, Rutgers, The State University of New Jersey. He is a noted expert in the foundation of statistics and fusion learning. His pioneer and ground-breaking research in confidence distributions was described as a "grounding process with energy and insight". His other research interest includes the foundation of data science, conformal prediction, big data, estimating equations, robust statistics, hierarchical models, asymptotics, etc. Dr. Xie received his BS degree in mathematics from University of Science and Technology (USTC) and PhD degree in statistics from University of Illinois at Urbana-Champaign (UIUC), both with high honors. He is the new Editor of The American Statistician (TAS), and the co-founding Editor-in-chief of The New England Journal of Statistics in Data Science (NEJSDS), the flagship journal of The New England Statistical Society. He also has served in the editorial boards of several other journals, including JASA, Statistical Science, Science China-Mathematics, among others. Dr. Xie is a fellow of ASA, a fellow of IMS and an elected member of ISI. He has authored/co-authored 100+ research articles in statistics, computer science, engineering, and bio-medical research.

谢敏革博士是新泽西州立罗格斯大学统计学系特聘教授。他是统计学基础和融合学习方面的著名专家。他在置信分布方面的开创性研究被描述为“具有影响力和洞察力的基础过程”。他的其他研究兴趣包括数据科学基础、共形预测、大数据、估计方程、稳健统计、分层模型、渐近等。谢博士1996年在伊利诺伊大学香槟分校(UIUC)获得统计学博士学位。他是《the American Statistician》(TAS)的新任主编,也是《The New England Journal of Statistics in Data Science》(NEJSDS),共同创始主编。他还曾担任其他几家期刊的编委,包括《JASA》《Statistical Science》《Science China-Mathematics》等。他是ASA和IMS的fellow,ISI的elected member。他在统计学、计算机科学、工程学和生物医学研究领域撰写或合作撰写了100多篇论文。

内容简介

Rapid data science developments require us to have new and revolutionary frameworks to tackle highly non-trivial "irregular inference problems", e.g., those involving discrete or non-numerical parameters and those involving non-numerical data, etc. This talk presents an innovative, wide-reaching, and effective simulation-inspired framework, calledrepro samples method, to conduct statistical inference for the irregular problems plus more. We develop both exact and approximate (asymptotic) theories to support the development and provide effective computing algorithms for problems in which explicit solutions are not available. The method is general, likelihood-free, and is particularly effective for irregular inference problems. Particularly, for often-seen irregular inference problems that involve both discrete (or nonnumerical) and continuous parameters, we propose an effective three-step procedure to make inference for all parameters and develop a unique matching scheme that turns the disadvantage of lacking tools to handle discrete/nonnumerical parameters into an advantage of improving computational efficiency. The effectiveness of the method is illustrated by solving two open inference problems in statistics: a) how to construct a confidence set for the unknown number of components in a normal mixture model; b) how to construct confidence sets for the unknown true model, the regression coefficients, or both true model and coefficients jointly in a high dimensional regression model. Comparison studies show that the method has far superior performance to existing attempts. Although the two case studies pertain to the traditional statistics models, the method also has direct extensions to complex machine learning models, e.g., (ensemble) tree models, neural networks, graphical models, etc. It provides a new toolbox to develop interpretable AI and unravel machine learning blackboxes.

快速的数据科学发展要求有新的和革命性的框架来解决高度非平凡的“不规则推理问题”,例如,那些涉及离散或非数值参数和涉及非数值数据等情况的问题。本讲座提出了一种新的、广泛且有效的模拟框架,即样本复制法(repro samples method),用于对不规则问题及更复杂的情况进行统计推断。主讲人开发了精确的和近似(渐近)的理论来支持这一方法,并为无法获得显式解的问题提供有效的计算算法。该方法是通用的,不需要计算似然,对不规则推断问题特别有效。特别是,对于经常看到的涉及离散(或非数值)和连续参数的不规则推断问题,主讲人提出了一个有效的三步程序来对所有参数进行推断,并开发了一个独特的匹配方案,将缺乏处理离散/非数值参数的工具的缺点转化为提高计算效率的优势。通过解决统计学中的两个开放推断问题,说明了该方法的有效性:a)如何在未知分量个数的正态混合模型中构造置信集;B)在高维回归模型中真实模型未知,回归系数未知,或二者同时未知的情况下如何构建置信集。对比研究表明,该方法的性能远远优于现有的方法。虽然这两个案例研究属于传统的统计模型,但该方法也可以直接扩展到复杂的机器学习模型,例如(集成)树模型、神经网络、图形模型等。它为开发可解释的人工智能和解开机器学习黑箱提供了一个新的工具。