报告题目:Prompt-based learning for analyses of biomedical images and text
报告时间:2023年7月18日 上午9:00
报告地点:南湖校区教学科研楼104
主办单位:科研处/数学与统计学院
主讲人:许东
许东简介:密苏里大学哥伦比亚分校电子工程和计算机科学系特聘教授,Christopher S. Bond生命科学中心和信息学研究所研究员。1995年在伊利诺伊大学香槟分校获得博士学位,在美国国家癌症研究所从事两年的博士后工作。在橡树岭国家实验室担任职员科学家至2003年,2007-2016年期间担任计算机科学系主任,2017-2020年期间担任信息技术项目主任。多年来,从事单细胞数据分析、蛋白质结构预测和建模、蛋白质翻译后修饰、蛋白质定位预测、计算系统生物学、生物信息系统以及生物信息学在人类、微生物和植物中的应用研究。自2012年后研究重点为生物信息学和深度学习之间的接口。曾发表400多篇论文,被引用23,000多次,根据Google Scholar的H-index为80。2015年当选为美国科学促进会(AAAS)会士,2020年当选为美国医学与生物工程学会(AIMBE)会士。
内容介绍:Foundation models, trained on large-scale data of images and natural languages, offer unprecedented opportunities for a wide range of applications. The potential of these models is further magnified when combined with prompt-based learning, allowing for the achievement of state-of-the-art (SOTA) performance even with a small number of labeled data. This talk focuses on the biomedical applications of two foundation models: ChatGPT and the Segment Anything Model (SAM). As the volume of the literature continues to grow exponentially, manual curation methods cannot extract the embedded knowledge efficiently. In response, we developed a pathway curation pipeline that synergizes image understanding and text mining techniques for deciphering biological knowledge. This pipeline employs SAM, contrastive learning, and Siamese networks to identify key attributes of pathway entities and their relationships. The integration of ChatGPT's predictive capabilities for gene interactions has proven useful in enhancing the extraction of pathway information. To optimize ChatGPT's responses, a novel iterative prompt refinement strategy was applied, in which the efficacy of these prompts was evaluated using metrics such as F1 score, precision, and recall, and subsequently, the evaluation results were fed into ChatGPT to suggest better prompts. The prompts were further refined using Tree-of-Thought iterations. We also applied prompt-based learning for SAM-based protein identification from cryo-Electron Microscopy (cryo-EM) images. The outcomes of our studies underscore the potential utilities of prompt-based learning for efficient biomedical data analyses and predictions.