Programme of the day
Table Of Content
- Invited Talk - “Harnessing the Power of LLMs in Practice: An Introduction to ChatGPT and Beyond”
- Invited Talk - “Healthcare Applications with Large Language Models”
- Research Showcase
Invited Talks
9:00AM - 11:00AM
“Harnessing the Power of LLMs in Practice: An Introduction to ChatGPT and Beyond”
Associate Professor Xia Hu
Rice University
Speaker's Bio
Dr. Xia “Ben” Hu is an Associate Professor at Rice University in the Department of Computer Science and director of the Center for Transforming Data to Knowledge (D2K Lab). Dr. Hu has published over 100 papers in several major academic venues, including NeurIPS, ICLR, KDD, WWW, IJCAI, AAAI, etc. An open-source package developed by his group, namely AutoKeras, has become the most used automated deep learning system on Github (with over 8,000 stars and 1,000 forks). Also, his work on deep collaborative filtering, anomaly detection and knowledge graphs have been included in the TensorFlow package, Apple production system and Bing production system, respectively. His papers have received several Best Paper (Candidate) awards from venues such as ICML, WWW, WSDM, ICDM, AMIA and INFORMS. He is the recipient of NSF CAREER Award and ACM SIGKDD Rising Star Award. His work has been cited more than 18,000 times with an h-index of 51. He is the conference General Co-Chair for WSDM 2020 and ICHI 2023. He is also the founder of AI POW LLC.
Abstract
The recent progress in large language models has resulted in highly effective models like OpenAI's ChatGPT that have demonstrated exceptional performance in various tasks, including question answering, essay writing, and code generation. This presentation will cover the evolution of LLMs from BERT to ChatGPT and showcase their use cases. Although LLMs are useful for many NLP tasks, one significant concern is the inadvertent disclosure of sensitive information, especially in the healthcare industry, where patient privacy is crucial. To address this concern, we developed a novel framework that generates high-quality synthetic data using ChatGPT and fine-tunes a local offline model for downstream tasks. The use of synthetic data improved the performance of downstream tasks, reduced the time and resources required for data collection and labeling, and addressed privacy concerns. Finally, we will discuss the regulation of LLMs, which has raised concerns about cheating in education. We will introduce our recent survey on LLM-generated text detection and discuss the opportunities and challenges it presents.
Click here for the seminar presentation slides.
Video Recording:
“Healthcare Applications with Large Language Models”
Professor Raymond Ng
University of British Columbia
Speaker's Bio
Raymond Ng is the Canada Research Chair on data science and analytics. He is also the founding Director of the Data Science Institute at the University of British Columbia, and an elected fellow of the Royal Society of Canada. He was named as one of the world’s top-75 “Academic Data Science Leaders 2022” by the Chief Data Officer Magazine which grew out of MIT’s Sloan School of Management. Ng’s main research area for the past three decades is on data mining, with a specific focus on health informatics and text mining. He has published over 220 peer-reviewed publications on data clustering, outlier detection, OLAP processing, health informatics and text mining. He is the recipient of two best paper awards – from the 2001 ACM SIGKDD conference, the premier data mining conference in the world, and the 2005 ACM SIGMOD conference, one of the top database conferences worldwide.
Abstract
Unstructured documents often come with embedded structured data. Representing valuable and structured information as tables is popular in health, financial, and many domains. However, manual extraction of structured information from documents typically costs tremendous time and labor, motivating the need for a system for automating the process. After such tables have been extracted, the data can be used for a wide variety of tasks such as question answering and various “down-stream” analytics tasks. In this talk, we will discuss how to leverage ground breaking large language models to develop tools for automated table extraction from various types of documents. We will present different applications from cancer registry reporting, cancer care, to psychiatry hospitalization prediction.
Click here for the seminar presentation slides.
Video Recording:
Research Showcase
11:00AM - 12:00AM
Poste, Demos and Networking Tea