IDS Research Seminars (2021)
These are a series of seminars and talks on selected data science topics of interest. The seminars will be led by IDS researchers as well as invited external speakers. These talks were open to current NUS graduate students only and were conducted in Zoom meetings. Following are the seminar details and also recordings of the talks for those who are interested in the topics.
Date 9 Dec 2021
Time: 9:00AM – 10:00AM
Speaker: Assoc. Prof Rebecca Lurie Starr, NUS
Title: Variation in the Outer Circle: Language variation and change amid transnational dialect contact in Singapore
Abstract: Scholarship in language variation and change has investigated the mechanisms and outcomes of dialect contact in a number of settings around the world. In variationist work, contact-induced change is typically argued to stem primarily from prolonged interpersonal contact, rather than from short-term travel or media exposure, as regular interaction is thought to be necessary for the transmission of complex linguistic features. In research in the world Englishes and contact linguistics traditions, however, media exposure is often pointed to as a source of language change; moreover, the notion that speakers adopt new features as a result of short-term travel and media consumption is a commonly-circulating discourse in postcolonial societies. This talk considers the impact of three major sources of transnational dialect contact — institutional exonormativity, transnational mobility, and media consumption — on variation and change in the English and Mandarin spoken in Singapore, a multilingual nation in Southeast Asia. The issues and findings surveyed here underscore the utility of applying variationist approaches to world English-speaking communities and other postcolonial settings. Correspondingly, they illustrate how research in such settings may shed new light on basic assumptions in the field of language variation and change.
Speaker Bio: After receiving her PhD from Stanford in Linguistics, Rebecca served as an A.W. Mellon Postdoctoral Fellow at CMU. In 2013, she moved to Singapore to join NUS where she is now an Associate Professor. She works on variation phenomena in English, Irish Gaelic, Mandarin, Cantonese, Japanese and Korean. She is primarily interested in the acquisition of sociolinguistic knowledge among bilingual children and is working on the Singapore Multilingual Corpus to document and investigate the speech of multilinguals in Singapore.
Date: 19 Nov 2021
Time: 10:00AM – 11:00AM
Speaker: Moming Duan, Chongqing University
Title: Personalization Techniques for Federated Learning
Abstract: With the enormous success of deep learning, more and more neural network applications enter every aspect of our lives. Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally. With the advantage of privacy-preserving, federated learning is currently the most attractive distributed machine learning framework. However, unlike the centralized training setting, the non-IID and imbalanced (statistical heterogeneity) training data of FL is distributed in the federated network, which will increase the divergences between the local models and the global model, further degrading performance. Therefore it is necessary to develop personalization techniques for federated learning, where the goal is to train a model for each client, based on the client’s local data and the data of other clients. In this talk, we revisit the existing personalization techniques of federated learning. First, we explore the impact of three kinds of imbalanced (size, local, global) training data on FL. Then, we present our hierarchical scheme to mitigate the performance degradation caused by statistical heterogeneity. Finally, we show some experimental results of our novel hierarchical federated learning framework.
Speaker Bio: Received the B.S. degree from the College of Computer Science, Chongqing University, PR China, in 2017. Currently, he is working toward the PhD degree at Chongqing University. His research interest is distributed machine learning with special interests in personalization, feature selection, heterogeneity challenge in federated learning systems. His papers are published in international conferences and journals (IEEE TPDS, ICCD, DATE, ISPA).
Video Recording:
Date: 1 Nov 2021
Time: 10:00AM – 11:00AM
Speaker: Xu Lin, School of Computing NUS
Title: Coherent and Captivating Topic Transitions in Knowledge-Grounded Conversations
Abstract: Knowledge-grounded dialogue systems aim to leverage external knowledge to conduct intelligent conversations with their users. However, merely increasing the informativeness of the generated response with new knowledge does not necessarily result in an engaging and coherent conversation. The knowledge selected for a generated response needs to be coherent to the historical context of the ongoing conversation, while at the same time sufficiently diversified so as to further develop the topic to captivate the user. However, existing methods fails to capture both of the properties in dialogue topic management due to their limited modelling in dialogue topic transitions. In this report, we first summarize the existing research methods of knowledge grounded conversations(KGCs) according to the techniques they used. Most of the current approaches to KGCs either ignore the topic transition learning or perform limited topic transition management when selecting knowledge given the dialogue context, which can result in capturing spurious correlations between knowledge and context, leading to incoherent and uninspiring generated dialogues. Considering the limitation of current methods in dialogue flow management, we introduce the Coherent and Captivating Topic Transition (C2T2) method to select appropriate target knowledge to generate a response which ensures that the topic transition is coherent to the ongoing conversation while enhancing the conversation with adequate topic development. Specifically, C2T2 learns transition features by context-knowledge entailing, temporal knowledge interaction and knowledge shifting constraints and then perform interactive pointer-based knowledge inference by comprehensively concerning the historical context, transition features and the relations of the target knowledge candidates. C2T2 outperformed current knowledge selection methods over experiments on multiple public benchmarks, achieving significant performance gains even on unseen scenarios, demonstrating that C2T2 is able to learn generalized patterns of topic transitions.
Speaker Bio: Second-year PhD student in SOC working on Knowledge-grounded Dialogue Systems. Her research mainly deals with the dialogue topic transitions and response generation in conversations with external knowledge. This will be the practice talk for her upcoming QE.
Date: 28 Oct 2021
Time: 10:00AM – 11:00AM
Speaker: Brian Formento, School of Computing NUS
Title: Character-based Attacks in NLP
Abstract: Current character-level attacks on NLP systems have been less successful than their word-level counterparts. One potential reason for this underperformance is their use of heuristic rules for character insertion or replacement, which do not result in strong adversarial perturbations. We propose two novel score-based black-box character-level attacks, SSTA (special symbols text attacks) and CATE (Character Attacks via Tokenizer Exploitation) to address this limitation. The SSTA algorithm discovers global symbols that can be used to help word-level attacks further deteriorate a model's performance when introduced as padding or by replacing existing punctuation in a sample with the global symbols. On the other hand, CATE introduces an algorithm that finds the optimal context-specific symbol or homoglyph to insert within a word while utilizing score based feedback from the model, therefore breaking the tokenizer and injecting perturbing information in the system. We evaluated SSTA and CATE on a range of NLP tasks using classifiers built on pre-trained deep learning models and found that special symbols such as punctuation can hold adversarial information as well as offering a mechanism to force erroneous behaviour by word-piece tokenizers through an insert operation, therefore improving upon current character level attacks and resulting in CATE achieving state-of-the-art character-level attack success rates. Compared with word-level attacks, our use of punctuation and homoglyphs enables original words to be preserved, allowing the global and context-specific character level attacks of CATE and SSTA to obtain higher semantic similarity than popular word-level attacks. As a result, on its own, CATE is competitive with the TextFooler word-level attack and outperformed it on entailment and question answering tasks. When combined with word-level attacks, CATE and SSTA result in stronger attacks on NLP models.
Speaker Bio: Second-year PhD student in SOC working on adversarial attacks on NLP. His research aims to better understand deep learning models by breaking them in the hope of finding better defence techniques in the future. This will be the practice talk for his upcoming QE.
Date: 7 Oct 2021
Time: 10:00AM – 11:00AM
Speaker: Hongwei Jin, the University of Illinois
Title: Certified Robustness of Graph Neural Networks (CollabML)
Abstract: Graph neural networks are vulnerable to both feature and topology attacks. Recently, several effective attack and defense algorithms have been designed. In this talk, I will first present the work of certified robustness of graph neural networks in classification with both local and global budgets. Our method is based on Lagrange dualization and convex envelope, which result in tight approximation bounds that are efficiently computable by dynamic programming. Second, I will talk about the graph isometric robustness. To address this issue, we propose measuring the perturbation with Gromov-Wasserstein (GW) discrepancy based on the all-pair shortest path (APSP) on graphs and building its Fenchel biconjugate to facilitate convex optimization. By developing trackable lower and upper bounds of GW distance, our certificate and attack algorithms are demonstrated to be effective. Finally, I will briefly discuss some ongoing research on differential privacy for graphs embedding with the federated setting.
Speaker Bio: Doctoral candidate in computer science at the University of Illinois at Chicago under the guidance of Prof. Xinhua Zhang. His research topics include (but are not limited to) robust graph neural networks, geometric machine learning, convex/non-convex optimization, and distributed machine learning. He had a couple of works that have been published in top-tier conferences including NeurIPS, ECML-PKDD, and ICML workshops. He also had diverse experiences work with different companies/institutes, including Samsung Research America, Futurewei Technology, and the Chinese Academy of Science. Before pursuing his Ph.D. degree, he held an M.S. degree in Applied Mathematics.