News & EventsDepartment Events & Announcements
Events

Jan19
EVENT DETAILS
Abstract
Sampling from highdimensional probability distributions is a fundamental and challenging problem encountered throughout science and engineering. One of the most popular approaches to tackle such problems is the Markov chain Monte Carlo (MCMC) paradigm. While MCMC algorithms are often simple to implement and widely used in practice, analyzing the fidelity of the generated samples remains a difficult problem.
In this talk, I will describe a new technique called "spectral independence" that my collaborators and I developed over the last couple of years to analyze Markov chains. This technique has allowed us to break longstanding barriers and resolve several decadesold open problems in MCMC theory. Our work has opened up numerous connections with other areas of computer science, mathematics, and statistical physics, leading to dozens of new developments as well as exciting new directions of inquiry. I will then discuss how these connections have allowed us to "unify" nearly all major algorithmic paradigms for approximate counting and sampling. Finally, I will conclude with a wide variety of future directions and open problems at the frontier of this research.
Biography
Kuikui Liu is a fourthyear PhD student at UW CSE advised by Shayan Oveis Gharan. He completed his undergraduate studies also at the University of Washington in 2017, double majoring in computer science and mathematics. His primary area of expertise is in developing and analyzing algorithms for sampling from highdimensional probability distributions encountered throughout science and engineering. More broadly, he is interested in algorithmic statistics and the theory of computing. His work has been recognized by a STOC Best Paper Award.
TIME Wednesday, January 19, 2022 at 12:00 PM  1:00 PM
CONTACT Pamela Villalovoz pmv@northwestern.edu EMAIL
CALENDAR Department of Computer Science

Jan24
EVENT DETAILS
Abstract
Traditional complexity and cryptography are primarily concerned with the difference between polynomial and superpolynomial time. A recent branch of complexity called FineGrained Complexity (FGC) has focused on polynomial differences in running time. Traditional complexity explains the hardness of problems like the Traveling Salesman Problem (NPhard problems), showing that there is no polynomialtime algorithm if P != NP. However, it offers no answer to whether there exist faster algorithms for problems with existing, but slow, polynomialtime algorithms. FGC has provided us with an explanatory toolbox for why some problems have seen no improvement in their polynomialtime algorithm speed for over 50 years (e.g. [BI15, LWW18]).
The longterm goal of my research agenda is to apply the new tools of finegrained complexity to the design of cryptographic protocols. A longstanding open problem in cryptography is to design publickey encryption which is secure solely under the assumption P!=NP. However, there are barriers to building encryption from P!=NP (e.g. discussion in survey of Boaz Barak in 2017). The hypothesis P!=NP is equivalent to the fact that there is no polynomialtime algorithm for satisfiability (SAT). There is a stronger hypothesis, the exponential time hypothesis (ETH), which states that 3SAT requires exponential time [IP01]. The ambition of my research agenda is to design publickey cryptography secure from ETH.Biography
Andrea Lincoln is a postdoctoral scholar at UC Berkeley. She was formerly an NTT Research Fellow at Simons. She completed her PhD at MIT advised by Virginia Vassilevska Williams. Andrea's thesis was on finegrained complexity and its applications. She continues to study finegrained complexity more broadly, but now primarily focuses on averagecase finegrained complexity.
TIME Monday, January 24, 2022 at 12:00 PM  1:00 PM
CONTACT Pamela Villalovoz pmv@northwestern.edu EMAIL
CALENDAR Department of Computer Science

Jan26
EVENT DETAILS
Abstract
A wellknown principle in data science (and AI) is garbageingarbageout, which means that if you feed bad data to your analytical tools, you will get bad results. The only way to get good results is to use good data. Moreover, Data science (or AI) = Code (or algorithms, models) + Data. On one hand, because 99% of the research efforts are on the code part, many analytical tools and AI models have become commodities. On the other hand, realworld data is getting bigger and messier, which causes the dilemma for practitioners that it is harder and harder to get good data. The mission of data democratization, which allows anyone to have access to good data, is more important than ever.
In this talk, I will first introduce what is data preparation and how to prepare data for absolute good data, where absolute means the ground truth. The need for absolute good data is evident, because it can be used to support many data science tasks such as statistical analysis, data visualization, and data mining. However, the pursuit of the ground truth is impossible without enough general and domain knowledge. I will overview two main directions that I have been pushing to make it possible, with human intelligence and with artificial intelligence.
Despite all the efforts of preparing absolute good data, it is not enough for AI applications, where they typically have unseen test data. In this case, what we need is relative good data. I will introduce four main directions to prepare relative good data for datacentric AI: (1) not enough train data, (2) not aligned train and test data, (3) not ideal data preparation pipelines, and (4) not consistent data and labels in the train data. In particular, I will present my recent research on model charging for discovering relative good data in the wild for (1) handling not enough train data, and model patching for (2) handling not aligned train and test data due to noise shift. I will give concrete use cases (3&4) for the other two cases and discuss possible ways to address them.
Biography
Dr. Nan Tang is a senior scientist at Qatar Center for Artificial Intelligence, Qatar Computing Research Institute (QCRI), Qatar. His research interests center around preparing good data for successful data science. Prior to joining QCRI in Dec 2011, He was a Research Fellow at LFCS (Laboratory for Foundations of Computer Science) at the University of Edinburgh, Edinburgh, UK (20102011). He was a scientific staff member with the CWI (Dutch National Research Center for Mathematics and Computer Science), Amsterdam, Netherlands (20082010). He got his PhD. degree from The Chinese University of Hong Kong, China (2007). He holds a visiting position at MIT, US (07/201708/2017) and a visiting position at University of Waterloo, Canada (03/200708/2007). He has been a PC member of many flagship conferences in data science, such as SIGMOD, VLDB, ICDE, KDD, VIS, and CHI. He has received the VLDB 2010 best paper award, SIGMOD 2020 reproducibility award, VLDB 2021 distinguished reviewer award, and other four best paper nominations in VLDB and ICDE.
TIME Wednesday, January 26, 2022 at 12:00 PM  1:00 PM
CONTACT Pamela Villalovoz pmv@northwestern.edu EMAIL
CALENDAR Department of Computer Science

Feb7
EVENT DETAILS
Abstract
Generating data with complex patterns, such as images, audio, and molecular structures, requires fitting very flexible statistical models to the data distribution. Even in the age of deep neural networks, building such models is difficult because they typically require an intractable normalization procedure to represent a probability distribution. To address this challenge, I propose to model the vector field of gradients of the data distribution (known as the score function), which does not require normalization and therefore can take full advantage of the flexibility of deep neural networks. I will show how to (1) estimate the score function from data with principled statistical methods, (2) generate new data using stochastic differential equations and numerical techniques, and even (3) evaluate probabilities as in a traditional statistical model. The resulting method, called scorebased generative modeling, achieves recordbreaking performance in applications including image synthesis, image compression, texttospeech generation, time series prediction, and point cloud generation, challenging the longtime dominance of generative adversarial networks (GANs) on many of these tasks. Furthermore, unlike GANs, scorebased generative models are suitable for Bayesian reasoning tasks such as solving illposed inverse problems, and I have demonstrated their superior performance on examples like sparseview computed tomography and accelerated magnetic resonance imaging. Finally, I will discuss how scorebased generative modeling opens up new opportunities for building machines that can create and understand complex data in various disciplines of science and engineering.
Biography
Yang Song is a final year Ph.D. student at Stanford University. His research interest is in deep generative models and their applications to inverse problem solving and AI safety. His firstauthor papers have been recognized with an Outstanding Paper Award at ICLR2021, and an oral presentation at NeurIPS2019. He is a recipient of the Apple PhD Fellowship in AI/ML, the J.P. Morgan PhD Fellowship, and the WAIC Rising Star Award.
TIME Monday, February 7, 2022 at 12:00 PM  1:00 PM
CONTACT Pamela Villalovoz pmv@northwestern.edu EMAIL
CALENDAR Department of Computer Science

Feb16
EVENT DETAILS
Abstract
A central goal in modern data science is to design algorithms for statistical inference tasks such as community detection, highdimensional clustering, sparse PCA, and many others. Ideally these algorithms would be both statistically optimal and computationally efficient. However, it often seems impossible to achieve both these goals simultaneously: for many problems, the optimal statistical procedure involves a brute force search while all known polynomialtime algorithms are statistically suboptimal (requiring more data or higher signal strength than is informationtheoretically necessary). In the quest for optimal algorithms, it is therefore important to understand the fundamental statistical limitations of computationally efficient algorithms.
I will discuss an emerging theoretical framework for understanding these questions, based on studying the class of "lowdegree polynomial algorithms." This is a powerful class of algorithms that captures the best known polytime algorithms for a wide variety of statistical tasks. This perspective has led to the discovery of many new and improved algorithms, and also many matching lower bounds: we now have tools to prove failure of all lowdegree algorithms, which provides concrete evidence for inherent computational hardness of statistical problems. This line of work illustrates that lowdegree polynomials provide a unifying framework for understanding the computational complexity of a wide variety of statistical tasks, encompassing hypothesis testing, estimation, and optimization.
Biography
Alex Wein is currently a Postdoctoral Fellow at Georgia Tech, hosted by Santosh Vempala. Previously he was a SimonsBerkeley Research Fellow at UC Berkeley, and before that, a Courant Instructor at NYU. He obtained his PhD from the MIT Department of Mathematics in 2018, supervised by Ankur Moitra. His work is centered on mathematical foundations of data science, aiming to understand optimal algorithms and fundamental limitations for solving highdimensional inference and optimization problems in a computationally efficient manner.
TIME Wednesday, February 16, 2022 at 12:00 PM  1:00 PM
CONTACT Pamela Villalovoz pmv@northwestern.edu EMAIL
CALENDAR Department of Computer Science

Mar12
EVENT DETAILS
Winter Classes End
TIME Saturday, March 12, 2022
CONTACT Office of the Registrar nuregistrar@northwestern.edu EMAIL
CALENDAR University Academic Calendar

Mar19
EVENT DETAILS
Spring Break Begins
TIME Saturday, March 19, 2022
CONTACT Office of the Registrar nuregistrar@northwestern.edu EMAIL
CALENDAR University Academic Calendar

Mar28
EVENT DETAILS
Spring Break Ends
TIME Monday, March 28, 2022
CONTACT Office of the Registrar nuregistrar@northwestern.edu EMAIL
CALENDAR University Academic Calendar

Mar29
EVENT DETAILS
Spring Classes Begin 8 a.m.
TIME Tuesday, March 29, 2022
CONTACT Office of the Registrar nuregistrar@northwestern.edu EMAIL
CALENDAR University Academic Calendar

May30
EVENT DETAILS
Memorial Day (no classes)
TIME Monday, May 30, 2022
CONTACT Office of the Registrar nuregistrar@northwestern.edu EMAIL
CALENDAR University Academic Calendar