# NewsSeminars and Tutorials

## Upcoming Events

### OSL User Group Meetings

Check back soon for upcoming dates

## Past Events

### Distinguished Lecture Series

#### Learning Representations Using Causal Invariance

Leon Bottou

Research Scientist, Machine Learning and AI

Thursday, October 24, 2019

**Abstract: **Learning algorithms often capture spurious correlations present in the training data distribution instead of addressing the task of interest. Such spurious correlations occur because the data collection process is subject to uncontrolled confounding biases. Suppose however that we have access to multiple datasets exemplifying the same concept but whose distributions exhibit different biases. Can we learn something that is common across all these distributions, while ignoring the spurious ways in which they differ? This can be achieved by projecting the data into a representation space that satisfy a causal invariance criterion. This idea differs in important ways from previous work on statistical robustness or adversarial objectives. Similar to recent work on invariant feature selection, this is about discovering the actual mechanism underlying the data instead of modeling its superficial statistics.

**Bio: **Léon Bottou received the Diplôme d’Ingénieur de l’École Polytechnique (X84) in 1987, the Magistère de Mathématiques Fondamentales et Appliquées et d’Informatique from École Normale Supérieure in 1988, and a PhD in Computer Science from Université de Paris-Sud in 1991. His research career took him to AT&T Bell Laboratories, AT&T Labs Research, NEC Labs America and Microsoft. He joined Facebook AI Research in 2015. The long-term goal of Bottou’s research is to understand and replicate human-level intelligence. Because this goal requires conceptual advances that cannot be anticipated, Leon’s research has followed many practical and theoretical turns: neural networks applications in the late 1980s, stochastic gradient learning algorithms and statistical properties of learning systems in the early 1990s, computer vision applications with structured outputs in the late 1990s, theory of large scale learning in the 2000s. During the last few years, Bottou’s research aims to clarify the relation between learning and reasoning, with more and more focus on the many aspects of causation (inference, invariance, reasoning, affordance, and intuition.)

#### Systems and Street Smarts in Applied maChine Learning

Yuchen Wu, Senior Staff Engineer, Google

Thursday, October 10, 2019**Abstract: **Optimization and Machine Learning techniques have advanced phenomenally over the past decade. Real-world applications of these techniques, however, require not only the core algorithms, but also involve broader sets of system components, considerations and sometimes street smarts. This talk shares a few things I learned from the trenches building applied ML systems in the industry.

**Bio: **Yuchen Wu is a senior staff software engineer at Google. He works on applied ML in a number of areas including user modeling, recommendation, ads targeting and ranking. Prior to Google, Yuchen studied optimization at Northwestern University and received PhD in 2010.

#### Integer Programming for Learning Directed Acyclic Graphs From Continuous Data

Simge Kucukyavuz

Assoicate Professor, IEMS

**Abstract: **Probabilistic Graphical Models (PGMs) play an important role in modern artificial intelligence. Bayesian networks are PGMs in which the conditional probability relationships among random variables are represented in the form of a Directed Acyclic Graph (DAG); they are popular in genetics, biology, and causal inference. Learning DAGs from data is challenging because the set of possible candidate DAGs scales super-exponentially with the number of nodes. We consider continuous observational data, use the penalized negative log-likelihood score function with both L0 and L1 regularizations, and propose a new mixed-integer quadratic optimization model for solving the learning problem. We show that the proposed formulation has desirable theoretical properties and performs well in practice.

**Bio: **Simge Kucukyavuz is an associate professor in the Industrial Engineering and Management Sciences Department at Northwestern University. She received the 2011 NSF CAREER Award and the 2015 INFORMS Computing Society (ICS) Prize. She is the Chair-elect of ICS and serves on the editorial boards of Mathematics of Operations Research, Mathematical Programming, and Mathematical Programming Computation.

#### Inventory Balancing With Online Learning

Xinshang Wang

Alibaba Research Labs

Thursday, March 7, 2019

**Abstract: **(joint work with Wang Chi Cheung, Will Ma and David Simchi-Levi) We study a general problem of allocating limited resources to heterogeneous customers over time, under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources, and returns different rewards for the resources consumed. We consider a general model framework, where the resource consumption distribution associated with each (customer type, action) combination is not known, but is consistent and can be learned over time. In addition, the sequence of customer types to arrive over time is arbitrary and completely unknown. We achieve near optimality under both model uncertainty and customer heterogeneity by judiciously synergizing two algorithmic frameworks in the literature: inventory balancing, which "reserves" a portion of each resource for high-reward customer types which could later arrive; and online learning, which shows how to "explore'' the resource consumption distributions of each customer type under different actions. We define an auxiliary problem, which allows for existing competitive ratio and regret bounds to be seamlessly integrated. Furthermore, we show that the performance guarantee generated by our framework is tight, using the special case of the online bipartite matching problem with unknown match probabilities. Finally, we demonstrate the practicality and efficacy of algorithms generated by our framework using a publicly available hotel data set.

**Bio: **Xinshang Wang is a senior algorithm engineer at Alibaba's DAMO Academy. Before joining Alibaba, Xinshang worked as a postdoctoral associate at MIT’s Institute of Data, Systems, and Society. He received his PhD in operations research from Columbia University, and his BS in physics from Peking University. Xinshang's research spans several areas of stochastic and combinatorial optimization for modern service applications, including data-driven optimization under uncertain and dependent demands, and modeling of customer choice behavior for resource allocation problems. Applications of his interest include, but are not limited to, revenue management, healthcare operations and supply-chain management.

#### Avoiding Numerical Issues in Optimization Models

Daniel Espinoza

Senior Developer, Gurobi optimization

Thursday, October 11, 2018

**Abstract:** Models with numerical issues can lead to undesirable results: slow performance, wrong answers or inconsistent behavior. When solving a model with numerical issues, tiny changes in the model or computer can make a big difference in the results. In this talk I will give a quick overview of current MIP technology, describe what numerical issues are, including an overview of Gurobi guidelines, the impact numerical issues can have on your solutions, how to identify numerical issues within a model and finally how to avoid (most of) them by reformulating your model.

**Bio:** Daniel Espinoza holds a Ph.D. in Operations Research from Georgia Institute of Technology. He has published numerous papers in the fields of mathematical programming, computer optimization and operations research. His research interests include Cutting Planes, Large scale optimization, Applications in Electricity, Mine Planning, and other applications of Integer Programming. Prior to joining Gurobi, he was Associate Professor in the Department of Industrial Engineering at the Universidad de Chile.

#### Computational Approaches in Synthetic and Molecular Biology

Samuel D. Perli

Postdoctoral Scholar

Galdstone Institutes,

University of California San Francisco

**Abstract:** Thanks to advanced sequencing technologies, vast amounts of high resolution "omic" data that capture the biological state at unprecedented levels of resolution are now being made available. Cutting-edge computational tools that can operate on this data to analyze, infer and predict biological behavior are thoroughly needed.

I will introduce a few tools that were built to address specific computational challenges in biological systems. We will first discuss an Analog computation based synthetic ratio-meter designed and built in eukaryotic cells. Later, a Digital memory system that can record biologically relevant information in human cells will be presented.

Finally, I will introduce an information theoretic framework for analyzing stem cell differentiation.

**Bio**: Sam Perli is a Postdoctoral Fellow working with Prof. Shinya Yamanaka at Gladstone Institutes and UCSF. Sam's research interests lie at the intersection of Computation and Molecular Biology. Sam is currently working on developing next generation applications of pluripotent stem cells while building computational tools for characterizing pluripotent stem cell development. Sam obtained his PhD at MIT working with Prof. Timothy Lu and won the Dimitris N. Chorafas award for his thesis titled "An Integrated CRISPR-Cas Toolkit for Engineering Human Cells". Prior to his foray into Biology, Sam obtained his Masters degree in Computer Science at MIT, working as a Presidential Fellow and his Bachelors degree in Electrical Engineering from IIT Madras where he was awarded the Institute Silver medal.

#### IEMS/OSL Talk: Visual Computational Sociology: Methods and Challenges

Timnit GebruMicrosoft Research, New York

Tuesday, April 10, 2018

**Abstract: **Targeted socio-economic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learning driven approaches are cheaper and faster--with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to determine demographic attributes using the detect cars. To facilitate our work, we used a graph based algorithm to collect the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources. Our prediction results correlate well with ground truth in come (r=0.82), race, education, voting, sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Data mining based works such as this one can be used for many types of applications--some ethical and others not. I will finally discuss recent work (inspired by my experiences while working on this project), on auditing and exposing the gender and skin tone bias found in commercial gender classification systems. I will end with the concept of an AI datasheet to standardize information for datasets and pre-trained models, to push the field as a whole towards transparency and accountability.

**Biography: **Timnit Gebru works in the Fairness Accountability Transparency and Ethics (FATE) group at Microsoft Research, New York. Prior to joining Microsoft Research, she was a PhD student in the Stanford Artificial Intelligence Laboratory, studying computer vision under Fei-Fei Li. Her main research interest is in data mining large-scale, publicly available images to gain sociological insight, and working on computer vision problems that arise as a result, including fine-grained image recognition, scalable annotation of images, and domain adaptation. She is currently studying the ethical considerations underlying any data mining project, and methods of auditing and mitigating bias in sociotechnical systems. The New York Times, MIT Tech Review and others have recently covered her work. As a cofounder of the group Black in AI, she works to both increase diversity in the field and reduce the negative impacts of racial bias in training data used for human-centric machine learning models.

#### Algorithms for Nonconvex, Nonsmooth Otimization

Frank E. Curtis

Associate Professor, Department of Industrial amd Systems Engineering

Lehigh university

Friday, March 2, 2018

**Abstract: **Intuitive descriptions of algorithms for solving nonconvex, nonsmooth optimization problems are presented. Starting with fund- amentals of subgradient, cutting plane, and gradient sampling methods for solving convex problems, the groundwork is laid for advancing to modern adaptive methods for solving more challenging nonconvex problems. Algorithms are presented from a geometric viewpoint. The tutorial closes with a discussion of issues related to the implementation of adaptive methods in the NonOpt software package, currently under development in collaboration with Andreas Wächter (Northwestern University) Presentation Slides HERE

#### Intern and Postdoc Opportunities for Advancing Science and Engineering at Argonne

Prasanna Balaprakash

Computer Scientist at the Mathematics & Computer Science Division and the Leadershop Computing Facility, Argonne

Stephan Wild

Computational Mathematician and Deputy Division Director at the Mathematics & Computer Science Division, Argonne

**Abstract:** A number of student internships and postdoctoral fellow opportunities exist at Argonne National Laboratory, an open-science lab 30 miles southwest of Northwestern. We will provide an overview of research activities and opportunities spanning Argonne work in the computational, computer, data, and mathematical sciences; artificial intelligence; operations research; and statistics. Relevant activities span Argonne and involve Energy Systems Division; Global Security Sciences Division; Laboratory for Applied Mathematics, Numerical Software, and Statistics; Leadership Computing Facility; and Mathematics & Computer Science Division, as well as the joint Northwestern Argonee Institute of Science and Engineering. This interactive presentation will include opportunities at all education levels.

#### Recent Research in Power System Optimization: Approximation Error Quantification, Feasible Space Computation, & Convex Relaxations

Daniel Molzahn

Computational Engineer

Argonne National Laboratory

**Abstract: **The power flow equations model the relationship between the voltages phasors and power flows in an electric power system. The nonlinearity of the power flow equations results in a variety of algorithmic and theoretical challenges, including non-convex feasible spaces for optimization problems constrained by these equations. This presentation focuses on three recent developments relevant to the power flow equations. By leveraging advances in convex relaxation techniques, this presentation first describes recent progress regarding a method for bounding the worst-case errors resulting from power flow linearizations. Using another new algorithm based on homotopy methods, this presentation next illustrates challenging feasible spaces associated with power system optimization problems. Finally, this presentation describes recent research using convex "moment" relaxations to find the global optima of many power system optimization problems.

**Bio: Dr. Daniel Molzahn **is a computational engineer at Argonne National Laboratory in the Center for Energy, Environmental, and Economic Systems Analysis. Prior to his current position, Daniel was a Dow Sustainability Fellow at the University of Michigan. Daniel received the B.S., M.S. and Ph.D. degrees in Electrical Engineering and the Masters of Public Affairs degree from the University of Wisconsin–Madison, where he was a National Science Foundation Graduate Research Fellow. His research interests are in the development of optimization and control techniques for electric power systems.

#### Automating the analysis and design of large-scale optimization algorithms

Laurent Lessard

Assistant Professor, Electrical and Computer Engineering

University of Wisconsin-Madison

Wednesday, October 11, 2017

**ABSTRACT:** Most complicated optimization problems, in particular those involving a large number of variables, are solved in practice using iterative algorithms. The problem of selecting a suitable algorithm is currently more of an art than a science; a great deal of expertise is required to know which algorithms to try and how to properly tune them. Moreover, there are seldom performance guarantees. In this talk, I will show how the problem of algorithm selection can be approached using tools from robust control theory. By solving simple semidefinite programs (that do not scale with problem size), we can derive robust bounds on convergence rates for popular algorithms such as the gradient method, proximal methods, fast/accelerated methods, and operator-splitting methods such as ADMM. The bounds derived in this manner either match or improve upon the best known bounds from the literature. The bounds also lead to a natural energy dissipation interpretation and an associated Lyapunov function. Finally, our framework can be used to search for algorithms that meet desired performance specifications, thus establishing a principled methodology for designing new algorithms.

**BIOGRAPHY:** Laurent Lessard is an Assistant Professor of Electrical and Computer Engineering at the UW–Madison and faculty member of the Optimization Group at the Wisconsin Institute for Discovery. Laurent received the B.A.Sc. in Engineering Science from the University of Toronto, and received the M.S. and Ph.D. in Aeronautics and Astronautics at Stanford University. After completing his doctoral work, Laurent was an LCCC Postdoc at Lund University in Sweden, and a postdoctoral researcher at the University of California, Berkeley. His research interests include: decentralized control, robust control, optimization, and machine learning.

#### Stein's Identity and Near-Optimal Estimation in High-dimensional Index Models

Krishna Balasubramanian

Department of ORFE, Princeton University

Friday July 7, 2017

**Abstract:** In this talk I will present a method for estimating the parametric components of semi-parametric multiple index models (a.k.a. not-so-deep neural nets) in a high-dimensional non-Gaussian setting. The proposed estimators leverage the score function based first and second-order Stein's identity and do not require Gaussian or elliptical symmetry assumptions made in the literature. Moreover, to handle score functions and response variables that are heavy-tailed, the estimators are con-structed via carefully thresholding their empirical counterparts. On the theoretical front, I will show that the proposed estimator achieves near-optimal statistical rate of convergence in several settings. I will end this talk with a discussion of on-going work on extending similar ideas for near-optimal estimation in 2-layer neural networks under high-dimensional non-Gaussian assumptions.

#### Variance Reduction for Restricted Strong Convexity

Chao Qu

Postdoctoral Candidate, National University of Singapore

Thursday, April 13, 2017

**Abstract: **Computational challenges of statistical estimators and machine learning algorithms are an active area of study thanks to countless applications involving big data. In particular, several variants of stochastic first order methods based on variance reduction, namely SVRG, SDCA and SAGA, have attracted significant attention because of light computation load for each iteration, and fast convergence speed - for strongly convex problems they all converge with a linear rate. Yet, many powerful statistical and machine learning models (e.g., Lasso, group Lasso, nuclear norm regularized matrix completion), are not strongly convex, and sometimes even non-convex (a prominent example being the SCAD regularizer). In this talk, I will argue that a nice statistical property of those problems, namely restricted strong convexity, ensures that linear convergence of variance reduced stochastic first order methods for those statistical models. From a high level, our results re-confirmed a well-observed phenomenon that statistically easy problems tend to be more computationally friendly. We believe this intriguing fact indicates that there are fundamental relationships between statistics and optimization, and understanding such relationships may shed deep insights.

**Bio-Sketch**: Chao Qu is a PhD student in the Department of Mechanical Engineering at National University of Singapore, advised by Huan Xu. His research interests mainly focuses on machine learning, high dimensional statistics and large scale optimization. Prior to National University of Singapore, he received his M.Phil. degree in ECE from Hong Kong University of Science and Technology and B. Eng. degree in Information Engineering from Xi’an Jiao Tong University.

A box lunch will be served- please rsvp to tara.sadera@northwestern.edu

#### Approximate Versions of the Alternating Direction Method of Multipliers

Jonathan Eckstein

Professor, Management Science and Information Systems

Rutgers University

Thursday, November 3, 2016

**Abstract:** The Alternating Direction Method of Multipliers (ADMM) is a decomposition method for convex optimization currently enjoying popularity in the solution of machine learning and image processing problems. This talk reviews the convergence mechanism of the ADMM and presents three new, provably convergent approximate versions in which one or both optimization subproblems arising at each iteration may be solved inexactly using practically testable approximation criteria. computational tests applying these methods on several different problem classes indicate that in cases in which an iterative method is used for at least one of the ADMM subproblems, these variants can significantly reduce total computational effort. We also discuss the possible application of these approximation techniques to the Progressive Hedging (PH) algorithm for stochastic programming, which may be viewed as a particular case of the ADMM.

Click HERE for Professor Eckstein's Bio

#### Optimization Methods for Learning and Big Data

Donald Goldfarb

Alexander and Hermine Avanessians Professor of Industrial Engineering & Operations Research

Columbia University

Thursday, October 20, 2016

**Abstract:** Many problems in machine learning (e.g., logistic regression, support vector machines, neural networks, robust principal component analysis) and signal processing (e.g., face recognition and compressed sensing) are solved by optimization algorithms. In today's age of big data, the size of these problems is often formidable. In this talk, we will discuss some of our work that addresses this challenge using the following approaches: first-order (including proximal and conditional gradient) methods, accelerated variants, stochastic gradient and second-order extensions (including full and limited memory block BFGS updating methods), and alternating direction (multiplier) methods for structured problems.

Click HERE for Professor Goldfarb's Bio

#### Bridging the Gap Between Real-world Decision Making and Mathematics: Multi-objective Optimisation in Action

Matthias Ehrgott

Professor, Head of Department,

Management Sciences

University of Lancaster

Wednesday, October 19, 2016

**Abstract:** In this talk I will first present several studies that illustrate the power of multi-objective optimisation in supporting decision makers facing conflicting goals. These studies are drawn from the fields of transportation (airline operations, traffic modelling), health radiation therapy treatment), and manufacturing (of composite materials). I will illustrate the widely differing mathematical methods used in these applications, but emphasise the common benefits of the multi-objective optimisation approach: The improved understanding of the problem achieved through additional insight into the problem structure; the improved support for decision makers through the ability to properly account for trade-offs between conflicting goals; and last but not least, the considerable benefits that result in terms of quality of decisions and/or improved decision making processes.

In the second part of the talk I will focus on a specific application in the radiotherapy treatment planning, namely the generation of deliverable treatment plans and beam angle optimisation in the presence of multiple objectives. External radiation therapy is one of the major forms of treatment of cancer. Planning a radiation treatment for a patient involves making trade-offs between the main goals of radiotherapy, namely to irradiate the tumour according to some prescription to affect tumour control and to avoid damage to surrounding healthy tissue. This conflict permeates all aspects of treatment planning from the selection of beam angles to the optimisation of fluence maps to the sequencing of the gantry for treatment delivery. In this talk I will first describe a matheuristic approach to incorporate multi-objective fluence optimisation in the beam angle optimisation problem and then describe a column generation approach to the multi-objective fluence map optimisation problem, which achieves a reduction of the number of apertures and total fluence required to deliver a Pareto-optimal treatment.

Click HERE for Professor Ehrgott's Bio

#### In-Network Nonconvex Big Data Optimization

Gesualdo Scutari

Associate Professor of Industrial Engineering

Purdue University

Monday, October 10, 2016

**Abstract:** Nowadays, large-scale systems are ubiquitous. Some examples/applications include wireless communication networks; electricity grid, sensor, and cloud networks; and machine learning and signal processing applications, just to name a few. In many of the above systems, i) data are distributively stored in the network (e.g., clouds, computers, sensors, robots), and ii) it is often impossible to run analytics on central fusion centers, owing to the volume of data, energy constraints, and/or privacy issues. Thus, distributed in-network processing with parallelized multi-processors is preferred. Moreover, many applications of interest lead to large-scale optimization problems with nonconvex, nonseparable objective functions. All this makes the analysis and design of distributed/parallel algorithms over networks a challenging task.

In this talk we will present our ongoing work in this area. More specifically, we consider a large-scale network composed of agents aiming to distributively minimize a (nonconvex) smooth sum-utility function plus a nonsmooth (nonseparable), convex one. The latter is usually employed to enforce some structure in the solution, e.g., sparsity. The agents have access only to their local functions but not the whole objective, and the network is modeled as a directed, time-varying, B- strongly connected graph. We propose a distributed solution method for the above optimization wherein the agents in parallel minimize a convex surrogate of the original nonconvex objective while using a novel broadcast protocol to distribute the computations over the network. We discuss several instances of the general algorithm framework tailored to specific (convex and nonconvex) applications and present some numerical results validating our theoretical findings.

Click **HERE** for Professor Scutari's Bio

** **

#### Seminar: Bayesian Optimization in the Tech Sector: The Parallel Knowledge Gradient Method, and Experiences at Uber, Yelp, and SigOpt

Peter Frazier, Cornell University

Tuesday, May 24

**Seminar Abstract**:

We discuss parallel derivative-free global optimization of expensive-to-evaluate functions, and the use of Bayesian optimization methods for solving these problems, in the context of applications from the tech sector: optimizing e-commerce systems, real-time economic markets, mobile apps, and hyperparameters in machine learning algorithms. We describe optimization as a decision-making task ("where should I sample next?") and study it using decision-theory, by placing a Bayesian prior distribution on the objective function, and choosing the set of points to evaluate next that provide the largest value of the information. Using this approach, we develop a novel algorithm for parallel derivative-free global optimization of expensive functions, called the parallel knowledge gradient (pKG) method, which outperforms previously state-of-the-art Bayesian methods. We conclude with examples of the impact of these methods in the tech sector, from the speakers' experiences working with Yelp, Uber, and the Bayesian Optimization startup company, SigOpt.

#### Tutorial: Coding for Researchers

Wednesday, May 25, 4:00 - 6:00 pm

**Tutorial Abstract:**

In this tutorial, we discuss techniques used in the software engineering industry to develop and manage large software products that are also useful for researchers. These techniques allow developing code more quickly, and result in code that is faster, more reliable, easier to use, and easier for others to understand, use, and extend. This tutorial will cover version control (focusing on git), unit testing, profiling, code optimization, code organization, techniques for managing computationally expensive experiments, and steps you can take to get people in the real world to actually use your ideas. This tutorial is designed for students who are writing software as part of their research, but who have not had prior exposure to techniques used in the software engineering industry. Examples will be given in Python, C, and Matlab.

Bio: Peter I. Frazier is an Associate Professor in the School of Operations Research and Information Engineering at Cornell University, and currently on sabbatical leave at Uber, where he is a Staff Data Scientist and the Data Science Manager for uberPOOL, Uber's carpooling service. His research interest is in sequential decision-making and Bayesian statistics, focusing on Bayesian optimization and optimal learning.

** **

#### Extracting Governing Equations in Chaotic Systems From Highly Corrupted Data

Rachel Ward, University of Texas at Austin

Tuesday, May 10, 2016

Select HERE for Seminar Abstract

Bio: Rachel Ward obtained her BS in Mathematics from University of Texas at Austin, and her PhD in Applied and Computational Mathematics from Princeton University. She is currently an Assistant Professor in the Department of Mathematics and ICES member at University of Texas at Austin.

#### Recent Advancements in Optimization for the Big Data Regime

Sham Kakade, University of Washington

Monday, April 25

#### Multi-View Representation Learning With Canonical Correlation Analysis

Weiran Wang, Toyota Technological Institute at Chicago

Monday, April 18

Canonical correlation analysis (CCA) has been the main workhorse for multi-view feature learning, where we have access to multiple ''views'' of data at training time while only one primary view is available at test time. The idea of CCA is to project the views to a common space such that the projections of different views are maximally correlated.

In the first part of the talk, we compare different nonlinear extensions of CCA, and find that the deep neural network extension of CCA, termed deep CCA (DCCA), has consistently good performance while being computationally efficient for large datasets. We further compare DCCA with deep autoencoder-based approaches, as well as new variants. We find an advantage for correlation-based representation learning, while the best results on most tasks are obtained with a new variant, deep canonically correlated autoencoders (DCCAE).

In the second part of the talk, we study the stochastic optimization of canonical correlation analysis, whose objective is nonconvex and does not decouple over training samples. Although several stochastic optimization algorithms have been recently proposed to solve this problem, no global convergence guarantee was provided by any of them. Based on the alternating least squares formulation of CCA, we propose a globally convergent stochastic algorithm, which solves the resulting least squares problems approximately to sufficient accuracy with state-of-the-art stochastic gradient methods. We provide the overall time complexity of our algorithm which improves upon that of previous work.

This talk includes joint work with Raman Arora (JHU), Jeff Bilmes (UW), Jialei Wang (U Chicago), Nati Srebro (TTIC), and Karen Livescu (TTIC).

Weiran Wang is a postdoctoral scholar at Toyota Technological Institute at Chicago. He obtained the PhD degree from the EECS Department at University of California, Merced in 2013. His research includes algorithms for deep learning, multi-view representation learning, structured output (sequence) prediction, manifold learning, optimization for machine learning, and applications to acoustic modeling for speech recognition.

#### Fast and Simple PCA via Convex Optimization

Dan Garber, Toyota Technological Institutue of Chicago

Tuesday, April 5

Computing the leading eigenvector of a real symmetric matrix is a fundamental problem in numerical linear algebra with numerous important applications, the most notable one probably being principal component analysis. Thus, in face of the ever increasing size of modern datasets, designing faster algorithms for this problem is an important challenge. Unfortunately, this problem is inherently non-convex, which makes it very difficult to adapt to it powerful techniques that were developed for convex problems, such as Nesterov's acceleration and various stochastic gradient methods, that proved highly useful for handling very large scale convex problems.

In this talk I will show that the leading eigenvector problem, though not convex, is reducible to solving a short (i.e. only poly-logarithmic in the important parameters of the problem) sequence of well-conditioned unconstrained convex optimization problems. This in turn, allows us to apply state-of-the-art stochastic gradient methods for solving this sequence of problems. As a result we derive algorithms for computing the leading principal component which are the fastest to date in a wide regime of parameters.

Dan Garber is a Research Assistant Professor at the Toyota Technological Institute at Chicago, a philanthropically endowed academic computer science institute located on the University of Chicago campus. Dan's research interests lie in the intersection of machine learning and continuous optimization, with the main focus being the development of efficient algorithms with novel and provable performance guarantees for basic machine learning, data analysis, decision making and optimization problems. Research highlights include faster projection-free first-order convex optimization algorithms, sublinear-time algorithms for semidefinite programming, and algorithms with stronger guarantees for principal component analysis in a variety of settings including offline (classical) computation, streaming and online.

Dan received both his Ph.D and his M.Sc degrees from the Technion - Israel Institute of Technology, where he worked under the supervision of Prof. Elad Hazan. Before that, Dan completed his bachelor's degree in computer engineering, also in the Technion.

#### Seminar: Advances in Fitting Statistical Models to Huge Datasets

Mark Schmidt, University of British Columbia

Tuesday, March 8, 2016

**Seminar Abstract**:

The talk will contain two parts:

In the first part, I will consider the problem of minimizing a finite sum of smooth functions. This is a ubiquitous computational problem across science and engineering, and includes special cases like least squares and logistic regression. I will describe the stochastic average gradient algorithm which, despite over 60 years of work on stochastic gradient algorithms, is the first method to achieve the low iteration cost of stochastic gradient methods while achieving a linear convergence rate as in deterministic gradient methods that process the entire dataset on every iteration.

In the second part, I will consider the even-more-specialized case where we have a linearly-parameterized model (such as linear least squares or logistic regression). I will talk about how coordinate descent methods, though a terrible idea for minimizing general functions, are theoretically and empirically well-suited to solving such problems. I will also discuss how we can design clever coordinate selection rules, that are much more efficient than the classic cyclic and randomized choices.

#### Tutorial: The Why and How of Releasing Applied-Math Code

In this talk I will argue that researchers should be releasing more code, whether it be implementations of algorithms proposed in their papers or even reference implementations of basic methods from their field. I'll discuss the benefits of this to the person releasing the code as well as to their field and science/engineering as a whole. Then I'll turn to my personal views on the factors that should go into designing a good code: design principles and making code extendable, performance vs. generality, licensing options, and user support/interfaces. Finally, for the specific case of codes related to solving optimization (where continuous or discrete or decision), I will discuss the *real* objective function that all users care about but that is completely ignored in most research on this topic.
Bio: Mark Schmidt has been an assistant professor in the Department of Computer Science at the University of British Columbia since 2014, and a Canada Research Chair since 2016. His research focuses on developing faster algorithms for large-scale machine learning, with an emphasis on methods with provable convergence rates and that can be applied to structured prediction problems. From 2011 through 2013 he worked at the École normale supérieure in Paris on inexact and stochastic convex optimization methods. He finished his M.Sc. in 2005 at the University of Alberta working as part of the Brain Tumor Analysis Project, and his Ph.D. in 2010 at the University of British Columbia working on graphical model structure learning with L1-regularization. He has also worked at Siemens Medical Solutions on heart motion abnormality detection, with Michael Friedlander in the Scientific Computing Laboratory at the University of British Columbia on semi-stochastic optimization methods, and with Anoop Sarkar at Simon Fraser University on large-scale training of natural language models.

#### Robust Bivariate Error Detection in Skewed Data With Application to Historical Radiosonde Winds

Amanda Hering, Colorado School of Mines

Tuesday, February 23

Select HERE for Seminar Abstract

Bio: Amanda Hering graduated from Baylor University with a bachelors degree in mathematics in 1999. She joined the Department of Applied Mathematics and Statistics at The Colorado School of Mines as an Assistant Professor in 2009 after earning her Ph.D. in Statistics from Texas A&M University. She is the CSM Site Director for the Center for Research and Education in Wind, and her research interests are in the areas of spatial and space-time modeling, wind speed forecasting, model validation, and multivariate methods. She is an Associate Editor for four journal, including Technometricsand Environmetrics. Association and past Associate Editor or Editorial Board Member of Technometrics, Journal of Quality Technology, Operations Research, and Naval Research Logistics.

#### Robust Maximum Likelihood Estimation

Omid Nohadani, Northwestern University, Department of Industrial Engineering & Management Sciences

Tuesday, November 17th

In many applications, statistical estimators serve to derive conclusions from data, most prominently in finance, medical decision-making and clinical trials. However, the conclusions are typically dependent on uncertainties in the data. We use robust optimization principles to construct robust maximum likelihood estimators that are immune against data errors. Both types of errors are considered: a) adversarial type, modeled using the notion of uncertainty sets, and b) probabilistic type, modeled by distributions. We provide efficient local and global search algorithms to compute the robust estimators and discuss them in details for the case of multivariate normally distributed data. The estimator performance is demonstrated on two datasets. First, using computer simulations, we demonstrate that the proposed estimators are robust against both types of data uncertainty and provide significantly more accurate estimates, compared to classical estimators which degrade significantly, when errors are encountered. We establish a range of uncertainty sizes, for which robust estimators are superior. Second, we analyze deviations in cancer radiation therapy planning. Uncertainties amongst plans are caused by patients’ individual anatomies and the trial-and-error nature of the process. Robust estimators prove to be result in more reliable decisions when applied to a large set of past treatments.

#### Learning (From) Networks: Fundamental Limits, Algorithms, and Applications

Soheil Feizi, MIT

Friday, November 6th

Network models provide a unifying framework for understanding dependencies among variables in medical, biological, and other sciences. Networks can be used to reveal underlying data structures, infer functional modules, and facilitate experiment design. In practice, however, size, uncertainty and complexity of the underlying associations render these applications challenging.

In this talk, we illustrate the use of spectral, combinatorial, and statistical inference techniques in several significant network science problems. First, we consider the problem of network alignment where the goal is to find a bijective mapping between nodes of two networks to maximize their overlapping edges while minimizing mismatches. To solve this combinatorial problem, we present a new scalable spectral algorithm, and establish its efficiency theoretically and experimentally over several synthetic and real networks. Next, we introduce network maximal correlation (NMC) as an essential measure to capture nonlinear associations in networks. We characterize NMC using geometric properties of Hilbert spaces and illustrate its application in learning network topology when variables have unknown nonlinear dependencies. Finally, we discuss the problem of learning low dimensional structures (such as clusters) in large networks, where we introduce logistic Random Dot Product Graphs, a new class of networks which includes most stochastic block models as well as other low dimensional structures. Using this model, we propose a spectral network clustering algorithm that possesses robust performance under different clustering setups. In all of these problems, we examine underlying fundamental limits and present efficient algorithms for solving them. We also highlight applications of the proposed algorithms to data-driven problems such as functional and regulatory genomics of human diseases, and cancer.

Tuesday, October 27th

Yuan Luo

Assistant Professor, Northwestern University, Division of Health & Biomedical Informatics and Department of Industrial Engineering & Management Sciences

This talk covers our works on applying subgraph mining and factorization algorithms to clinical narrative text, ICU physiologic time series and computational genomics. The common theme of these works aims at building clinical models that improve both prediction accuracy and interpretability, by exploring relational information in each data modality.

This talk focuses on three concrete examples, implicating neurodevelopmentally coregulated exon clusters in phenotypes of Autism Spectrum Disorder (ASD), predicting mortality risk of ICU patients based on their physiologic measurement time series, and identifying subtypes of lymphoma patients based on pathology report text. In each example, we show how to automatically extract relational information into a graph representation and how to collect important subgraphs that are of interest. Depending on the degree of structure in the data format, heavier machinery of factorization models becomes necessary to reliably group important subgraphs. We demonstrate that these methods lead to not only improved performance but also better interpretability in each application.