Martin Wainwright Discusses Big Data Algorithms at CS+X Colloquium

Sketching is a fast way to optimize large data sets

Everyone might be talking about Big Data, but it’s not the data itself that’s interesting. It’s the inferences and insight hidden within the data that make it valuable. But as the amount of data continues to increase exponentially, these insights are harder and harder to find.

Many algorithms cannot handle the exponentially increasing amount of data.“We are experiencing a data deluge,” Martin Wainwright said during Wednesday’s CS+X Colloquium. “Even the fastest algorithms are no longer efficient enough to handle it.”

A professor of statistics and computer science at the University of California at Berkeley, Wainwright discussed challenges and solutions for managing Big Data in his lecture, “Statistics Meets Optimization: Fast Randomized Algorithms for Large Data Sets,” on Wednesday, March 16.

Wainwright demonstrated his work using random dimensionality reduction techniques, or sketching, to optimize large data sets quickly. First appearing in the 1980s, sketching removes redundant information and breaks down large data sets into smaller pieces to be solved incrementally. The result is a “good enough” approximate answer or “sketch” of the data stream.

Wainwright shared strategies for deciding when to use a sketching algorithm and outlined its fundamental flaws. One example for an application is Netflix’s recommender system. Wainwright said a sketching algorithm could quickly sort through the enormous base of movies and users to predict what movies people will most enjoy.

“It’s an embarrassingly simple method and can be used in many, many areas,” Wainwright said. “Even though it’s simple, it’s amazingly powerful.”

Hosted by the Department of Electrical Engineering and Computer Science, the CS+X series explores ways in which computer science can augment and be augmented by other disciplines. Northwestern will host Abraham Flaxman, mathematician and computer scientist from the University of Washington Institute for Health Metrics and Evaluation, for the next CS+X colloquium on Monday, March 28.