Downey Contributes to Search Engine for Scientists

Semantic Scholar makes searching through scientific literature faster, easier, and smarter

When it comes to scientific journal articles, the world is suffering from a data deluge. About 5,000 new articles are published every single day, totaling up to nearly 2 million per year. For the scientists who slog through this mass of information to find past literature about their research subjects, it’s often more than the brain can handle.

Doug DowneyThis past Monday, the Allen Institute for Artificial Intelligence launched Semantic Scholar, a new search engine aimed at combatting information overload. It offers smarter searching and easier-to-digest results. Northwestern Engineering’s Douglas Downey delivered some of the algorithms that make this new search engine function.

“Scientists spend a lot of time searching through past literature to see what’s been done and how to build on that work,” said Downey, associate professor of electrical engineering and computer science. “It’s hard to keep up with the number of papers that are published every day, so scientists need specialized tools — more powerful tools than the typical web searcher.”

Currently, researchers use online tools, such as Google Scholar and PubMed, to search through past literature. But Semantic Scholar goes beyond the basic search to offer more innovative features. Using the algorithm that Downey built with PhD students Chandra Sekhar Bhagavatula and Thanapon Noraset, it extracts the most important keywords and phrases from scientific papers.

Semantic Scholar uses machine reading to crawl the web, finding PDFs of open-access scientific papers and extracts both text and figures. The search engine allows the reader to scan Downey’s keyword-generated summaries — along with abstracts, figures, tables, and number of citations — without downloading the actual PDF of the paper. 

“Semantic Scholar gives people five or so key phrases that offer a succinct summary of the paper,” Downey said. “So they can understand the subject at a glance without investing the time it takes to read the whole paper.”

Semantic Scholar also identifies influential citations and references, making it easier to find the most useful papers, and offers advanced filtering options to only find the most relevant results. It aggregates all of the work done on one particular subject, suggests relevant conferences on that topic, and creates timelines of how the popularity of the subject matter has changed over time.

For now, the free tool can only search through three million open-access journal articles about computer science. Semantic Scholar will add neuroscience papers in 2016 and continue to expand to include all open-access scientific literature available online.

“The key phrase extraction task that we focused on is a well-studied task,” Downey said. “But it’s rarely used on such a big scale, which makes it more challenging. This has been a fun and valuable experience, and it’s satisfying to see thousands of people use our work.”