Student Research | Chocolate Chips and Fish Sauce: A Network Analysis and Visualization in Ingredient Pairings

This excerpt is taken from an MSiA student research blog posting. Each month, students in our program submit original extracurricular research as part of our blog competition. The winner(s) are published to the MSiA Student Research Blog, our program website, and receive a chance to attend an analytics conference of their choice. Visit our blog to see more.

Accompany visualization found at https://finnqiao.github.io/recipe_universe_d3/

Introduction

In 2014, a food truck emblazoned with “IBM Cognitive Cooking” arrived at South by Southwest ready to serve up algorithmically generated street fare. Along with IBM’s Jeopardy AI, they were the early applications that would help ease AI into the public sphere.

Figure 1: IBM Cognitive Cooking at South by Southwest

The recipes from Cognitive Cooking was then codified into the cookbook “Cognitive Cooking with Chef Watson” and a website where users could get recipe inspiration for chosen recipes and cuisines. While the site was active, it helped inspire home chefs all over the world to experiment with unexpected ingredient and flavor combinations. Chef Watson was based on text analysis Bon Appetit recipes and an understanding of chemical flavor compounds.

Figure 2: Example of Cognitive Cooking recipe

Despite being an effective demonstration of what an everyday AI application would look like, the website was discontinued in 2018 due to a lack of traction and commercial use cases. Furthermore, the user interface was unintuitive and needlessly complex, rendering the service a novelty rather than a necessity for home cooks.

While IBM Watson focused on the novelty factor by heavily featuring the intricacies of chemical flavor compounds in their algorithms, such a presentation isn’t necessarily immediately explainable to an everyday home cook. I thus wanted to apply natural language processing techniques and more intuitive metrics such as pairing counts across recipes to present the connections between ingredients in a more intuitive, interactive, and design-forward manner.

Ultimately, the goal of such an analysis and visualization would be to identify clusters of ingredients based on the base metric of ingredient pair counts across recipes. By tweaking the algorithms for this frequency metric to weight rare and common ingredients differently, lesser known connections between ingredients would be useful in thinking about how to pair lesser seen ingredients.

For example, chocolate is often added in stews and moles to thicken up the sauce and add a depth of flavor and bitterness. Fish sauce is often the “secret ingredient” for red meat stews because of how well the umami flavor brings out the flavor of red meat. As a result, chocolate and fish sauce do pair very well together in a certain subset of red meat stew dishes and it would be interesting to parse out such oft-overlooked ingredient pairings.

Figure 3: Ox tail stew made with the above-mentioned chocolate, fish sauce combination
Figure 4: Ox tail stew made with the above-mentioned chocolate, fish sauce combination

With this in mind, I set out to visualize the connections between these ingredients across the most popular recipe sites: Food Network, Allrecipes, and Epicurious. Conveniently, there was already an open source python script to scrape these recipes at https://github.com/rtlee9/recipe-box. A quick scrape returned 124,647 recipes.

Network analysis was chosen as the primary technical approach for this problem as it was the most effective in capturing the prominence of both individual ingredients and ingredient pairs. The strength of each ingredient pair is also captured by the well understood concept of edge weights that will be touched on below.

With the scraped recipe data, NLP analysis was first conducted to clean up the strings and identify ingredients in each entry. A list of all possible ingredient pairings and their counts was then created in order to form the network edge and weights. Network analysis was then performed to generate the visualization and edge weights are defined by a basic count of pairings and a custom Ingredient Frequency – Inverse Recipe Frequency metric.

Read More

View media coverage of our news story at the following link: https://sites.northwestern.edu/msia/2019/05/21/chocolate-chips-and-fish-sauce-a-network-analysis-and-visualization-in-ingredient-pairings/