Validating Generative AI-Based Social Science

Mar 10, 2026Michelle Mohney

At the Feb. 20 symposium “Validating Generative AI-Based Social Sciences,” the Northwestern University Center for Human-Computer Interaction + Design (HCI+D)—a collaboration between Northwestern Engineering and Northwestern’s School of Communication—convened computer scientists, economists, social scientists, and statisticians to examine how generative AI is reshaping the study of human behavior.

Symposium participants—which included faculty, industry researchers, and graduate students—explored the types of tests, methods, and systems necessary to ensure that social and behavioral data generated by large language models (LLMs) remains credible, replicable, and actionable.

“Some of the most important questions about generative AI sit at the boundaries between disciplines,” said Darren Gergle, codirector of HCI+D, Bao Family Professor in Human-Computer Interaction at the School of Communication, and co-director of graduate studies for Northwestern’s Technology and Social Behavior PhD program. “Our goal was to help build a research community around its use in social science. These questions are far too consequential to be left to any one individual, lab, or institution.”

Codirectors of HCI+D(from left) Elizabeth Gerber, Bryan Pardo, and Darren Gergle welcomed computer scientists, economists, social scientists, and statisticians to the Feb. 20 symposium.Photo courtesy the Center for Human-Computer Interaction + Design

Christopher Schuh, dean of the McCormick School of Engineering, and Jessica Hullman, Ginni Rometty Professor of Computer Science, welcomed the guests. Supported by Northwestern’s Cognitive Science Program, the symposium was structured around three lightning talks and Q&A sessions followed by a large group and breakout group discussions.

“There’s a lot of excitement over using language models to supplement human data in fields like psychology or marketing, but little broad consensus about how to do this in a statistically valid way,” said Hullman, who is also a faculty fellow at Northwestern’s Institute for Policy Research. “So we brought together a group of researchers who are thinking very deliberatively about what is needed.”

Hullman co-organized the event with Yingdan Lu, assistant professor of communication studies; and Aaron Shaw, associate professor of communication studies.

“As generative AI becomes part of how we conduct social science, we have to revisit the foundations of our methods. The symposium created space for interdisciplinary dialogue about what rigorous and replicable AI-driven research might look like,” said Gergle.

As generative AI becomes part of how we conduct social science, we have to revisit the foundations of our methods. The symposium created space for interdisciplinary dialogue about what rigorous and replicable AI-driven research might look like.

Darren GergleCodirector of the Center for Human-Computer Interaction + Design, Bao Family Professor in Human-Computer Interaction at Northwestern's School of Communication, and (by courtesy) Professor of Computer Science at Northwestern Engineering

In the first session, David Broska (Stanford University), Serina Chang (University of California, Berkeley), and Kristina Gligorić (Johns Hopkins University) discussed human-grounded evaluation methods of AI model simulations. Key topics posed by the speakers covered: how researchers can account for bias and uncertainty that result when survey responses are predicted by LLMs rather than observed, and what types of valid experimental designs can effectively combine human subjects with AI-powered synthetic survey respondents to better approximate real response behavior.

During the second lightning talk session, Eli Ben-Michael (Carnegie Mellon University), Patryk Perkowski (Yeshiva University), Austin van Loon (Massachusetts Institute of Technology), and Angelina Wang (Cornell Tech and Cornell University) examined how to preserve causal validity when AI enters the experimental pipeline. They discussed the potential of LLMs to enable unbiased causal estimates under some conditions but also addressed critical ethical concerns around potential representational harms.

The final session included lightning talks by John Horton (MIT Sloan School of Management), Jonne Kamphorst (Sciences Po), Austin Kozlowski (University of Chicago), and Michael Spadafore (Ford Motor Company). The speakers discussed topics related to the broader set of potential applications of generative AI-driven research, ranging from the internal semantic structure of LLMs to how simulated social agents perform in behavioral research and consumer insight markets.

“The workshop generated a lot of excitement among participants for its focused look at critical validation questions, and we hope to keep the conversation going through follow-up events,” said Hullman.

Validating Generative AI-Based Social Science

HCI+D hosted a symposium to examine how generative AI is reshaping the study of human behavior

As generative AI becomes part of how we conduct social science, we have to revisit the foundations of our methods. The symposium created space for interdisciplinary dialogue about what rigorous and replicable AI-driven research might look like.

Get our news in your inbox.

Check out our magazine.