Anna Leshinskaya

Program

AI Safety Science
Institution

UC Irvine
Location

USA

Dr. Leshinskaya is an Assistant Professor in the Department of Cognitive Sciences and a Fellow at the Center for Neurobiology of Learning & Memory at UC Irvine, as well as an affiliated researcher with the AI Objectives Institute. She directs the Relational Cognition laboratory. She earned a PhD in Cognitive Psychology from Harvard University in 2015. Her background and interests lie in human cognitive neuroscience and computational cognitive science, and is broadly interested in the computational principles that underlie the learning and representation of concepts and relations, and how these are supported by both artificial and natural neural networks. On this project, she will be supported by Professor Seth Lazar (Philosophy, ANU), a pioneer of computational philosophy and the ethics of AI; Professor Alice Oh (Computing, KAIST), a leading NLP researcher with deep and broad experience in evaluating ethical dimensions of language models.

Leshinskaya’s project proposes a mechanistic cognitive evaluation framework to understand AI safety by analyzing the cognitive processes behind AI and human decision-making in moral scenarios, using an automated annotation tool and Bayesian program learning to create interpretable meta-models.