Home
Contact
Contact
Contact
Contact
HRK
Hello World
I’m Hannah Rose Kirk.

keywords = {
Large Language Models
;
Online Safety
;
Bias Mitigation
;
Statistics
;
China AI
;
Large Language Models
;
Oxford Internet Institute
;
NYU
;
Oxford AI Society
;
Cambridge University
;
Peking University
;
Oxford Internet Institute
;
Sci-Fi Books
;
Sushi
;
Documentaries
;
Emoji 😺😸
;
Cycling
;
Sci-Fi Books
;
}
LEARN MORE

print MySummary

I currently research large language models @ the University of Oxford. In the short term, I'm a visiting academic @ New York University.

My current research centres on human-and-model-in-the-loop feedback and data-centric alignment of AI. I am passionate about the societal impact of AI systems as we scale across model capabilities, domains and human populations.

My body of published work spans computational linguistics, economics, ethics and sociology, addressing a broad range of issues such as alignment, bias, fairness and hate speech from a multidisciplinary perspective. Alongside academia, I collaborate often with industry and policymakers.

Education

.class GetDegrees

2021 - 2025

Oxford Internet Institute, University of Oxford

DPhil in Social Data Science
Fully-funded scholarship
Supervised by Dr Scott A. Hale & Dr Bertie Vidgen
2020 - 2021

Oxford Internet Institute, University of Oxford

MSc in Social Data Science
Distinction, 77%
Awarded the Oxford Internet Institute Thesis Prize for best graduate dissertation
2018-2020

Yenching Academy, Peking University

MA in China Studies and Economics
GPA: 3.99, Rank: 2/99
2015 - 2018

Trinity College, University of Cambridge

BA in Economics
Double First Class Honours
Awarded the Roger Dennis Prize for best undergraduate dissertation

Positions

.class AddExperience

Sept 2023 - Present

New York University

Visiting Academic in Data Science
Collaborating on human-AI coordination and LLM alignment with Professor He & Professor Bowman
February 2023 - Present

Google

External Student Researcher
Co-hosting an adversarial challenge to identify unsafe failure modes in text2image models
August 2023 - Present

OpenAI

Red-Teamer + Consultant
Improving the safety of OpenAI models (DALL-E & GPT-4)
Sept 2021 - Sept 2023

The Alan Turing Institute

Data Scientist in Online Safety
Monitoring and detecting harmful language
Sept 2021 - July 2023

Rewire Online

Research Scientist
Implementing NLP solutions for online safety
Oct 2020 - Present

Oxford Artificial Intelligence Society

Research Labs Manager
Leading student research projects on AI bias
Sept 2019 - Sept 2020

The Berggruen Institute, China Center

Research Scholar
Linking Chinese philosophy to AI and privacy

Grants

.class Find$$$

2023-2024

Microsoft Accelerating Foundation Models Research Programme

Project title: “GRIFFIN: Collecting Granular, Representative and Individualised Feedback for Inclusive Alignment of LLMs"
2022-2024

MetaAI Dynabench Grant

Project title: “Optimizing feedback between humans-and-model-in-the-loop
2020-2024

Economic and Social Science Research Council

PhD scholarship, Digitial Social Science Pathway

(Selected) Publications

return Output

December 2023

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

SOLAR @ NeurIPs 2023
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A Hale
December 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

EMNLP 2023
Hannah Rose Kirk, Andrew M Bean, Bertie Vidgen, Paul Röttger, Scott A Hale
November 2023

XSTest: A test suite for identifying exaggerated safety behaviours in large language models

ArXiv
Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy
November 2023

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

ArXiv
Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A Hale, Paul Röttger

Auditing large language models: a three-layered approach

AI & Ethics
Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, Luciano Floridi
March 2023

SemEval-2023 Task 10: Explainable Detection of Online Sexism

SemEval @ ACL 2023
Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger
January 2023

VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

NeurIPs 2023
Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk
November 2022

Handling and Presenting Harmful Text in NLP Research

EMNLP 2022
Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen, Leon Derczynski
September 2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

AACL 2022
Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain
September 2022

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate

NAACL 2022
Hannah Rose Kirk, Bertram Vidgen, Paul Röttger, Tristan Thrush & Scott A. Hale
August 2022

Tracking abuse on Twitter against football players in the 2021-22 Premier League season

Policy Report
Bertie Vidgen, Yi-Ling Chung, Pica Johansson, Hannah Rose Kirk, Angus Williams, Scott A. Hale, Helen Margetts, Paul Röttger, Laila Sprejer
May 2022

Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

GeBNLP @ NAACL 2022
Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk
December 2021

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

NeurIPS 2021
Hannah Rose Kirk, Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski & Yuki M. Asano
August 2021

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

WOAH @ ACL 2021
Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, & Yuki M Asano
August 2020

The Nuances of Confucianism in Technology Policy: an Inquiry into the Interaction Between Cultural and Political Systems in Chinese Digital Ethics

International Journal of Politics, Culture, and Society
Hannah Rose Kirk, Kangkyu Lee & Carlisle Micallef
12
/
16
/
23

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

SOLAR @ NeurIPs 2023
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A Hale
12
/
06
/
23

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

EMNLP 2023
Hannah Rose Kirk, Andrew M Bean, Bertie Vidgen, Paul Röttger, Scott A Hale
11
/
15
/
23

XSTest: A test suite for identifying exaggerated safety behaviours in large language models

ArXiv
Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy
11
/
14
/
23

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

ArXiv
Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A Hale, Paul Röttger

Auditing large language models: a three-layered approach

AI & Ethics
Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, Luciano Floridi
03
/
07
/
23

SemEval-2023 Task 10: Explainable Detection of Online Sexism

SemEval @ ACL 2023
Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger
01
/
09
/
23

VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

NeurIPs 2023
Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk
11
/
16
/
22

Handling and Presenting Harmful Text in NLP Research

EMNLP 2022
Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen, Leon Derczynski
09
/
23
/
22

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

AACL 2022
Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain
09
/
06
/
22

Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate

NAACL 2022
Hannah Rose Kirk, Bertram Vidgen, Paul Röttger, Tristan Thrush & Scott A. Hale
08
/
02
/
22

Tracking abuse on Twitter against football players in the 2021-22 Premier League season

Policy Report
Bertie Vidgen, Yi-Ling Chung, Pica Johansson, Hannah Rose Kirk, Angus Williams, Scott A. Hale, Helen Margetts, Paul Röttger, Laila Sprejer
05
/
23
/
22

Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

GeBNLP @ NAACL 2022
Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk
12
/
01
/
21

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

NeurIPS 2021
Hannah Rose Kirk, Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski & Yuki M. Asano
08
/
01
/
21

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

WOAH @ ACL 2021
Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, & Yuki M Asano
08
/
19
/
20

The Nuances of Confucianism in Technology Policy: an Inquiry into the Interaction Between Cultural and Political Systems in Chinese Digital Ethics

International Journal of Politics, Culture, and Society
Hannah Rose Kirk, Kangkyu Lee & Carlisle Micallef

In the News

display Headlines