JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Bias Mitigation for Language Models with Task Arithmetic Approach
Daiki SHIRAFUJIMakoto TAKENAKATatsuhiko SAITOYasutomo KIMURA
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2024 Volume 2024 Issue AGI-028 Pages 06-

Details
Abstract

As language models have become more widely used in recent years, social biases and stereotypes in those models have become more problematic. These biases in models are potentially to be reflected in outputs of models. To address them, inspired by task arithmetic approach, we propose ``Bias Vector'' for the mitigation of biases in language models without any human-created debiased data. Our approach consists of three main steps: (1) training pre-trained LM on biased data with masked language modeling; (2) constructing the Bias Vector as the difference between the weights of biased LMs and the ones of pre-trained LMs; and (3) debiasing pre-trained LMs by subtracting Bias Vectors from the weights of pre-trained LMs. We evaluate ``Bias Vector'' on SEAT across three LMs and confirm an average improvement of 0.177 points. We also show that the ``Bias Vector'' method does not degrade the LM performance on downstream tasks in GLUE benchmark. Additionally, we examine the impact of scaling factors, which regulate the norm of Bias Vectors, on SEAT effect sizes and conduct a comprehensive evaluation of our debiased LMs across both the SEAT and GLUE benchmarks. Warning: This paper includes examples that could be considered as discriminatory.

Content from these authors
© 2024 Authors
Previous article Next article
feedback
Top