Task Arithmeticに基づく言語モデルにおけるバイアス低減手法の検討

白藤 大幹; 竹中 誠; 斉藤 辰彦; 木村 泰知

doi:10.11517/jsaisigtwo.2024.AGI-028_06

Abstract

As language models have become more widely used in recent years, social biases and stereotypes in those models have become more problematic. These biases in models are potentially to be reflected in outputs of models. To address them, inspired by task arithmetic approach, we propose ``Bias Vector'' for the mitigation of biases in language models without any human-created debiased data. Our approach consists of three main steps: (1) training pre-trained LM on biased data with masked language modeling; (2) constructing the Bias Vector as the difference between the weights of biased LMs and the ones of pre-trained LMs; and (3) debiasing pre-trained LMs by subtracting Bias Vectors from the weights of pre-trained LMs. We evaluate ``Bias Vector'' on SEAT across three LMs and confirm an average improvement of 0.177 points. We also show that the ``Bias Vector'' method does not degrade the LM performance on downstream tasks in GLUE benchmark. Additionally, we examine the impact of scaling factors, which regulate the norm of Bias Vectors, on SEAT effect sizes and conduct a comprehensive evaluation of our debiased LMs across both the SEAT and GLUE benchmarks. Warning: This paper includes examples that could be considered as discriminatory.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!