
Additionally, building a classifier to detect gender-neutrality for each source language was data intensive. Specifically, generating masculine and feminine translations independently using a neural machine translation (NMT) system resulted in low recall, failing to show gender-specific translations for up to 40% of eligible queries, because the two translations often weren’t exactly equivalent, except for gender-related phenomena. Right: The new Translate provides both a feminine and a masculine translation option.īut as this approach was applied to more languages, it became apparent that there were issues in scaling. In this case, only a biased example is given. Left: Early example of the translation of a gender neutral English phrase to a gender-specific Spanish counterpart. We used this approach to enable gender-specific translations for phrases and sentences in Turkish-to-English and have now expanded this approach for English-to-Spanish translations, the most popular language-pair in Google Translate. For this work, we developed a three-step approach, which involved detecting gender-neutral queries, generating gender-specific translations and checking for accuracy. This feature in Google Translate provides options for both feminine and masculine translations when translating queries that are gender-neutral in the source language. In line with Google’s AI Principles, which emphasizes the importance to avoid creating or reinforcing unfair biases, in December 2018 we announced gender-specific translations. For instance, Google Translate historically translated the Turkish equivalent of “ He/she is a doctor” into the masculine form, and the Turkish equivalent of “ He/she is a nurse” into the feminine form. One such example, gender bias, often becomes more apparent when translating between a gender-specific language and one that is less-so. Machine learning (ML) models for language translation can be skewed by societal biases reflected in their training data.


Posted by Melvin Johnson, Senior Software Engineer, Google Research
