LLMs have a strong bias against use of African American English

BlackEco@lemmy.blackeco.com to Technology@lemmy.world – 1 points –
LLMs have a strong bias against use of African American English
arstechnica.com
21

Makes sense. AAVE is mostly a spoken thing, LLMs are mostly trained on the corpus of written text on the internet and in books. It's pretty rare for people to write in an AAVE style in those contexts.

Except it has no difficulty reading and understanding AAVE, because people use it online frequently...

Like, the article makes that abundantly clear, but everyone commenting just read the headline and assumed what it meant was it couldn't understand it...

I never said it can't understand it. I am agreeing with the notion that it has a bias against using it.

African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students -> Academic text have strong bias for text written by graduate students -> LLM training data has bias for academic texts -> LLMs have a strong bias for writing like training data.

The error occurs upstream a bit, don't point at the coders.

Writing in AAVE is silly, just like someone from the Deep South including southern drawl in their writing would be, or someone from Boston spelling “car keys” as “kha kees”

So

African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students

Is a bit of a jump. Someone writing in AAVE probably wouldn’t get accepted to college, because written word is supposed to transcend dialects and follow a set of rules to be universally understandable.

They can't possibly encounter much of it in training material... Of course they're not going to like it.

What?

It trains off social media, and even white kids use AAVE online. And kids make the most social media comments.

A lot of times when someone posts a text screenshot and everyone talks about how kids talk crazy, it's just a patois of AAEV mixed in with "regular" English.

It should be able to "read" it fine.

The bias part (as clearly stated in the article...) is when you ask a LLM to describe the person who would phrase something in AAVE, and the LLM replies back with stereotypes about Black people.

So it can read and interpret it fine, it just has a bias against people who talk like that

LLM’s don’t have a bias against anyone, it’s literally just data. And those models are by and large fed with traditionally grammatically correct data. They don’t understand dialects, you’re looking soooooo hard for something to be offended over

If you're going to revive a 3+ day old thread...

At least read the article first so you have a clue what other people were talking about

So for those that didn't read the article, it basically explains how LLMs have a negative connotation about AAE. When asked to associate words with AAE written phrases, it used words like "aggressive". When given a normal English phrase and the same phrase but in AAE and then asked what jobs would suit this person, the LLM gave low income jobs for the AAE statement with broader options for the normal English one.

It's a serious problem because people that naturally write in AAE are most likely getting worse results. It stems mostly from old rascist newspaper articles and similar things.

It's a serious problem because people that naturally write in AAE are most likely getting worse results

Person using LLM built on grammatical rules of the English language has subpar results when operating outside of those rules. More at 6.

Is this the new term for ebonics and is ebonics offensive now or inappropriate?

Essentially, yes. Ebonics isn’t inherently offensive or inappropriate, as far as I can tell, but it has connotations that are not attached to AAE. Linguists avoid the term today, and modern uses of it tend to be derogatory.

Source

Because there is no such thing as "African American English". There is proper English and then there is slang.

Never heard of AAVE? There isn’t one immutable version of English

What you call "proper English" (or "proper" any other language) is merely an arbitrary construct. It is not set on stone.

That applies to all levels of a language, by the way, not just vocabulary ("slang").

It's bad enough the American's are too stupid to use the proper one that we have to have two.

But people talking incorrectly is not a reason to write like that. Unless it's a character speaking or whatever.