ChatGPT was recognized as one of the biggest technological innovations of 2022 after its release last November. A powerful artificial intelligence (AI) chatbot can generate text on almost any topic, from a Shakespearean sonnet reinterpreted in the style of Megan Thee Stallion to complex mathematical theorems described in language that a 5-year-old can understand. In the first week of operation, it was visited by more than a million users.
ChatGPT’s developer, OpenAI, is currently in talks with investors to raise $29 billion, including a potential $10 billion investment from Microsoft. That would make OpenAI, founded in San Francisco in 2015 to build super-intelligent machines, one of the world’s most valuable AI companies.
But the success story is not only the merit of Silicon Valley geniuses. In its quest to make ChatGPT less toxic, OpenAI used outsourced Kenyan workers earning as little as $2 an hour as Time‘s investigation revealed.
This work was vital to OpenAI. ChatGPT’s predecessor, GPT-3, had already demonstrated an exceptional ability to string sentences together. But it was a hard sell because the app also had a propensity for violent, sexist and racist language. This is explained by the fact that the AI was trained on hundreds of billions of words scraped from the Internet – the largest repository of human language.
This massive training data set is the reason for GPT-3’s amazing linguistic abilities, but also perhaps its greatest curse. Because some parts of the internet are rife with toxic and biased material, there was no easy way to clean up this training data. Even a team of hundreds of people would take decades to manually sift through the vast arrays of information. Only by creating an additional security mechanism based on artificial intelligence, OpenAI was able to reduce this damage, creating a chatbot suitable for everyday use.
To create this security system, OpenAI took a cue from social networks such as Facebook, which have already demonstrated that it is possible to create a separate AI capable of detecting toxic speech, such as hate speech, to help remove it from their platforms.
The premise was simple: give an AI labeled examples of violence, hate speech, and sexual assault, and it would learn to detect these forms of toxicity. This detector will be built into ChatGPT to check if it replicates toxic training data and filter it before it reaches the user. It can also help clean toxic text from the training datasets of future AI models.
To obtain these labels, OpenAI sent tens of thousands of pieces of text to an outsourcing firm in Kenya starting in November 2021. Much of this text was pulled from the darkest corners of the Internet. Some of them described in graphic detail situations such as child sexual abuse, zoophilia, murder, suicide, torture, self-harm and incest.
OpenAI’s outsourcing partner in Kenya is Sama, a San Francisco-based company that hires workers in Kenya, Uganda and India to label data for Silicon Valley clients such as Google, Meta and Microsoft. Sama positions itself as an “ethical AI company” and claims to have helped more than 50,000 people escape poverty.
Data taggers hired by Sama on behalf of OpenAI were paid between $1.32 and $2 per hour, depending on experience and performance. For its article, Time analyzed hundreds of pages of internal Sama and OpenAI documents, including employee payment information, and interviewed four Sama employees who worked on the project. They all spoke on condition of anonymity out of fear for their livelihoods.
The story of the workers who made ChatGPT possible provides insight into the working conditions in this little-known part of the AI industry, which nevertheless plays an important role in the quest to make AI systems safe for the public to use.
In a statement, an OpenAI spokesperson confirmed that Sama employees in Kenya contributed to the development of a toxic content detection tool that was later built into ChatGPT. The statement also said the work contributed to efforts to remove toxic data from the training datasets of tools such as ChatGPT.
One Sama employee tasked with reading and tagging text for OpenAI told Time he suffered from a mental breakdown after reading a graphic account of a man having sex with a dog in front of a young child.
The traumatic nature of the work eventually led to Sama canceling all of its work for OpenAI in February 2022, eight months ahead of schedule.
Documents reviewed by Time testify In late 2021, OpenAI signed three contracts with Sama for a total of about $200,000 to flag textual descriptions of sexual assault, hate speech, and violence. About three dozen employees were divided into three teams, each of which dealt with each topic.
Three workers told Time that they had to read and mark between 150 and 250 passages of text in a nine-hour shift. These passages could range from about 100 words to more than 1,000. All four workers said that the work caused them psychological trauma.
Although they were allowed to attend sessions with “wellness” counselors, all four said these sessions were unhelpful and infrequent due to the high demands of being more productive at work. Two of them said they were only allowed to attend group sessions, and one said their requests for individual meetings with counselors were repeatedly denied by Sama management.
In a statement, a Sama spokesperson called it “wrong” that workers only had access to group sessions. They were eligible for both individual and group sessions with “professionally trained and licensed psychotherapists,” a company spokesman said.
The agents, the youngest data tagging specialists who made up the majority of the three teams, were paid a base salary of 21,000 Kenyan shillings ($170) a month, according to three Sama employees. They also received monthly bonuses of about $70 USD due to the nature of their work, as well as commissions for achieving key performance indicators such as accuracy and speed.
An agent who worked a nine-hour shift could expect to earn at least $1.32 an hour after taxes, and if he overworked all of his tasks, he could earn up to $1.44 an hour.
Quality analysts — senior labelers whose job it was to check agents’ work — could earn up to $2 an hour if they completed all their tasks.
In a statement, a Sama representative said workers were asked to mark up 70 passages of text per nine-hour shift, rather than up to 250, and that workers could earn between $1.46 and $3.74 an hour after taxes.
An OpenAI representative said in a statement that the company did not set any performance targets, and that Sama was responsible for managing pay and ensuring the mental health of employees.
In February 2022, Sama and OpenAI’s relationship deepened briefly, but then ended. That same month, Sama began piloting a separate OpenAI project, collecting sexual and violent images – some of which are illegal under US law – and feeding them to OpenAI. Image tagging work appears to be unrelated to ChatGPT.
In its comment, an OpenAI spokesperson did not specify the purpose of using the images Sama sought, but said that flagging malicious images was a “necessary step” to improve the security of its AI tools.
Within weeks, Sama canceled all of its work for OpenAI — eight months ahead of schedule. The outsourcing company said in a statement that its agreement to collect images for OpenAI did not contain any mention of illegal content, and that it was only after the start of the work that OpenAI sent “additional instructions” that addressed “certain illegal categories.”
Because the contracts were terminated early, both OpenAI and Sama said the $200,000 they had previously agreed upon was not paid in full. OpenAI said the contracts were worth “about $150,000 over the life of the partnership.”
Sama employees claim that their managers gave another reason for terminating the contracts. On February 14, Time published an article titled Inside Facebook’s African sweatshop. The investigation detailed how Sama hired content moderators for Facebook to review images and videos of executions, rapes and child abuse for as little as $1.50 an hour.
Four Sama employees said they were told the investigation was the reason for the company’s decision to stop working with OpenAI.
On January 10 of this year, Sama went even further, announcing that it was canceling all of its remaining work on confidential content. The firm said it would not renew its $3.9 million content moderation contract with Facebook, resulting in the loss of about 200 jobs in Nairobi.
As you can see, despite all the innovation of chatbots and other artificial intelligence systems, the need for people to label their data still remains. Moreover, viewing such information can be harmful to a person’s mental health, and it uses low-paid workers from developing countries. This raises many questions about the ethics of AI and the companies that develop it.