AI is learning from humans, many humans

Namita Pradhan sat at a desk in downtown Bhubaneswar, India, about 40 miles from the Bay of Bengal, staring at a video recorded in a hospital on the other side of the world.

>> Cade MetzThe New York Times
Published : 18 August 2019, 05:41 PM
Updated : 18 August 2019, 05:57 PM

The video showed the inside of someone’s colon. Pradhan was looking for polyps, small growths in the large intestine that could lead to cancer. When she found one — they look a bit like a slimy, angry pimple — she marked it with her computer mouse and keyboard, drawing a digital circle around the tiny bulge.

She was not trained as a doctor, but she was helping to teach an artificial intelligence system that could eventually do the work of a doctor.

Pradhan was one of dozens of young Indian women and men lined up at desks on the fourth floor of a small office building. They were trained to annotate all kinds of digital images, pinpointing everything from stop signs and pedestrians in street scenes to factories and oil tankers in satellite photos.

AI, most people in the tech industry would tell you, is the future of their industry, and it is improving fast thanks to something called machine learning. But tech executives rarely discuss the labour-intensive process that goes into its creation. AI is learning from humans. Lots and lots of humans.

Before an AI system can learn, someone has to label the data supplied to it. Humans, for example, must pinpoint the polyps. The work is vital to the creation of artificial intelligence like self-driving cars, surveillance systems and automated health care.

Tech companies keep quiet about this work. And they face growing concerns from privacy activists over the large amounts of personal data they are storing and sharing with outside businesses.

Earlier this year, I negotiated a look behind the curtain that Silicon Valley’s wizards rarely grant. I made a meandering trip across India and stopped at a facility across the street from the Superdome in downtown New Orleans. In all, I visited five offices where people are doing the endlessly repetitive work needed to teach AI systems, all run by a company called iMerit.

There were intestine surveyors like Pradhan and specialists in telling a good cough from a bad cough. There were language specialists and street scene identifiers. What is a pedestrian? Is that a double yellow line or a dotted white line? One day, a robotic car will need to know the difference.

Glenda Hernandez works at her desk in the office of iMerit, a technology services company, in New Orleans, May 6, 2019. Tech executives rarely discuss the labour-intensive process that goes into the creation of artificial intelligence, which is learning from thousands of office workers around the world. (Bryan Tarnowski/The New York Times).

What I saw didn’t look much like the future — or at least the automated one you might imagine. The offices could have been call centres or payment processing centres. One was a timeworn former apartment building in the middle of a low-income residential neighbourhood in western Kolkata that teemed with pedestrians, auto rickshaws and street vendors.

In facilities like the one I visited in Bhubaneswar and in other cities in India, China, Nepal, the Philippines, East Africa and the United States, tens of thousands of office workers are punching a clock while they teach the machines.

Tens of thousands more workers, independent contractors usually working in their homes, also annotate data through crowdsourcing services like Amazon Mechanical Turk, which lets anyone distribute digital tasks to independent workers in the United States and other countries. The workers earn a few pennies for each label.

Based in India, iMerit labels data for many of the biggest names in the technology and automobile industries. It declined to name these clients publicly, citing confidentiality agreements. But it recently revealed that its more than 2,000 workers in nine offices around the world are contributing to an online data-labelling service from Amazon called SageMaker Ground Truth. Previously, it listed Microsoft as a client.

One day, who knows when, artificial intelligence could hollow out the job market. But for now, it is generating relatively low-paying jobs. The market for data labelling passed $500 million in 2018 and it will reach $1.2 billion by 2023, according to the research firm Cognilytica. This kind of work, the study showed, accounted for 80% of the time spent building AI technology.

Is the work exploitative? It depends on where you live and what you’re working on. In India, it is a ticket to the middle class. In New Orleans, it’s a decent enough job. For someone working as an independent contractor, it is often a dead end.

There are skills that must be learned — like spotting signs of a disease in a video or medical scan or keeping a steady hand when drawing a digital lasso around the image of a car or a tree. In some cases, when the task involves medical videos, pornography or violent images, the work turns grisly.

“When you first see these things, it is deeply disturbing. You don’t want to go back to the work. You might not go back to the work,” said Kristy Milland, who spent years doing data-labelling work on Amazon Mechanical Turk and has become a labour activist on behalf of workers on the service.

Employees at iMerit’s technology centre in Kolkata, India, Jan. 30, 2019. Tech executives rarely discuss the labour-intensive process that goes into the creation of artificial intelligence, which is learning from thousands of office workers around the world. (Rebecca Conway/The New York Times).

“But for those of us who cannot afford to not go back to the work, you just do it,” Milland said.

AI researchers hope they can build systems that can learn from smaller amounts of data. But for the foreseeable future, human labour is essential.

“This is an expanding world, hidden beneath the technology,” said Mary Gray, an anthropologist at Microsoft and the co-author of the book “Ghost Work,” which explores the data labelling market. “It is hard to take humans out of the loop.”

THE CITY OF TEMPLES

Bhubaneswar is called the City of Temples. Ancient Hindu shrines rise over roadside markets at the southwestern end of the city — giant towers of stacked stone that date to the first millennium. In the city centre, many streets are unpaved. Cows and feral dogs meander among the mopeds, cars and trucks.

The city — population: 830,000 — is also a rapidly growing hub for online labour. About a 15-minute drive from the temples, on a (paved) road near the city centre, a white, four-story building sits behind a stone wall. Inside, there are three rooms filled with long rows of desks, each with its own widescreen computer display. This was where Namita Pradhan spent her days labelling videos when I met her.

Over the course of what was a typical eight-hour day, the shy 24-year-old watched about a dozen colonoscopy videos, constantly reversing the video for a closer look at individual frames.

Every so often, she would find what she was looking for. She would lasso it with a digital “bounding box.” She drew hundreds of these bounding boxes, labelling the polyps and other signs of illness, like blood clots and inflammation.

Her client, a company in the United States that iMerit is not allowed to name, will eventually feed her work into an AI system so it can learn to identify medical conditions on its own. The colon owner is not necessarily aware the video exists. Pradhan doesn’t know where the images came from. Neither does iMerit.

Pradhan learned the task during seven days of online video calls with a nonpractising doctor, based in Oakland, California, who helps train workers at many iMerit offices. But some question whether experienced doctors and medical students should do this labelling themselves.

This work requires people “who have a medical background, and the relevant knowledge in anatomy and pathology,” said Dr George Shih, a radiologist at Weill Cornell Medicine and NewYork-Presbyterian and the co-founder of the startup MD.ai., which helps organisations build artificial intelligence for health care.

Prasenjit Baidya, left, and his wife Barnali Paik, employees of iMerit, a technology services company, at Presenjit’s family home in Baidhyahat village in the state of West Bengal in India, Feb. 1, 2019. Tech executives rarely discuss the labour-intensive process that goes into the creation of artificial intelligence, which is learning from thousands of office workers around the world. (Rebecca Conway/The New York Times).

When we chatted about her work, Pradhan called it “quite interesting,” but tiring. As for the graphic nature of the videos? “It was disgusting at first, but then you get used to it.”

Pradhan and her fellow labellers earn between $150 and $200 a month, which pulls in between $800 and $1,000 of revenue for iMerit, according to one company executive.

By US standards, Pradhan’s salary is indecently low. But for her and many others in these offices, it is about an average salary for a data-entry job.

Tedious work. But it pays for an apartment.

Prasenjit Baidya grew up on a farm about 30 miles from Kolkata, the largest city in West Bengal, on the east coast of India. His parents and extended family still live in his childhood home, a cluster of brick buildings built at the turn of the 19th century. They grow rice and sunflowers in the surrounding fields and dry the seeds on rugs spread across the rooftops.

He was the first in his family to get a college education, which included a computer class. But the class didn’t teach him all that much. The room offered only one computer for every 25 students. He learned his computer skills after college, when he enrolled in a training course run by a nonprofit called Anudip. It was recommended by a friend, and it cost the equivalent of $5 a month.

Anudip runs English and computer courses across India, training about 22,000 people a year. It feeds students directly into iMerit, which its founders set up as a sister operation in 2013. Through Anudip, Baidya landed a job at an iMerit office in Kolkata, and so did his wife, Barnali Paik, who grew up in a nearby village.

Over the last six years, iMerit has hired more than 1,600 students from Anudip. It now employs about 2,500 people in total. More than 80% come from families with incomes below $150 a month.

Founded in 2012 and still a private company, iMerit has its employees perform digital tasks like transcribing audio files or identifying objects in photos. Businesses across the globe pay the company to use its workers, and increasingly, they assist work on artificial intelligence.

Kristy Milland, a former data labeller for Amazon Mechanical Turk, with her collection of “squishies” in her at-home workspace in Toronto, May 1, 2019. Tech executives rarely discuss the labor-intensive process that goes into the creation of artificial intelligence, which is learning from thousands of office workers around the world. (Arden Wray/The New York Times).

“We want to bring people from low-income backgrounds into technology — and technology jobs,” said Radha Basu, who founded Anudip and iMerit with her husband, Dipak, after long careers in Silicon Valley with the tech giants Cisco Systems and HP.

LISTENING TO PEOPLE COUGH

A few weeks after my trip to India, I took an Uber through downtown New Orleans. About 18 months ago, iMerit moved into one of the buildings across the street from the Superdome.

A major American tech company needed a way of labelling data for a Spanish-language version of its home digital assistant. So it sent the data to the new iMerit office in New Orleans.

After Hurricane Katrina in 2005, hundreds of construction workers and their families moved into New Orleans to help rebuild the city. Many stayed. A number of Spanish speakers came with that new workforce, and the company began hiring them.

The office has expanded into other work, serving businesses that want to keep their data within the United States. Some projects must remain stateside, for legal and security purposes.

Glenda Hernandez, 42, who was born in Guatemala, said she missed her old work on the digital assistant project. She loved to read. She reviewed books online for big publishing companies so she could get free copies, and she relished the opportunity of getting paid to read in Spanish.

“That was my baby,” she said of the project.

She was less interested in image tagging or projects like the one that involved annotating recordings of people coughing; it was a way to build AI that identifies disease symptoms of illness over the phone.

“Listening to coughs all day is kind of disgusting,” she said.

The work is easily misunderstood, said Gray, the Microsoft anthropologist. Listening to people cough all day may be disgusting, but that is also how doctors spend their days. “We don’t think of that as drudgery,” she said.

Hernandez’s work is intended to help doctors do their jobs or maybe, one day, replace them. She takes pride in that. Moments after complaining about the project, she pointed to her colleagues across the office.

“We were the cough masters,” she said.

‘It was enough to live on then. It wouldn’t be now.’

Oscar Cabezas works at his desk in the office of iMerit, a technology services company, in New Orleans, May 6, 2019. Tech executives rarely discuss the labour-intensive process that goes into the creation of artificial intelligence, which is learning from thousands of office workers around the world. (Bryan Tarnowski/The New York Times).

In 2005, Kristy Milland signed up for her first job on Amazon Mechanical Turk. She was 26, and living in Toronto with her husband, who managed a local warehouse. Mechanical Turk was a way of making a little extra money.

The first project was for Amazon itself. Three photos of a storefront would pop up on her laptop, and she would choose the one that showed the front door. Amazon was building an online service similar to Google Street View, and the company needed help picking the best photos.

She made 3 cents for each click, or about 18 cents a minute. In 2010, her husband lost his job, and “MTurk” became a full-time gig. For two years, she worked six or seven days a week, sometimes as much as 17 hours a day. She made about $50,000 a year.

“It was enough to live on then. It wouldn’t be now,” Milland said.

The work at that time didn’t really involve AI. For another project, she would pull information out of mortgage documents or retype names and addresses from photos of business cards, sometimes for as little as a dollar an hour.

Around 2010, she started labelling for AI projects. Milland tagged all sorts of data, like gory images that showed up on Twitter (which helps build AI that can help remove gory images from the social network) or aerial footage likely taken somewhere in the Middle East (presumably for AI that the military and its partners are building to identify drone targets).

Projects from US tech giants, Milland said, typically paid more than the average job — about $15 an hour. But the job didn’t come with health care or paid vacation, and the work could be mind-numbing — or downright disturbing. She called it “horrifically exploitative.” Amazon declined to comment.

Since 2012, Milland, now 40, has been part of an organisation called TurkerNation, which aims to improve conditions for thousands of people who do this work. In April, after 14 years on the service, she quit.

She is in law school, and her husband makes $600 less than they pay in rent each month, which does not include utilities. So, she said, they are preparing to go into debt. But she will not go back to labelling data.

“This is a dystopian future,” she said. “And I am done.”

c.2019 New York Times News Service