Chrome’s new AI feature solves one of the web’s eternal problems

To help blind and low-vision users, Google is using machine learning to generate descriptions for millions of images. The internet—and social media in particular—is dominated by images. But not everyone can see them.
To experience the internet the way that most people do, blind and low-vision users often rely on screen readers or Braille displays. But these devices depend on website creators remembering to create what’s called “alternative text,” or alt text—a tag that provides a description of what’s in the image.
However, while many big websites do include alt text (and more will have to, given that the Supreme Court has upheld a ruling that the Americans with Disabilities Act applies to online spaces as well as physical ones), smaller ones often don’t. And alt text doesn’t always appear on social media, where images and memes fly faster than some systems can keep up with.
Help is on the way courtesy of the Chrome accessibility team at Google. Today the company is announcing a new Chrome feature that takes advantage of Google’s considerable image recognition prowess to algorithmically generate alt text descriptions of images.
“The unfortunate state right now is that there are still millions and millions of unlabeled images across the web,” says Laura Allen, a senior program manager on the Chrome accessibility team, who herself has low vision. “When you’re navigating with a screen reader or a Braille display, when you get to one of those images, you’ll actually just basically hear ‘image’ or ‘unlabeled graphic,’ or my favorite, a super long string of numbers which is the file name, which is just totally irrelevant.”
Using the same tech that lets you search for images of swimming pools on Google Photos, Chrome can now auto-generate descriptions of what an image depicts. For instance, a screen reader might come across an image of bananas, coconuts, and pineapples laid out on a table and tell the user: “Appears to be fruits and vegetables at the market.” Another image of a dog laying down with a tennis ball between its paws might get translated to: “Appears to be dog catches something.” The tool can also read out words within an image, like of a packing slip or a sign. In that case, the descriptor will start with “appears to say.”
“We always add contextualization—something like ‘appears to be’ or ‘appears to say’—so users are never confused about the fact that these descriptions are coming from a computer,” says Dominic Mazzoni, the tech lead for the Chrome & Chrome OS Accessibility team at Google.
The translations aren’t perfect, though the Chrome team decided to err on the side of avoiding inaccuracy. If the algorithm isn’t confident what an image is, it won’t try to label it at all.
Currently, the tool has labeled more than 10 million images during a few months of testing. It’s being slowly rolled out to users, and Chrome is promoting it specifically to people who use screen readers to encourage them to try it out. Those users also have control over how much they want to use it: They can decide to turn it on for a single web page, or they can decide to leave it always on. It’s available only for sites in English, but coming to more languages soon.
These descriptions also won’t be shared with web admins or developers, since Mazzoni says that any human looking at a picture would be able to generate a better description. But it’s a useful tool for the millions upon millions of photos that aren’t yet labeled (including—full disclosure—ones on this site).
Continue on to Fast Company to read the complete article.