Research in machine learning and AI, now a key technology in virtually every industry and company, is too voluminous for anyone to read. This column, Perceptron (formerly Deep Science), aims to gather some of the most relevant recent discoveries and documents – especially in, but not limited to, artificial intelligence – and explain why they matter.

This week in the field of AI, a new study reveals how bias, a common problem in artificial intelligence systems, can begin with instructions given to people hired to annotate data from which AI systems learn to make predictions. The co-authors found that annotators adopted models in the instructions, which led them to add annotations, which then became overrepresented in the data, directing the AI ​​system to these explanations.

Many artificial intelligence systems today are “learning” to make sense of images, videos, text, and audio from examples cited by annotators. Labels allow systems to extrapolate the relationships between the examples (eg the relationship between the inscription “kitchen sink” and a photo of a kitchen sink) to data that the systems have not seen before (eg photos of kitchen sinks that were not included in the the data used to “teach” the model).

This works remarkably well. But annotation is an imperfect approach – annotators bring biases to the table that can enter the trained system. For example, studies show that middle annotator is more likely to mark phrases in African American Folk English (AAVE), the informal grammar used by some black Americans as toxic, leading AI toxicity detectors trained on labels to see AAVE as disproportionately toxic.

As it turns out, the predispositions of annotators may not be solely to blame for the bias in training labels. In prepress study from Arizona State University and the Allen Institute of Artificial Intelligence, researchers investigated whether a source of bias could lie in instructions written by the creators of the dataset to serve as guides for annotators. Such instructions usually include a brief description of the task (eg “Label all birds in these photos”) along with a few examples.

Image credits: Parmar et al.

Researchers looked at 14 different benchmark data sets used to measure the performance of natural language processing systems or artificial intelligence systems that can classify, summarize, translate, and otherwise analyze or manipulate text. When studying the instructions for tasks given to the annotators who worked on the data sets, they found evidence that the instructions influenced the annotators to follow specific patterns, which were then disseminated in the data sets. For example, more than half of the explanations in Quoref, a data set designed to test the ability of AI systems to understand when two or more expressions refer to the same person (or something), begin with the phrase “What’s the name?” present in one third of the instructions for the data set.

The phenomenon that researchers call “instruction deviation” is particularly worrying because it suggests that systems trained in biased instruction data / annotation may not work as well as originally thought. In fact, the co-authors found that instructional bias overestimates the performance of systems and that these systems often fail to generalize beyond instructional models.

The silver lining is that large systems, such as OpenAI’s GPT-3, are generally less sensitive to instruction bias. But the study serves as a reminder that AI systems, like humans, are prone to developing bias from sources that are not always obvious. The insurmountable challenge is to identify these sources and mitigate the downstream impact.

In a less sobering document scientists from Switzerland concluded that face recognition systems are not easily fooled by realistic AI-edited faces. “Morphing attacks”, as they are called, include the use of AI to change the photo on an ID card, passport or other form of identity document in order to circumvent security systems. The co-authors created “morphs” using AI (Nvidia’s StyleGAN 2) and tested them against four state-of-the-art facial recognition systems. They argue that morphs do not pose a significant threat, despite their true appearance.

Elsewhere in the field of computer vision, Meta researchers have developed an AI “assistant” that can remember room characteristics, including the location and context of objects, to answer questions. Detailed in pre-printed paper, the work is probably part of Meta’s Project Nazareth an initiative to develop augmented reality glasses that use AI to analyze their environment.

Metaegocentric AI

Image credits: The goal

The researchers’ system, which is designed to be used on any body-worn device equipped with a camera, analyzes footage to create “semantically rich and effective scene memories” that “encode space-time information about objects.” The system remembers where the objects are located and when they appeared in the video, and also stores in its memory answers to questions that the user can ask about the objects. For example, when asked “Where did you last see my keys?”, The system may indicate that the keys were on the side table in the living room that morning.

Meta, which is reportedly planning to launch fully functional AR glasses in 2024, telegraphed its plans for “egocentric” AI last October with the launch of Ego4D, a long-term research project on AI with “egocentric perception”. The company then said the goal was to teach artificial intelligence systems, among other tasks, to understand social cues, how the actions of an AR device carrier can affect the environment, and how hands interact with objects.

From language and augmented reality to physical phenomena: The AI ​​model has been useful in studying MIT waves – how and when they break. Although it seems a bit mysterious, the truth is that wave models are needed both to build structures in and near water and to model the interaction of the ocean with the atmosphere in climate models.

Image credits: MIT

Usually waves are roughly simulated by a set of equations, but researchers train model for machine learning hundreds of waves in a 40-foot water tank full of sensors. By observing the waves and making predictions based on empirical evidence, then comparing them with theoretical models, AI helped show where the models were insufficient.

The startup was born from research at EPFL, where Thibault Asselborn’s doctoral dissertation on handwriting analysis has become a full-fledged educational application. Using algorithms he developed, the app (called School Rebound) can identify habits and corrective actions with just 30 seconds of a child writing on an iPad with a stylus. They are presented to the child in the form of games that help them write more clearly, reinforcing good habits.

“Our scientific model and rigor are important and are what sets us apart from other existing applications,” Asselborn said in a press release. “We received letters from teachers who have seen their students improve their jumps and limits. Some students even come an hour ago to practice. “

Image credits: Duke University

Another new discovery in primary schools is related to the identification of hearing problems during routine screenings. These projections, which some readers may remember, often use a device called a tympanometer, which must be operated by trained audiologists. If one is not available, say in an isolated school district, children with hearing problems may never receive the help they need in time.

Samantha Robler and Susan Emmett of Duke decide to build a tympanometer that essentially works on its own, send data to a smartphone app where they are interpreted by an AI model. Everything anxious will be marked and the child may be subjected to additional screening. This is not a substitute for an expert, but it is much better than nothing and can help identify hearing problems much earlier in places without adequate resources.

Perceptron: AI bias can arise from annotation instructions

Previous articleNew prenatal genetic screens represent underestimated ethical dilemmas
Next article24 years ago, Steve Jobs made technology sexy again