With this in mind, scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) created “VISTA 2.0,” a data-driven simulation engine where vehicles can learn to drive in the real world and recover from near-crash scenarios. What’s more, all of the code is being released open-source to the public.
“Today, only companies have software like the type of simulation environments and capabilities of VISTA 2.0, and this software is proprietary. With this release, the research community will have access to a powerful new tool for accelerating the research and development of adaptive robust control for autonomous driving,” says the senior author of a paper about the research, MIT Professor and CSAIL Director Daniela Rus.
VISTA is a data-driven, photorealistic autonomous driving simulator. It can simulate not only live video, but also LiDAR data and event cameras, and also include other simulated vehicles to model complex driving situations. VISTA is open source and the code can be found below.
VISTA 2.0, which builds on the team’s previous model, VISTA, is fundamentally different from existing AV simulators because it is data-driven. That means it’s created and photo-realistically rendered from real-world data — thus enabling a direct transfer to reality. While the initial iteration only supported single-lane vehicle following with a single camera sensor, achieving high-fidelity data-driven simulation required rethinking the fundamentals of how different sensors and behavioral interactions can be synthesized.
Enter VISTA 2.0: a data-driven system that can simulate complex sensor types and massive interactive scenarios and intersections at scale. Using much less data than previous models, the team was able to train autonomous vehicles that could be significantly more robust than those trained on large amounts of real-world data.
“This is a huge leap in data-driven simulation capabilities for autonomous vehicles, as well as increasing scale and the ability to handle greater driving complexity,” said Alexander Amini, a CSAIL PhD student and co-author of two new papers, along with fellow PhD student Tsun-Hsuan Wang. “VISTA 2.0 demonstrates the ability to simulate sensor data far beyond 2D RGB cameras, but also extremely high-dimensional 3D million-point lidars, time-incorrect event-based cameras, and even interactive and dynamic scenarios with other vehicles.”
The team of scientists was able to scale the complexity of interactive driving tasks for things like overtaking, following and negotiating, including multi-agent scenarios in highly photorealistic environments.
Since most of our data is (thankfully) just ordinary, everyday driving, training AI models for autonomous vehicles involves a hard-to-provide fodder of different varieties of edge cases and weird, dangerous scenarios. Logically, we can’t just crash into other cars just to teach a neural network how not to crash into other cars.
Recently, there has been a shift from more classical, human-designed simulation environments to ones built from real-world data. The latter have tremendous photorealism, but the former can easily model virtual cameras and lidars. With this paradigm shift, a key question arose: Can the richness and complexity of all the sensors that autonomous vehicles need, such as lidar and event-based cameras, which are more scarce, be accurately synthesized?
Lidar sensor data is much harder to interpret in a data-driven world – you’re effectively trying to generate brand new 3D point clouds with millions of points from just sparse views of the world. To synthesize 3D lidar point clouds, the researchers used the data collected from the car, projected it into the 3D space coming from the lidar data, and then let a new virtual vehicle drive around from where the original was vehicle. Finally, they projected all this sensory information back into the field of view of this new virtual vehicle using neural networks.
Along with the simulation of event-based cameras that operate at rates higher than thousands of events per second, the simulator was capable of not only simulating this multimodal information, but doing so in real time. This makes it possible to train neural networks offline, but also to test the car online in augmented reality settings for safe evaluations. “Whether multisensory simulation at this scale of complexity and photorealism is possible in the realm of data-driven simulation has been largely an open question,” says Amini.
With this, the driving school becomes a party. In the simulation, you can move around, have different types of controllers, simulate different types of events, create interactive scenarios, and simply release brand new vehicles that weren’t even in the original data. They tested for lane following, lane turning, car following and more dangerous scenarios such as static and dynamic overtaking (seeing obstacles and moving to avoid a collision). With multi-agency, both real and simulated agents interact, and new agents can be brought into the scene and controlled in any way.
Taking their full-scale car out into the “wild” – aka Devons, Massachusetts – the team saw an immediate opportunity to transfer the results, with both failures and successes. They also managed to demonstrate the elegant, magic word of self-driving car models: “healthy”. They showed that AVs trained entirely in VISTA 2.0 are so robust in the real world that they can handle this elusive queue of challenging failures.
One railing that humans rely on that cannot yet be simulated is human emotion. It’s the friendly wave, nod, or confirmation switch that are the type of nuances the team wants to implement in future work.
“The central algorithm of this research is how we can take a dataset and build a fully synthetic world for learning and autonomy,” Amini says. “It’s a platform that I believe can one day expand into many different axes in robotics. Not just autonomous driving, but many areas that rely on vision and complex behavior. We are excited to release VISTA 2.0 to help the community collect their own datasets and transform them into virtual worlds where they can directly simulate their own virtual autonomous vehicles, drive around these virtual terrains, train autonomous vehicles in these worlds and then can directly transfer them to full-sized, true self-driving cars.”
Reference: “VISTA 2.0: An Open, Data-Driven Simulator for Multimodal Monitoring and Policy Training for Autonomous Vehicles” by Alexander Amini, Tsun-Hsuan Wang, Igor Gilichensky, Vilko Schwarting, Zhijian Liu, Song Han, Sertak Karaman, and Daniela Rus , November 23, 2021, Computer Science > Robotics.
Amini and Wang co-authored the paper with Zhijian Liu, a postdoctoral fellow at MIT CSAIL; Igor Gilichensky, assistant professor of computer science at the University of Toronto; Wilko Schwarting, AI researcher and MIT CSAIL PhD ’20; Song Han, associate professor in MIT’s Department of Electrical Engineering and Computer Science; Sertak Karaman, associate professor of aeronautics and astronautics at MIT; and Daniela Roos, MIT professor and director of CSAIL. The researchers presented the work at the IEEE International Conference on Robotics and Automation (ICRA) in Philadelphia.
This work was supported by the National Science Foundation and the Toyota Research Institute. The team acknowledges the support of NVIDIA with the donation of Drive AGX Pegasus.