Ideal Motors Launches End-to-End + Vision Language Model Program for Smart Driving

TapTechNews July 5th news, during the Ideal Smart Driving Summer Launch Event tonight, Ideal Motors announced the official launch of the End-to-End + Vision Language Model Early Bird Program, claiming that it can make the car smarter and more like a human.

Ideal Motors Launches End-to-End + Vision Language Model Program for Smart Driving_0

TapTechNews summarizes the main information as follows:

Ideal Motors claims that the advantages of the end-to-end model lie in the two aspects of Efficient Transmission and Efficient Computation: The end-to-end is an integrated model, and the information is transmitted within the model, with a higher ceiling, and the actions and decisions of the entire system that users can feel are more human-like; The integrated model can complete inference in the GPU at one time, and the end-to-end latency is lower, and users can perceive eyes and hands in harmony, and the vehicle responds promptly to actions.

The integrated model can achieve end-to-end trainability, completely data-driven. The official said that for users, the biggest feeling is that the speed of OTA is getting faster and faster.

Ideal Motors Launches End-to-End + Vision Language Model Program for Smart Driving_1

In terms of the vision language model, its overall algorithm architecture is composed of a unified Transformer model. The Prompt (prompt word) text is encoded by the Tokenizer (word splitter), and then the images of the 120-degree and 30-degree cameras in the front view and the navigation map information are encoded for visual information, and the modality alignment is performed through the graphic-text alignment module, and then handed over to the VLM model for autoregressive inference; The information output by the VLM includes the understanding of the environment, driving decisions and driving trajectories, and is passed to System 1 to control the vehicle.

The official said that there are three highlights in the overall design of this system: A streaming video encoder is designed, which can cache longer temporal visual information; A memory module is added, which caches multi-frame historical information and can solve the problem of long context inference delay; A smart driving Prompt question bank is designed, System 2 can think about the current driving environment and give System 1 reasonable driving suggestions, and System 1 can also invoke different Prompt questions in different scenarios and actively ask for help from System 2.

Ideal Motors Launches End-to-End + Vision Language Model Program for Smart Driving_2

Likes