TapTechNews July 5th news, during the Ideal Smart Driving Summer Launch Event tonight, Ideal Motors announced the official launch of the End-to-End + Vision Language Model Early Bird Program, claiming that it can make the car smarter and more like a human.
TapTechNews summarizes the main information as follows:
Ideal Motors claims that the advantages of the end-to-end model lie in the two aspects of Efficient Transmission and Efficient Computation: The end-to-end is an integrated model, and the information is transmitted within the model, with a higher ceiling, and the actions and decisions of the entire system that users can feel are more human-like; The integrated model can complete inference in the GPU at one time, and the end-to-end latency is lower, and users can perceive eyes and hands in harmony, and the vehicle responds promptly to actions.
The integrated model can achieve end-to-end trainability, completely data-driven. The official said that for users, the biggest feeling is that the speed of OTA is getting faster and faster.
In terms of the vision language model, its overall algorithm architecture is composed of a unified Transformer model. The Prompt (prompt word) text is encoded by the Tokenizer (word splitter), and then the images of the 120-degree and 30-degree cameras in the front view and the navigation map information are encoded for visual information, and the modality alignment is performed through the graphic-text alignment module, and then handed over to the VLM model for autoregressive inference; The information output by the VLM includes the understanding of the environment, driving decisions and driving trajectories, and is passed to System 1 to control the vehicle.
The official said that there are three highlights in the overall design of this system: A streaming video encoder is designed, which can cache longer temporal visual information; A memory module is added, which caches multi-frame historical information and can solve the problem of long context inference delay; A smart driving Prompt question bank is designed, System 2 can think about the current driving environment and give System 1 reasonable driving suggestions, and System 1 can also invoke different Prompt questions in different scenarios and actively ask for help from System 2.