Multimodal Large Language Models

Attention re-alignment in multimodal large language models via intermediate-layer guidance

Multimodal large language models (MLLMs) have achieved impressive performance in understanding and describing visual content, setting new state-of-the-art results on a variety of visual question ...

Nature

Evaluating multimodal commercial and open-source large language models for dynamical astronomy: a benchmark study of resonant behavior classification

Machine learning has been used in astronomy for many years. Classical methods such as k-nearest neighbors, decision trees, random forests, or gradient boosting have helped classify images, detect ...

Frontiers

Multimodal World Models, Embodiment, and Cognitive Amplification

Multimodal models and world models are emerging as promising frameworks for extending language-based AI beyond text, towards ...

TechCrunch

Microsoft takes on AI rivals with three new foundational models

Microsoft AI, the tech giant’s research lab, announced the release of three foundational AI models on Thursday that can generate text, voice, and images. The release signals Microsoft’s continued push ...

12d

TwelveLabs’ video AI finds new use cases on AWS Marketplace

TwelveLabs' Danny Nicolopoulos talks to theCUBE about how the company's video AI tools have found a wider range of use cases ...

China Daily Global Edition

Baidu advances AI with new Kunlun chips and Ernie 5.0 model

Robin Li, co-founder, chairman and CEO of Baidu Inc delivers a speech at Baidu World 2025. [Photo provided to ...

25d

Google's new open source Gemma 4 12B analyzes audio, video — and runs entirely locally on a typical 16GB enterprise laptop

For enterprise leaders aiming to decentralize their AI workloads, Gemma 4 12B offers a rare combination of edge-friendly efficiency and frontier-class reasoning.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results