City
Epaper

After ChatGPT, Microsoft working on AI model that takes images as cues

By IANS | Updated: March 3, 2023 18:50 IST

New Delhi, March 3 As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, ...

Open in App

New Delhi, March 3 As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages.

The multimodal large language model (MLLM) can help in an array of new tasks, including image captioning, visual question answering and more.

Kosmos-1 can pave the way for the next-stage beyond ChatGPT's text prompts.

"A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context and follow instructions," said Microsoft's AI researchers in a paper.

The paper suggests that multimodal perception, or knowledge acquisition and "grounding" in the real world, is needed to move beyond ChatGPT-like capabilities to artificial general intelligence (AGI), reports ZDNet.

"More importantly, unlocking multimodal input greatly widens the applications of language models to more high-value areas, such as multimodal machine learning, document intelligence, and robotics," the paper read.

The goal is to align perception with LLMs, so that the models are able to see and talk.

Experimental results showed that Kosmos-1 achieves impressive performance on language understanding, generation, and even when directly fed with document images.

It also showed good results in perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks, such as image recognition with descriptions (specifying classification via text instructions).

Disclaimer: This post has been auto-published from an agency feed without any modifications to the text and has not been reviewed by an editor

Tags: microsoftAGI
Open in App

Related Stories

TechnologyIT Layoffs 2025: Microsoft, Google, Apple Among 284 Tech Companies That Cut Jobs in First 5 Months

TechnologyMicrosoft Layoffs: Satya Nadella-led Company Sacks Over 6,000 Employees Across Key Positions

TechnologyWhy Is Skype Shutting Down? Microsoft's Video-Calling Platform to Retire on May 5

Business‘Microsoft Is a Digital Weapons Manufacturer’: Indian-American Engineer Calls Out Gates, Ballmer, Nadella Over AI Ties to Gaza War (Watch Video)

TechnologyMicrosoft to Bid for TikTok: Will the App Make a Comeback in India?

Technology Realted Stories

TechnologyNSE-Cyprus stock exchange pact marks new chapter in financial cooperation

TechnologyICAR-NIHSAD Bhopal designated as containment facility for Rinderpest virus

TechnologyC-DOT provides grants to 18 startups under ‘Samarth’ programme

TechnologySensex, Nifty rise nearly 1 pc despite rising Mideast tensions

TechnologyIndia continues to lead as global vaccine supplier, generic drug manufacturer: Anupriya Patel