A new study shows that AI language models like GPT-4 and PaLM struggle to grasp sensory-rich concepts such as flowers because they rely mainly on text data without direct sensory or motor experiences.
Researchers compared how humans and AI understand over 4,000 words and found AI performs well with abstract terms but fails to capture the full richness of words linked to smell, touch, or taste.
The study highlights that language alone cannot fully represent human experiences. Models trained with both images and text did better on visual concepts, suggesting that adding sensory data could improve AI understanding. This gap may impact how AI interacts with humans, emphasizing the need for multi-modal AI development.
Published in Nature Human Behaviour, the research points to the limits of current AI and the importance of embodied learning for future progress.