/ The Power of Vision in Multimodal AI

Description

Recent AI advancements, especially in generative AI and Large Language Models (LLMs), are revolutionizing various sectors by introducing multimodal capabilities that process text, images, videos, and sounds. This shift towards multimodal and foundational models in computer vision marks a departure from task-specific models, broadening the scope of machine perception. At GFT's visual inspection and computer vision area, we're exploring and leveraging these multimodal capabilities to enhance machine understanding of the world, showcasing the transformative potential of these technologies.

Session 🗣 Intermediate ⭐⭐ Track: AI, ML, Bigdata, Python

computer vision

AI

deep learning

multimodality

🗳️ Vote this talk
This website uses cookies to enhance the user experience. Read here