Recent AI advancements, especially in generative AI and Large Language Models (LLMs), are revolutionizing various sectors by introducing multimodal capabilities that process text, images, videos, and sounds. This shift towards multimodal and foundational models in computer vision marks a departure from task-specific models, broadening the scope of machine perception. At GFT's visual inspection and computer vision area, we're exploring and leveraging these multimodal capabilities to enhance machine understanding of the world, showcasing the transformative potential of these technologies.
Session 🗣 Intermediate ⭐⭐ Track: AI, ML, Bigdata, Python
computer vision
AI
deep learning
multimodality