Enterprise AI

Why Enterprise AI Needs More Than LLMs: The Case for Spatial and Visual Intelligence

Nov 11, 2024

The current enterprise AI conversation is dominated by large language models, and for understandable reasons. LLMs have demonstrated remarkable capabilities at language-based tasks that are directly relevant to many enterprise workflows: document analysis, knowledge retrieval, text generation, code assistance, and conversational interfaces. The commercial success of LLM-based products has created strong incentives to frame enterprise AI strategy primarily in terms of language model adoption and customization. But this framing leaves out two categories of AI capability that are essential for enterprises operating in physical environments: spatial intelligence and visual intelligence.

Most enterprises are not purely information-processing organizations. They make physical products, maintain physical infrastructure, operate in physical facilities, manage physical fleets, and interact with customers who exist in physical spaces. The value chains of manufacturing, logistics, infrastructure, healthcare, retail, construction, and dozens of other sectors depend on understanding and managing physical reality, not just information about physical reality. Language models are excellent tools for reasoning about information. They are not the right tools for understanding the visual appearance of production defects, the geometric structure of infrastructure assets, the spatial layout of warehouse operations, or the physical behavior of products in deployment.

Visual intelligence provides AI systems with the ability to understand and analyze the visual world in ways relevant to enterprise operations. This means not just classifying images or detecting objects in isolation, but understanding the operational significance of what is seen: which surface conditions indicate imminent failure, what visual patterns signal quality deviations, how worker movements and equipment interactions can be analyzed to improve safety and efficiency. Visual intelligence in this operational sense requires training data that reflects the specific visual characteristics of the enterprise's actual operating environment, not generic image recognition benchmarks.

Spatial intelligence adds a further dimension: the ability to reason about how things are positioned and related in three-dimensional physical space, how spatial configurations affect operational outcomes, and how operations should be adjusted in response to spatial information. A logistics AI that understands the spatial layout of a warehouse and can reason about optimal picking paths, storage configurations, and traffic flow is qualitatively more useful than a system that processes text descriptions of warehouse operations. A construction AI that understands the spatial relationships of building components and can reason about construction sequence, interference, and quality in 3D is more useful than document management AI for construction workflows.

The case for investing in spatial and visual intelligence alongside LLMs is not that LLMs are inadequate. It is that different types of enterprise operations require different types of AI capability, and a complete enterprise AI strategy must include the full range of capabilities that enterprise operations actually demand. Enterprises that invest exclusively in LLM infrastructure will find that they have powerful tools for language-based tasks and significant gaps for the operational tasks that require seeing, measuring, and reasoning about the physical world.

Building enterprise-grade spatial and visual AI requires investment in training data that reflects real operational environments, which often means synthetic data generated from simulation environments calibrated to specific facility, product, and operational characteristics. It requires evaluation frameworks that measure performance on operational tasks rather than generic benchmarks. And it requires integration with operational systems, sensor networks, and spatial data infrastructure that allows AI insights to flow into operational decisions. This investment is more complex than deploying a pre-trained LLM with a retrieval layer, but it addresses a fundamentally different and complementary set of enterprise intelligence requirements that are not substitutable by language capability alone.