Semiconductors Designed for Edge AI

By: Charlene Wan, VP of Branding, Marketing, and Investor Relations, Ambiq

0
258

In an era dominated by interconnected technologies, integrating Artificial Intelligence (AI) into battery-powered devices such as wearables or sensors is revolutionizing how we perceive and interact with smart devices. This integration, often called Edge AI, is making intelligence everywhere possible. These two technologies are connected by semiconductors, which form an essential bridge between each channel. However, AI is complex and power-intensive, especially on Edge devices. To deliver a truly profound user experience, semiconductor companies will need to rethink how they optimize their semiconductors for Edge AI applications.

For many years, AI has been restricted to the Cloud, requiring large data servers to process data and consuming a constant source of energy. In traditional cloud-based AI systems, the heavy computational workloads are offloaded to data centers, where powerful processors, such as GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units), handle the AI inferencing.

But, a shift to make AI more readily accessible and unload some of the intensive power workloads from servers and energy grids has created a market for Edge AI and AI chips. To enable AI chips, semiconductor manufacturers must strike a balance between performance, energy efficiency, and size constraints. Yet, condensing a cloud worth of data into a 1mm3 chip with 2000X less available memory, costs 3,000X less, and a 100,000X lower power budget has been challenging for the semiconductor industry. Yet, uncovering this holy grail of semiconductor design extends the possibilities of AI from which the end consumer benefits.

Benefits of Edge AI

This departure from traditional cloud-based approaches brings intelligence closer to the data source, providing many advantages, including enhanced security, faster processing, better user experience, and reduced dependency on the Cloud.

Enhanced Privacy and Security

Implementing AI on local Edge devices empowers users with greater control over their data. Processing sensitive information locally alleviates privacy concerns and reduces the likelihood of data exposure to external networks. By moving AI to local devices, organizations responsible for maintaining the safety of their customers’ personally identifiable information (PII) are better protected. 

On the devices, this data isn’t exposed to the servers of cloud service providers and other third parties, ensuring better compliance with local and international regulations around data protection. This approach mitigates the risks associated with data breaches and unauthorized access, providing a robust solution for applications such as digital health, which require heightened security.   

Edge AI built for digital health applications can analyze data in real time, ensuring that sensitive information remains private while still enabling advanced diagnostics and personalized care. This combination of privacy, speed, and security makes Edge AI an essential technology for the future of secure, data-driven applications.

Reduced Latency

Performing AI on Edge devices also has the potential to reduce latency issues, as can be seen with cloud-based AI.  Language models like GPT-3, the system upon which OpenAI built ChatGPT, require tremendous compute power to process. OpenAI had to activate traffic management strategies like a queuing system and slowing down queries to handle the surge in demand after ChatGPT’s launch. This incident highlights how compute power is becoming a bottleneck, limiting the advancement of AI models.

Performing AI locally can significantly reduce latency, leading to faster processing times. Applications benefit from quicker response times, enabling real-time decision-making, especially in critical scenarios such as healthcare or smart home devices. Decentralized processing also means that insights are generated in real-time with less latency than if the device had to send data to the Cloud to be processed and listen for a response.

Enhanced User Experience

Reduced latency and improved processing speed provide a seamless and responsive user experience. Real-time feedback becomes possible, resulting in higher user satisfaction and engagement. Endpoint AI can process and analyze user data at the Edge to create personalized experiences without relying on centralized servers.

This leads to more responsive and tailored services, such as personalized recommendations and content delivery for cases like shopping lists, fitness apps, and meal recommendations. This type of personalized AI can more effectively captivate users and increase their engagement with content and experiences that resonate with their interests.

Reduced Dependence on the Cloud

While the Cloud has enormous power for collecting and processing data, it is susceptible to threats like hacking or outages. It may not be available in areas with limited internet access. Reducing reliance on the Cloud with Edge AI enhances their performance and promotes their security from external threats. This also increases their resilience in scenarios with limited internet connectivity, particularly significant for applications in remote areas or environments with intermittent network access.

Designing Semiconductors for Edge AI

With all the benefits of Edge AI presented to us, how does the semiconductor industry get there? To perform AI at the Edge, manufacturers must design semiconductors for energy efficiency, higher computations through transistor design, and optimized memory utilization.

A key design focus for AI chips is reducing power consumption. Edge devices, such as smartwatches, sensors, asset trackers, etc., are often limited to a small battery as a power source. As a result, power utilization in semiconductors must be highly energy efficient. Techniques like voltage scaling, dynamic power management, tuning individual transistors on a chip, and optimized memory architectures are used to minimize power usage without sacrificing computational capabilities.

AI models, with deep learning neural networks, will need significant computational power packed in the limited space a semiconductor provides. This can be achieved by using smaller process nodes that measure in nanometers (nm). This technique enables more transistors to be placed on a single chip, boosting computation and performance. Real-world examples of semiconductor fabricators doing this include leading semiconductor companies like TSMC that develop chips using 5nm and even 3nm technology. Efficient memory usage in this constrained design is equally critical for computations at the Edge.

 AI requires large amounts of data to be stored and accessed instantly. Optimizing memory bandwidth is a key focus in AI chip designs. To address this, semiconductor manufacturers use specialized memory hierarchies that include on-chip SRAM (Static Random Access Memory) and NVM (Non-Volatile Memory) to optimize data access speeds while minimizing power consumption.

Trends that help with these requirements include the development of System-on-Chip (SoC) designs, which integrate multiple components, such as processors, memory, and AI accelerators, onto a single chip. SoCs are highly efficient and compact, making them ideal for Edge AI applications. For instance, companies like Ambiq leverage ultra-low-power semiconductor SoCs to create chips that enable AI at the lowest power possible.

Are NPUs Important for Edge AI?

Neural processing units (NPUs) have been introduced to accelerate the compute of AI. Still, due to the nature of Edge AI, an NPU can be an overhead compared to what a manufacturer requires for their applications.

Always-on Edge AI workloads are defined by their constraints – they operate on locally collected data that must fit a memory, compute, and power envelope. Edge AI is meant to operate on data collected by local sensors. Typical data sources are multi-variate time series from biometric, inertial, vibration, environmental sensors, audio data, and vision data. The types of data available to draw insights from inform the types of neural architectures relevant to Edge AI and largely dictate its performance and memory needs.

AI workloads need memory (capacity and bandwidth) and compute capacity, which must be balanced to avoid bottlenecks. NPUs accelerate compute without adding memory. While some Edge AI workloads benefit from this, most do not. Examples of where an NPU may be more useful are for real-time complex audio processing and real-time video analytics.

Another reason that is often used to justify NPUs is the concept of ‘race to sleep.’ In battery-powered environments, the traditional way of saving power is to stay in sleep mode as long as possible. However, the advances in microcontroller power efficiency make the race to sleep less relevant, compelling, or even unnecessary.

Finally, the Large Language Models (LLMs) such as Chat-GPT may make a case for an NPU but again go outside of the definitions of Edge AI. This is a massive computation, requiring the largest compute platforms ever built with compute needs in the petaflop range, they are not a practical consideration for Edge devices.

This does not mean that always-on Edge AI isn’t capable of deep insights, only that LLMs are not the best approach for this constrained device. For limited domains like health analytics, semantic embedding models or distilled foundational models are a much better alternative and yield similar experiences. Because these models are not real-time, they can be implemented without NPUs.

The Future of Edge AI and Semiconductors

Despite the challenges of making Edge AI possible, ongoing advancements in hardware design and optimization techniques are steadily overcoming these obstacles. Technologies such as Edge computing and energy-efficient processors pave the way for more efficient local on-device AI implementations.

The benefits of Edge computing are reshaping the technological landscape especially as manufacturers rely on semiconductors to meet this demand. Offering improved privacy, security, speed, and user experiences, while designing for power, memory, and computation, will help to embed AI into our daily lives.