**YoonJae Yang**
AI Application Developer, Nota AI

**Hyungjun Lee**
Research Engineer, Nota AI

Introduction

In the world of artificial intelligence, semantic segmentation stands out as a crucial task. This AI model excels at detecting the pixels of specific target objects within images or videos. In this article, our focus narrows down to a rather familiar target - "people." The applications of semantic segmentation are diverse, ranging from background removal during video conferencing and video calls to enhancing the art of photo editing.

However, here's the catch. Semantic segmentation models demand a substantial amount of computational power. Why? Because they must meticulously classify every single pixel within an input image. For instance, they need to determine whether a given pixel belongs to the background or represents a person. This high computational complexity poses a significant challenge, especially when real-time background removal is a necessity, as in the case of video conferencing or video calls.

Test Platform

Our journey into the world of mobile optimization begins with a specific set of tools and hardware:

Operating System: Android
AI Model: PIDNet (Xu et al., CVPR 2023)
AI Model Format: ONNX
Target Hardware: Galaxy S23 (SM-S911**)

Additionally, we have made our project available on GitHub, making it accessible to fellow developers and AI enthusiasts. You can explore our work at this GitHub repository. We're proud to acknowledge the contributions of YoonJae Yang and Hyungjun Lee to this project.

Scenario Using NetsPresso®

Now, let's delve into a critical scenario where we explore how to optimize segmentation models for mobile devices using NetsPresso®.

Initial Challenge

We embarked on our journey using PIDNet, a cutting-edge model known for its efficiency in segmentation tasks. However, despite its prowess, we encountered a significant obstacle: a latency of 1.86 FPS (Frames Per Second). To put it simply, this meant a noticeable lag between processing frames, which isn't acceptable for real-time applications.

The Power of Compression

To address this challenge, we turned to NetsPresso® and harnessed its structured pruning capabilities. Our objective was clear: compress the model without sacrificing performance. We experimented with two levels of compression - 40% and 50%.

40% Compression

Our first experiment was a revelation. By compressing the model by 40%, we not only managed to maintain nearly the same accuracy (measured by mIoU) but also achieved a significant reduction in latency. We compared the original model's performance with the 40% compressed model, and the results were impressive.

Table 1: Performance Comparison - Original vs. 40% Compressed Model

The 40% compressed model significantly reduced the time taken to produce results compared to the original model.

50% Compression

Encouraged by the success of our first experiment, we pushed the boundaries further by compressing the model by 50%. This time, there was some loss in accuracy, but the gains in latency were noteworthy.

Even with a 50% compression, we managed to significantly improve latency while accepting a slight dip in accuracy.