CloudSeg: Edge-to-Cloud Framework for Vision Analytics with Super-Resolution | Study notes Computer Vision

Bridging the Edge-Cloud Barrier for Real-time Advanced Vision Analytics

Yiding Wang

HKUST

Weiyan Wang

HKUST

Junxue Zhang

HKUST

Junchen Jiang

University of Chicago

Kai Chen

HKUST

Abstract

Advanced vision analytics plays a key role in a plethora of

real-world applications. Unfortunately, many of these applica-

tions fail to leverage the abundant compute resource in cloud

services, because they require high computing resources and

high-quality video input, but the (wireless) network connec-

tions between visual sensors (cameras) and the cloud/edge

servers do not always provide sufficient and stable bandwidth

to stream high-fidelity video data in real time.

This paper presents CloudSeg, an edge-to-cloud framework

for advanced vision analytics that co-designs the cloud-side

inference with real-time video streaming, to achieve both

low latency and high inference accuracy. The core idea is

to send the video stream in low resolution, but recover the

high-resolution frames from the low-resolution stream via a

super-resolution procedure tailored for the actual analytics

tasks. In essence, CloudSeg trades additional cloud-side com-

putation (super-resolution) for significantly reduced network

bandwidth. Our initial evaluation shows that compared to pre-

vious work, CloudSeg can reduce bandwidth consumption by

∼6.8×with negligible drop in accuracy.

1 Introduction

Recent years have seen an explosive growth of real-world

vision-based applications, primarily driven by advances in

traditionally challenging vision tasks, e.g. multiple object

detection [21,24], semantic segmentation [14,29], instance

segmentation [8,25] and panoptic segmentation [12,13]. To

obtain adequate inference accuracy, these tasks often require

both high computation power and high-resolution images (or

video streams). This, however, poses a fundamental challenge

to real-time vision-based applications. On the one hand, many

video analytics tasks have been optimized for cloud environ-

ments (e.g. [10,28]). This seems to suggest one should send

data via the bandwidth-limited connection to the cloud in the

hope that the sophisticated cloud-side model can still extract

enough information from the limited data. This hope, unfor-

tunately, turns out to be illusory for advanced vision analytics

tasks; while reducing video resolution (or frame rate) does

save bandwidth, it will nevertheless inflict non-trivial drop

in inference accuracy [4,27]. On the other hand, some real-

time advanced vision applications, e.g. autonomous driving,

put expensive hardware accelerators [15] on edge devices

to perform local inference. However, this approach does not

make much economic sense when future applications require

large-scale deployment, e.g. fleets of delivery vehicles [23].

In this paper, we present CloudSeg, an edge-to-cloud video

analytics framework that optimizes for both high accuracy

and low latency. CloudSeg lowers the quality in which the

video is sent to the cloud, but it then runs a super-resolution

(SR) procedure at the cloud server to reconstruct high-quality

videos before executing the actual video analytics (video

segmentation, object detection, etc.). This approach is in the

same spirit of prior applications of SR where high-quality

images are needed when only low-quality images are avail-

able [7]. What’s new is that we found it can potentially strike

a desirable balance between accuracy and latency in the edge-

to-cloud analytics setting. Essentially, running SR uses much

less cloud resource and cause less delay than the actual in-

ference, and it could restore the video quality so that video

analytics task could achieve the same accuracy as if the video

is streamed in high quality.

That said, we found that current SR models do not always

perform as well as expected. This is because traditional SR

models seek to retain pixel-level details (i.e., minimizing vi-

sual quality loss), which does not always retain the informa-

tion needed by vision analytics. A notable example of such

mismatch is the recovery of small details such as distant pedes-

trians. Traditional SR models, trained to uniformly recover

all pixels to meet a given target quality, may fail to recover

enough details for small object than for big objects, thus mak-

ing small objects hard to identify or segment. However, these

small objects are crucial (just as other large objects) to the

accuracy of vision tasks and the practicality of applications

e.g. autonomous driving.

To address the limitations of SR, we train our SR model

in such a way that it reduces both quality loss as well as the

CloudSeg: Edge-to-Cloud Framework for Vision Analytics with Super-Resolution, Study notes of Computer Vision

Partial preview of the text

Download CloudSeg: Edge-to-Cloud Framework for Vision Analytics with Super-Resolution and more Study notes Computer Vision in PDF only on Docsity!

Bridging the Edge-Cloud Barrier for Real-time Advanced Vision Analytics

Yiding Wang

HKUST

Weiyan Wang

HKUST

Junxue Zhang

HKUST

Junchen Jiang

University of Chicago

Kai Chen

HKUST

Abstract

1 Introduction

2 Background

2.1 Requirements of advanced vision analytics

2.2 Video streaming for vision analytics

2.3 Super-resolution for vision analytics

3.3 Adaptive bitrate controlling

4 Preliminary results

4.1 Analytics-aware super-resolution

4.2 Bandwidth consumption

4.3 Inference latency