research paper which is related to computer vision | Essays (university) Computer Vision

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mingxing Tan 1Quoc V. Le 1

Abstract

Convolutional Neural Networks (ConvNets) are

commonly developed at a fixed resource budget,

and then scaled up for better accuracy if more

resources are available. In this paper, we sys-

tematically study model scaling and identify that

carefully balancing network depth, width, and res-

olution can lead to better performance. Based

on this observation, we propose a new scaling

method that uniformly scales all dimensions of

depth/width/resolution using a simple yet highly

effective compound coefficient. We demonstrate

the effectiveness of this method on scaling up

MobileNets and ResNet.

To go even further, we use neural architecture

search to design a new baseline network and

scale it up to obtain a family of models, called

EfficientNets, which achieve much better accu-

racy and efficiency than previous ConvNets. In

particular, our EfficientNet-B7 achieves state-

of-the-art 84.4% top-1 / 97.1% top-5 accuracy

on ImageNet, while being

8.4x smaller

and

6.1x faster

on inference than the best existing

ConvNet. Our EfficientNets also transfer well and

achieve state-of-the-art accuracy on CIFAR-100

(91.7%), Flowers (98.8%), and 3 other transfer

learning datasets, with an order of magnitude

fewer parameters. Source code is at

https:

//github.com/tensorflow/tpu/tree/

master/models/official/efficientnet.

1. Introduction

Scaling up ConvNets is widely used to achieve better accu-

racy. For example, ResNet (He et al.,2016) can be scaled

up from ResNet-18 to ResNet-200 by using more layers;

Recently, GPipe (Huang et al.,2018) achieved 84.3% Ima-

geNet top-1 accuracy by scaling up a baseline model four

time larger. However, the process of scaling up ConvNets

Google Research, Brain Team, Mountain View, CA. Corre-

spondence to: Mingxing Tan <tanmingxing@google.com>.

Preprint, to apear in ICML 2019.

0 20 40 60 80 100 120 140 160 180

Number of Parameters (Millions)

Imagenet Top 1 Accuracy(%)

ResNet-34

ResNet-50

ResNet-152

DenseNet-201

Inception-v2

Inception-ResNet-v2

NASNet-A

ResNeXt-101

Xception

AmoebaNet-A AmoebaNet-C

SENet

EfficientNet-B7

Top1Acc. #Params

ResNet-152 (He et al.,2016) 77.8% 60M

EfficientNet-B1 78.8% 7.8M

ResNeXt-101 (Xie et al.,2017) 80.9% 84M

EfficientNet-B3 81.1% 12M

SENet (Hu et al.,2018) 82.7% 146M

NASNet-A (Zoph et al.,2018) 82.7% 89M

EfficientNet-B4 82.6% 19M

GPipe (Huang et al.,2018)†84.3% 556M

EfficientNet-B7 84.4% 66M

†Not plotted

Figure 1. Model Size vs. ImageNet Accuracy.

All numbers are

for single-crop, single-model. Our EfficientNets significantly out-

perform other ConvNets. In particular, EfficientNet-B7 achieves

new state-of-the-art 84.4% top-1 accuracy but being 8.4x smaller

and 6.1x faster than GPipe. EfficientNet-B1 is 7.6x smaller and

5.7x faster than ResNet-152. Details are in Table 2and 4.

has never been well understood and there are currently many

ways to do it. The most common way is to scale up Con-

vNets by their depth (He et al.,2016) or width (Zagoruyko &

Komodakis,2016). Another less common, but increasingly

popular, method is to scale up models by image resolution

(Huang et al.,2018). In previous work, it is common to scale

only one of the three dimensions – depth, width, and image

size. Though it is possible to scale two or three dimensions

arbitrarily, arbitrary scaling requires tedious manual tuning

and still often yields sub-optimal accuracy and efficiency.

In this paper, we want to study and rethink the process

of scaling up ConvNets. In particular, we investigate the

central question: is there a principled method to scale up

ConvNets that can achieve better accuracy and efficiency?

Our empirical study shows that it is critical to balance all

dimensions of network width/depth/resolution, and surpris-

ingly such balance can be achieved by simply scaling each

of them with constant ratio. Based on this observation, we

propose a simple yet effective compound scaling method.

Unlike conventional practice that arbitrary scales these fac-

tors, our method uniformly scales network width, depth,

and resolution with a set of fixed scaling coefficients. For

arXiv:1905.11946v1 [cs.LG] 28 May 2019

research paper which is related to computer vision, Essays (university) of Computer Vision

Related documents

Partial preview of the text

Download research paper which is related to computer vision and more Essays (university) Computer Vision in PDF only on Docsity!

Abstract

1. Introduction

arXiv:1905.11946v1 [cs.LG] 28 May 2019

2. Related Work

4. EfficientNet Architecture

5. Experiments

6. Discussion