Surrogate-Assisted Evolutionary DL Using E2E Random Forest-based Performance Predictor 论文研读
Surrogate-Assisted Evolutionary Deep Learning Using an End-to-End Random Forest-based Performance Predictor
the page dist: https://ieeexplore.ieee.org/document/8744404
INTRODUCTION
The proposed performance predictor shows promising performance in term of the classification accuracy and the consumed computational resources when compared with 18 state-of-the-art peer competitors by integrating it into an existing EDL algorithm as a casestudy.
Unfortunately, designing architecture with the best performance for the investigated data requires extensive expertise in both the CNNs and the data domain [7], which is not necessarily held by the interested users.Designing the best CNN architecture for the given data can be seen as an optimization problem, which can be mathematically formulated by (1):
where λ
refers to the parameters related to the architectures of CNNs, such as the number of convolutional layers and the configurations of pooling layers;Λ
refers to the parameter space, denotes the CNN algorithm A adopting the architecture parameters λ, and L
means the performance measure of on the test data after it has been trained on the training data . Generally, λ
is with discrete values.
Our goal in this paper is to present an effective and efficient end-to-end performance predictor (in short named E2EPP) based on random forest
LITERATURE REVIEW
In this section, the EDL, AE-CNN and random forest are first introduced as the base algorithms in Subsection II-A, which is helpful to know the details of the proposed performance predictor.
Background
The CNN architectures generated by AE-CNN are composed of the DenseNet Blocks (DBs), ResNet Blocks (RBs) and Pooling Blocks (PBs). Each DB or RB is composed of multiple DenseNet Units (DUs) and ResNet Units (RUs).
Respectively,while a PB consists of only one pooling layer. Each DU, RU, or PB differs in terms of the parameter settings.
- The parameters of a DU are the sizes of input and output (denoted by in and out, respectively).
- While those of an RU are the same as those of a DU, in addition to an increasing factor (denoted by k).
- The parameter of a PB is only the pooling type (i.e., the maximal or mean pooling type) because its other parameters are all set to fixed values.
- Because a DB/RB is composed of multiple DUs/RUs, the parameters of a DB/RB are the corresponding parameters of a DU/RU and the amount (denoted by amount) of DUs/RUs in a DB/RB.
- In addition, as the DBs, RBs and PBs compose a CNN with an order, an extra parameter (denoted by id) is also used to represent the corresponding position in the CNN.
Obviously, a CNN generated by AE-CNN is composed of sequential blocks which may be the DUs, RUs or PBs. When the block is an RB, the parameters are the id
, amount
, in
and out
; when the block is a DB, the parameters are id
, amount
, k
, in
and out
; when the block is a PB, the parameter is the pooling type denoted by type. Noting that the in of the current RB/DB should be equal to the out of its previous RB/DB for the reason of constructing a valid CNN.
EDL
tournament selection crossover and mutation environmental selection
AE-CNN
Random Forest
As the performance predictor is part of EDL in this research, an EDL method should be provided before the performance predictor is detailed. In this work, the AECNN algorithm developed by the authors is selected as the representative EDL, which is mainly based on the reasons that: 1) it shows promising performance among existing EDLs [48], and 2) the source code of AE-CNN is available to the public. Noting that the proposed performance predictor is applicable to any existing EDLs
The AE-CNN algorithm [48] is an automatic EDL algorithm based on the building blocks of the state-of-the-art ResNet[49]andDenseNet[50].
Related Work(FBO 和 Peephole的优缺点)
The existing performance predictors can be classified into two different categories: performance predictors based on the learning curve and end-to-end performance predictors, both of which are based on the training-predicting computational paradigm.
The Peephole algorithm uses a number of CNN architectures and their corresponding performances that have been achieved by training the CNNs with T epochs, as the training samples to train a long-short time memory neural network [33]. The trained neural network directly predicts the performance of a new CNN based its architecture, which is called the end-to-end mechanism because the end in the input is the raw data while the end in the output is the classification accuracy. Because the architecture cannot be directly used as the input data of the neural network, the Peephole algorithm employs the word vector technique to map the CNN architecture to a numerical value.
The major advantage of the FBO algorithm remains in the trained-CNN-free nature, i.e., it does not need any trained CNNs in advance. Because training a CNN is time-consuming, varying from several days to weeks, the FBO algorithm is efficient. However, it will not be effective when the learning curve is not smooth because the curve fitting works under the assumption that the curve is smooth. In recent deep learning applications, the learning curve is usually not smooth because a schedule of learning rates is usually used. Once the learning rate is changed, the learning curve will have a non-smooth segment.
Another limitation of the FBO algorithm is the nonend-to-end nature (i.e., in predicting the performance of each CNN, a part of the training data regarding this CNN must be collected for training the predictor), which requires much more labour work when it is used.
Owing to the end-to-end nature, the Peephole algorithm is more convenient for use. However, the major drawback of Peephole remains in the requirement of a large number of training samples, which results in added computational complexity of collecting train samples exceeding the EDL without using performance predictors. For example, Peephole used over 8,000 fully trained CNN architectures as the training data. However, EDLs generally achieve promising performance by evaluating only hundreds of individuals. If we have enough computational resources to evaluate the 8,000 CNN architectures, we will not need to develop the performance predictor. Such a limitation is largely caused by its adopted regression mode, i.e., the neural network-based algorithm, which typically highly relies on a large amount of labelled training data.
PROPOSED ALGORITHM
As shown in Fig. 1 that is composed of three blocks, i.e., the data collection, E2EPP and EDL. The proposed E2EPP performance predictor is part of EDL.
Firstly, a set of training data is collected for training the random forest-based predictor, where the collection is achieved by performing the corresponding EDL without using E2EPP. Each data sample is composed of the CNN architecture and the corresponding classification accuracy that is obtained by training the CNN from scratch.
Secondly, those architectures are encoded into discrete code (shown in Subsection III-A) for building the random forest-based predictor pool with a large number (say K) regression trees (denoted as CARTs) [51] (shown in Subsection III-B). During each generation of the EDL, the newly generated CNN architecture is encoded as the input to the random forest, and then its performance is predicted by using the adaptive combination of CARTs from the predictor pool (shown in Subsection III-C). When the EDL terminates, the CNN architecture that has the best prediction performance is output. Noting that there is no further CNN training during the optimization process.
Encoding
Based on the description shown in Subsection II-A2, the collected training data are summarized as below:
- RBs and DBs: Each generated CNN architecture is composed of four RBs and four DBs at most, the number of output channels of each block varies between [32,512] that is set based on the conventions of state-of-the-art CNNs [50], [54].
- PBs: Each generated CNN architecture contains four pooling layers at most. There are two types of PBs: MAX and MEAN.
|
|