DL4J hello world

背景：之前尝试TensorFlow训练保存pb模型给到spark用，感觉还是性能太慢了；开始寻求在spark上跑深度学习的方法，权衡sparkNet和DL4J后选择。

参考官网 https://deeplearning4j.org/cn/quickstart 先弄了个例子：
步骤1：克隆到本地
F:\spark project\dl4j-examples>git clone https://github.com/deeplearning4j/dl4j-examples.git
Cloning into 'dl4j-examples'...
remote: Enumerating objects: 201, done.
remote: Counting objects: 100% (201/201), done.
remote: Compressing objects: 100% (133/133), done.
error: RPC failed; curl 56 OpenSSL SSL_read: SSfL_ERROR_SYSCALL, errno 10054
atal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

上面问题参考：https://stackoverflow.com/questions/21277806/fatal-early-eof-fatal-index-pack-failed 如下解决：

F:\spark project\dl4j-examples>
F:\spark project\dl4j-examples>git config --global core.compression 0

F:\spark project\dl4j-examples>git clone --depth 1 https://github.com/deeplearning4j/dl4j-examples.git
Cloning into 'dl4j-examples'...
remote: Enumerating objects: 768, done.
remote: Counting objects: 100% (768/768), done.
remote: Compressing objects: 100% (547/547), done.
remote: Total 768 (delta 161), reused 491 (delta 97), pack-reused 0 eceiving objects: 99% (761/768), 22.93 MiB | 76.00 KiB/s
Receiving objects: 100% (768/768), 22.94 MiB | 165.00 KiB/s, done.
Resolving deltas: 100% (161/161), done.

感觉执行到这里可以了，毕竟例子都下下来了，然后我又往下执行了几行：
F:\spark project\dl4j-examples>git fetch --unshallow
fatal: not a git repository (or any of the parent directories): .git

F:\spark project\dl4j-examples>git fetch --depth=2147483647
fatal: not a git repository (or any of the parent directories): .git

F:\spark project\dl4j-examples>git init
Initialized empty Git repository in F:/spark project/dl4j-examples/.git/

F:\spark project\dl4j-examples>git fetch --unshallow
fatal: --unshallow on a complete repository does not make sense

F:\spark project\dl4j-examples>git fetch --depth=2147483647

F:\spark project\dl4j-examples>git pull --all
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

git branch --set-upstream-to=<remote>/<branch> master

步骤2：大概要半小时，然后需要5G多磁盘空间
mvn clean install
这里我没有单独安装maven，而是偷懒在idea的其他项目中通过Execute Maven Goal在指定路径F:\spark project\dl4j-examples\dl4j-examples 下执行mvn clean install的

步骤3：挑个例子跑起来：
如org.deeplearning4j.examples.feedforward.classification.MLPClassifierLinear，
IDEA运行测试用例报错如下：
Error running 'AuthServer': Command line is too long. Shorten command line for AuthServeror also for Application default configuration.
解决办法：
修改项目下 .idea\workspace.xml，找到标签 <component name="PropertiesComponent"> ，在标签里加一行 <property name="dynamic.classpath" value="true" />
执行结果：

DL4J hello world

步骤4：到自己的spark里面运行
先添加依赖：
<dependency>
   <groupId>org.datavec</groupId>
   <artifactId>datavec-api</artifactId>
   <version>1.0.0-beta5</version>
</dependency>
<dependency>
   <groupId>org.nd4j</groupId>
   <artifactId>nd4j-native-platform</artifactId>
   <version>1.0.0-beta5</version>
</dependency>
<dependency>
   <groupId>org.deeplearning4j</groupId>
   <artifactId>deeplearning4j-core</artifactId>
   <version>1.0.0-beta5</version>
</dependency>
<dependency>
   <groupId>org.deeplearning4j</groupId>
   <artifactId>dl4j-spark_2.11</artifactId>
   <version>1.0.0-beta5</version>
</dependency>
<dependency>
   <groupId>com.beust</groupId>
   <artifactId>jcommander</artifactId>
   <version>1.27</version>
</dependency>

复制org.deeplearning4j.legacyExamples.mlp.MnistMLPExample并稍作修改得到：
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
import org.deeplearning4j.eval.Evaluation
import org.deeplearning4j.nn.conf.NeuralNetConfiguration
import org.deeplearning4j.nn.conf.layers.DenseLayer
import org.deeplearning4j.nn.conf.layers.OutputLayer
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.spark.impl.multilayer.SparkDl4jMultiLayer
import org.deeplearning4j.spark.impl.paramavg.ParameterAveragingTrainingMaster
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.dataset.DataSet
import org.nd4j.linalg.learning.config.Nesterovs
import org.nd4j.linalg.lossfunctions.LossFunctions
import java.util

object MnistMLPExample {
val batchSizePerWorker=16
val numEpochs = 2
def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf
    System.setProperty("hadoop.home.dir", "D:\\火狐下载\\hadoop-common-2.2.0-bin-master")
    sparkConf.setMaster("local[*]")
    sparkConf.setAppName("DL4J Spark MLP Example")
    val sc = new JavaSparkContext(sparkConf)
   sc.setLogLevel("WARN")
    //Load the data into memory then parallelize
    //This isn't a good approach in general - but is simple to use for this example
    val iterTrain = new MnistDataSetIterator(batchSizePerWorker, true, 12345)
    val iterTest = new MnistDataSetIterator(batchSizePerWorker, true, 12345)
    val trainDataList = new util.ArrayList[DataSet]
    val testDataList = new util.ArrayList[DataSet]
    while ( {
      iterTrain.hasNext
    }) trainDataList.add(iterTrain.next)
    while ( {
      iterTest.hasNext
    }) testDataList.add(iterTest.next)
    val trainData = sc.parallelize(trainDataList)
    val testData = sc.parallelize(testDataList)
    //Create network configuration and conduct network training
    val conf = new NeuralNetConfiguration.Builder()
      .seed(12345)
      .activation(Activation.LEAKYRELU)
      .weightInit(WeightInit.XAVIER)
      .updater(new Nesterovs(0.1))// To configure: .updater(Nesterovs.builder().momentum(0.9).build())
      .l2(1e-4)
      .list()
      .layer(new DenseLayer.Builder().nIn(28 * 28).nOut(500).build())
      .layer(new DenseLayer.Builder().nOut(100).build())
      .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
        .activation(Activation.SOFTMAX).nOut(10).build())
      .build();
    //Configuration for Spark training: see http://deeplearning4j.org/spark for explanation of these configuration options
    val tm = new ParameterAveragingTrainingMaster.Builder(batchSizePerWorker)    //Each DataSet object: contains (by default) 32 examples
      .averagingFrequency(5)
      .workerPrefetchNumBatches(2)            //Async prefetching: 2 examples per worker
      .batchSizePerWorker(batchSizePerWorker)
      .build();
    //Create the Spark network
    val sparkNet = new SparkDl4jMultiLayer(sc, conf, tm)
    //Execute training:
    var i = 0
    while ( {
      i < numEpochs
    }) {
      sparkNet.fit(trainData)
      println("Completed Epoch {}", i)
      i += 1;
    }
    //Perform evaluation (distributed)
    //        Evaluation evaluation = sparkNet.evaluate(testData);
    val evaluation = sparkNet.doEvaluation(testData, 64, new Evaluation(10))(0) //Work-around for 0.9.1 bug: see https://deeplearning4j.org/releasenotes
    println("***** Evaluation *****")
    println(evaluation.stats)
    //Delete the temp training files, now that we are done with them
    tm.deleteTempFiles(sc)
    println("***** Example Complete *****")

}

}

运行结果为：
......
15:03:14,619 INFO ~ Completed Epoch 1
15:03:20,498 INFO ~ ***** Evaluation *****
15:03:20,502 INFO ~

========================Evaluation Metrics========================
# of classes:    10
Accuracy:        0.9608
Precision:       0.9605
Recall:          0.9607
F1 Score:        0.9605
Precision, recall & F1: macro-averaged (equally weighted avg. of 10 classes)

=========================Confusion Matrix=========================
    0    1    2    3    4    5    6    7    8    9
---------------------------------------------------
5807    0   10    3   11   17   24    3   40    8 | 0 = 0
    1 6583   50   17    9    8    1    7   54   12 | 1 = 1
   24   11 5759   24   27   11   19   31   47    5 | 2 = 2
   14   16   99 5717    3 113    8   29   96   36 | 3 = 3
    6   14   25    2 5615    1   31    5   16 127 | 4 = 4
   28    8   14   65   15 5180   39    8   43   21 | 5 = 5
   29    8   10    0   21   49 5775    0   26    0 | 6 = 6
   11   24   77   13   35    3    2 6006   14   80 | 7 = 7
   18   39   24   56   14   45   21    9 5600   25 | 8 = 8
   22   12    5   43 115   28    2   71   43 5608 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================
15:03:20,502 INFO ~ Attempting to delete temporary directory: /tmp/hadoop-lenovo/dl4j/1572245786579_-2c362db/0/
15:03:22,346 INFO ~ Deleted temporary directory: /tmp/hadoop-lenovo/dl4j/1572245786579_-2c362db/0/
15:03:22,347 INFO ~ ***** Example Complete *****

DeepLearning4J环境搭建、测试完成，接下来了解下DLJ4的细节...

相关推荐