HYPERPARAMETER TUNING WITH RAY TUNE
超参数调参可以使平均模型和高精度模型之间的差异。通常简单的事情,比如选择不同的学习速率或改变网络层大小,都会对您的模型性能产生巨大的影响。
幸运的是,有一些工具可以帮助找到参数的最佳组合。 Ray Tune 是分布式超参数调优的行业标准工具。Ray Tune包括最新的超参数搜索算法,与TensorBoard等分析库集成,并通过Ray’s distributed machine learning engine本地支持分布式训练。
在本教程中,我们将向大家展示如何将Ray Tune集成到Py Torch培训工作流程中。我们将从Py Torch文档this tutorial from the PyTorch documentation 中扩展本教程,用于训练CIFAR10图像分类器。
As you will see, we only need to add some slight modifications. In particular, we need to
wrap data loading and training in functions,
将数据的训练和加载包装在功能模块里
make some network parameters configurable,
对网络参数进行配置
add checkpointing (optional),
增加checkpoint
and define the search space for the model tuning
定义搜索区域用来模型调优
要运行本教程,请确保安装了以下包:
ray[tune]
: Distributed hyperparameter tuning library
- 分布式超参数调参库
-
torchvision
: For the data transformers
Setup / Imports
首先从imports开始:
from functools import partial import numpy as np import os import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torch.utils.data import random_split import torchvision import torchvision.transforms as transforms from ray import tune from ray.tune import CLIReporter from ray.tune.schedulers import ASHAScheduler
Data loaders
我们将数据加载器包装在自己的函数中,并传递一个全局数据目录。这样我们就可以在不同的试验之间共享一个数据目录。
def load_data(data_dir="./data"): transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) trainset = torchvision.datasets.CIFAR10( root=data_dir, train=True, download=True, transform=transform) testset = torchvision.datasets.CIFAR10( root=data_dir, train=False, download=True, transform=transform) return trainset, testset
Configurable neural network
我们只能调整那些可配置的参数。在本例中,我们可以指定完全连接层的层大小:
class Net(nn.Module): def __init__(self, l1=120, l2=84): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, l1) self.fc2 = nn.Linear(l1, l2) self.fc3 = nn.Linear(l2, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
The train function
Now it gets interesting,因为我们从Py Torch文档中引入了一些对示例的更改。
我们将训练脚本包装在函数train_cifar中 train_cifar(config,
checkpoint_dir=None,
data_dir=None)
配置参数将接收我们希望使用的超参数。checkpoint_dir参数用于恢复检查点。data_dir指定我们加载和存储数据的目录,因此多次运行可以共享相同的数据源。
net = Net(config["l1"], config["l2"]) if checkpoint_dir: model_state, optimizer_state = torch.load( os.path.join(checkpoint_dir, "checkpoint")) net.load_state_dict(model_state) optimizer.load_state_dict(optimizer_state)
优化器的学习速率也是可配置的:
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
我们还将培训数据拆分为一个培训和验证子集。因此,我们对80%的数据进行培训,并计算其余20%的验证损失。我们迭代训练和测试集的批处理大小也是可配置的。
Adding (multi) GPU support with DataParallel
图像分类主要得益于GPU。幸运的是,我们可以继续在RayTune中使用PyTorch的抽象。因此,我们可以在nn中包装我们的模型。数据并行支持多个GPU上的数据并行培训:
device = "cpu" if torch.cuda.is_available(): device = "cuda:0" if torch.cuda.device_count() > 1: net = nn.DataParallel(net) net.to(device)
通过使用设备变量,我们确保当我们没有可用的GPU时,培训也能工作。Py Torch要求我们显式地将数据发送到GPU内存,如下所示:
for i, data in enumerate(trainloader, 0): inputs, labels = data inputs, labels = inputs.to(device), labels.to(device)
该代码现在支持对CPU、单个GPU和多个GPU进行培训。 值得注意的是,Ray还支持fractional GPUs,因此我们可以在试验中共享GPU,只要模型仍然适合GPU内存。 我们稍后再谈。
Communicating with Ray Tune
The most interesting part is the communication with Ray Tune:
with tune.checkpoint_dir(epoch) as checkpoint_dir: path = os.path.join(checkpoint_dir, "checkpoint") torch.save((net.state_dict(), optimizer.state_dict()), path) tune.report(loss=(val_loss / val_steps), accuracy=correct / total)
在这里,我们首先保存一个检查点,然后向RayTune报告一些度量。 具体来说,我们将验证损失和准确性送回RayTune。 然后,Ray Tune可以使用这些度量来决定哪些超参数配置导致最佳结果。 这些指标也可以用于早期停止性能不佳的试验,以避免在这些试验上浪费资源。
检查点保存是可选的,但是,如果我们想使用诸如Population Based Training的高级调度程序,这是必要的。 此外,通过保存检查点,我们可以稍后加载经过训练的模型并在测试集上验证它们。
Full training function
完整的代码示例如下:
def train_cifar(config, checkpoint_dir=None, data_dir=None): net = Net(config["l1"], config["l2"]) device = "cpu" if torch.cuda.is_available(): device = "cuda:0" if torch.cuda.device_count() > 1: net = nn.DataParallel(net) net.to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9) if checkpoint_dir: model_state, optimizer_state = torch.load( os.path.join(checkpoint_dir, "checkpoint")) net.load_state_dict(model_state) optimizer.load_state_dict(optimizer_state) trainset, testset = load_data(data_dir) test_abs = int(len(trainset) * 0.8) train_subset, val_subset = random_split( trainset, [test_abs, len(trainset) - test_abs]) trainloader = torch.utils.data.DataLoader( train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8) valloader = torch.utils.data.DataLoader( val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8) for epoch in range(10): # loop over the dataset multiple times running_loss = 0.0 epoch_steps = 0 for i, data in enumerate(trainloader, 0): # get the inputs; data is a list of [inputs, labels] inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.item() epoch_steps += 1 if i % 2000 == 1999: # print every 2000 mini-batches print("[%d, %5d] loss: %.3f" % (epoch + 1, i + 1, running_loss / epoch_steps)) running_loss = 0.0 # Validation loss val_loss = 0.0 val_steps = 0 total = 0 correct = 0 for i, data in enumerate(valloader, 0): with torch.no_grad(): inputs, labels = data inputs, labels = inputs.to(device), labels.to(device) outputs = net(inputs) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() loss = criterion(outputs, labels) val_loss += loss.cpu().numpy() val_steps += 1 with tune.checkpoint_dir(epoch) as checkpoint_dir: path = os.path.join(checkpoint_dir, "checkpoint") torch.save((net.state_dict(), optimizer.state_dict()), path) tune.report(loss=(val_loss / val_steps), accuracy=correct / total) print("Finished Trainin
大多数代码都是直接从原始示例中改编的。
Test set accuracy
通常,机器学习模型的性能是在一个没有用于训练模型的数据的搁置测试集上进行测试的。 我们还将其包装为一个函数:
def test_accuracy(net, device="cpu"): trainset, testset = load_data() testloader = torch.utils.data.DataLoader( testset, batch_size=4, shuffle=False, num_workers=2) correct = 0 total = 0 with torch.no_grad(): for data in testloader: images, labels = data images, labels = images.to(device), labels.to(device) outputs = net(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() return correct / total
该函数还需要device
参数,因此我们可以在GPU上进行测试集验证。
Configuring the search space
最后,我们需要定义RayTune的搜索空间。 下面举一个例子:
config = { "l1": tune.sample_from(lambda _: 2**np.random.randint(2, 9)), "l2": tune.sample_from(lambda _: 2**np.random.randint(2, 9)), "lr": tune.loguniform(1e-4, 1e-1), "batch_size": tune.choice([2, 4, 8, 16]) }
sample_from()函数可以定义自己的示例方法以获得超参数。 在本例中,L1和L2参数应该是4到256之间的2的幂,所以要么是4、8、16、32、64、128或256。 应在0.0001到0.1之间均匀采样lr(学习率。 最后,批处理大小是2、4、8和16之间的选择。
在每一次试验中,Ray Tune现在将随机地从这些搜索空间中抽取一个参数组合。 然后,它将并行训练一些模型,并在其中找到性能最好的模型。 我们还使用ASHAScheduler,它将提前终止执行不良的试验。
我们用functools.partial包装train_cifar函数来设置常量data_dir参数。 我们还可以告诉RayTune每个试验应该有哪些资源:
gpus_per_trial = 2 # ... result = tune.run( partial(train_cifar, data_dir=data_dir), resources_per_trial={"cpu": 8, "gpu": gpus_per_trial}, config=config, num_samples=num_samples, scheduler=scheduler, progress_reporter=reporter, checkpoint_at_end=True)
可以指定CPU的数量,然后可用,例如。 以增加PyTorch数据加载器实例的num_workers。 在每个试验中,选定的GPU数对PyTorch是可见的。 审判不能访问尚未为他们请求的GPU-所以您不必关心使用同一组资源的两个审判。
这里我们也可以指定分数GPU,所以像gpus_per_trial=0.5这样的东西是完全有效的。 然后,试验将相互共享GPU。 只需要确保模型仍然适合GPU内存。
在对模型进行训练后,我们将找到性能最好的模型,并从检查点文件加载训练好的网络。 然后,我们获得测试集的准确性,并通过打印报告一切。
完整的主要功能如下:
def main(num_samples=10, max_num_epochs=10, gpus_per_trial=2): data_dir = os.path.abspath("./data") load_data(data_dir) config = { "l1": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)), "l2": tune.sample_from(lambda _: 2 ** np.random.randint(2, 9)), "lr": tune.loguniform(1e-4, 1e-1), "batch_size": tune.choice([2, 4, 8, 16]) } scheduler = ASHAScheduler( metric="loss", mode="min", max_t=max_num_epochs, grace_period=1, reduction_factor=2) reporter = CLIReporter( # parameter_columns=["l1", "l2", "lr", "batch_size"], metric_columns=["loss", "accuracy", "training_iteration"]) result = tune.run( partial(train_cifar, data_dir=data_dir), resources_per_trial={"cpu": 2, "gpu": gpus_per_trial}, config=config, num_samples=num_samples, scheduler=scheduler, progress_reporter=reporter) best_trial = result.get_best_trial("loss", "min", "last") print("Best trial config: {}".format(best_trial.config)) print("Best trial final validation loss: {}".format( best_trial.last_result["loss"])) print("Best trial final validation accuracy: {}".format( best_trial.last_result["accuracy"])) best_trained_model = Net(best_trial.config["l1"], best_trial.config["l2"]) device = "cpu" if torch.cuda.is_available(): device = "cuda:0" if gpus_per_trial > 1: best_trained_model = nn.DataParallel(best_trained_model) best_trained_model.to(device) best_checkpoint_dir = best_trial.checkpoint.value model_state, optimizer_state = torch.load(os.path.join( best_checkpoint_dir, "checkpoint")) best_trained_model.load_state_dict(model_state) test_acc = test_accuracy(best_trained_model, device) print("Best trial test set accuracy: {}".format(test_acc)) if __name__ == "__main__": # You can change the number of GPUs per trial here: main(num_samples=10, max_num_epochs=10, gpus_per_trial=0)
输出:
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz Extracting /var/lib/jenkins/workspace/beginner_source/data/cifar-10-python.tar.gz to /var/lib/jenkins/workspace/beginner_source/data Files already downloaded and verified == Status == Memory usage on this node: 4.0/240.1 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (9 PENDING, 1 RUNNING) +---------------------+----------+-------+--------------+------+------+-------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | |---------------------+----------+-------+--------------+------+------+-------------| | DEFAULT_77a44_00000 | RUNNING | | 4 | 8 | 128 | 0.0210161 | | DEFAULT_77a44_00001 | PENDING | | 2 | 256 | 128 | 0.000461678 | | DEFAULT_77a44_00002 | PENDING | | 8 | 32 | 16 | 0.0131231 | | DEFAULT_77a44_00003 | PENDING | | 4 | 4 | 128 | 0.00551547 | | DEFAULT_77a44_00004 | PENDING | | 2 | 256 | 256 | 0.0647615 | | DEFAULT_77a44_00005 | PENDING | | 4 | 4 | 128 | 0.0421917 | | DEFAULT_77a44_00006 | PENDING | | 2 | 8 | 8 | 0.000359613 | | DEFAULT_77a44_00007 | PENDING | | 4 | 128 | 16 | 0.00202898 | | DEFAULT_77a44_00008 | PENDING | | 2 | 4 | 8 | 0.000162963 | | DEFAULT_77a44_00009 | PENDING | | 2 | 32 | 256 | 0.000134494 | +---------------------+----------+-------+--------------+------+------+-------------+ [2m[36m(pid=1164)[0m Files already downloaded and verified [2m[36m(pid=1145)[0m Files already downloaded and verified [2m[36m(pid=1104)[0m Files already downloaded and verified [2m[36m(pid=1119)[0m Files already downloaded and verified [2m[36m(pid=1140)[0m Files already downloaded and verified [2m[36m(pid=1118)[0m Files already downloaded and verified [2m[36m(pid=1098)[0m Files already downloaded and verified [2m[36m(pid=1101)[0m Files already downloaded and verified [2m[36m(pid=1165)[0m Files already downloaded and verified [2m[36m(pid=1126)[0m Files already downloaded and verified [2m[36m(pid=1164)[0m Files already downloaded and verified [2m[36m(pid=1098)[0m Files already downloaded and verified [2m[36m(pid=1145)[0m Files already downloaded and verified [2m[36m(pid=1104)[0m Files already downloaded and verified [2m[36m(pid=1119)[0m Files already downloaded and verified [2m[36m(pid=1140)[0m Files already downloaded and verified [2m[36m(pid=1118)[0m Files already downloaded and verified [2m[36m(pid=1101)[0m Files already downloaded and verified [2m[36m(pid=1165)[0m Files already downloaded and verified [2m[36m(pid=1126)[0m Files already downloaded and verified [2m[36m(pid=1126)[0m [1, 2000] loss: 2.295 [2m[36m(pid=1101)[0m [1, 2000] loss: 2.310 [2m[36m(pid=1165)[0m [1, 2000] loss: 2.193 [2m[36m(pid=1119)[0m [1, 2000] loss: 2.302 [2m[36m(pid=1145)[0m [1, 2000] loss: 2.296 [2m[36m(pid=1118)[0m [1, 2000] loss: 2.326 [2m[36m(pid=1104)[0m [1, 2000] loss: 2.303 [2m[36m(pid=1098)[0m [1, 2000] loss: 2.083 [2m[36m(pid=1164)[0m [1, 2000] loss: 1.995 [2m[36m(pid=1140)[0m [1, 2000] loss: 2.377 [2m[36m(pid=1126)[0m [1, 4000] loss: 1.078 [2m[36m(pid=1101)[0m [1, 4000] loss: 1.149 [2m[36m(pid=1119)[0m [1, 4000] loss: 1.149 [2m[36m(pid=1165)[0m [1, 4000] loss: 1.020 [2m[36m(pid=1118)[0m [1, 4000] loss: 1.161 [2m[36m(pid=1104)[0m [1, 4000] loss: 1.157 [2m[36m(pid=1145)[0m [1, 4000] loss: 1.052 [2m[36m(pid=1098)[0m [1, 4000] loss: 0.883 [2m[36m(pid=1164)[0m [1, 4000] loss: 0.927 [2m[36m(pid=1140)[0m [1, 4000] loss: 1.186 [2m[36m(pid=1126)[0m [1, 6000] loss: 0.684 [2m[36m(pid=1101)[0m [1, 6000] loss: 0.760 [2m[36m(pid=1119)[0m [1, 6000] loss: 0.758 [2m[36m(pid=1165)[0m [1, 6000] loss: 0.660 [2m[36m(pid=1118)[0m [1, 6000] loss: 0.775 [2m[36m(pid=1104)[0m [1, 6000] loss: 0.770 [2m[36m(pid=1145)[0m [1, 6000] loss: 0.624 [2m[36m(pid=1098)[0m [1, 6000] loss: 0.542 Result for DEFAULT_77a44_00002: accuracy: 0.2841 date: 2020-10-09_19-56-48 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.881975656604767 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 41.3854501247406 time_this_iter_s: 41.3854501247406 time_total_s: 41.3854501247406 timestamp: 1602273408 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00002 == Status == Memory usage on this node: 8.8/240.1 GiB Using AsyncHyperBand: num_stopped=0 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -1.881975656604767 Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (10 RUNNING) +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | RUNNING | | 4 | 8 | 128 | 0.0210161 | | | | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.88198 | 0.2841 | 1 | | DEFAULT_77a44_00003 | RUNNING | | 4 | 4 | 128 | 0.00551547 | | | | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | RUNNING | | 4 | 4 | 128 | 0.0421917 | | | | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | | 4 | 128 | 16 | 0.00202898 | | | | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [1, 8000] loss: 0.499 [2m[36m(pid=1101)[0m [1, 8000] loss: 0.559 [2m[36m(pid=1119)[0m [1, 8000] loss: 0.552 [2m[36m(pid=1165)[0m [1, 8000] loss: 0.488 [2m[36m(pid=1118)[0m [1, 8000] loss: 0.581 [2m[36m(pid=1104)[0m [1, 8000] loss: 0.579 [2m[36m(pid=1145)[0m [1, 8000] loss: 0.448 [2m[36m(pid=1098)[0m [1, 8000] loss: 0.389 [2m[36m(pid=1140)[0m [1, 6000] loss: 0.793 [2m[36m(pid=1164)[0m [2, 2000] loss: 1.870 [2m[36m(pid=1101)[0m [1, 10000] loss: 0.435 [2m[36m(pid=1126)[0m [1, 10000] loss: 0.386 [2m[36m(pid=1119)[0m [1, 10000] loss: 0.427 [2m[36m(pid=1165)[0m [1, 10000] loss: 0.390 [2m[36m(pid=1118)[0m [1, 10000] loss: 0.465 [2m[36m(pid=1104)[0m [1, 10000] loss: 0.462 [2m[36m(pid=1145)[0m [1, 10000] loss: 0.341 [2m[36m(pid=1098)[0m [1, 10000] loss: 0.302 [2m[36m(pid=1101)[0m [1, 12000] loss: 0.353 [2m[36m(pid=1126)[0m [1, 12000] loss: 0.311 [2m[36m(pid=1119)[0m [1, 12000] loss: 0.345 [2m[36m(pid=1164)[0m [2, 4000] loss: 0.938 [2m[36m(pid=1140)[0m [1, 8000] loss: 0.594 Result for DEFAULT_77a44_00003: accuracy: 0.2563 date: 2020-10-09_19-57-13 done: true experiment_id: 5c01db6fb7974f6087f128418068ab25 experiment_tag: 3_batch_size=4,l1=4,l2=128,lr=0.0055155 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.9565512576580049 node_ip: 172.17.0.2 pid: 1165 should_checkpoint: true time_since_restore: 65.84106469154358 time_this_iter_s: 65.84106469154358 time_total_s: 65.84106469154358 timestamp: 1602273433 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00003 == Status == Memory usage on this node: 8.9/240.1 GiB Using AsyncHyperBand: num_stopped=1 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: -1.919263457131386 Resources requested: 20/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (10 RUNNING) +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | RUNNING | | 4 | 8 | 128 | 0.0210161 | | | | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.88198 | 0.2841 | 1 | | DEFAULT_77a44_00003 | RUNNING | 172.17.0.2:1165 | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | RUNNING | | 4 | 4 | 128 | 0.0421917 | | | | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | | 4 | 128 | 16 | 0.00202898 | | | | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+----------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_77a44_00005: accuracy: 0.0986 date: 2020-10-09_19-57-13 done: true experiment_id: 8d41531f8ac84a2fa81eb0d04bb4809a experiment_tag: 5_batch_size=4,l1=4,l2=128,lr=0.042192 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 2.3523551787376404 node_ip: 172.17.0.2 pid: 1118 should_checkpoint: true time_since_restore: 66.13440608978271 time_this_iter_s: 66.13440608978271 time_total_s: 66.13440608978271 timestamp: 1602273433 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00005 Result for DEFAULT_77a44_00000: accuracy: 0.1073 date: 2020-10-09_19-57-13 done: true experiment_id: 71350ebb3b9b4c2ca892c43094b6e672 experiment_tag: 0_batch_size=4,l1=8,l2=128,lr=0.021016 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 2.306087596511841 node_ip: 172.17.0.2 pid: 1104 should_checkpoint: true time_since_restore: 66.43020415306091 time_this_iter_s: 66.43020415306091 time_total_s: 66.43020415306091 timestamp: 1602273433 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00000 Result for DEFAULT_77a44_00007: accuracy: 0.4484 date: 2020-10-09_19-57-14 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.505290996646881 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 67.45768523216248 time_this_iter_s: 67.45768523216248 time_total_s: 67.45768523216248 timestamp: 1602273434 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00007 [2m[36m(pid=1145)[0m [1, 12000] loss: 0.270 [2m[36m(pid=1126)[0m [1, 14000] loss: 0.260 [2m[36m(pid=1101)[0m [1, 14000] loss: 0.301 [2m[36m(pid=1119)[0m [1, 14000] loss: 0.288 Result for DEFAULT_77a44_00002: accuracy: 0.2704 date: 2020-10-09_19-57-21 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 2 loss: 1.9036258604049683 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 74.83478355407715 time_this_iter_s: 33.44933342933655 time_total_s: 74.83478355407715 timestamp: 1602273441 timesteps_since_restore: 0 training_iteration: 2 trial_id: 77a44_00002 == Status == Memory usage on this node: 7.3/240.1 GiB Using AsyncHyperBand: num_stopped=3 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.9565512576580049 Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (7 RUNNING, 3 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.90363 | 0.2704 | 2 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1098)[0m [2, 2000] loss: 1.427 [2m[36m(pid=1145)[0m [1, 14000] loss: 0.227 [2m[36m(pid=1140)[0m [1, 10000] loss: 0.476 [2m[36m(pid=1101)[0m [1, 16000] loss: 0.260 [2m[36m(pid=1126)[0m [1, 16000] loss: 0.223 [2m[36m(pid=1119)[0m [1, 16000] loss: 0.245 [2m[36m(pid=1164)[0m [3, 2000] loss: 1.876 [2m[36m(pid=1098)[0m [2, 4000] loss: 0.711 [2m[36m(pid=1145)[0m [1, 16000] loss: 0.196 [2m[36m(pid=1101)[0m [1, 18000] loss: 0.226 [2m[36m(pid=1126)[0m [1, 18000] loss: 0.194 [2m[36m(pid=1119)[0m [1, 18000] loss: 0.216 [2m[36m(pid=1140)[0m [1, 12000] loss: 0.396 [2m[36m(pid=1098)[0m [2, 6000] loss: 0.462 [2m[36m(pid=1164)[0m [3, 4000] loss: 0.927 [2m[36m(pid=1126)[0m [1, 20000] loss: 0.171 [2m[36m(pid=1101)[0m [1, 20000] loss: 0.200 [2m[36m(pid=1145)[0m [1, 18000] loss: 0.170 [2m[36m(pid=1119)[0m [1, 20000] loss: 0.188 [2m[36m(pid=1098)[0m [2, 8000] loss: 0.345 Result for DEFAULT_77a44_00002: accuracy: 0.3206 date: 2020-10-09_19-57-52 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 3 loss: 1.9260577551841735 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 105.59961199760437 time_this_iter_s: 30.76482844352722 time_total_s: 105.59961199760437 timestamp: 1602273472 timesteps_since_restore: 0 training_iteration: 3 trial_id: 77a44_00002 == Status == Memory usage on this node: 7.3/240.1 GiB Using AsyncHyperBand: num_stopped=3 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.9565512576580049 Resources requested: 14/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (7 RUNNING, 3 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.92606 | 0.3206 | 3 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | | 2 | 8 | 8 | 0.000359613 | | | | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | RUNNING | | 2 | 4 | 8 | 0.000162963 | | | | | DEFAULT_77a44_00009 | RUNNING | | 2 | 32 | 256 | 0.000134494 | | | | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [1, 20000] loss: 0.148 [2m[36m(pid=1140)[0m [1, 14000] loss: 0.339 Result for DEFAULT_77a44_00008: accuracy: 0.1883 date: 2020-10-09_19-57-56 done: true experiment_id: 528c728f0abd4dde8df53627aa7b3cc9 experiment_tag: 8_batch_size=2,l1=4,l2=8,lr=0.00016296 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.984449322938919 node_ip: 172.17.0.2 pid: 1101 should_checkpoint: true time_since_restore: 109.06154918670654 time_this_iter_s: 109.06154918670654 time_total_s: 109.06154918670654 timestamp: 1602273476 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00008 Result for DEFAULT_77a44_00006: accuracy: 0.3722 date: 2020-10-09_19-57-56 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.6620629720330238 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 109.24619793891907 time_this_iter_s: 109.24619793891907 time_total_s: 109.24619793891907 timestamp: 1602273476 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00006 Result for DEFAULT_77a44_00009: accuracy: 0.3066 date: 2020-10-09_19-57-58 done: false experiment_id: 448a03d8183b48e4a732b9974760de96 experiment_tag: 9_batch_size=2,l1=32,l2=256,lr=0.00013449 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.8606878761410712 node_ip: 172.17.0.2 pid: 1119 should_checkpoint: true time_since_restore: 111.55251812934875 time_this_iter_s: 111.55251812934875 time_total_s: 111.55251812934875 timestamp: 1602273478 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00009 == Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.919263457131386 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | | 2 | 256 | 128 | 0.000461678 | | | | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.92606 | 0.3206 | 3 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1098)[0m [2, 10000] loss: 0.275 [2m[36m(pid=1164)[0m [4, 2000] loss: 1.842 [2m[36m(pid=1126)[0m [2, 2000] loss: 1.660 Result for DEFAULT_77a44_00001: accuracy: 0.4374 date: 2020-10-09_19-58-05 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 1.5289554242562502 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 118.45757269859314 time_this_iter_s: 118.45757269859314 time_total_s: 118.45757269859314 timestamp: 1602273485 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00001 == Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: -1.9036258604049683 | Iter 1.000: -1.881975656604767 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.92606 | 0.3206 | 3 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.50529 | 0.4484 | 1 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1119)[0m [2, 2000] loss: 1.796 Result for DEFAULT_77a44_00007: accuracy: 0.5087 date: 2020-10-09_19-58-08 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 2 loss: 1.3934748243197799 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 121.18621754646301 time_this_iter_s: 53.72853231430054 time_total_s: 121.18621754646301 timestamp: 1602273488 timesteps_since_restore: 0 training_iteration: 2 trial_id: 77a44_00007 [2m[36m(pid=1140)[0m [1, 16000] loss: 0.298 [2m[36m(pid=1126)[0m [2, 4000] loss: 0.801 [2m[36m(pid=1164)[0m [4, 4000] loss: 0.914 [2m[36m(pid=1145)[0m [2, 2000] loss: 1.454 [2m[36m(pid=1119)[0m [2, 4000] loss: 0.886 [2m[36m(pid=1098)[0m [3, 2000] loss: 1.292 [2m[36m(pid=1126)[0m [2, 6000] loss: 0.528 [2m[36m(pid=1140)[0m [1, 18000] loss: 0.264 Result for DEFAULT_77a44_00002: accuracy: 0.3437 date: 2020-10-09_19-58-23 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 4 loss: 1.8035019870758056 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 136.0801386833191 time_this_iter_s: 30.48052668571472 time_total_s: 136.0801386833191 timestamp: 1602273503 timesteps_since_restore: 0 training_iteration: 4 trial_id: 77a44_00002 == Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=4 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.881975656604767 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.8035 | 0.3437 | 4 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | | 2 | 256 | 256 | 0.0647615 | | | | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.39347 | 0.5087 | 2 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [2, 4000] loss: 0.730 [2m[36m(pid=1119)[0m [2, 6000] loss: 0.570 [2m[36m(pid=1098)[0m [3, 4000] loss: 0.647 [2m[36m(pid=1126)[0m [2, 8000] loss: 0.389 [2m[36m(pid=1119)[0m [2, 8000] loss: 0.417 [2m[36m(pid=1145)[0m [2, 6000] loss: 0.476 [2m[36m(pid=1164)[0m [5, 2000] loss: 1.852 [2m[36m(pid=1098)[0m [3, 6000] loss: 0.428 [2m[36m(pid=1126)[0m [2, 10000] loss: 0.306 [2m[36m(pid=1140)[0m [1, 20000] loss: 0.237 [2m[36m(pid=1119)[0m [2, 10000] loss: 0.326 [2m[36m(pid=1145)[0m [2, 8000] loss: 0.349 [2m[36m(pid=1126)[0m [2, 12000] loss: 0.255 [2m[36m(pid=1164)[0m [5, 4000] loss: 0.934 [2m[36m(pid=1098)[0m [3, 8000] loss: 0.325 Result for DEFAULT_77a44_00004: accuracy: 0.1024 date: 2020-10-09_19-58-49 done: true experiment_id: 2ca91983c1654f39a11db9cdd1e47f10 experiment_tag: 4_batch_size=2,l1=256,l2=256,lr=0.064762 hostname: 234fef3cc6b0 iterations_since_restore: 1 loss: 2.346003741002083 node_ip: 172.17.0.2 pid: 1140 should_checkpoint: true time_since_restore: 161.9359531402588 time_this_iter_s: 161.9359531402588 time_total_s: 161.9359531402588 timestamp: 1602273529 timesteps_since_restore: 0 training_iteration: 1 trial_id: 77a44_00004 == Status == Memory usage on this node: 6.8/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.919263457131386 Resources requested: 12/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (6 RUNNING, 4 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.8035 | 0.3437 | 4 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | RUNNING | 172.17.0.2:1140 | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.39347 | 0.5087 | 2 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1119)[0m [2, 12000] loss: 0.271 [2m[36m(pid=1145)[0m [2, 10000] loss: 0.276 [2m[36m(pid=1126)[0m [2, 14000] loss: 0.213 Result for DEFAULT_77a44_00002: accuracy: 0.3035 date: 2020-10-09_19-58-53 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 5 loss: 1.8839821341514587 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 166.10145020484924 time_this_iter_s: 30.02131152153015 time_total_s: 166.10145020484924 timestamp: 1602273533 timesteps_since_restore: 0 training_iteration: 5 trial_id: 77a44_00002 [2m[36m(pid=1098)[0m [3, 10000] loss: 0.254 [2m[36m(pid=1119)[0m [2, 14000] loss: 0.228 [2m[36m(pid=1145)[0m [2, 12000] loss: 0.230 [2m[36m(pid=1126)[0m [2, 16000] loss: 0.187 Result for DEFAULT_77a44_00007: accuracy: 0.5319 date: 2020-10-09_19-59-00 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 3 loss: 1.3139552696928383 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 173.1586651802063 time_this_iter_s: 51.972447633743286 time_total_s: 173.1586651802063 timestamp: 1602273540 timesteps_since_restore: 0 training_iteration: 3 trial_id: 77a44_00007 == Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.919263457131386 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.88398 | 0.3035 | 5 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1164)[0m [6, 2000] loss: 1.907 [2m[36m(pid=1119)[0m [2, 16000] loss: 0.198 [2m[36m(pid=1145)[0m [2, 14000] loss: 0.192 [2m[36m(pid=1126)[0m [2, 18000] loss: 0.166 [2m[36m(pid=1098)[0m [4, 2000] loss: 1.200 [2m[36m(pid=1119)[0m [2, 18000] loss: 0.177 [2m[36m(pid=1164)[0m [6, 4000] loss: 0.960 [2m[36m(pid=1126)[0m [2, 20000] loss: 0.148 [2m[36m(pid=1145)[0m [2, 16000] loss: 0.164 [2m[36m(pid=1098)[0m [4, 4000] loss: 0.599 [2m[36m(pid=1119)[0m [2, 20000] loss: 0.152 Result for DEFAULT_77a44_00002: accuracy: 0.2862 date: 2020-10-09_19-59-22 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 6 loss: 1.9193087907791138 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 195.79263925552368 time_this_iter_s: 29.69118905067444 time_total_s: 195.79263925552368 timestamp: 1602273562 timesteps_since_restore: 0 training_iteration: 6 trial_id: 77a44_00002 == Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=5 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.6485503423623742 | Iter 1.000: -1.919263457131386 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.66206 | 0.3722 | 1 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.86069 | 0.3066 | 1 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [2, 18000] loss: 0.147 Result for DEFAULT_77a44_00006: accuracy: 0.4589 date: 2020-10-09_19-59-27 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 2 loss: 1.448237135411054 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 199.99908256530762 time_this_iter_s: 90.75288462638855 time_total_s: 199.99908256530762 timestamp: 1602273567 timesteps_since_restore: 0 training_iteration: 2 trial_id: 77a44_00006 [2m[36m(pid=1098)[0m [4, 6000] loss: 0.403 Result for DEFAULT_77a44_00009: accuracy: 0.4358 date: 2020-10-09_19-59-33 done: true experiment_id: 448a03d8183b48e4a732b9974760de96 experiment_tag: 9_batch_size=2,l1=32,l2=256,lr=0.00013449 hostname: 234fef3cc6b0 iterations_since_restore: 2 loss: 1.5461469007849693 node_ip: 172.17.0.2 pid: 1119 should_checkpoint: true time_since_restore: 206.13924598693848 time_this_iter_s: 94.58672785758972 time_total_s: 206.13924598693848 timestamp: 1602273573 timesteps_since_restore: 0 training_iteration: 2 trial_id: 77a44_00009 == Status == Memory usage on this node: 6.3/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.4971920180980116 | Iter 1.000: -1.919263457131386 Resources requested: 10/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (5 RUNNING, 5 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.52896 | 0.4374 | 1 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | RUNNING | 172.17.0.2:1119 | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [2, 20000] loss: 0.130 [2m[36m(pid=1164)[0m [7, 2000] loss: 1.967 [2m[36m(pid=1126)[0m [3, 2000] loss: 1.454 [2m[36m(pid=1098)[0m [4, 8000] loss: 0.310 [2m[36m(pid=1126)[0m [3, 4000] loss: 0.715 [2m[36m(pid=1164)[0m [7, 4000] loss: 0.997 [2m[36m(pid=1098)[0m [4, 10000] loss: 0.248 Result for DEFAULT_77a44_00001: accuracy: 0.5459 date: 2020-10-09_19-59-44 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 2 loss: 1.2801105223743245 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 217.948983669281 time_this_iter_s: 99.49141097068787 time_total_s: 217.948983669281 timestamp: 1602273584 timesteps_since_restore: 0 training_iteration: 2 trial_id: 77a44_00001 == Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.8035019870758056 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31396 | 0.5319 | 3 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [3, 6000] loss: 0.488 Result for DEFAULT_77a44_00007: accuracy: 0.5309 date: 2020-10-09_19-59-50 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 4 loss: 1.3358730784237385 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 223.8010766506195 time_this_iter_s: 50.64241147041321 time_total_s: 223.8010766506195 timestamp: 1602273590 timesteps_since_restore: 0 training_iteration: 4 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.8/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: None | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 1.91931 | 0.2862 | 6 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.33587 | 0.5309 | 4 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_77a44_00002: accuracy: 0.2505 date: 2020-10-09_19-59-52 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 7 loss: 2.00418664560318 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 225.23884892463684 time_this_iter_s: 29.44620966911316 time_total_s: 225.23884892463684 timestamp: 1602273592 timesteps_since_restore: 0 training_iteration: 7 trial_id: 77a44_00002 [2m[36m(pid=1145)[0m [3, 2000] loss: 1.219 [2m[36m(pid=1126)[0m [3, 8000] loss: 0.356 [2m[36m(pid=1098)[0m [5, 2000] loss: 1.144 [2m[36m(pid=1145)[0m [3, 4000] loss: 0.632 [2m[36m(pid=1164)[0m [8, 2000] loss: 1.980 [2m[36m(pid=1126)[0m [3, 10000] loss: 0.283 [2m[36m(pid=1098)[0m [5, 4000] loss: 0.566 [2m[36m(pid=1145)[0m [3, 6000] loss: 0.410 [2m[36m(pid=1164)[0m [8, 4000] loss: 1.014 [2m[36m(pid=1126)[0m [3, 12000] loss: 0.236 [2m[36m(pid=1098)[0m [5, 6000] loss: 0.390 [2m[36m(pid=1145)[0m [3, 8000] loss: 0.304 [2m[36m(pid=1126)[0m [3, 14000] loss: 0.198 Result for DEFAULT_77a44_00002: accuracy: 0.2253 date: 2020-10-09_20-00-21 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 8 loss: 2.1314156931877135 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 254.41000890731812 time_this_iter_s: 29.171159982681274 time_total_s: 254.41000890731812 timestamp: 1602273621 timesteps_since_restore: 0 training_iteration: 8 trial_id: 77a44_00002 == Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.13142 | 0.2253 | 8 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.33587 | 0.5309 | 4 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1098)[0m [5, 8000] loss: 0.297 [2m[36m(pid=1145)[0m [3, 10000] loss: 0.245 [2m[36m(pid=1126)[0m [3, 16000] loss: 0.173 [2m[36m(pid=1164)[0m [9, 2000] loss: 2.112 [2m[36m(pid=1098)[0m [5, 10000] loss: 0.235 [2m[36m(pid=1145)[0m [3, 12000] loss: 0.203 [2m[36m(pid=1126)[0m [3, 18000] loss: 0.154 Result for DEFAULT_77a44_00007: accuracy: 0.5628 date: 2020-10-09_20-00-40 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 5 loss: 1.2729537689715624 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 273.7186484336853 time_this_iter_s: 49.917571783065796 time_total_s: 273.7186484336853 timestamp: 1602273640 timesteps_since_restore: 0 training_iteration: 5 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.13142 | 0.2253 | 8 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.27295 | 0.5628 | 5 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1164)[0m [9, 4000] loss: 1.053 [2m[36m(pid=1126)[0m [3, 20000] loss: 0.141 [2m[36m(pid=1145)[0m [3, 14000] loss: 0.170 [2m[36m(pid=1098)[0m [6, 2000] loss: 1.095 Result for DEFAULT_77a44_00002: accuracy: 0.17 date: 2020-10-09_20-00-51 done: false experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 9 loss: 2.1584741218566896 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 284.08941316604614 time_this_iter_s: 29.679404258728027 time_total_s: 284.08941316604614 timestamp: 1602273651 timesteps_since_restore: 0 training_iteration: 9 trial_id: 77a44_00002 == Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.28011 | 0.5459 | 2 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.15847 | 0.17 | 9 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.44824 | 0.4589 | 2 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.27295 | 0.5628 | 5 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [3, 16000] loss: 0.149 Result for DEFAULT_77a44_00006: accuracy: 0.4727 date: 2020-10-09_20-00-55 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 3 loss: 1.4226891365654766 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 287.9995017051697 time_this_iter_s: 88.00041913986206 time_total_s: 287.9995017051697 timestamp: 1602273655 timesteps_since_restore: 0 training_iteration: 3 trial_id: 77a44_00006 [2m[36m(pid=1098)[0m [6, 4000] loss: 0.556 [2m[36m(pid=1145)[0m [3, 18000] loss: 0.136 [2m[36m(pid=1164)[0m [10, 2000] loss: 2.212 [2m[36m(pid=1126)[0m [4, 2000] loss: 1.392 [2m[36m(pid=1098)[0m [6, 6000] loss: 0.376 [2m[36m(pid=1145)[0m [3, 20000] loss: 0.114 [2m[36m(pid=1126)[0m [4, 4000] loss: 0.679 [2m[36m(pid=1164)[0m [10, 4000] loss: 1.133 [2m[36m(pid=1098)[0m [6, 8000] loss: 0.279 [2m[36m(pid=1126)[0m [4, 6000] loss: 0.458 Result for DEFAULT_77a44_00001: accuracy: 0.5798 date: 2020-10-09_20-01-21 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 3 loss: 1.1820625860116911 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 314.0342721939087 time_this_iter_s: 96.08528852462769 time_total_s: 314.0342721939087 timestamp: 1602273681 timesteps_since_restore: 0 training_iteration: 3 trial_id: 77a44_00001 == Status == Memory usage on this node: 5.7/240.1 GiB Using AsyncHyperBand: num_stopped=6 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 8/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (4 RUNNING, 6 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.18206 | 0.5798 | 3 | | DEFAULT_77a44_00002 | RUNNING | 172.17.0.2:1164 | 8 | 32 | 16 | 0.0131231 | 2.15847 | 0.17 | 9 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.42269 | 0.4727 | 3 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.27295 | 0.5628 | 5 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_77a44_00002: accuracy: 0.1292 date: 2020-10-09_20-01-21 done: true experiment_id: 2cf1c1fc6eaf4ed5961e07d3ec779432 experiment_tag: 2_batch_size=8,l1=32,l2=16,lr=0.013123 hostname: 234fef3cc6b0 iterations_since_restore: 10 loss: 2.2377114813804626 node_ip: 172.17.0.2 pid: 1164 should_checkpoint: true time_since_restore: 314.6153542995453 time_this_iter_s: 30.525941133499146 time_total_s: 314.6153542995453 timestamp: 1602273681 timesteps_since_restore: 0 training_iteration: 10 trial_id: 77a44_00002 [2m[36m(pid=1098)[0m [6, 10000] loss: 0.232 [2m[36m(pid=1126)[0m [4, 8000] loss: 0.342 [2m[36m(pid=1145)[0m [4, 2000] loss: 1.100 Result for DEFAULT_77a44_00007: accuracy: 0.5459 date: 2020-10-09_20-01-30 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 6 loss: 1.3732997598737477 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 323.68818259239197 time_this_iter_s: 49.969534158706665 time_total_s: 323.68818259239197 timestamp: 1602273690 timesteps_since_restore: 0 training_iteration: 6 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.2/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.18206 | 0.5798 | 3 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.42269 | 0.4727 | 3 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.3733 | 0.5459 | 6 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [4, 10000] loss: 0.271 [2m[36m(pid=1145)[0m [4, 4000] loss: 0.556 [2m[36m(pid=1098)[0m [7, 2000] loss: 1.034 [2m[36m(pid=1126)[0m [4, 12000] loss: 0.229 [2m[36m(pid=1145)[0m [4, 6000] loss: 0.364 [2m[36m(pid=1126)[0m [4, 14000] loss: 0.196 [2m[36m(pid=1098)[0m [7, 4000] loss: 0.541 [2m[36m(pid=1145)[0m [4, 8000] loss: 0.274 [2m[36m(pid=1126)[0m [4, 16000] loss: 0.169 [2m[36m(pid=1098)[0m [7, 6000] loss: 0.368 [2m[36m(pid=1145)[0m [4, 10000] loss: 0.215 [2m[36m(pid=1126)[0m [4, 18000] loss: 0.150 [2m[36m(pid=1098)[0m [7, 8000] loss: 0.273 [2m[36m(pid=1126)[0m [4, 20000] loss: 0.135 [2m[36m(pid=1145)[0m [4, 12000] loss: 0.182 [2m[36m(pid=1098)[0m [7, 10000] loss: 0.217 [2m[36m(pid=1145)[0m [4, 14000] loss: 0.158 Result for DEFAULT_77a44_00007: accuracy: 0.576 date: 2020-10-09_20-02-19 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 7 loss: 1.24756854121387 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 372.3224792480469 time_this_iter_s: 48.63429665565491 time_total_s: 372.3224792480469 timestamp: 1602273739 timesteps_since_restore: 0 training_iteration: 7 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.569687532749772 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.18206 | 0.5798 | 3 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.42269 | 0.4727 | 3 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.24757 | 0.576 | 7 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_77a44_00006: accuracy: 0.4961 date: 2020-10-09_20-02-20 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 4 loss: 1.3667119354642927 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 373.32873916625977 time_this_iter_s: 85.32923746109009 time_total_s: 373.32873916625977 timestamp: 1602273740 timesteps_since_restore: 0 training_iteration: 4 trial_id: 77a44_00006 [2m[36m(pid=1145)[0m [4, 16000] loss: 0.134 [2m[36m(pid=1126)[0m [5, 2000] loss: 1.317 [2m[36m(pid=1098)[0m [8, 2000] loss: 1.013 [2m[36m(pid=1145)[0m [4, 18000] loss: 0.120 [2m[36m(pid=1126)[0m [5, 4000] loss: 0.660 [2m[36m(pid=1098)[0m [8, 4000] loss: 0.521 [2m[36m(pid=1126)[0m [5, 6000] loss: 0.438 [2m[36m(pid=1145)[0m [4, 20000] loss: 0.108 [2m[36m(pid=1098)[0m [8, 6000] loss: 0.350 [2m[36m(pid=1126)[0m [5, 8000] loss: 0.331 [2m[36m(pid=1098)[0m [8, 8000] loss: 0.267 Result for DEFAULT_77a44_00001: accuracy: 0.6009 date: 2020-10-09_20-02-54 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 4 loss: 1.1593985119301593 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 407.62501096725464 time_this_iter_s: 93.59073877334595 time_total_s: 407.62501096725464 timestamp: 1602273774 timesteps_since_restore: 0 training_iteration: 4 trial_id: 77a44_00001 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -2.1314156931877135 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.36671 | 0.4961 | 4 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.24757 | 0.576 | 7 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [5, 10000] loss: 0.271 [2m[36m(pid=1098)[0m [8, 10000] loss: 0.218 [2m[36m(pid=1145)[0m [5, 2000] loss: 0.967 [2m[36m(pid=1126)[0m [5, 12000] loss: 0.221 Result for DEFAULT_77a44_00007: accuracy: 0.5664 date: 2020-10-09_20-03-08 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 8 loss: 1.3161735702279955 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 421.1367325782776 time_this_iter_s: 48.81425333023071 time_total_s: 421.1367325782776 timestamp: 1602273788 timesteps_since_restore: 0 training_iteration: 8 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.36671 | 0.4961 | 4 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31617 | 0.5664 | 8 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [5, 4000] loss: 0.496 [2m[36m(pid=1126)[0m [5, 14000] loss: 0.186 [2m[36m(pid=1098)[0m [9, 2000] loss: 0.986 [2m[36m(pid=1145)[0m [5, 6000] loss: 0.332 [2m[36m(pid=1126)[0m [5, 16000] loss: 0.164 [2m[36m(pid=1098)[0m [9, 4000] loss: 0.503 [2m[36m(pid=1126)[0m [5, 18000] loss: 0.144 [2m[36m(pid=1145)[0m [5, 8000] loss: 0.243 [2m[36m(pid=1098)[0m [9, 6000] loss: 0.342 [2m[36m(pid=1126)[0m [5, 20000] loss: 0.129 [2m[36m(pid=1145)[0m [5, 10000] loss: 0.204 [2m[36m(pid=1098)[0m [9, 8000] loss: 0.266 [2m[36m(pid=1145)[0m [5, 12000] loss: 0.167 Result for DEFAULT_77a44_00006: accuracy: 0.5285 date: 2020-10-09_20-03-45 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 5 loss: 1.2945664445526899 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 458.353075504303 time_this_iter_s: 85.02433633804321 time_total_s: 458.353075504303 timestamp: 1602273825 timesteps_since_restore: 0 training_iteration: 5 trial_id: 77a44_00006 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.31617 | 0.5664 | 8 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1098)[0m [9, 10000] loss: 0.213 [2m[36m(pid=1145)[0m [5, 14000] loss: 0.144 [2m[36m(pid=1126)[0m [6, 2000] loss: 1.270 Result for DEFAULT_77a44_00007: accuracy: 0.5803 date: 2020-10-09_20-03-56 done: false experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 9 loss: 1.3147958470012993 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 469.72292470932007 time_this_iter_s: 48.58619213104248 time_total_s: 469.72292470932007 timestamp: 1602273836 timesteps_since_restore: 0 training_iteration: 9 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.1594 | 0.6009 | 4 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.3148 | 0.5803 | 9 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [6, 4000] loss: 0.624 [2m[36m(pid=1145)[0m [5, 16000] loss: 0.127 [2m[36m(pid=1098)[0m [10, 2000] loss: 0.949 [2m[36m(pid=1126)[0m [6, 6000] loss: 0.430 [2m[36m(pid=1145)[0m [5, 18000] loss: 0.112 [2m[36m(pid=1098)[0m [10, 4000] loss: 0.502 [2m[36m(pid=1126)[0m [6, 8000] loss: 0.323 [2m[36m(pid=1145)[0m [5, 20000] loss: 0.099 [2m[36m(pid=1098)[0m [10, 6000] loss: 0.346 [2m[36m(pid=1126)[0m [6, 10000] loss: 0.258 Result for DEFAULT_77a44_00001: accuracy: 0.6221 date: 2020-10-09_20-04-28 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 5 loss: 1.0875221006242093 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 501.5412850379944 time_this_iter_s: 93.91627407073975 time_total_s: 501.5412850379944 timestamp: 1602273868 timesteps_since_restore: 0 training_iteration: 5 trial_id: 77a44_00001 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=7 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.08752 | 0.6221 | 5 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.3148 | 0.5803 | 9 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [6, 12000] loss: 0.211 [2m[36m(pid=1098)[0m [10, 8000] loss: 0.253 [2m[36m(pid=1145)[0m [6, 2000] loss: 0.827 [2m[36m(pid=1126)[0m [6, 14000] loss: 0.177 [2m[36m(pid=1098)[0m [10, 10000] loss: 0.210 [2m[36m(pid=1145)[0m [6, 4000] loss: 0.448 [2m[36m(pid=1126)[0m [6, 16000] loss: 0.160 Result for DEFAULT_77a44_00007: accuracy: 0.5713 date: 2020-10-09_20-04-45 done: true experiment_id: 1e0a3b1304eb470898956b381db607e6 experiment_tag: 7_batch_size=4,l1=128,l2=16,lr=0.002029 hostname: 234fef3cc6b0 iterations_since_restore: 10 loss: 1.2877456236266531 node_ip: 172.17.0.2 pid: 1098 should_checkpoint: true time_since_restore: 518.6297419071198 time_this_iter_s: 48.90681719779968 time_total_s: 518.6297419071198 timestamp: 1602273885 timesteps_since_restore: 0 training_iteration: 10 trial_id: 77a44_00007 == Status == Memory usage on this node: 5.1/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 6/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (3 RUNNING, 7 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.08752 | 0.6221 | 5 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29457 | 0.5285 | 5 | | DEFAULT_77a44_00007 | RUNNING | 172.17.0.2:1098 | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [6, 18000] loss: 0.143 [2m[36m(pid=1145)[0m [6, 6000] loss: 0.297 [2m[36m(pid=1126)[0m [6, 20000] loss: 0.127 [2m[36m(pid=1145)[0m [6, 8000] loss: 0.235 [2m[36m(pid=1145)[0m [6, 10000] loss: 0.184 Result for DEFAULT_77a44_00006: accuracy: 0.5484 date: 2020-10-09_20-05-10 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 6 loss: 1.2631257870631292 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 543.5542225837708 time_this_iter_s: 85.20114707946777 time_total_s: 543.5542225837708 timestamp: 1602273910 timesteps_since_restore: 0 training_iteration: 6 trial_id: 77a44_00006 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.08752 | 0.6221 | 5 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.26313 | 0.5484 | 6 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [6, 12000] loss: 0.157 [2m[36m(pid=1126)[0m [7, 2000] loss: 1.256 [2m[36m(pid=1126)[0m [7, 4000] loss: 0.631 [2m[36m(pid=1145)[0m [6, 14000] loss: 0.131 [2m[36m(pid=1126)[0m [7, 6000] loss: 0.407 [2m[36m(pid=1145)[0m [6, 16000] loss: 0.121 [2m[36m(pid=1126)[0m [7, 8000] loss: 0.311 [2m[36m(pid=1145)[0m [6, 18000] loss: 0.101 [2m[36m(pid=1126)[0m [7, 10000] loss: 0.243 [2m[36m(pid=1145)[0m [6, 20000] loss: 0.094 [2m[36m(pid=1126)[0m [7, 12000] loss: 0.203 Result for DEFAULT_77a44_00001: accuracy: 0.61 date: 2020-10-09_20-06-01 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 6 loss: 1.1592615005358762 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 594.7056727409363 time_this_iter_s: 93.1643877029419 time_total_s: 594.7056727409363 timestamp: 1602273961 timesteps_since_restore: 0 training_iteration: 6 trial_id: 77a44_00001 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15926 | 0.61 | 6 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.26313 | 0.5484 | 6 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [7, 14000] loss: 0.176 [2m[36m(pid=1126)[0m [7, 16000] loss: 0.156 [2m[36m(pid=1145)[0m [7, 2000] loss: 0.802 [2m[36m(pid=1126)[0m [7, 18000] loss: 0.141 [2m[36m(pid=1145)[0m [7, 4000] loss: 0.393 [2m[36m(pid=1126)[0m [7, 20000] loss: 0.123 [2m[36m(pid=1145)[0m [7, 6000] loss: 0.282 Result for DEFAULT_77a44_00006: accuracy: 0.5369 date: 2020-10-09_20-06-34 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 7 loss: 1.2813393794611097 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 627.4627993106842 time_this_iter_s: 83.90857672691345 time_total_s: 627.4627993106842 timestamp: 1602273994 timesteps_since_restore: 0 training_iteration: 7 trial_id: 77a44_00006 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15926 | 0.61 | 6 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28134 | 0.5369 | 7 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [7, 8000] loss: 0.206 [2m[36m(pid=1126)[0m [8, 2000] loss: 1.200 [2m[36m(pid=1145)[0m [7, 10000] loss: 0.171 [2m[36m(pid=1126)[0m [8, 4000] loss: 0.602 [2m[36m(pid=1145)[0m [7, 12000] loss: 0.138 [2m[36m(pid=1126)[0m [8, 6000] loss: 0.407 [2m[36m(pid=1145)[0m [7, 14000] loss: 0.121 [2m[36m(pid=1126)[0m [8, 8000] loss: 0.296 [2m[36m(pid=1145)[0m [7, 16000] loss: 0.109 [2m[36m(pid=1126)[0m [8, 10000] loss: 0.247 [2m[36m(pid=1145)[0m [7, 18000] loss: 0.098 [2m[36m(pid=1126)[0m [8, 12000] loss: 0.205 [2m[36m(pid=1145)[0m [7, 20000] loss: 0.086 [2m[36m(pid=1126)[0m [8, 14000] loss: 0.175 [2m[36m(pid=1126)[0m [8, 16000] loss: 0.152 Result for DEFAULT_77a44_00001: accuracy: 0.6115 date: 2020-10-09_20-07-35 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 7 loss: 1.1567747425308288 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 687.9970579147339 time_this_iter_s: 93.29138517379761 time_total_s: 687.9970579147339 timestamp: 1602274055 timesteps_since_restore: 0 training_iteration: 7 trial_id: 77a44_00001 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.7237946317078545 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15677 | 0.6115 | 7 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28134 | 0.5369 | 7 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [8, 18000] loss: 0.136 [2m[36m(pid=1145)[0m [8, 2000] loss: 0.721 [2m[36m(pid=1126)[0m [8, 20000] loss: 0.122 [2m[36m(pid=1145)[0m [8, 4000] loss: 0.373 Result for DEFAULT_77a44_00006: accuracy: 0.5222 date: 2020-10-09_20-07-58 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 8 loss: 1.3225798389766366 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 711.2751452922821 time_this_iter_s: 83.8123459815979 time_total_s: 711.2751452922821 timestamp: 1602274078 timesteps_since_restore: 0 training_iteration: 8 trial_id: 77a44_00006 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3225798389766366 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.15677 | 0.6115 | 7 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.32258 | 0.5222 | 8 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [8, 6000] loss: 0.246 [2m[36m(pid=1126)[0m [9, 2000] loss: 1.150 [2m[36m(pid=1145)[0m [8, 8000] loss: 0.191 [2m[36m(pid=1126)[0m [9, 4000] loss: 0.587 [2m[36m(pid=1145)[0m [8, 10000] loss: 0.153 [2m[36m(pid=1126)[0m [9, 6000] loss: 0.383 [2m[36m(pid=1145)[0m [8, 12000] loss: 0.128 [2m[36m(pid=1126)[0m [9, 8000] loss: 0.297 [2m[36m(pid=1145)[0m [8, 14000] loss: 0.116 [2m[36m(pid=1126)[0m [9, 10000] loss: 0.239 [2m[36m(pid=1145)[0m [8, 16000] loss: 0.098 [2m[36m(pid=1126)[0m [9, 12000] loss: 0.200 [2m[36m(pid=1145)[0m [8, 18000] loss: 0.093 [2m[36m(pid=1126)[0m [9, 14000] loss: 0.173 [2m[36m(pid=1126)[0m [9, 16000] loss: 0.155 [2m[36m(pid=1145)[0m [8, 20000] loss: 0.083 [2m[36m(pid=1126)[0m [9, 18000] loss: 0.135 Result for DEFAULT_77a44_00001: accuracy: 0.6234 date: 2020-10-09_20-09-07 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 8 loss: 1.1474703996328957 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 780.5215935707092 time_this_iter_s: 92.52453565597534 time_total_s: 780.5215935707092 timestamp: 1602274147 timesteps_since_restore: 0 training_iteration: 8 trial_id: 77a44_00001 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.14747 | 0.6234 | 8 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.32258 | 0.5222 | 8 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1126)[0m [9, 20000] loss: 0.122 [2m[36m(pid=1145)[0m [9, 2000] loss: 0.652 Result for DEFAULT_77a44_00006: accuracy: 0.5382 date: 2020-10-09_20-09-21 done: false experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 9 loss: 1.2859820882213302 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 794.5377962589264 time_this_iter_s: 83.26265096664429 time_total_s: 794.5377962589264 timestamp: 1602274161 timesteps_since_restore: 0 training_iteration: 9 trial_id: 77a44_00006 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.14747 | 0.6234 | 8 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28598 | 0.5382 | 9 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [9, 4000] loss: 0.336 [2m[36m(pid=1126)[0m [10, 2000] loss: 1.142 [2m[36m(pid=1145)[0m [9, 6000] loss: 0.233 [2m[36m(pid=1126)[0m [10, 4000] loss: 0.570 [2m[36m(pid=1145)[0m [9, 8000] loss: 0.178 [2m[36m(pid=1126)[0m [10, 6000] loss: 0.395 [2m[36m(pid=1145)[0m [9, 10000] loss: 0.143 [2m[36m(pid=1126)[0m [10, 8000] loss: 0.299 [2m[36m(pid=1145)[0m [9, 12000] loss: 0.118 [2m[36m(pid=1126)[0m [10, 10000] loss: 0.228 [2m[36m(pid=1145)[0m [9, 14000] loss: 0.104 [2m[36m(pid=1126)[0m [10, 12000] loss: 0.196 [2m[36m(pid=1145)[0m [9, 16000] loss: 0.093 [2m[36m(pid=1126)[0m [10, 14000] loss: 0.169 [2m[36m(pid=1126)[0m [10, 16000] loss: 0.151 [2m[36m(pid=1145)[0m [9, 18000] loss: 0.083 [2m[36m(pid=1126)[0m [10, 18000] loss: 0.132 [2m[36m(pid=1145)[0m [9, 20000] loss: 0.078 [2m[36m(pid=1126)[0m [10, 20000] loss: 0.118 Result for DEFAULT_77a44_00001: accuracy: 0.6124 date: 2020-10-09_20-10-40 done: false experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 9 loss: 1.2186276267750566 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 873.050055027008 time_this_iter_s: 92.52846145629883 time_total_s: 873.050055027008 timestamp: 1602274240 timesteps_since_restore: 0 training_iteration: 9 trial_id: 77a44_00001 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=8 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.21863 | 0.6124 | 9 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.28598 | 0.5382 | 9 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ Result for DEFAULT_77a44_00006: accuracy: 0.5454 date: 2020-10-09_20-10-45 done: true experiment_id: 696157fc029f42e781f0779431a5902f experiment_tag: 6_batch_size=2,l1=8,l2=8,lr=0.00035961 hostname: 234fef3cc6b0 iterations_since_restore: 10 loss: 1.290222985061258 node_ip: 172.17.0.2 pid: 1126 should_checkpoint: true time_since_restore: 878.2885060310364 time_this_iter_s: 83.75070977210999 time_total_s: 878.2885060310364 timestamp: 1602274245 timesteps_since_restore: 0 training_iteration: 10 trial_id: 77a44_00006 == Status == Memory usage on this node: 4.6/240.1 GiB Using AsyncHyperBand: num_stopped=9 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 4/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (2 RUNNING, 8 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.21863 | 0.6124 | 9 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | RUNNING | 172.17.0.2:1126 | 2 | 8 | 8 | 0.000359613 | 1.29022 | 0.5454 | 10 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ [2m[36m(pid=1145)[0m [10, 2000] loss: 0.564 [2m[36m(pid=1145)[0m [10, 4000] loss: 0.304 [2m[36m(pid=1145)[0m [10, 6000] loss: 0.210 [2m[36m(pid=1145)[0m [10, 8000] loss: 0.165 [2m[36m(pid=1145)[0m [10, 10000] loss: 0.132 [2m[36m(pid=1145)[0m [10, 12000] loss: 0.107 [2m[36m(pid=1145)[0m [10, 14000] loss: 0.096 [2m[36m(pid=1145)[0m [10, 16000] loss: 0.089 [2m[36m(pid=1145)[0m [10, 18000] loss: 0.082 [2m[36m(pid=1145)[0m [10, 20000] loss: 0.071 Result for DEFAULT_77a44_00001: accuracy: 0.6152 date: 2020-10-09_20-12-10 done: true experiment_id: f3958015aa1f4ab2a11c7e4fc8b68da6 experiment_tag: 1_batch_size=2,l1=256,l2=128,lr=0.00046168 hostname: 234fef3cc6b0 iterations_since_restore: 10 loss: 1.3026221742785826 node_ip: 172.17.0.2 pid: 1145 should_checkpoint: true time_since_restore: 963.3746852874756 time_this_iter_s: 90.32463026046753 time_total_s: 963.3746852874756 timestamp: 1602274330 timesteps_since_restore: 0 training_iteration: 10 trial_id: 77a44_00001 == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 2/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (1 RUNNING, 9 TERMINATED) +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | RUNNING | 172.17.0.2:1145 | 2 | 256 | 128 | 0.000461678 | 1.30262 | 0.6152 | 10 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | TERMINATED | | 2 | 8 | 8 | 0.000359613 | 1.29022 | 0.5454 | 10 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-----------------+--------------+------+------+-------------+---------+------------+----------------------+ == Status == Memory usage on this node: 4.1/240.1 GiB Using AsyncHyperBand: num_stopped=10 Bracket: Iter 8.000: -1.3193767046023162 | Iter 4.000: -1.3512925069440156 | Iter 2.000: -1.448237135411054 | Iter 1.000: -1.919263457131386 Resources requested: 0/32 CPUs, 0/2 GPUs, 0.0/157.76 GiB heap, 0.0/49.41 GiB objects Result logdir: /var/lib/jenkins/ray_results/DEFAULT Number of trials: 10 (10 TERMINATED) +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+ | Trial name | status | loc | batch_size | l1 | l2 | lr | loss | accuracy | training_iteration | |---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------| | DEFAULT_77a44_00000 | TERMINATED | | 4 | 8 | 128 | 0.0210161 | 2.30609 | 0.1073 | 1 | | DEFAULT_77a44_00001 | TERMINATED | | 2 | 256 | 128 | 0.000461678 | 1.30262 | 0.6152 | 10 | | DEFAULT_77a44_00002 | TERMINATED | | 8 | 32 | 16 | 0.0131231 | 2.23771 | 0.1292 | 10 | | DEFAULT_77a44_00003 | TERMINATED | | 4 | 4 | 128 | 0.00551547 | 1.95655 | 0.2563 | 1 | | DEFAULT_77a44_00004 | TERMINATED | | 2 | 256 | 256 | 0.0647615 | 2.346 | 0.1024 | 1 | | DEFAULT_77a44_00005 | TERMINATED | | 4 | 4 | 128 | 0.0421917 | 2.35236 | 0.0986 | 1 | | DEFAULT_77a44_00006 | TERMINATED | | 2 | 8 | 8 | 0.000359613 | 1.29022 | 0.5454 | 10 | | DEFAULT_77a44_00007 | TERMINATED | | 4 | 128 | 16 | 0.00202898 | 1.28775 | 0.5713 | 10 | | DEFAULT_77a44_00008 | TERMINATED | | 2 | 4 | 8 | 0.000162963 | 1.98445 | 0.1883 | 1 | | DEFAULT_77a44_00009 | TERMINATED | | 2 | 32 | 256 | 0.000134494 | 1.54615 | 0.4358 | 2 | +---------------------+------------+-------+--------------+------+------+-------------+---------+------------+----------------------+ Best trial config: {'l1': 128, 'l2': 16, 'lr': 0.0020289809406172947, 'batch_size': 4} Best trial final validation loss: 1.2877456236266531 Best trial final validation accuracy:
如果运行该代码,示例输出可能如下所示:
为了避免浪费资源,大多数试验已及早停止。 性能最好的试验达到了大约58%的验证精度,这可以在测试集上得到确认。
就这样!大家现在可以调整PyTorch模型的参数。
接下来,给大家介绍一下租用GPU做实验的方法,我们是在智星云租用的GPU,使用体验很好。具体大家可以参考:智星云官网: http://www.ai-galaxy.cn/,淘宝店:https://shop36573300.taobao.com/公众号: 智星AI