SGE faild提交作业,属性不是内存值
我未能提交具有mem属性的作业。由于我是新手,谷歌两天后,我在这里寻求帮助。任何建议将不胜感激!SGE faild提交作业,属性不是内存值
以下是我所做的:
\ 1。提交我的稿子:
qsub -S /bin/bash -A assembly -pe threads 16 -l mem=2GB -cwd -N "pBcR_correct_asm" -j y -o /dev/null runCorrection.sh
Unable to run job: unknown resource "mem".
Exiting.
\ 2。考虑到我已经将“h”替换为“主机”,根据SGE unknown resource "nodes"解决了我的问题,我将“m”替换为“mem”,并且它不起作用。
\ 3.在google之后,我知道“h” 为在 “/选择/ gridengine/util的/资源/森特里/ 主机名” 中定义的快捷方式,并且可以与 “的qconf -sc” 予以确认:
qconf -sc
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------
arch a RESTRING == YES NO NONE 0
calendar c RESTRING == YES NO NONE 0
cpu cpu DOUBLE >= YES NO 0 0
display_win_gui dwg BOOL == YES NO 0 0
h_core h_core MEMORY <= YES NO 0 0
h_cpu h_cpu TIME <= YES NO 0:0:0 0
h_data h_data MEMORY <= YES NO 0 0
h_fsize h_fsize MEMORY <= YES NO 0 0
h_rss h_rss MEMORY <= YES NO 0 0
h_rt h_rt TIME <= YES NO 0:0:0 0
h_stack h_stack MEMORY <= YES NO 0 0
h_vmem h_vmem MEMORY <= YES NO 0 0
hostname h HOST == YES NO NONE 0
load_avg la DOUBLE >= NO NO 0 0
load_long ll DOUBLE >= NO NO 0 0
load_medium lm DOUBLE >= NO NO 0 0
load_short ls DOUBLE >= NO NO 0 0
m_core core INT <= YES NO 0 0
m_socket socket INT <= YES NO 0 0
m_topology topo RESTRING == YES NO NONE 0
m_topology_inuse utopo RESTRING == YES NO NONE 0
mem_free mf MEMORY <= YES NO 0 0
mem_total mt MEMORY <= YES NO 0 0
mem_used mu MEMORY >= YES NO 0 0
\ 4因此我代替。 “mt”改为“mem”,但是它抱怨属性问题,根据上面的输出,似乎mem_total与之前的“hostname”几乎相同,那么我认为jsv在通过SGE指南后可能会出现问题,但我找不到任何脚本包含“无法运行作业:属性......” ,其中“/ opt/gridengine/util/resources/jsv”的负责人。我想我必须配置一些文件,但是这些文件是什么,我该怎么办?
qsub -S /bin/bash -A assembly -pe threads 16 -l mt=2GB -cwd -N "pBcR_correct_asm" -j y -o test.out runCorrection.sh
Unable to run job: attribute "mem_total" is not a memory value.
Exiting.
你可能想要的是h_vmem
。至少这是我总是用来指定我想要的工作请求的内存的属性。
参见:
http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html?pathrev=V62u5_TAG
具体来说,
The resource limit parameters s_vmem and h_vmem are imple-
mented by Sun Grid Engine as a job limit. They impose a
limit on the amount of combined virtual memory consumed by
all the processes in the job. If h_vmem is exceeded by a job
running in the queue, it is aborted via a SIGKILL signal
(see kill(1)). If s_vmem is exceeded, the job is sent a
SIGXCPU signal which can be caught by the job. If you wish
to allow a job to be "warned" so it can exit gracefully
before it is killed then you should set the s_vmem limit to
a lower value than h_vmem. For parallel processes, the
limit is applied per slot which means that the limit is mul-
tiplied by the number of slots being used by the job before
being applied.
此外,您可能需要设置为消耗使用qconf
。
@Vince!
非常感谢您的回复。
最后我解决了我的问题,通过使用“h_vmem = 2g”(“2GB”会给出错误),但我不知道在哪里可以找到如何设计复杂值(MEMORY)的值。
以下信息现在是不必要的。
我已经读过你给的网站,并将h_vmem和s_vmeme的属性复杂地配置为“可使用”,但它不起作用。我想我必须配置当前“无”队列的“complex_value”。但是,我无法打开可能会告诉我如何配置的网络http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_types.html?pathrev=V62u5_TAG。我有权配置队列吗?我是否也必须配置主机?
任何建议将不胜感激!
Fowllowing是我做过的:
\ 1。将h_vmem和s_vmem的消耗品属性更改为“YES”:
qconf -sc
#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------
arch a RESTRING == YES NO NONE 0
calendar c RESTRING == YES NO NONE 0
cpu cpu DOUBLE >= YES NO 0 0
display_win_gui dwg BOOL == YES NO 0 0
h_core h_core MEMORY <= YES NO 0 0
h_cpu h_cpu TIME <= YES NO 0:0:0 0
h_data h_data MEMORY <= YES NO 0 0
h_fsize h_fsize MEMORY <= YES NO 0 0
h_rss h_rss MEMORY <= YES NO 0 0
h_rt h_rt TIME <= YES NO 0:0:0 0
h_stack h_stack MEMORY <= YES NO 0 0
h_vmem h_vmem MEMORY <= YES YES 0 0
hostname h HOST == YES NO NONE 0
load_avg la DOUBLE >= NO NO 0 0
load_long ll DOUBLE >= NO NO 0 0
load_medium lm DOUBLE >= NO NO 0 0
load_short ls DOUBLE >= NO NO 0 0
m_core core INT <= YES NO 0 0
m_socket socket INT <= YES NO 0 0
m_topology topo RESTRING == YES NO NONE 0
m_topology_inuse utopo RESTRING == YES NO NONE 0
mem_free mf MEMORY <= YES NO 0 0
mem_total mt MEMORY <= YES NO 0 0
mem_used mu MEMORY >= YES NO 0 0
min_cpu_interval mci TIME <= NO NO 0:0:0 0
np_load_avg nla DOUBLE >= NO NO 0 0
np_load_long nll DOUBLE >= NO NO 0 0
np_load_medium nlm DOUBLE >= NO NO 0 0
np_load_short nls DOUBLE >= NO NO 0 0
num_proc p INT == YES NO 0 0
qname q RESTRING == YES NO NONE 0
rerun re BOOL == NO NO 0 0
s_core s_core MEMORY <= YES NO 0 0
s_cpu s_cpu TIME <= YES NO 0:0:0 0
s_data s_data MEMORY <= YES NO 0 0
s_fsize s_fsize MEMORY <= YES NO 0 0
s_rss s_rss MEMORY <= YES NO 0 0
s_rt s_rt TIME <= YES NO 0:0:0 0
s_stack s_stack MEMORY <= YES NO 0 0
s_vmem s_vmem MEMORY <= YES YES 0 0
seq_no seq INT == NO NO 0 0
slots s INT <= YES YES 1 1000
swap_free sf MEMORY <= YES NO 0 0
swap_rate sr MEMORY >= YES NO 0 0
swap_rsvd srsv MEMORY >= YES NO 0 0
swap_total st MEMORY <= YES NO 0 0
swap_used su MEMORY >= YES NO 0 0
tmpdir tmp RESTRING == NO NO NONE 0
virtual_free vf MEMORY <= YES NO 0 0
virtual_total vt MEMORY <= YES NO 0 0
virtual_used vu MEMORY >= YES NO 0 0
# >#< starts a comment but comments are not saved across edits --------
\ 2。将我的作业提交到smp队列。q,它抱怨同样的问题:
qsub -S /bin/bash -A assembly -q smp.q -pe newPe 16 -l h_vmem=2GB -cwd -N "pBcR_correct_asm" -j y -o runCorrection.sh
Unable to run job: attribute "h_vmem" is not a memory value.
Exiting.
\ 3。 smp.q.的信息我认为“complex_values”应该改变,“h_vmem”可以保持不变:
qconf -sq smp.q
qname smp.q
hostlist @smp.q
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make newPe
rerun FALSE
slots 160
tmpdir /tmp
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
\ 4。 @ smp.q中的主机信息:
qconf -sconf smp03.local
#smp03.local:
mailer /bin/mail
xterm /usr/bin/X11/xterm
execd_spool_dir /opt/gridengine/default/spool
\ 5。全球信息。我在这里添加h_vmem和s_vmem吗?
qconf -sconf
#global:
execd_spool_dir /opt/gridengine/default/spool
mailer /bin/mail
xterm /usr/bin/X11/xterm
load_sensor none
prolog none
epilog none
shell_start_mode posix_compliant
login_shells sh,ksh,csh,tcsh
min_uid 0
min_gid 0
user_lists none
xuser_lists none
projects none
xprojects none
enforce_project false
enforce_user auto
load_report_time 00:00:40
max_unheard 00:05:00
reschedule_unknown 00:00:00
loglevel log_warning
administrator_mail none
set_token_cmd none
pag_cmd none
token_extend_time none
shepherd_cmd none
qmaster_params none
execd_params ENABLE_ADDGRP_KILL=TRUE H_MEMORYLOCKED=infinity
reporting_params accounting=true reporting=true \
flush_time=00:00:15 joblog=true sharelog=00:00:00
finished_jobs 100
gid_range 20000-20100
qlogin_command builtin
qlogin_daemon builtin
rlogin_command builtin
rlogin_daemon builtin
rsh_command builtin
rsh_daemon builtin
max_aj_instances 2000
max_aj_tasks 75000
max_u_jobs 0
max_jobs 0
max_advance_reservations 0
auto_user_oticket 0
auto_user_fshare 0
auto_user_default_project none
auto_user_delete_time 86400
delegated_file_staging false
reprioritize 0
jsv_url none
jsv_allowed_mod ac,h,i,e,o,j,M,N,p,w
我想我知道我为什么失败。它似乎是h_vmem不配置全局,那就是我必须“qconf -mconf global”并添加“h_vmem 1024M”。但是当管理员不在时我无法测试它。如果有效,我会在这里发布解决方案。 – lam138138