每一次反复,无非是你上一次做得并不好,也是为了,下一次做得更好。
——abysw
以前有写过很多,可以看Yuan-SW-F(abysw) – 博客园 (cnblogs.com)
作业调度系统配置总结-未验证版 – Yuan-SW-F(abysw) – 博客园 (cnblogs.com)
主机:###########################################################
yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel libtool -y
wget https://src.fedoraproject.org/lookaside/pkgs/torque/torque-6.1.1.1.tar.gz/sha512/74ff683f56d04a4d08774896c9f9875c68aa2cacfe6c1c8c65246da52396443d3f7497bc8a6a1f06d357f52c65153fc9db00692f514ac30279e4c765547d98c0/torque-6.1.1.1.tar.gz
./configure
make
make install
######后面没有变动#####
7. 查看节点信息
cat /proc/cpuinfo # 用于后面的节点配置
8. 更改主机名
cat /etc/sysconfig/network
# Created by anaconda
*****9
9. 设置计算机时间
ln -s ../usr/share/zoneinfo/Asia/Shanghai /etc/localtime
10. 设置公钥
ssh-keygen -t rsa # 回车三次即可
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # 每个节点都如此操作
将所有节点的公钥合并,实现无密登录
11. 计算节点配置
拷贝文件到计算节点:
cp /chenlab/abyss/torque-6.1.1.1/contrib/init.d/{pbs_mom,trqauthd} /chenlab/abyss/torque-6.1.1.1/contrib/{torque-package-clients-linux-x86_64.sh,torque-package-mom-linux-x86_64.sh} ~
以下操作为每个节点的单独操作:
cd /root
./torque-package-clients-linux-x86_64.sh –install
./torque-package-mom-linux-x86_64.sh –install
12. 配置几个文件
cat /var/spool/torque/server_name
*****9
cat /var/spool/torque/server_priv/nodes
*****9 np=12
*****8 np=122
cat /etc/hosts
192.168.*****9 *****9
192.168.*****8 *****8
cat /var/spool/torque/mom_priv/config
$pbsserver *****9
$logevent 255
vi /etc/profile
加入变量如下:
TORQUE=/usr/local
MAUI=/usr/local
if [ `id -u` -eq 0 ]; then
PATH=$TORQUE/bin:$TORQUE/sbin:$TORQUE/bin:$MAUI/sbin:$MAUI/bin:$PATH
else
PATH=$TORQUE/bin:$MAUI/bin:$PATH
fi
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/bin:$PATH
export PATH=/usr/local/sbin:$PATH
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
### 上下两份是冗余的
cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/local/lib
/sbin/ldconfig /etc/ld.so.conf
make packages #
cp pbs_mom pbs_sched pbs_server trqauthd /etc/init.d/ #
13. 添加用户账号
useradd usr_name
torque.setup usr_name
若已有pbs运行,ps -e | grep pbs | kill 后重试
14. 开启pbs
for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i start;done
15. nodes数据丢失,重新添加
vi /var/spool/torque/server_priv/nodes
cat /var/spool/torque/mom_priv/config
$pbsserver *****9
$logevent 255
=========================================================
16. 重新开启pbs
for i in pbs_server pbs_sched pbs_mom trqauthd; do sudo service $i restart;done
=========================================================
17. 投任务
$ cat zsleep.sh
while [ 1 ];do
echo `date` >> zzz.test
sleep 5s
done
$ qsub zsleep.sh
18. 换成自己设置的队列名
https://www.cnblogs.com/abysw/p/14448811.html
qmgr -c ‘print server’
qmgr -c “c q abyss”
qmgr -c “s q abyss queue_type=Execution”
qmgr -c “s q abyss enabled=true”
qmgr -c “s s default_queue=abyss”
qmgr -c “s q abyss started=true”
qmgr -c ‘print server’
qmgr -c “s q abyss resources_default.nodes = 1”
qmgr -c ‘print server’
qmgr -c ‘d q batch’
qmgr -c ‘print server’
qmgr -c “s q abyss resources_default.walltime = 1000:00:00”
qmgr -c ‘print server’
qmgr -c “create queue new queue_type=execution”
qmgr -c “set queue new started=true”
qmgr -c “set queue new enabled=true”
qmgr -c “set queue new resources_default.nodes=1”
qmgr -c “set queue new resources_default.walltime=3600”
qmgr -c “set queue new acl_users=fuyuan”
qmgr -c “set queue new acl_user_enable=true”
qmgr -c “set queue new max_user_queuable=30”
qmgr -c “set queue new max_user_run=5”
19. 若计算节点down,关闭防火墙
yum install iptables-services
$ cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing – SELinux security policy is enforced.
# permissive – SELinux prints warnings instead of enforcing.
# disabled – No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three values:
# targeted – Targeted processes are protected,
# minimum – Modification of targeted policy. Only selected processes are protected.
# mls – Multi Level Security protection.
SELINUXTYPE=targeted
SELINUX=disabled # 加入这一行
service iptables stop # 关闭防火墙
==========================
down状态的节点变成了free
==========================
重启
===========================
$ pbsnodes -a