k8s 操作笔记
更多kubernetes文章:k8s专栏目录
版本 1.9.0
namespace限制gpu
[[email protected] gpu-namespace]# cat compute-resources2.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
#pods: "4"
#requests.cpu: "1"
#requests.memory: 1Gi
#limits.cpu: "2"
#limits.memory: 2Gi
kubectl create -f compute-resources2.yaml
kubectl get quota
kubectl describe quota compute-resources
kubectl delete quota compute-resources
先创建namespace 再在namespace上增加限制,这里是在default下增加限制
- docker里面没vi等基本命令
echo "nameserver 192.168.1.254" > /etc/resolv.conf
apt-get update
apt install net-tools # ifconfig
apt install iputils-ping # ping
apt install vi
启动gpu任务
Warning FailedScheduling 3s (x7 over 34s) default-scheduler 0/3 nodes are available: 1 PodToleratesNodeTaints, 3 Insufficient nvidia.com/gpu.
- 调整副本数
kubectl scale ds/kube-flannel-ds --replicas=1
- 在指定node上启动容器
增加参数 nodeName: xxxx
eg:
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "nfs:5000/tensorflow/tensorflow:nightly"
#resources:
#limits:
#nvidia.com/gpu: 1 # requesting 1 GPU
nodeName: tensorflow1
- 隔离恢复节点
kubectl cordon {hostname} #隔离
kubectl uncordon {hostname} #恢复
- 创建删除应用
kubectl run httpd-app --image=httpd --replicas=2
kubectl get all --all-namespaces
kubectl get deployments
删除任务
kubectl delete deployment xxxxx
kubectl delete deploy/httpd-app
验证
[[email protected] k8s_images]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
httpd-app-5fbccd7c6c-5sx5z 1/1 Running 0 25m 10.244.2.2 tensorflow0
httpd-app-5fbccd7c6c-87jvp 1/1 Running 0 17m 10.244.2.3 tensorflow0
[[email protected] k8s_images]# curl 10.244.2.2
<html><body><h1>It works!</h1></body></html>
[[email protected] k8s_images]# curl 10.244.2.3
<html><body><h1>It works!</h1></body></html>
- restful api无法访问
1.5使用不加密的4194端口,1.9使用加密的6443端口,需要做额外设置才能访问
master机器上执行 curl "https://localhost:6443/healthz" -k
-k忽略证书问题
kubectl get clusterrole/cluster-admin -o yaml
编辑basic_auth_file
vi /etc/kubernetes/pki/basic_auth_file
admin,admin,2
vi /etc/kubernetes/manifests/kube-apiserver.yaml
增加 - --basic_auth_file=/etc/kubernetes/pki/basic_auth_file
注意 basic_auth_file必须是下划线,网上有中划线的是不行的
自动生效
这里basic_auth_file必须在/etc/kubernetes/pki/下的原因,可能是因为apiserver的容器里挂载了这个路径。仅是猜测,未经测试。
访问
权限问题:
api文档
解决Kubernetes 1.6.4 Dashboard无法访问的问题
Kubernetes 1.6新特性学习:RBAC授权
Kubernetes dashboard1.8.0 WebUI安装与配置
User “system:anonymous” cannot get path “/”
rbac 官方文档
kubernetes1.8版本heapster部署
- 访问dashboard失败
endpoints正常
[[email protected] k8s_images]# kubectl get endpoints --all-namespaces
NAMESPACE NAME ENDPOINTS AGE
default kubernetes 192.168.1.138:6443 18h
kube-system kube-controller-manager <none> 18h
kube-system kube-dns 172.17.0.2:53,172.17.0.2:53 18h
kube-system kube-scheduler <none> 18h
kube-system kubernetes-dashboard 172.17.0.3:8443 29m
[[email protected] k8s_images]# curl "https://172.17.0.3:8443" -k
<!doctype html> <html ng-app="kubernetesDashboard"> <head> <meta charset="utf-8"> <title ng-controller="kdTitle as $ctrl" ng-bind="$ctrl.title()"></title> <link rel="icon" type="image/png" href="assets/images/kubernetes-logo.png"> <meta name="viewport" content="width=device-width"> <link rel="stylesheet" href="static/vendor.93db0a0d.css"> <link rel="stylesheet" href="static/app.ffb1366f.css"> </head> <body ng-controller="kdMain as $ctrl"> <!--[if lt IE 10]>
<p class="browsehappy">You are using an <strong>outdated</strong> browser.
Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your
experience.</p>
<![endif]--> <kd-login layout="column" layout-fill ng-if="$ctrl.isLoginState()"> </kd-login> <kd-chrome layout="column" layout-fill ng-if="!$ctrl.isLoginState()"> </kd-chrome> <script src="static/vendor.9a600e6f.js"></script> <script src="api/appConfig.json"></script> <script src="static/app.fe2776ce.js"></script> </body> </html>
service信息如下
[[email protected] k8s_images]# kubectl describe svc/kubernetes-dashboard -n kube-system
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: NodePort
IP: 10.100.2.162
Port: <unset> 443/TCP
TargetPort: 8443/TCP
NodePort: <unset> 32666/TCP
Endpoints: 172.17.0.3:8443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
curl "https://10.100.2.162:443" -k 可以访问
master机器执行 curl "https://localhost:32666" -k 不可以访问
打标签 以后使用定制标签,否则每次会去网上检查,再拉镜像
docker tag httpd:latest httpd:20180322
几个port之间的关系
- dashboard https打不开问题
ERR_SSL_SERVER_CERT_BAD_FORMAT
ie和google浏览器都打不开
要用火狐打开。允许 添加例外。然后就能打开了
- dashboard看不了了
权限问题 如何配置权限见另一篇文章 RBAC多租户权限控制实现
直接编辑对象
kubectl -n kube-system edit service kubernetes-dashboard
gpu 资源控制
存在bug 修复了
升级方案
gpu 只有requests值,没有limits值
k8s 1.10.0安装
gpu设置为0的时候无效是个bug
目前解决方案是设置环境变量 将NVIDIA_VISIBLE_DEVICES 设置为空