Openstack QOS by Linux TC

1. tc Traffic Control

Linux操作系统中的流量控制器TC(Traffic Control)用于Linux内核的流量控制,主要是通过在输出端口处建立一个队列来实现流量控制。

要实现nova instance的QOS,可以通过在linux bridge的qvbxxxx端口限制instance出口流量,在ovs bridge上的qvoxxxx端口限制instance的入口流量,并添加filter过滤tenant内部网络流量不受QOS限制。

2. get metadata for TC rule

1
2
3
4
5
root@node-4:~# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.3 LTS"
2.1 show instance info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
root@node-1:~# nova show 95985dfe-8356-4bb6-8ec7-46730d1b4c41 | grep host

+--------------------------------------+----------------------------------------------------------+
| Property                             | Value                                                    |
+--------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                    | AUTO                                                     |
| OS-EXT-AZ:availability_zone          | nova                                                     |
| OS-EXT-SRV-ATTR:host                 | node-4.domain.tld                                        |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | node-4.domain.tld                                        |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000039                                        |
| OS-EXT-STS:power_state               | 1                                                        |
| OS-EXT-STS:task_state                | -                                                        |
| OS-EXT-STS:vm_state                  | active                                                   |
| OS-SRV-USG:launched_at               | 2015-10-19T02:22:12.000000                               |
| OS-SRV-USG:terminated_at             | -                                                        |
| accessIPv4                           |                                                          |
| accessIPv6                           |                                                          |
| config_drive                         |                                                          |
| created                              | 2015-10-19T02:21:50Z                                     |
| flavor                               | m1.micro (0eefecd0-97e4-4516-809e-90f85a9a03f3)          |
| hostId                               | c3d0268a383dcc25cedf8853096e472d36a34b91908f299c6634c176 |
| id                                   | 95985dfe-8356-4bb6-8ec7-46730d1b4c41                     |
| image                                | TestVM (82f4c727-5b18-427f-9094-0f92eed5607e)            |
| key_name                             | -                                                        |
| metadata                             | {}                                                       |
| name                                 | tc-test-by-zhanghui                                      |
| net04 network                        | 192.168.111.14                                           |
| os-extended-volumes:volumes_attached | []                                                       |
| progress                             | 0                                                        |
| security_groups                      | default                                                  |
| status                               | ACTIVE                                                   |
| tenant_id                            | ba7a2c788ab849e0a583cf54bec0a1f4                         |
| updated                              | 2015-10-19T02:22:12Z                                     |
| user_id                              | e36dfa5518fe42bfa753c59674b6ed59                         |
+--------------------------------------+----------------------------------------------------------+
2.2 list instance port
1
2
3
4
5
6
7
root@node-1:~# nova interface-list 95985dfe-8356-4bb6-8ec7-46730d1b4c41

+------------+--------------------------------------+--------------------------------------+----------------+-------------------+
| Port State | Port ID                              | Net ID                               | IP addresses   | MAC Addr          |
+------------+--------------------------------------+--------------------------------------+----------------+-------------------+
| ACTIVE     | fc618310-b936-4247-a5e4-2389a9d8c50e | de2df80a-5f69-4630-85ac-4081ce70de98 | 192.168.111.14 | fa:16:3e:5e:99:0c |
+------------+--------------------------------------+--------------------------------------+----------------+-------------------+
2.3 show port info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
root@node-1:~# neutron port-show fc618310-b936-4247-a5e4-2389a9d8c50e

+-----------------------+---------------------------------------------------------------------------------------+
| Field                 | Value                                                                                 |
+-----------------------+---------------------------------------------------------------------------------------+
| admin_state_up        | True                                                                                  |
| allowed_address_pairs |                                                                                       |
| binding:host_id       | node-4.domain.tld                                                                     |
| binding:profile       | {}                                                                                    |
| binding:vif_details   | {"port_filter": true, "ovs_hybrid_plug": true}                                        |
| binding:vif_type      | ovs                                                                                   |
| binding:vnic_type     | normal                                                                                |
| device_id             | 95985dfe-8356-4bb6-8ec7-46730d1b4c41                                                  |
| device_owner          | compute:nova                                                                          |
| extra_dhcp_opts       |                                                                                       |
| fixed_ips             | {"subnet_id": "d75307b5-137f-4fd4-8c30-17974a8489a3", "ip_address": "192.168.111.14"} |
| id                    | fc618310-b936-4247-a5e4-2389a9d8c50e                                                  |
| mac_address           | fa:16:3e:5e:99:0c                                                                     |
| name                  |                                                                                       |
| network_id            | de2df80a-5f69-4630-85ac-4081ce70de98                                                  |
| security_groups       | 344e5239-44cd-4b14-9cf1-d0c3bd2f27ed                                                  |
| status                | ACTIVE                                                                                |
| tenant_id             | ba7a2c788ab849e0a583cf54bec0a1f4                                                      |
+-----------------------+---------------------------------------------------------------------------------------+

root@node-1:~# neutron subnet-show d75307b5-137f-4fd4-8c30-17974a8489a3
+-------------------+------------------------------------------------------+
| Field             | Value                                                |
+-------------------+------------------------------------------------------+
| allocation_pools  | {"start": "192.168.111.2", "end": "192.168.111.254"} |
| cidr              | 192.168.111.0/24                                     |
| dns_nameservers   | 114.114.114.114                                      |
|                   | 8.8.8.8                                              |
| enable_dhcp       | True                                                 |
| gateway_ip        | 192.168.111.1                                        |
| host_routes       |                                                      |
| id                | d75307b5-137f-4fd4-8c30-17974a8489a3                 |
| ip_version        | 4                                                    |
| ipv6_address_mode |                                                      |
| ipv6_ra_mode      |                                                      |
| name              | net04__subnet                                        |
| network_id        | de2df80a-5f69-4630-85ac-4081ce70de98                 |
| subnetpool_id     |                                                      |
| tenant_id         | ba7a2c788ab849e0a583cf54bec0a1f4                     |
+-------------------+------------------------------------------------------+
2.4 dump instance xml by virsh
1
2
3
4
5
6
7
8
9
10
root@node-4:~# virsh dumpxml instance-00000039

<interface type='bridge'>
  <mac address='fa:16:3e:5e:99:0c'/>
  <source bridge='qbrfc618310-b9'/>
  <target dev='tapfc618310-b9'/>
  <model type='virtio'/>
  <alias name='net0'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
2.5 show linux bridge
1
2
3
4
5
6
7
8
9
10
root@node-4:~# brctl show
bridge name     bridge id               STP enabled     interfaces
br-aux          8000.e41d2d0f1091       no              bond0
                                                        p_e52381cd-0
br-fw-admin             8000.3863bb3350c8       no              eth2
br-mgmt         8000.e41d2d0f1091       no              bond0.101
br-storage              8000.e41d2d0f1091       no              bond0.103

qbrfc618310-b9          8000.26d598f1427a       no              qvbfc618310-b9
                                                        tapfc618310-b9
2.6 show ovs bridge
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
root@node-4:~# ovs-vsctl show   
ea7c0ff3-687a-4e5d-8e0c-787a3bf57206
    Bridge br-int
        fail_mode: secure
        Port "qvofc618310-b9"
            tag: 6
            Interface "qvofc618310-b9"
        Port int-br-prv
            Interface int-br-prv
                type: patch
                options: {peer=phy-br-prv}
        Port br-int
            Interface br-int
                type: internal
    Bridge br-prv
        Port phy-br-prv
            Interface phy-br-prv
                type: patch
                options: {peer=int-br-prv}
        Port br-prv
            Interface br-prv
                type: internal
        Port "p_e52381cd-0"
            Interface "p_e52381cd-0"
                type: internal
    ovs_version: "2.3.1"

3. create tc rule

注意: tc rule 重启Host后会丢失!!!

1
2
linux bridge port = qvbxxxxxxx
ovs beidge port = qvoxxxxxxx
3.1 create
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# linux bridge limit outgoing bandwidth
tc qdisc add dev <LinuxBridge Port> root handle 1: htb default 100

tc class add dev <LinuxBridge Port> parent 1: classid 1:100 htb rate <Bandwidth>mbit ceil <Bandwidth*2>mbit burst <Bandwidth*10>mbit
tc qdisc add dev <LinuxBridge Port> parent 1:100 sfq perturb 10

tc class add dev <LinuxBridge Port> parent 1: classid 1:1 htb rate 10gbit
tc qdisc add dev <LinuxBridge Port> parent 1:1 sfq perturb 10

tc filter add dev <LinuxBridge Port> protocol ip parent 1: prio 1 u32 match ip dst <Subnet CIDR> flowid 1:1

# ovs bridge limit ingoing bandwidth
tc qdisc add dev <OVSBridge Port> root handle 1: htb default 100

tc class add dev <OVSBridge Port> parent 1: classid 1:1 htb rate 10gbit
tc qdisc add dev <OVSBridge Port> parent 1:1 sfq perturb 10

tc class add dev <OVSBridge Port> parent 1: classid 1:100 htb rate <Bandwidth>mbit ceil <Bandwidth*2>mbit burst <Bandwidth*10>mbit
tc qdisc add dev <OVSBridge Port> parent 1:100 sfq perturb 10

tc filter add dev <OVSBridge Port> protocol ip parent 1: prio 1 u32 match ip src <Subnet CIDR> flowid 1:1
3.2 update
1
2
tc class change dev <LinuxBridge Port> parent 1: classid 1:100 htb rate <New Bandwidth>mbit ceil <New Bandwidth * 2>mbit burst <New Bandwidth * 10>mbit 
tc class change dev <OVSBridge Port> parent 1: classid 1:100 htb rate <New Bandwidth>mbit ceil <New Bandwidth * 2>mbit burst <New Bandwidth * 10>mbit 
3.3 delete
1
2
tc qdisc del dev <LinuxBridge Port> root
tc qdisc del dev <OVSBridge Port> root
3.4 show
1
2
3
tc -s qdisc show dev <Port>
tc -s class show dev <Port>
tc -s filter show dev <Port>

4. test

1
2
3
4
# no bandwidth limit
100%[============================================================>] 610,271,232 91.3MB/s   in 6.3s

# 5Mbit limit

5. listen to openstack notification bus to create tc rule

#!/usr/bin/env python
#-*- coding=utf-8 -*-

# python qos_agent.py > /dev/null 2>&1 &

import datetime
import logging
import requests
import subprocess
from kombu.mixins import ConsumerMixin
from kombu.log import get_logger
from kombu import Queue, Exchange


###### eonboard config ######
EONBAORD_API_URL = "10.6.13.82:8000"

###### log config ######
LOG = get_logger(__name__)
LOG.setLevel(logging.INFO)
f_handler = logging.FileHandler('/var/log/nova/qos_agent.log')
f_handler.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(levelname)s: %(message)s')
f_handler.setFormatter(formatter)
LOG.addHandler(f_handler)


###### rabbit config ######
HOST = "14.14.15.6"
PORT = 5673
USER = "nova"
PASSWD = "CavteiuV"
CONN_STR = "amqp://%s:%s@%s:%s//" % (USER, PASSWD, HOST, PORT)
TASK_QUEUE = [
Queue("qos.nova.info",
Exchange('nova', 'topic', durable=False),
durable=False, routing_key='notifications.info'),
Queue("qos.nova.error",
Exchange('nova', 'topic', durable=False),
durable=False, routing_key='notifications.error'),

Queue('qos.neutron.info',
Exchange('neutron', 'topic', durable=False),
durable=False, routing_key='notifications.info'),
Queue('qos.neutron.error',
Exchange('neutron', 'topic', durable=False),
durable=False, routing_key='notifications.error'),
]


###### tc template ######

CLEAN_RULE = ["tc qdisc del dev %(linux_bridge_port)s root",
"tc qdisc del dev %(ovs_port)s root"]

CREATE_LINUX_BRIDGE_RULE = """
tc qdisc add dev
%(linux_bridge_port)s root handle 1: htb default 100
tc class add dev
%(linux_bridge_port)s parent 1: classid 1:100 htb rate %(bandwidth)dmbit ceil %(bandwidth_2)dmbit burst %(bandwidth_10)dmbit
tc qdisc add dev
%(linux_bridge_port)s parent 1:100 sfq perturb 10
tc class add dev
%(linux_bridge_port)s parent 1: classid 1:1 htb rate 10gbit
tc qdisc add dev
%(linux_bridge_port)s parent 1:1 sfq perturb 10
tc filter add dev
%(linux_bridge_port)s protocol ip parent 1: prio 1 u32 match ip dst %(subnet_cidr)s flowid 1:1
"""


CREATE_OVS_PROT_RULE = """
tc qdisc add dev
%(ovs_port)s root handle 1: htb default 100
tc class add dev
%(ovs_port)s parent 1: classid 1:1 htb rate 10gbit
tc qdisc add dev
%(ovs_port)s parent 1:1 sfq perturb 10
tc class add dev
%(ovs_port)s parent 1: classid 1:100 htb rate %(bandwidth)dmbit ceil %(bandwidth_2)dmbit burst %(bandwidth_10)dmbit
tc qdisc add dev
%(ovs_port)s parent 1:100 sfq perturb 10
tc filter add dev
%(ovs_port)s protocol ip parent 1: prio 1 u32 match ip src %(subnet_cidr)s flowid 1:1
"""


def get_instance_args_by_payload(payload):
args = None
if payload.has_key("instance_id"):
args = payload["instance_id"]
if payload.has_key("floatingip"):
args = payload["floatingip"].get("floating_ip_address", None)
floating_port = payload["floatingip"].get("port_id", None)
if not floating_port:
args = None
if args:
resp = requests.get("http://%(eonboard_api_url)s/api/instances/%(args)s/detail/" % {
"eonboard_api_url": EONBAORD_API_URL, 'args': args})
if resp.status_code == 200:
return resp.json()

return None

def make_sure_tc_qos_exist(payload):
if not payload:
LOG.info("Create tc rule, but payload is null.")
return

if type(payload) == type(list):
payload = payload[0]
instance = get_instance_args_by_payload(payload)
if not instance:
LOG.info("get instance by payload is None. payload:[%s]", payload)
return
port_11 = instance["port"][0:11]
prepare_args = {"linux_bridge_port": "qvb%s" % port_11,
"ovs_port": "qvo%s" % port_11,
"subnet_cidr": instance["network_info"]["address"],
"bandwidth": instance["bandwidth"]*1,
"bandwidth_2": instance["bandwidth"]*2,
"bandwidth_10": instance["bandwidth"]*10,
}

linux_bridge_port_rule = CREATE_LINUX_BRIDGE_RULE % prepare_args
ovs_bridge_port_rule = CREATE_OVS_PROT_RULE % prepare_args

#print linux_bridge_port_rule
#print ovs_bridge_port_rule

cmd_list = []
for cmd in linux_bridge_port_rule.split("\n"):
if len(cmd) > 0:
cmd_list.append(cmd)

for cmd in ovs_bridge_port_rule.split("\n"):
if len(cmd) > 0:
cmd_list.append(cmd)

for cmd in CLEAN_RULE:
cmd = cmd % prepare_args
try:
ret = subprocess.call(['ssh', instance["host"], cmd])
except:
pass

ret = subprocess.call(['ssh', instance["host"], " && ".join(cmd_list)])
if ret == 0:
LOG.info("[Instance:%s] qos execute succeed. \ncmd: %s", instance['uuid'], cmd_list)
else:
LOG.error("[Instance:%s] cmd: %s", instance.uuid, cmd_list)
LOG.error("[Instance:%s] qos execute failed.", instance['uuid'])


MESSAGE_PROCESS = {
"compute.instance.create.end": make_sure_tc_qos_exist,
"compute.instance.power_on.end": make_sure_tc_qos_exist,
"floatingip.update.end": make_sure_tc_qos_exist,
}


class Worker(ConsumerMixin):

def __init__(self, connection):
self.connection = connection

def get_consumers(self, Consumer, channel):
return [Consumer(queues=TASK_QUEUE,
accept=['json'],
callbacks=[self.process_message])]

def process_message(self, body, message):
try:
event_type = body.get('event_type', None)
if event_type in MESSAGE_PROCESS.keys():
MESSAGE_PROCESS[event_type](body.get('payload', None))
else:
LOG.warn("Ingnore event_type [%s]", event_type)
except Exception as e:
LOG.exception("Process message exception")
message.ack()


if __name__ == '__main__':
from kombu import Connection
from kombu.utils.debug import setup_logging
setup_logging(loglevel='DEBUG', loggers=[''])
LOG.info(CONN_STR)
with Connection(CONN_STR) as conn:
try:
LOG.info("#################GO###################")
worker = Worker(conn)
worker.run()
except KeyboardInterrupt:
LOG.info('bye bye')

##prapare to install # for all nodes sudo useradd -d /home/ceph -m ceph sudo passwd ceph echo “ceph ALL = (root) NOPASSWD:ALL” | sudo tee /etc/sudoers.d/ceph sudo chmod 0440 /etc/sudoers.d/ceph

##admin-node node(ceph and root) ssh-keygen vi ~/.ssh/config chmod 600 ~/.ssh/config

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[ceph@rdo-manager my-cluster]$ cat ~/.ssh/config 
Host rdo-manager
HostName 192.168.11.100
User ceph
Port 22

Host rdo-compute1
HostName 192.168.11.101
User ceph
Port 22

ssh-copy-id ceph@rdo-manager  
ssh-copy-id ceph@rdo-compute1

ceph-deploy install host-name

这里ceph-deploy会安装wget并会配置ceph的yum源,不过我们希望rdo和ceph的源都用本地镜像,可以使用如下方法解决源内baseurl指向本地:

1
2
export CEPH_DEPLOY_REPO_URL="http://your-mirrors/rpm-emperor/el6"
export CEPH_DEPLOY_GPG_URL="http://your-mirrors/rpm-emperor/gpg.key"

1
qemu-img -h | grep 'rbd'  

###config ceph cluster

[root@eayun-admin ceph]# ceph-deploy disk list eayun-compute1

1
2
3
4
5
6
7
8
9
10
11
12
[ceph_deploy.cli][INFO  ] Invoked (1.3.5): /usr/bin/ceph-deploy disk list eayun-compute1
[eayun-compute1][DEBUG ] connected to host: eayun-compute1 
[eayun-compute1][DEBUG ] detect platform information from remote host
[eayun-compute1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] Listing disks on eayun-compute1...
[eayun-compute1][INFO  ] Running command: ceph-disk list
[eayun-compute1][DEBUG ] /dev/sda :
[eayun-compute1][DEBUG ]  /dev/sda1 other, ext4, mounted on /
[eayun-compute1][DEBUG ]  /dev/sda2 other, swap
[eayun-compute1][DEBUG ] /dev/sdb other, unknown
[eayun-compute1][DEBUG ] /dev/sdc other, unknown

[root@eayun-admin ceph]# ceph-deploy disk zap eayun-compute1:/dev/sdb

1
2
3
4
5
6
7
8
9
10
11
12
[ceph_deploy.cli][INFO  ] Invoked (1.3.5): /usr/bin/ceph-deploy disk zap eayun-compute1:/dev/sdb
[ceph_deploy.osd][DEBUG ] zapping /dev/sdb on eayun-compute1
[eayun-compute1][DEBUG ] connected to host: eayun-compute1 
[eayun-compute1][DEBUG ] detect platform information from remote host
[eayun-compute1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[eayun-compute1][DEBUG ] zeroing last few blocks of device
[eayun-compute1][INFO  ] Running command: sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb
[eayun-compute1][DEBUG ] Creating new GPT entries.
[eayun-compute1][DEBUG ] GPT data structures destroyed! You may now partition the disk using fdisk or
[eayun-compute1][DEBUG ] other utilities.
[eayun-compute1][DEBUG ] The operation has completed successfully.

[root@eayun-admin ceph]# ceph-deploy disk prepare eayun-compute1:/dev/sdb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[ceph_deploy.cli][INFO  ] Invoked (1.3.5): /usr/bin/ceph-deploy disk prepare eayun-compute1:/dev/sdb
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks eayun-compute1:/dev/sdb:
[eayun-compute1][DEBUG ] connected to host: eayun-compute1 
[eayun-compute1][DEBUG ] detect platform information from remote host
[eayun-compute1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] Deploying osd to eayun-compute1
[eayun-compute1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[eayun-compute1][WARNIN] osd keyring does not exist yet, creating one
[eayun-compute1][DEBUG ] create a keyring file
[eayun-compute1][INFO  ] Running command: udevadm trigger --subsystem-match=block --action=add
[ceph_deploy.osd][DEBUG ] Preparing host eayun-compute1 disk /dev/sdb journal None activate False
[eayun-compute1][INFO  ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdb
[eayun-compute1][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdb
[eayun-compute1][DEBUG ] Information: Moved requested sector from 34 to 2048 in
[eayun-compute1][DEBUG ] order to align on 2048-sector boundaries.
[eayun-compute1][DEBUG ] The operation has completed successfully.
[eayun-compute1][DEBUG ] Information: Moved requested sector from 10485761 to 10487808 in
[eayun-compute1][DEBUG ] order to align on 2048-sector boundaries.
[eayun-compute1][DEBUG ] The operation has completed successfully.
[eayun-compute1][DEBUG ] meta-data=/dev/sdb1              isize=2048   agcount=4, agsize=196543 blks
[eayun-compute1][DEBUG ]          =                       sectsz=512   attr=2, projid32bit=0
[eayun-compute1][DEBUG ] data     =                       bsize=4096   blocks=786171, imaxpct=25
[eayun-compute1][DEBUG ]          =                       sunit=0      swidth=0 blks
[eayun-compute1][DEBUG ] naming   =version 2              bsize=4096   ascii-ci=0
[eayun-compute1][DEBUG ] log      =internal log           bsize=4096   blocks=2560, version=2
[eayun-compute1][DEBUG ]          =                       sectsz=512   sunit=0 blks, lazy-count=1
[eayun-compute1][DEBUG ] realtime =none                   extsz=4096   blocks=0, rtextents=0
[eayun-compute1][DEBUG ] The operation has completed successfully.
[ceph_deploy.osd][DEBUG ] Host eayun-compute1 is now ready for osd use.

[root@eayun-admin ceph]# ceph-deploy disk list eayun-compute1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[ceph_deploy.cli][INFO  ] Invoked (1.3.5): /usr/bin/ceph-deploy disk list eayun-compute1
[eayun-compute1][DEBUG ] connected to host: eayun-compute1 
[eayun-compute1][DEBUG ] detect platform information from remote host
[eayun-compute1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] Listing disks on eayun-compute1...
[eayun-compute1][INFO  ] Running command: ceph-disk list
[eayun-compute1][DEBUG ] /dev/sda :
[eayun-compute1][DEBUG ]  /dev/sda1 other, ext4, mounted on /
[eayun-compute1][DEBUG ]  /dev/sda2 other, swap
[eayun-compute1][DEBUG ] /dev/sdb :
[eayun-compute1][DEBUG ]  /dev/sdb1 ceph data, active, cluster ceph, osd.0, journal /dev/sdb2
[eayun-compute1][DEBUG ]  /dev/sdb2 ceph journal, for /dev/sdb1
[eayun-compute1][DEBUG ] /dev/sdc :
[eayun-compute1][DEBUG ]  /dev/sdc1 ceph data, active, cluster ceph, osd.1, journal /dev/sdc2
[eayun-compute1][DEBUG ]  /dev/sdc2 ceph journal, for /dev/sdc1

[root@eayun-admin ceph]# ceph-deploy disk activate eayun-compute1:/dev/sdb

1
2
3
4
5
6
7
8
9
[ceph_deploy.cli][INFO  ] Invoked (1.3.5): /usr/bin/ceph-deploy disk activate eayun-compute1:/dev/sdb
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks eayun-compute1:/dev/sdb:
[eayun-compute1][DEBUG ] connected to host: eayun-compute1 
[eayun-compute1][DEBUG ] detect platform information from remote host
[eayun-compute1][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] activating host eayun-compute1 disk /dev/sdb
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[eayun-compute1][INFO  ] Running command: ceph-disk-activate --mark-init sysvinit --mount /dev/sdb

[root@eayun-admin ceph]# ceph osd lspools

1
0 data,1 metadata,2 rbd,

[root@eayun-admin ceph]# ceph status

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
cluster cfb2142b-32e2-41d2-9244-9c56bedd7846
 health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean
 monmap e1: 1 mons at {eayun-admin=192.168.11.210:6789/0}, election epoch 1, quorum 0 eayun-admin
 osdmap e8: 2 osds: 2 up, 2 in
  pgmap v15: 192 pgs, 3 pools, 0 bytes data, 0 objects
        68372 kB used, 6055 MB / 6121 MB avail
             192 active+degraded ...继续添加4个OSD,然后 >[root@eayun-admin ceph]# ceph status

cluster cfb2142b-32e2-41d2-9244-9c56bedd7846
 health HEALTH_OK
 monmap e1: 1 mons at {eayun-admin=192.168.11.210:6789/0}, election epoch 1, quorum 0 eayun-admin
 osdmap e22: 6 osds: 6 up, 6 in
  pgmap v53: 192 pgs, 3 pools, 8 bytes data, 1 objects
        201 MB used, 18164 MB / 18365 MB avail
             192 active+clean >[root@eayun-admin ~(keystone_admin)]# ceph auth list

installed auth entries:

osd.0
        key: AQCy3DRTQF7TJBAAolYze6AHAhKb/aiYsP7i8Q==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.1
        key: AQDj3TRT6J9bKxAAWZHGH7ATaeqPv9Iw4hT3ag==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.2
        key: AQBk3zRTaFTHCBAAWb5M/2GaUbWg9sEBeeMsEQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.3
        key: AQCa3zRTyDI6IBAAWT28xyblxjJitBmxkcFhiA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.4
        key: AQCk3zRTkKkFIBAAuu5Qbjv8IZt5sL9ro/mEzw==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.5
        key: AQCu3zRTsF5XIRAASdgzc8BfaRyLvc+bCH6rEQ==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQBeOTRTcFaRLxAA48EyD2xpaq+jHfaAj1fdQg==
        caps: [mds] allow
        caps: [mon] allow *
        caps: [osd] allow *
client.bootstrap-mds
        key: AQBfOTRToBYGJRAAw4PCsuWhj8HmrIjBE46k8g==
        caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
        key: AQBfOTRTSJXQBhAABp3lUFkyxer7LYRNisDR/A==
        caps: [mon] allow profile bootstrap-osd

next step to config ceph for openstack glance and cinder

[root@eayun-admin ceph]# ceph osd pool create volumes 1000

1
2
3
pool 'volumes' created >[root@eayun-admin ceph]# ceph osd pool create images 1000

pool 'images' created

[root@eayun-admin ceph]# ceph auth get-or-create client.volumes mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images’

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[client.volumes]
    key = AQCF6DRTOGGrEBAAnanwo1Qs5pONEGB2lCe49Q== >[root@eayun-admin ceph]# ceph auth get-or-create client.images mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images'

[client.images]
    key = AQCU6DRTUH2GOBAA0fkbK3ANNqq9es/F0NXSyQ== >[root@eayun-admin ceph(keystone_admin)]# ll

-rw-r--r-- 1 root root    72 Mar 27 14:45 ceph.bootstrap-mds.keyring
-rw-r--r-- 1 root root    72 Mar 27 14:45 ceph.bootstrap-osd.keyring
-rw------- 1 root root    64 Mar 27 14:46 ceph.client.admin.keyring
-rw-r--r-- 1 root root    64 Mar 28 03:54 ceph.client.images.keyring
-rw-r--r-- 1 root root    65 Mar 28 06:08 ceph.client.volumes.keyring
-rw-r--r-- 1 root root   343 Mar 28 02:42 ceph.conf
-rw-r--r-- 1 root root 41044 Mar 28 02:34 ceph.log
-rw-r--r-- 1 root root    73 Mar 27 14:31 ceph.mon.keyring
-rwxr-xr-x 1 root root    92 Dec 20 22:47 rbdmap

copy ceph keyring to compute node(repeat this step on every compute node, fucking repeat!!!)

[root@eayun-admin ceph]# ceph auth get-key client.volumes ssh eayun-compute1 2 3 tee client.volumes.key
1
2
3
4
5
6
7
8
9
10
11
12
13
14
AQCF6DRTOGGrEBAAnanwo1Qs5pONEGB2lCe49Q== >[root@eayun-admin ~]# uuidgen #this uuid is also used by cinder.conf:rbd_secret_uuid!!!!

a01a8859-8d0d-48de-8d03-d6f40cc40646 >[root@eayun-admin ceph]# ssh eayun-compute1

[root@eayun-compute1 ~]# cat > secret.xml <<EOF  
> <secret ephemeral='no' private='no'>
>     <uuid>a01a8859-8d0d-48de-8d03-d6f40cc40646</uuid>
>     <usage type='ceph'>  
>         <name>client.volumes secret</name>
>     </usage>
> </secret>
> EOF >[root@eayun-compute1 ~]# virsh secret-define secret.xml 

Secret c0bca24d-4648-500f-9590-0f934ad13572 created

[root@eayun-compute1 ~]# virsh secret-set-value –secret {uuid of secret} –base64 $(cat client.volumes.key) && rm client.volumes.key secret.xml

1
2
3
4
5
6
7
8
Secret value set

rm: remove regular file `client.volumes.key'? y
rm: remove regular file `secret.xml'? y >[root@eayun-compute1 ~]# virsh secret-list

UUID                                 Usage
-----------------------------------------------------------
c0bca24d-4648-500f-9590-0f934ad13572 Unused

config glance to use ceph backend.

[root@eayun-admin ceph]# touch /etc/ceph/ceph.client.images.keyring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[client.images]
    key = AQCU6DRTUH2GOBAA0fkbK3ANNqq9es/F0NXSyQ== 验证:rbd --id user-name ls pool-name >[root@eayun-admin ceph]# rbd --id images ls images

rbd: pool images doesn't contain rbd images >[root@eayun-admin ceph]# vi /etc/glance/glance-api.conf

default_store=rbd
rbd_store_user=images
rbd_store_pool=images >[root@eayun-admin ceph]# /etc/init.d/openstack-glance-api restart

Stopping openstack-glance-api:                             [  OK  ]
Starting openstack-glance-api:                             [  OK  ] >[root@eayun-admin ceph]# /etc/init.d/openstack-glance-registry restart

Stopping openstack-glance-registry:                        [  OK  ]
Starting openstack-glance-registry:                        [  OK  ] 上传一个模板 >[root@eayun-admin ceph]# rbd --id images ls images

2bfbc891-b185-41a0-a373-655b5870babb >[root@eayun-admin ~(keystone_admin)]# glance image-list

+--------------------------------------+------------------------------+-------------+------------------+----------+--------+
| ID                                   | Name                         | Disk Format | Container Format | Size     | Status |
+--------------------------------------+------------------------------+-------------+------------------+----------+--------+
| 2bfbc891-b185-41a0-a373-655b5870babb | cirros-0.3.1-x86_64-disk.img | qcow2       | bare             | 13147648 | active |
+--------------------------------------+------------------------------+-------------+------------------+----------+--------+ #### config cinder to use ceph backend.  >[root@eayun-admin ~(keystone_admin)]# vi /etc/cinder/cinder.conf

volume_driver=cinder.volume.drivers.rbd.RBDDriver
backup_ceph_conf=/etc/ceph/ceph.conf
rbd_pool=volumes
glance_api_version=2
rbd_user=volumes
rbd_secret_uuid={uuid of secret} >[root@eayun-admin ~(keystone_admin)]# /etc/init.d/openstack-cinder-api restart   >[root@eayun-admin ~(keystone_admin)]# /etc/init.d/openstack-cinder-scheduler restart   >[root@eayun-admin ~(keystone_admin)]# /etc/init.d/openstack-cinder-volume restart

[root@eayun-admin ceph]# rbd –id volumes ls volumes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
rbd: pool volumes doesn't contain rbd images >[root@eayun-admin ~(keystone_admin)]# cinder create --display-name cinder-ceph-vol1 --display-description "first cinder volume on ceph backend" 1

+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2014-03-28T05:25:47.475149      |
| display_description | first cinder volume on ceph backend  |
|     display_name    |           cinder-ceph-vol1           |
|          id         | a85c780a-9003-4bff-8271-5e200c9cad5e |
|       metadata      |                  {}                  |
|         size        |                  1                   |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+

[root@eayun-admin cinder(keystone_admin)]# cinder list

1
2
3
4
5
6
7
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 59d0b231-f5a3-45d5-98a2-007319f65529 | available |  ceph-vol1   |  1   |     None    |  false   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ >[root@eayun-admin cinder(keystone_admin)]# rbd --id volumes ls volumes

volume-59d0b231-f5a3-45d5-98a2-007319f65529

DEBUG

[root@eayun-compute3 11008f92-d7bf-42d3-ac2c-acc2a54ffe9e]# virsh start instance-00000003

1
2
3
4
5
6
error: Failed to start domain instance-00000003
error: internal error Process exited while reading console log output: char device redirected to /dev/pts/0
qemu-kvm: -drive file=rbd:volumes/volume-59d0b231-f5a3-45d5-98a2-007319f65529:id=libvirt:key=AQCF6DRTOGGrEBAAnanwo1Qs5pONEGB2lCe49Q==:auth_supported=cephx\;none:mon_host=192.168.11.210\:6789,if=none,id=drive-ide0-0-1: error connecting
qemu-kvm: -drive file=rbd:volumes/volume-59d0b231-f5a3-45d5-98a2-007319f65529:id=libvirt:key=AQCF6DRTOGGrEBAAnanwo1Qs5pONEGB2lCe49Q==:auth_supported=cephx\;none:mon_host=192.168.11.210\:6789,if=none,id=drive-ide0-0-1: could not open disk image rbd:volumes/volume-59d0b231-f5a3-45d5-98a2-007319f65529:id=libvirt:key=AQCF6DRTOGGrEBAAnanwo1Qs5pONEGB2lCe49Q==:auth_supported=cephx\;none:mon_host=192.168.11.210\:6789: Operation not permitted
//ceph.log
2014-03-28 09:09:26.488788 7ffcd6eeb700  0 cephx server client.libvirt: couldn't find entity name: client.libvirt

###CEPH相关操作 ####About Pool To show a pool’s utilization statistics, execute:

[root@eayun-admin ceph]# rados df

1
2
3
4
5
6
7
8
9
pool name       category                 KB      objects       clones     degraded      unfound           rd        rd KB           wr        wr KB
data            -                          1            1            0            0           0           17           13            4            3
images          -                          0            0            0            0           0            0            0            0            0
metadata        -                          0            0            0            0           0            0            0            0            0
rbd             -                          0            0            0            0           0            0            0            0            0
volumes         -                          0            0            0            0           0            0            0            0            0
  total used          235868            1
  total avail       18570796
  total space       18806664  

[root@eayun-admin ceph]# ceph osd pool get images size

1
2
3
4
5
6
7
8
9
10
11
12
13
size: 3 >[root@eayun-admin ceph]# ceph osd pool get images pg_num

pg_num: 1000 >[root@eayun-admin ceph]# ceph osd pool get images pgp_num

pgp_num: 1000 >[root@eayun-admin ceph]# ceph osd pool set images pgp_num 99

set pool 4 pgp_num to 99 >[root@eayun-admin ceph]# ceph osd pool set images pg_num 99

specified pg_num 99 <= current 1000 不能把一个Pool的pg_num缩小!!! >[root@eayun-admin ceph]# ceph osd pool get images pg_num

pg_num: 1000 删除一个Pool. yes, i really really mean it :^) >[root@eayun-admin ceph]# ceph osd pool delete images

Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool images.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.

[root@eayun-admin ceph]# ceph osd pool delete images images –yes-i-really-really-mean-it

1
pool 'images' deleted

Pool的ID是类似主键的自增序列,删除Pool重新创建后ID持续增加。

[root@eayun-admin ceph]# ceph osd lspools

1
0 data,1 metadata,2 rbd,5 images,6 volumes,

[1]Shrink LVM File System
[2]Grow LVM File System


<h4>收缩分区空间</h4>

初始时候磁盘分区情况,/var/lib/nova/instances分区挂载在lv上,查看磁盘分区和lv信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
root@glos-manager:~# df -h
Filesystem                                   Size  Used Avail Use% Mounted on
/dev/mapper/cinder--volumes-system--root     3.7G  990M  2.6G  28% /
udev                                         993M  4.0K  993M   1% /dev
tmpfs                                        401M  352K  401M   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                        1002M     0 1002M   0% /run/shm
/dev/sda1                                    184M   31M  145M  18% /boot
/dev/sda3                                    939M   18M  875M   2% /home
/dev/sda2                                    939M  524M  368M  59% /usr
/dev/mapper/cinder--volumes-nova--instances  2.2G   68M  2.1G   4% /var/lib/nova/instances

root@glos-manager:~# lvdisplay
  --- Logical volume ---
  LV Name                /dev/cinder-volumes/nova-instances
  VG Name                cinder-volumes
  LV UUID                IQCf3L-Zebu-CfrD-H5ct-qlg6-DnTI-SaQWef
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                2.22 GiB
  Current LE             568
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0

  --- Logical volume ---
  LV Name                /dev/cinder-volumes/system-root
  VG Name                cinder-volumes
  LV UUID                XVAGEN-Orq2-OnRI-FXvY-G3yd-dRiE-JdDlmw
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3.73 GiB
  Current LE             954
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

缩小lv的空间。

1
2
3
4
5
umount /var/lib/nova/instances
e2fsck -f /dev/mapper/cinder--volumes-nova--instances
resize2fs /dev/mapper/cinder--volumes-nova--instances 1500M
lvresize -L 1.5G /dev/cinder-volumes/nova-instances
mount /dev/mapper/cinder--volumes-nova--instances /var/lib/nova/instances

查看缩小后的lv信息和磁盘分区情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
root@glos-manager:/# lvdisplay
  --- Logical volume ---
  LV Name                /dev/cinder-volumes/nova-instances
  VG Name                cinder-volumes
  LV UUID                IQCf3L-Zebu-CfrD-H5ct-qlg6-DnTI-SaQWef
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                1.50 GiB
  Current LE             384
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0

  --- Logical volume ---
  LV Name                /dev/cinder-volumes/system-root
  VG Name                cinder-volumes
  LV UUID                XVAGEN-Orq2-OnRI-FXvY-G3yd-dRiE-JdDlmw
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3.73 GiB
  Current LE             954
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

root@glos-manager:/# mount /dev/mapper/cinder--volumes-nova--instances /var/lib/nova/instances
root@glos-manager:/# df -h
Filesystem                                   Size  Used Avail Use% Mounted on
/dev/mapper/cinder--volumes-system--root     3.7G  990M  2.6G  28% /
udev                                         993M  4.0K  993M   1% /dev
tmpfs                                        401M  352K  401M   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                        1002M     0 1002M   0% /run/shm
/dev/sda1                                    184M   31M  145M  18% /boot
/dev/sda3                                    939M   18M  875M   2% /home
/dev/sda2                                    939M  524M  368M  59% /usr
/dev/mapper/cinder--volumes-nova--instances  1.5G   68M  1.4G   5% /var/lib/nova/instances

<h4>扩展分区空间</h4>

查看磁盘分区情况和lv信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
root@glos-manager:/# lvdisplay
  --- Logical volume ---
  LV Name                /dev/cinder-volumes/nova-instances
  VG Name                cinder-volumes
  LV UUID                IQCf3L-Zebu-CfrD-H5ct-qlg6-DnTI-SaQWef
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                1.50 GiB
  Current LE             384
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0

  --- Logical volume ---
  LV Name                /dev/cinder-volumes/system-root
  VG Name                cinder-volumes
  LV UUID                XVAGEN-Orq2-OnRI-FXvY-G3yd-dRiE-JdDlmw
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3.73 GiB
  Current LE             954
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

root@glos-manager:/# df -kh
Filesystem                                   Size  Used Avail Use% Mounted on
/dev/mapper/cinder--volumes-system--root     3.7G  990M  2.6G  28% /
udev                                         993M  4.0K  993M   1% /dev
tmpfs                                        401M  352K  401M   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                        1002M     0 1002M   0% /run/shm
/dev/sda1                                    184M   31M  145M  18% /boot
/dev/sda3                                    939M   18M  875M   2% /home
/dev/sda2                                    939M  524M  368M  59% /usr
/dev/mapper/cinder--volumes-nova--instances  1.5G   68M  1.4G   5% /var/lib/nova/instances

查看volume group的空间使用情况。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@glos-manager:/# vgdisplay cinder-volumes
  --- Volume group ---
  VG Name               cinder-volumes
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               5.95 GiB
  PE Size               4.00 MiB
  Total PE              1522
  Alloc PE / Size       1338 / 5.23 GiB
  Free  PE / Size       184 / 736.00 MiB
  VG UUID               EIKfKf-mIvG-Sabt-JlV4-19Ot-XqSK-3dEXE0

还有700+MB可用,分配500MB到/var/lib/nova/instances空间。

1
2
3
root@glos-manager:/# lvresize -L +500MB /dev/cinder-volumes/nova-instances
  Extending logical volume nova-instances to 1.99 GiB
  Logical volume nova-instances successfully resized

查看LV信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
root@glos-manager:/# lvdisplay
  --- Logical volume ---
  LV Name                /dev/cinder-volumes/nova-instances
  VG Name                cinder-volumes
  LV UUID                IQCf3L-Zebu-CfrD-H5ct-qlg6-DnTI-SaQWef
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                1.99 GiB
  Current LE             509
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0

  --- Logical volume ---
  LV Name                /dev/cinder-volumes/system-root
  VG Name                cinder-volumes
  LV UUID                XVAGEN-Orq2-OnRI-FXvY-G3yd-dRiE-JdDlmw
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                3.73 GiB
  Current LE             954
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

查看磁盘分区情况。

1
2
3
4
5
6
7
8
9
10
11
root@glos-manager:/# df -h
Filesystem                                   Size  Used Avail Use% Mounted on
/dev/mapper/cinder--volumes-system--root     3.7G  990M  2.6G  28% /
udev                                         993M  4.0K  993M   1% /dev
tmpfs                                        401M  352K  401M   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                        1002M     0 1002M   0% /run/shm
/dev/sda1                                    184M   31M  145M  18% /boot
/dev/sda3                                    939M   18M  875M   2% /home
/dev/sda2                                    939M  524M  368M  59% /usr
/dev/mapper/cinder--volumes-nova--instances  1.5G   68M  1.4G   5% /var/lib/nova/instances

instances分区依然保留原来1.5G分区,还需要对instances做扩展。

1
2
3
4
5
6
root@glos-manager:/# resize2fs -p /dev/mapper/cinder--volumes-nova--instances
resize2fs 1.42 (29-Nov-2011)
Filesystem at /dev/mapper/cinder--volumes-nova--instances is mounted on /var/lib/nova/instances; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/mapper/cinder--volumes-nova--instances to 521216 (4k) blocks.
The filesystem on /dev/mapper/cinder--volumes-nova--instances is now 521216 blocks long.

查看扩展后的效果。

1
2
3
4
5
6
7
8
9
10
11
root@glos-manager:/# df -h
Filesystem                                   Size  Used Avail Use% Mounted on
/dev/mapper/cinder--volumes-system--root     3.7G  990M  2.6G  28% /
udev                                         993M  4.0K  993M   1% /dev
tmpfs                                        401M  352K  401M   1% /run
none                                         5.0M     0  5.0M   0% /run/lock
none                                        1002M     0 1002M   0% /run/shm
/dev/sda1                                    184M   31M  145M  18% /boot
/dev/sda3                                    939M   18M  875M   2% /home
/dev/sda2                                    939M  524M  368M  59% /usr
/dev/mapper/cinder--volumes-nova--instances  2.0G   68M  1.8G   4% /var/lib/nova/instances

参考资料:
[LVM Resizing Guide ]

[1]Check User Password </br>


####Check User Password####

import passlib.hash

CONF.crypt_strength = 4000

def trunc_password(password):
"""Truncate passwords to the MAX_PASSWORD_LENGTH."""
if len(password) > MAX_PASSWORD_LENGTH:
return password[:MAX_PASSWORD_LENGTH]
else:
return password


def hash_password(password):
"""Hash a password. Hard."""
password_utf8 = password.encode('utf-8')
if passlib.hash.sha512_crypt.identify(password_utf8):
return password_utf8
h = passlib.hash.sha512_crypt.encrypt(password_utf8,
rounds=CONF.crypt_strength)
return h


def check_password(password, hashed):
"""Check that a plaintext password matches hashed.

hashpw returns the salt value concatenated with the actual hash value.
It extracts the actual salt if this value is then passed as the salt.

"""

if password is None:
return False
password_utf8 = password.encode('utf-8')
return passlib.hash.sha512_crypt.verify(password_utf8, hashed)

####Looping Call####

同事最近把jenkins里的openstack的压力测试跑起来了,压力测试的方案使用openstack项目中的tempest, 24*7循环跑testcase,早晨同事发现很多VMs状态是Error,看了一下nova-sechduler,因为资源不够导致没有host可用,因此得出一个结论:openstack存在资源泄露。 我们现在就要一起来探索一下openstack的资源调度和监控机制,找到问题解决疑惑,并且需要确认是否真的泄露。

首先看两段log,/var/log/nova/nova-compute.log和/var/log/nova/nova-scheduler.log。

1
/var/log/nova/nova-compute.log (host: trystack-compute3)
1
nova.compute.resource_tracker [-] Hypervisor: free ram (MB): 23353 
1
nova.compute.resource_tracker [-] Hypervisor: free disk (GB): 4271
1
nova.compute.resource_tracker [-] Hypervisor: free VCPUs: -43 
1
nova.compute.resource_tracker [-] Free ram (MB): -46683
1
nova.compute.resource_tracker [-] Free disk (GB): -7332
1
nova.compute.resource_tracker [-] Free VCPUS: -90
1
nova.compute.resource_tracker [-] Compute_service record updated for trystack-compute3

1
/var/log/nova/nova-scheduler.log (host: trystack-manager)
1
nova.openstack.common.rpc.amqp [-] received method 'update_service_capabilities'
1
nova.scheduler.host_manager [-] Received compute service update from trystack-compute2.
1
nova.scheduler.host_manager [-] Received compute service update from trystack-compute2.
1
nova.scheduler.host_manager [-] Received compute service update from trystack-compute2.

openstack的资源监控机制是每个host自己负责将自己的资源使用情况实时的持久化到数据库里,启动VM的时候需要通过资源调度将VM路由到资源最合理的host上去,那么nova-compute与nova-scheduler是怎么交互的呢?nova-scheduler又是怎么获取每个host上的资源情况呢?

update available resource

首先看看nova-compute的update_available_resource这个定时任务(default:60S),我们看看他做了些什么事情。

  1. 定时执行update_available_resouce, 这是compute/resouce_tracker.py: ResouceTracker的一个方法
  2. resource_tracker.update_available_resource的逻辑:
    • resource = libvirt_dirver.get_available_resource() 这个资源都是host的真实资源情况,还不包含instance中flavor的资源占用情况。
    • nova-compute log 打印Hypervisor资源相关的信息
    • purge_expired_claims这里我没仔细看
    • 获取当前host上所有未删除的云主机
    • 对resource进行资源扣除根据每个instance的flavor信息mem/disk
    • 持久化resource信息到mysql

report driver status

再来看一个和nova-scheduler相关的_report_driver_status(60S),我们再看看他做了些什么事情。

  1. 检查时间间隔
  2. capabilities = libvirt_driver.get_host_stats (host的真实资源情况和一些系统参数)
  3. update self.last_capabilities = capabilities
  4. 另外一个periodic task发生,把capabilities的信息发送给nova-scheduler
  5. nova-scheduler服务接受这个rpc call之后cache住这个host的信息
  6. 当启动VM的时候nova-scheduler获取到所有的cached的host capabilities,并且通过mysql里的compute_node信息更新capabilities信息。
  7. 到此nova-scheduler终于和update_available_resource有了联系。

总起来说,这个逻辑真复杂,为什么不直接用mysql多加了一个rpc call意义在哪?最后还是要依赖于db里的compute_node信息?

    #/nova/compute/manager.py: ComputeManager --> SchedulerDependentManager
@manager.periodic_task
def _report_driver_status(self, context):
curr_time = time.time()
if curr_time - self._last_host_check > FLAGS.host_state_interval:
self._last_host_check = curr_time
LOG.info(_("Updating host status"))
# This will grab info about the host and queue it
# to be sent to the Schedulers.
capabilities = self.driver.get_host_stats(refresh=True)
capabilities['host_ip'] = FLAGS.my_ip
self.update_service_capabilities(capabilities)

#/nova/manager.py: SchedulerDependentManager
@periodic_task
def _publish_service_capabilities(self, context):
"""Pass data back to the scheduler at a periodic interval."""
if self.last_capabilities:
LOG.debug(_('Notifying Schedulers of capabilities ...'))
self.scheduler_rpcapi.update_service_capabilities(context,
self.service_name, self.host, self.last_capabilities)

#/nova/schedule/host_manager.py: HostManager
def update_service_capabilities(self, service_name, host, capabilities):
"""Update the per-service capabilities based on this notification."""
LOG.debug(_("Received %(service_name)s service update from "
"%(host)s.") % locals())
service_caps = self.service_states.get(host, {})
# Copy the capabilities, so we don't modify the original dict
capab_copy = dict(capabilities)
capab_copy["timestamp"] = timeutils.utcnow() # Reported time
service_caps[service_name] = capab_copy
self.service_states[host] = service_caps

def get_all_host_states(self, context, topic):
"""Returns a dict of all the hosts the HostManager
knows about. Also, each of the consumable resources in HostState
are pre-populated and adjusted based on data in the db.

For example:
{'192.168.1.100': HostState(), ...}

Note: this can be very slow with a lot of instances.
InstanceType table isn't required since a copy is stored
with the instance (in case the InstanceType changed since the
instance was created)."""


if topic != 'compute':
raise NotImplementedError(_(
"host_manager only implemented for 'compute'"))

# Get resource usage across the available compute nodes:
compute_nodes = db.compute_node_get_all(context)
for compute in compute_nodes:
service = compute['service']
if not service:
LOG.warn(_("No service for compute ID %s") % compute['id'])
continue
host = service['host']
capabilities = self.service_states.get(host, None)
host_state = self.host_state_map.get(host)
if host_state:
host_state.update_capabilities(topic, capabilities,
dict(service.iteritems()))
else:
host_state = self.host_state_cls(host, topic,
capabilities=capabilities,
service=dict(service.iteritems()))
self.host_state_map[host] = host_state
host_state.update_from_compute_node(compute)

return self.host_state_map
#/nova/compute/manager.py:ComputeManager
@manager.periodic_task
def update_available_resource(self, context):
#self.resource_tracker = resource_tracker.ResourceTracker(host, driver)
self.resource_tracker.update_available_resource(context)

#/nova/compute/resource_tracker.py:ResourceTracker
@utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)
def update_available_resource(self, context):
resources = self.driver.get_available_resource()
#self._verify_resources(resources)
#self._report_hypervisor_resource_view(resources)
self._purge_expired_claims()
instances = db.instance_get_all_by_host(context, self.host)
self._update_usage_from_instances(resources, instances)
#self._report_final_resource_view(resources)
self._sync_compute_node(context, resources)

def _sync_compute_node(self, context, resources):
"""Create or update the compute node DB record"""
def _get_service(self, context):
try:
return db.service_get_all_compute_by_host(context,
self.host)[0]
except exception.NotFound:
LOG.warn(_("No service record for host %s"), self.host)

if not self.compute_node:
# we need a copy of the ComputeNode record:
service = self._get_service(context)
if not service:
# no service record, disable resource
return

compute_node_ref = service['compute_node']
if compute_node_ref:
self.compute_node = compute_node_ref[0]

def _create(self, context, values):
"""Create the compute node in the DB"""
# initialize load stats from existing instances:
compute_node = db.compute_node_create(context, values)
self.compute_node = dict(compute_node)

def _update(self, context, values, prune_stats=False):
"""Persist the compute node updates to the DB"""
compute_node = db.compute_node_update(context,
self.compute_node['id'], values, prune_stats)
self.compute_node = dict(compute_node)

if not self.compute_node:
resources['service_id'] = service['id']
self._create(context, resources)
LOG.info(_('Compute_service record created for %s ') % self.host)
else:
self._update(context, resources, prune_stats=True)
LOG.info(_('Compute_service record updated for %s ') % self.host)


def _update_usage_from_instances(self, resources, instances):
"""Calculate resource usage based on instance utilization. This is
different than the hypervisor's view as it will account for all
instances assigned to the local compute host, even if they are not
currently powered on.
"""

self.tracked_instances.clear()

# purge old stats
self.stats.clear()

# set some intiial values, reserve room for host/hypervisor:
resources['local_gb_used'] = FLAGS.reserved_host_disk_mb / 1024
resources['memory_mb_used'] = FLAGS.reserved_host_memory_mb
resources['vcpus_used'] = 0
resources['free_ram_mb'] = (resources['memory_mb'] -
resources['memory_mb_used'])
resources['free_disk_gb'] = (resources['local_gb'] -
resources['local_gb_used'])
resources['current_workload'] = 0
resources['running_vms'] = 0

for instance in instances:
self._update_usage_from_instance(resources, instance)

def _update_usage_from_instance(self, resources, instance):
"""Update usage for a single instance."""

uuid = instance['uuid']
is_new_instance = uuid not in self.tracked_instances
is_deleted_instance = instance['vm_state'] == vm_states.DELETED

if is_new_instance:
self.tracked_instances[uuid] = 1
sign = 1

if instance['vm_state'] == vm_states.DELETED:
self.tracked_instances.pop(uuid)
sign = -1

self.stats.update_stats_for_instance(instance)

# if it's a new or deleted instance:
if is_new_instance or is_deleted_instance:
# new instance, update compute node resource usage:
resources['memory_mb_used'] += sign * instance['memory_mb']
resources['local_gb_used'] += sign * instance['root_gb']
resources['local_gb_used'] += sign * instance['ephemeral_gb']

# free ram and disk may be negative, depending on policy:
resources['free_ram_mb'] = (resources['memory_mb'] -
resources['memory_mb_used'])
resources['free_disk_gb'] = (resources['local_gb'] -
resources['local_gb_used'])

resources['running_vms'] = self.stats.num_instances
resources['vcpus_used'] = self.stats.num_vcpus_used
resources['current_workload'] = self.stats.calculate_workload()
resources['stats'] = self.stats

"""libvirt.driver.get_available_resource""""

def get_available_resource(self):
"""Retrieve resource info.

This method is called as a periodic task and is used only
in live migration currently.

:returns: dictionary containing resource info
libvrit api examples:
>>> import libvirt
>>> conn = libvirt.openReadOnly("qemu:///system")
>>> conn.getInfo()
['x86_64', 24049, 8, 1197, 2, 1, 4, 1]
>>> conn.getVersion()
1003001
>>> conn.getType()
'QEMU'
>>> conn.getHostname()
'trystack-manager'
>>> conn.listDomainsID()
[4, 5, 1, 27, 10, 3, 2]
>>> dom = conn.lookupByID(4)
>>> dom.name()
'instance-00000031'
>>> int(os.path.getsize('/var/lib/nova/instances/instance-00000031/disk'))
35651584
>>> os.system("qemu-img info /var/lib/nova/instances/instance-00000031/disk")
image: /var/lib/nova/instances/instance-00000031/disk
file format: qcow2
virtual size: 24M (25165824 bytes)
disk size: 15M
cluster_size: 2097152
0
>>> dom
<libvirt.virDomain instance at 0x7ffdc5bca170>
>>> xml = dom.XMLDesc(0)
>>> from lxml import etree
>>> doc = etree.fromstring(xml)
>>> doc.findall('.//devices/disk/driver')
[<Element driver at 0x2004f00>, <Element driver at 0x2004e60>]
>>> driver_nodes = doc.findall('.//devices/disk/driver')
>>> path_nodes = doc.findall('.//devices/disk/source')
>>> disk_nodes = doc.findall('.//devices/disk')
>>> enumerate(path_nodes)
<enumerate object at 0x2004d20>
>>> list(path_nodes)
[<Element source at 0x2004fa0>]
>>> l = enumerate(path_nodes)
>>> l.next()
(0, <Element source at 0x2004fa0>)
>>> l = enumerate(path_nodes)
>>> i, ele = l.next()
>>> disk_nodes[i].get('type')
'file'
>>> ele.get('file')
'/var/lib/nova/instances/instance-00000031/disk'
>>> driver_nodes[i].get('type')
'qcow2'
>>> from nova.virt import images
>>> d = images.qemu_img_info(ele.get('file'))
>>> d
{'file format': 'qcow2', 'image': '/var/lib/nova/instances/instance-00000031/disk', 'disk size': '15M', 'virtual size': '24M (25165824 bytes)', 'cluster_size': '2097152'}
>>> d.get('backing file')
>>>
"""
dic = {'vcpus': self.get_vcpu_total(), #multiprocessing.cpu_count()
'vcpus_used': self.get_vcpu_used(), #host instance vcpu
'memory_mb': self.get_memory_mb_total(), #conn.getInfo()[1]
'memory_mb_used': self.get_memory_mb_used(), #total - free(buggers/cache)
'local_gb': self.get_local_gb_total(), #total HDD $instance_path
'local_gb_used': self.get_local_gb_used(), #used HDD $instance_path

'hypervisor_type': self.get_hypervisor_type(), #conn.get_type()=QEMU
'hypervisor_version': self.get_hypervisor_version(), #conn.getVersion()=1003001
'hypervisor_hostname': self.get_hypervisor_hostname(),#conn.getHostname()=trystack-manager
'cpu_info': self.get_cpu_info(), #cpu architure
'disk_available_least': self.get_disk_available_least()}
return dic

#########################
def get_disk_available_least(self):
"""
Return disk available least size.

The size of available disk, when block_migration command given
disk_over_commit param is FALSE.

The size that deducted real nstance disk size from the total size
of the virtual disk of all instances.

"""
# available size of the disk
dk_sz_gb = self.get_local_gb_total() - self.get_local_gb_used()

# Disk size that all instance uses : virtual_size - disk_size
instances_name = self.list_instances()
instances_sz = 0
for i_name in instances_name:
try:
disk_infos = jsonutils.loads(
self.get_instance_disk_info(i_name))
for info in disk_infos:
i_vt_sz = int(info['virt_disk_size'])
i_dk_sz = int(info['disk_size'])
instances_sz += i_vt_sz - i_dk_sz
except OSError as e:
if e.errno == errno.ENOENT:
LOG.error(_("Getting disk size of
%(i_name)s: %(e)s") %
locals())
else:
raise
except exception.InstanceNotFound:
# Instance was deleted during the check so ignore it
pass
# NOTE(gtt116): give change to do other task.
greenthread.sleep(0)
# Disk available least size
available_least_size = dk_sz_gb * (1024 ** 3) - instances_sz
return (available_least_size / 1024 / 1024 / 1024)


def get_memory_mb_used(self):
"""
Get the free memory size(MB) of physical computer.

:returns: the total usage of memory(MB).

"""

if sys.platform.upper() not in ['LINUX2', 'LINUX3']:
return 0

m = open('/proc/meminfo').read().split()
idx1 = m.index('MemFree:')
idx2 = m.index('Buffers:')
idx3 = m.index('Cached:')
if FLAGS.libvirt_type == 'xen':
used = 0
for domain_id in self.list_instance_ids():
# skip dom0
dom_mem = int(self._conn.lookupByID(domain_id).info()[2])
if domain_id != 0:
used += dom_mem
else:
# the mem reported by dom0 is be greater of what
# it is being used
used += (dom_mem -
(int(m[idx1 + 1]) +
int(m[idx2 + 1]) +
int(m[idx3 + 1])))
# Convert it to MB
return used / 1024
else:
avail = (int(m[idx1 + 1]) + int(m[idx2 + 1]) + int(m[idx3 + 1]))
# Convert it to MB
return self.get_memory_mb_total() - avail / 1024


def get_fs_info(path):
"""
Get free/used/total space info for a filesystem

:param path: Any dirent on the filesystem
:returns: A dict containing:

:free: How much space is free (in bytes)
:used: How much space is used (in bytes)
:total: How big the filesystem is (in bytes)
"""
hddinfo = os.statvfs(path)
total = hddinfo.f_frsize * hddinfo.f_blocks
free = hddinfo.f_frsize * hddinfo.f_bavail
used = hddinfo.f_frsize * (hddinfo.f_blocks - hddinfo.f_bfree)
return {'total': total,
'free': free,
'used': used}


def get_local_gb_used(self):
"""
Get the free hdd size(GB) of physical computer.

:returns:
The total usage of HDD(GB).
Note that this value shows a partition where
NOVA-INST-DIR/instances mounts.

"""

stats = get_fs_info(FLAGS.instances_path)
return stats['used'] / (1024 ** 3)


def get_local_gb_total():
"""
Get the total hdd size(GB) of physical computer.

:returns:
The total amount of HDD(GB).
Note that this value shows a partition where
NOVA-INST-DIR/instances mounts.

"""

stats = get_fs_info(FLAGS.instances_path)
return stats['total'] / (1024 ** 3)

def get_vcpu_total():
"""
Get vcpu number of physical computer.

:returns: the number of cpu core.

"""

# On certain platforms, this will raise a NotImplementedError.
try:
return multiprocessing.cpu_count()
except NotImplementedError:
LOG.warn(_("Cannot get the number of cpu, because this "
"function is not implemented for this platform. "
"This error can be safely ignored for now."))
return 0
def get_memory_mb_total(self):
"""
Get the total memory size(MB) of physical computer.

:returns: the total amount of memory(MB).

"""

return self._conn.getInfo()[1]

def get_vcpu_used(self):
"""
Get vcpu usage number of physical computer.

:returns: The total number of vcpu that currently used.

"""

total = 0
for dom_id in self.list_instance_ids():
dom = self._conn.lookupByID(dom_id)
vcpus = dom.vcpus()
if vcpus is None:
# dom.vcpus is not implemented for lxc, but returning 0 for
# a used count is hardly useful for something measuring usage
total += 1
else:
total += len(vcpus[1])
# NOTE(gtt116): give change to do other task.
greenthread.sleep(0)
return total