Wednesday, February 18, 2015

Debugging openstack .. not a cake!

So I have been playing with openstack for few days and as cool as it is ..i have to admit its a nightmare to debug.

so how do you start ?

1) find out all the major player nodes
2) go to each node and look into /var/log/
3) find out which one was written when your task failed

for examples

 I spinned up a vm and it was taking forever to come up.
 I logged into my compute node and found this

2015-02-18 10:07:26.586 3215 TRACE nova.compute.manager [instance: 0d28a0ad-93ce-43f5-b984-3c33f614861c] RemoteError: Remote error: OperationalError (OperationalError) (1048, "Column 'instance_uuid' cannot be null") 'UPDATE instance_extra SET updated_at=%s, instance_uuid=%s WHERE instance_extra.id = %s' (datetime.datetime(2015, 2, 18, 16, 7, 26, 576147), None, 152339L)


Now a friend pointed that this could be because of version mismatched.

So i logged into my controller node and checked the version

root@njain-compute:~# dpkg -l nova-compute
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================================================-==============================-==============================-============================================================================================================
ii nova-compute 1:2014.2.1-0ubuntu1~cloud0 all OpenStack Compute - compute node base

So i logged into my controller node and checked the version

root@control-\:~# dpkg -l |grep nova 
ii  nova-api                            1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - API frontend
ii  nova-cert                           1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - certificate management
ii  nova-common                         1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - common files
ii  nova-conductor                      1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - conductor service
ii  nova-consoleauth                    1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - Console Authenticator
ii  nova-novncproxy                     1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - NoVNC proxy
ii  nova-scheduler                      1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute - virtual machine scheduler
ii  python-nova                         1:2014.2.1-0ubuntu1~cloud0            all          OpenStack Compute Python libraries
ii  python-novaclient                   1:2.19.0-0ubuntu1~cloud0              all          client library for OpenStack Compute API

So i logged into my cinder node and checked the version

root@cinder-controller-mel01-1:~# dpkg -l|grep cinder
ii cinder-api 1:2014.2-0ubuntu1~cloud0 all Cinder storage service - API server
ii cinder-common 1:2014.2-0ubuntu1~cloud0 all Cinder storage service - common files
ii cinder-scheduler 1:2014.2-0ubuntu1~cloud0 all Cinder storage service - Scheduler server
ii python-cinder 1:2014.2-0ubuntu1~cloud0 all Cinder Python libraries
ii python-cinderclient 1:1.1.0-0ubuntu1~cloud0 all python bindings to the OpenStack Volume API


and as you can see cinder is out of sync .. and its version needs to match  1:2014.2.1


 well thats one way to debug(primarily since i was told version mismatch might be the issue)  though I wish debugging openstack was easier. 
Often people say reinstalling/rebuilding is better than debugging openstack and google is not yet abuzz with many clues .. here is to hoping!