Adsense Code

Thursday, 3 September 2015

Snapshot still being created on a VM for hours but never ever completes? - Here's what to do.

If you are creating a snapshot or your backup product has tried to create a snapshot and many hours later, it is STILL trying to create the snapshot then you don't have many options.

Once you have tried the usual safe options like cancel, stop, stopping the backup job etc etc - then you are left with the KILL option.  And like the word kill, it is kind of final.  You will have downtime.  Only as long as it is for the VM to power backup etc but we all know those workloads are variable these days - so the downtime thing is a pain.

You will need


  • Putty client
  • The name of the host that the VM is on
  • The display name of the VM
  • Downtime
  • Console open on the VM that you are going to KILL


Method

  • Go to the security profile on that host and enable SSH
  • Connect via SSH (putty) to that host.
  • Type in

    ps | grep vmx | grep <name of VM>
  • This will then display the VM in question like so

The first column shows the different processes that are tied to the parent process.  In this case, the parent process is 21592627 and it is that process that we need to kill.  Remember - this has downtime.

Just type in

kill <parent process number>


Now go to the console of the VM and you can monitor what is going on.  You should get an error message about the vmx file and the VM should restart.  You should then within a few seconds get the snapshot job cancel!  Yay!


Tuesday, 1 September 2015

vSphere Server appliance using up lots of disk space and how to tidy it up

I had a problem where the vCenter Server appliance that we use ended up with very full disks.  First of all, and as you may have gleamed from other previous blogs, I got around this by extending disks and migrating data around.

But this just keeps the wolf from the door!  I logged a call with VMWare and they were referred me to this article which is great in that it stops the issue from getting any worse but it doesn't tidy up the woe that has been left behind.

So what do you do?

Maybe do a clone of your vCenter appliance for an immediate backup.

Find out which host is running your vCenter appliance and then connect directly to the host with a vSphere client as you are about to bring your vcenter environment down.

Login to the console your direct vcenter appliance now.

Type in

service vmware-vpxd stop

Then  

service vmware-vpostgres stop


Type the following command in make a copy of the original postgresql.conf file (all one entry)

cp /storage/db/vpostgres/postgresql.conf /storage/db/vpostgres/postgresql.conf.orig

So now we need to edit postgresql.conf - the VMware article says with a text editor, I used vi which is on the appliance but I have also included instructions for vi as I come a from a Windows background and whilst I have come to enjoy vi's quirks - when there are big issues going on, you don't want to faff about!

vi /storage/db/vpostgres/postgresql.conf

Page down and using the cursor keys scroll down to line 312 where you want to edit the log_min_messages = error line

Press the insert key and now you are editing the config file!

Delete the first hash/number so that the config file will be read and then delete the word warning and replace with the word error.



Now press the escape key and now you are not editing the file.

Press colon and then w and then press enter - you'll see in the bottom left of the screen a

:w

Pressing enter saves the file because you have just written the file (hence the w) 

Press colon again and then q and then press enter

:q

This will allow you to quit (hence the q....)

But now we need to delete the old excess data - change to the directory where all the excess data

cd /storage/db/vpostgres/pg_log

Then this gem of a command that I found which deletes EVERYTHING in that directory except for the 5 most recent files.

rm `ls -t | awk 'NR>5'`

On my British keyboard within the VM console, to get a pipe, | , I need to press Shift and ~ over by the main enter key.

If you have run out of space drastically, you will then have a lot of data to delete.  So this command can actually take a while to run.  Monitor it on your graphs as you do have a direct connection to the host that is running your vCenter appliance after all.

Now you just need to get vCenter back up and running.

service vmware-vpostgres start

and then

service vmware-vpxd start