My.Sys.Admin: May 2009

Tuesday 26 May 2009

Things not to do on an ILO

unless you need to take down the server in a hurry, do not choose the CTRL-ALT-DEL function button on the ilo

it will reboot the server

understanding the emc syminq command and what it tells you about devices

so here is a syminq command output

eudt0206:root / > ./var/tmp/SYMCLI.depot/SYMCLI/SYMCLI/SYMCLI/V6.4.0/bin/syminq

Device Product Device
---------------------------- --------------------------- ---------------------
Name Type Vendor ID Rev Ser Num Cap (KB)
---------------------------- --------------------------- ---------------------
/dev/rdsk/c0t0d0 COMPAQ BF1468A4CC HPB5 3KN2TWTK 143374744
/dev/rdsk/c0t0d0s1 COMPAQ BF1468A4CC HPB5 3KN2TWTK 143374744
/dev/rdsk/c0t0d0s2 COMPAQ BF1468A4CC HPB5 3KN2TWTK 143374744
/dev/rdsk/c0t0d0s3 COMPAQ BF1468A4CC HPB5 3KN2TWTK 143374744
/dev/rdsk/c0t1d0 COMPAQ BF1468A4CC HPB5 3KN2VX2A 143374744
/dev/rdsk/c2t0d0 COMPAQ BF1468A4CC HPB5 3KN2THV5 143374744
/dev/rdsk/c2t0d0s1 COMPAQ BF1468A4CC HPB5 3KN2THV5 143374744
/dev/rdsk/c2t0d0s2 COMPAQ BF1468A4CC HPB5 3KN2THV5 143374744
/dev/rdsk/c2t0d0s3 COMPAQ BF1468A4CC HPB5 3KN2THV5 143374744
/dev/rdsk/c2t1d0 COMPAQ BF1468A4CC HPB5 3KN2STSA 143374744
/dev/rdsk/c3t2d0 Optiarc DVD RW AD-5* 1.31 Nov20,2 N/A
/dev/rdsk/c4t0d0 GK EMC SYMMETRIX 5771 0400040000 2880
/dev/rdsk/c4t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800
/dev/rdsk/c4t9d5 M(4) EMC SYMMETRIX 5771 0400721000 35692800
/dev/rdsk/c4t9d6 M(4) EMC SYMMETRIX 5771 0400725000 35692800
/dev/rdsk/c4t9d7 M(4) EMC SYMMETRIX 5771 0400729000 35692800
/dev/rdsk/c4t13d7 M(4) EMC SYMMETRIX 5771 040072D000 35692800
/dev/rdsk/c7t0d0 GK EMC SYMMETRIX 5771 0400040000 2880
/dev/rdsk/c7t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800
/dev/rdsk/c7t9d5 M(4) EMC SYMMETRIX 5771 0400721000 35692800
/dev/rdsk/c7t9d6 M(4) EMC SYMMETRIX 5771 0400725000 35692800
/dev/rdsk/c7t9d7 M(4) EMC SYMMETRIX 5771 0400729000 35692800
/dev/rdsk/c7t13d7 M(4) EMC SYMMETRIX 5771 040072D000 35692800
/dev/vx/rdmp/c0t0d0s2 COMPAQ BF1468A4CC HPB5 3KN2TWTK 143374744
/dev/vx/rdmp/c0t1d0 COMPAQ BF1468A4CC HPB5 3KN2VX2A 143374744
/dev/vx/rdmp/c2t0d0s2 COMPAQ BF1468A4CC HPB5 3KN2THV5 143374744
/dev/vx/rdmp/c2t1d0 COMPAQ BF1468A4CC HPB5 3KN2STSA 143374744
/dev/vx/rdmp/c4t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800
/dev/vx/rdmp/c4t9d5 M(4) EMC SYMMETRIX 5771 0400721000 35692800
/dev/vx/rdmp/c4t9d6 M(4) EMC SYMMETRIX 5771 0400725000 35692800
/dev/vx/rdmp/c4t9d7 M(4) EMC SYMMETRIX 5771 0400729000 35692800
/dev/vx/rdmp/c4t13d7 M(4) EMC SYMMETRIX 5771 040072D000 35692800

let us break it down further - here we have a total of 5 EMC symmetrix disks, with multipathing;

/dev/rdsk/c4t0d0 GK EMC SYMMETRIX 5771 0400040000 2880
/dev/rdsk/c4t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800
/dev/rdsk/c4t9d5 M(4) EMC SYMMETRIX 5771 0400721000 35692800
/dev/rdsk/c4t9d6 M(4) EMC SYMMETRIX 5771 0400725000 35692800
/dev/rdsk/c4t9d7 M(4) EMC SYMMETRIX 5771 0400729000 35692800
/dev/rdsk/c4t13d7 M(4) EMC SYMMETRIX 5771 040072D000 35692800
/dev/rdsk/c7t0d0 GK EMC SYMMETRIX 5771 0400040000 2880
/dev/rdsk/c7t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800
/dev/rdsk/c7t9d5 M(4) EMC SYMMETRIX 5771 0400721000 35692800
/dev/rdsk/c7t9d6 M(4) EMC SYMMETRIX 5771 0400725000 35692800
/dev/rdsk/c7t9d7 M(4) EMC SYMMETRIX 5771 0400729000 35692800
/dev/rdsk/c7t13d7 M(4) EMC SYMMETRIX 5771 040072D000 35692800

that are also being used under vxvm;

/dev/vx/rdmp/c4t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800
/dev/vx/rdmp/c4t9d5 M(4) EMC SYMMETRIX 5771 0400721000 35692800
/dev/vx/rdmp/c4t9d6 M(4) EMC SYMMETRIX 5771 0400725000 35692800
/dev/vx/rdmp/c4t9d7 M(4) EMC SYMMETRIX 5771 0400729000 35692800
/dev/vx/rdmp/c4t13d7 M(4) EMC SYMMETRIX 5771 040072D000 35692800

to find out the device ID, look at the 6th column;

/dev/vx/rdmp/c4t9d4 M(4) EMC SYMMETRIX 5771 040071D000 35692800

040071D000 breaks down as
04 are the last 2 digits of the serial number of the Symmetrix array/frame
0071D is the Symmetrix device ID number
000 is always 000 and doesn't have meaning

Also - GK stands for Gatekeeper devices, and the M stands for Mirror (with the 4 indicating that it is a 4 way mirror)

VxVm - Show total used or occupied disk space in a particular disk group

Show total used or occupied disk space in a particular disk group.

Since the basic unit of allocation is the subdisk, use vxprint and total up
all the subdisks length for a particular diskgroup i.e

vxprint -g diskgroup_name -s

vxdg free

The Veritas volume manager (VxVM) provides logical volume management capabilites across a variety of platforms. As you create new volumes, it is often helpful to know how much free space is available. You can find free space using two methods. The first method utilizes vxdg’s “free” option:

$ vxdg -g oradg free

GROUP DISK DEVICE TAG OFFSET LENGTH FLAGS
oradg c3t20d1 c3t20d1s2 c3t20d1 104848640 1536 -
oradg c3t20d3 c3t20d3s2 c3t20d3 104848640 1536 -
oradg c3t20d5 c3t20d5s2 c3t20d5 104848640 1536 -
oradg c3t20d7 c3t20d7s2 c3t20d7 104848640 1536 -
oradg c3t20d9 c3t20d9s2 c3t20d9 104848640 1536 -

The “LENGTH” column displays the number of 512-byte blocks available on each disk drive in the disk group “oradg.”. To calculate the size of a volume, use vxprint, and look for the "length". The volume length is in sectors. Convert to kilobytes
by dividing by 2. To find out the free - look at the offset column.

Friday 22 May 2009

sftp logging on linux

try
/var/log/secure
or
/var/log/auth.log

Linux printing

Setting up network printers in linux

Make sure the printer name resolves (use nslookup and also do a ping test) – if that is the case you can assume that the initial printer config has been set up and the printer is on the network (ie the initial jetadmin setup)

Confirm that there is /etc/printcap present

Su to root

Run the following;

redhat-config-printer

or

system-config-printer

you get this GUI

you can then send a test print and then you can run an

lpq

to confirm that the queue is active

also do a ps –ef | grep cupsd

Wednesday 20 May 2009

If you get a VCS service group offline and faulted

Got an issue where the hastatus summary showed the diskgroup as offline failed;

-- SYSTEM STATE

-- System State Frozen

A eupr0001 RUNNING 0

A eupr0002 RUNNING 0

-- GROUP STATE

-- Group System Probed AutoDisabled State

B commonsg eupr0001 Y N ONLINE

B commonsg eupr0002 Y N ONLINE

B nsr01_sg eupr0001 Y N OFFLINE|FAULTED

B nsr01_sg eupr0002 Y N ONLINE

-- RESOURCES FAILED

-- Group Type Resource System

C nsr01_sg Application NetWorker eup

so tried the following;

hares -display
For each resource that is faulted run:

hares -clear resource-name -sys faulted-system

so in this case

hares -clear NetWorker -sys eupr0001

If all of these clear, then run hastatus -summary and make sure that these are clear. You should then see the FAULTED removed, and just be left with the ONLINE status

-------------------------------------------------------------------------
nsr01_proxy eupr0001 ONLINE
nsr01_proxy eupr0002 ONLINE
^Ceupr0001> hastatus -sum

-- SYSTEM STATE
-- System State Frozen

A eupr0001 RUNNING 0
A eupr0002 RUNNING 0

-- GROUP STATE
-- Group System Probed AutoDisabled State

B commonsg eupr0001 Y N ONLINE
B commonsg eupr0002 Y N ONLINE
B nsr01_sg eupr0001 Y N OFFLINE
B nsr01_sg eupr0002 Y N ONLINE
eupr0001>

If some don't clear you MAY be able to clear them on the group level. Only do this as last resort:

hagrp -disableresources groupname
hagrp -flush group -sys sysname
hagrp -enableresources groupname

To get a group to go online:

hagrp -online group -sys desired-system

However, if this was not the issue (because despite doing flushes, clears and onlines, vcs was stating there was not a problem) - it could be something else, like licenses - I did an nsrwatch on the server and could see an issue with networker licenses;

Server: ukprbklas001-mn.emea.abnamro-net.com Wed May 20 19:34:30 2009

Up since: Mon Jul 7 18:43:09 2008 Version: NetWorker nw_7_3_2_jumbo.Build.394 Base enabler disabled
Saves: 0 session(s) Recovers: 0 session(s)
Device type volume
/tmp/fd1 adv_file D.001
1/_AF_readonly adv_file D.001.RO read only
/tmp/fd2 adv_file W.001
2/_AF_readonly adv_file W.001.RO read only
/tmp/fd3 adv_file ukprbklas001.001
3/_AF_readonly adv_file ukprbklas001.001.RO read only

Sessions:

Messages:
Sat 00:13:00 registration info event: Server is disabled: Install base enabler
Sun 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Sun 00:13:00 registration info event: Server is disabled: Install base enabler
Mon 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Mon 00:13:00 registration info event: Server is disabled: Install base enabler
Tue 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Tue 00:13:00 registration info event: Server is disabled: Install base enabler
Wed 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Wed 00:13:00 registration info event: Server is disabled: Install base enabler
Thu 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Thu 00:13:00 registration info event: Server is disabled: Install base enabler
Fri 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Fri 00:13:00 registration info event: Server is disabled: Install base enabler
Sat 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Sat 00:13:00 registration info event: Server is disabled: Install base enabler
Sun 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Sun 00:13:00 registration info event: Server is disabled: Install base enabler
Mon 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Mon 00:13:00 registration info event: Server is disabled: Install base enabler
Tue 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Tue 00:13:00 registration info event: Server is disabled: Install base enabler
Wed 00:13:00 registration info: License enabler #none (NetWorker/10 Eval) has expired.
Wed 00:13:00 registration info event: Server is disabled: Install base enabler

Pending:
Mon 18:43:08 registration info: Server is disabled: Install base enabler

Tuesday 19 May 2009

VxVm various

Veritas Volume Manager

Tidbits

by sebastin on Apr.27, 2009, under Veritas Volume Manager

rootdg is a mandatory requirement for the older versions of VxVM. It is an optional in newer versions since 4.1

sliced type is used for the rootdg slices 3 and 4 are crated to hold separate private and public region partitions with all slices (apart from slice 2) zeroed out.
simple uses slice 3 to hold public and private regons and rest all slices (apart from slice 2) are zeroed out.
cdsdisk for cross platform data migration. slice 7 holds private and public regions. All other slices are zeroed out. and this format type is not suitable for root disk. (Cross-platform Data Sharing format)
none type is unformatted. This can not be set as a valid format.

When you move data from older versions to newer one and if you have ivc-snap or metromirror technology to replicate data on a regular basis, upgrading veritas from 3.5 to 5.0 may impose a problem while try to keep the compatibility with disk layouts.

This can possibly be fixed by inserting -T version number option to the vxdg init command.

If you want to force -T 90 on VxVM 5.0MP1, One of the following disk init might be required to force simple,

vxdisksetup -i disk_name format=sliced|simple|cdsdisk
vxdisk -f init disk_name type=auto format=sliced |simple|cdsdisk

vxdg -T 90 init new_dg_name disk_name

Private length areas is 32Mb. Maximum possible size is 524288 blocks (version 5.0
Public length is the size of the disk minus private area of the disk.

Leave a Comment more...

VxVM 5.0+ - How to import a disk group of cloned disks on the same host as the originals

by sebastin on Feb.24, 2009, under Veritas Volume Manager

Description

Cloned disks are created typically by hardware specific copying, such as
the Hitachi ‘Shadow Image’ feature, or EMC’s TimeFinder, or any of the
so-called “in-the-box data replication” features of some SUN arrays.

Prior to the 5.0 version of Veritas Volume Manager (VxVM), those
replicated (cloned) disks were mapped to a secondary host and the
diskgroup using those disks had to be imported there. This restriction
was because the disk’s VxVM private region contained identical information
as the original disk, causing VxVM confusion as to which was the original
and which was the replicated disk.

However, beginning with VxVM 5.0, additional features have been added
to allow VxVM to easily identify which disk(s) are originals and which
are clones, as well as commands to easily import a diskgroup made of up
cloned disks, therefore allowing these operations to take place even on
the same host as the original diskgroup.

This tutorial shows a simplified example of importing a diskgroup made up
of cloned disks on the same host as the original diskgroup.

Steps to Follow

We start with a simple 2-disk diskgroup called ‘ckdg’.

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 auto:sliced rootdisk rootdg online
c1t41d2s2 auto:sliced ckdg01 ckdg online
c1t41d3s2 auto:sliced ckdg02 ckdg online

We use some sort of cloning operation to create two new copies of these
two disks after deporting the diskgroup (to ensure consistency), and then
make those disks available to this host. Once those two new disks are
seen by solaris,

# vxdctl enable

will bring them into the vxvm configuration. We can now see them using:

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 auto:sliced rootdisk rootdg online
c1t41d2s2 auto:sliced - - online
c1t41d3s2 auto:sliced - - online
c1t41d4s2 auto:sliced - - online udid_mismatch
c1t41d5s2 auto:sliced - - online udid_mismatch

The ‘udid_mismatch’ indicates that vxvm knows this disk is a copy of some
other disk. The ‘udid_mismatch’ flag is further documented in the
‘vxdisk(1M)’ manpage.

So here, you have a choice as to how to proceed. You must remember that
VxVM will never allow you to have two diskgroups with the same name
imported at the same time. So, if you want both of these diskgroups (the
original and the cloned) to be imported simultaneously, you will have to
change the name of one of them.

For this example, we will leave the original diskgroup named ‘ckdg’,
and change the cloned diskgroup to ‘newckdg’.

First, import the original diskgroup:

# vxdg import ckdg

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 auto:sliced rootdisk rootdg online
c1t41d2s2 auto:sliced ckdg01 ckdg online
c1t41d3s2 auto:sliced ckdg02 ckdg online
c1t41d4s2 auto:sliced - - online udid_mismatch
c1t41d5s2 auto:sliced - - online udid_mismatch

The command to import the diskgroup using the cloned disks while renaming the
diskgroup is:

# vxdg -n newckdg -o useclonedev=on -o updateid import ckdg

The ‘useclonedev’ flag instructs vxvm to use ONLY cloned disks, not the
originals. You must use the ‘updateid’ flag as well, because we need to
validate these disks, and make sure that they are no longer mismatched
with regards to their UDID. This will update the UDID stored in the
disk’s vxvm private region to the actual disk’s UDID. Finally, the ‘-n’
flag specifies the new diskgroup name.

What we’re left with now is BOTH diskgroups imported simultaneously:

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 auto:sliced rootdisk rootdg online
c1t41d2s2 auto:sliced ckdg01 ckdg online
c1t41d3s2 auto:sliced ckdg02 ckdg online
c1t41d4s2 auto:sliced ckdg01 newckdg online clone_disk
c1t41d5s2 auto:sliced ckdg02 newckdg online clone_disk

Whether or not you chose to leave that ‘clone_disk’ flag turned on is up
to you. Since at this point, the ‘newckdg’ diskgroup is a full-fledged
diskgroup, there’s really no need to leave those flags on, so the commands

# vxdisk set c1t41d4s2 clone=off
# vxdisk set c1t41d5s2 clone=off

will turn off that flag, leaving us with:

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 auto:sliced rootdisk rootdg online
c1t41d2s2 auto:sliced ckdg01 ckdg online
c1t41d3s2 auto:sliced ckdg02 ckdg online
c1t41d4s2 auto:sliced ckdg01 newckdg online
c1t41d5s2 auto:sliced ckdg02 newckdg online

2 Comments more...

How to get VxVM to recognize that a hardware RAID LUN has been grown

by sebastin on Feb.24, 2009, under Veritas Volume Manager

For VxVM 4.0 and above

Beginning with VxVM 4.0, the vxdisk(1M) command has a new option
( resize ) that is provided to support dynamic LUN expansion
(DLE). This command option is available only if a Storage Foundation
license has been installed; the normal VxVM license key is not enough to
unlock this new functionality.

DLE should only be performed on arrays that preserve data. VxVM makes no
attempt to verify the validity of pre-existing data on the LUN, so the
user must validate with the array’s vendor whether or not the array
preserves the data already on the LUN when the LUN is grown.

This ‘vxdisk resize’ command updates the VTOC of the disk automatically.
The user does NOT need to run the ‘format’ utility to change the length of
partition 2 of the disk. In fact, the user doesn’t have to run ‘format’
at all!

Also, DLE can be done on-the-fly, even when there are VxVM volumes on
that disk/LUN, and while these volumes are mounted. There is a
requirement that there be at least one other disk in the same diskgroup,
because during the resize operation, the disk is temporarily/quietly
removed from the disk group, and it is not possible to remove the last
disk from a disk group.

Here is an example procedure illustrating the usage and results of this
command:

We will start with disk c1t1d0s2, which is currently 20965120 sectors
(approx 10GB) in size, and the volume (vol01) on that disk is the entire
size of the disk:

# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 auto:none - - online invalid
c0t1d0s2 auto:sliced - - online
c1t0d0s2 auto:sliced - - online
c1t1d0s2 auto:cdsdisk disk01 newdg online
c1t2d0s2 auto:cdsdisk disk02 newdg online
c1t3d0s2 auto:cdsdisk disk03 newdg online

# vxprint -ht
dm disk01 c1t1d0s2 auto 2048 20965120 -
dm disk02 c1t2d0s2 auto 2048 428530240 -
dm disk03 c1t3d0s2 auto 2048 428530240 -

v vol01 - ENABLED ACTIVE 20965120 SELECT - fsgen
pl vol01-01 vol01 ENABLED ACTIVE 20965120 CONCAT - RW
sd disk01-01 vol01-01 disk01 0 20965120 0 c1t1d0 ENA

# prtvtoc /dev/rdsk/c1t1d0s2
* Partition Tag Flags Sector Count Sector Mount Directory
2 5 01 0 20967424 20967423
7 15 01 0 20967424 20967423

The first step is to actually grow the LUN on the array according to the
documentation for your array. For this example, let’s assume the LUN
was grown to approximatly 20GB (i.e., doubled in size).
Nothing needs to be done on the host before performing this step.
The disk can remain in the diskgroup, and the volumes can remain mounted.

After the LUN has been grown on the array, nothing on the host will appear
different; the ‘format’ command and the ‘prtvtoc’ command will both show
the old (1GB) size, as will the ‘vxprint -ht’ command.

To get Solaris and VxVM to recognize the new disk size, we simply have
to run the command

# vxdisk resize c1t1d0s2

This command queries the LUN to determine the new size, and then updates
the disk’s VTOC as well as the data structures in the VxVM private region
on the disk to reflect the new size. There is no need to run any other
command. This command typically takes less than a few seconds to complete,
since there is no data to move.

At this point, we can rerun the commands we ran before and see the
differences (the ‘vxdisk list’ output will remain the same):

# vxprint -ht
dm disk01 c1t1d0s2 auto 2048 41936640 -
dm disk02 c1t2d0s2 auto 2048 428530240 -
dm disk03 c1t3d0s2 auto 2048 428530240 -

v vol01 - ENABLED ACTIVE 20965120 SELECT - fsgen
pl vol01-01 vol01 ENABLED ACTIVE 20965120 CONCAT - RW
sd disk01-01 vol01-01 disk01 0 20965120 0 c1t1d0 ENA

# prtvtoc /dev/rdsk/c1t1d0s2
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
2 5 01 0 41938944 41938943
7 15 01 0 41938944 41938943

We can see in these outputs that the VTOC of the disk is now showing it’s
new (20GB) size (41938944 sectors) and the ‘vxprint -ht’ is now showing
the disk ‘disk01′ with a larger public region (41936640 sectors). Of
course, the volume ‘vol01′ has NOT been grown - that part is left to the
administrator to use the ‘vxresize’ command, if that is desired.

Leave a Comment more...

Move a single volume to another diskgroup

by sebastin on Jan.27, 2009, under Veritas Volume Manager

1) Add a new mirror-way to the volume on a new free disk

# vxprint -ht app_vol
Disk group: rootdg

v app_vol - ENABLED ACTIVE 8388608 SELECT - fsgen
pl app_vol-01 app_vol ENABLED ACTIVE 8395200 CONCAT - RW
sd rootdisk-04 app_vol-01 rootdisk 41955647 8395200 0 c0t0d0 ENA
pl app_vol-02 app_vol ENABLED ACTIVE 8395200 CONCAT - RW
sd mirrdisk-04 app_vol-02 mirrdisk 41955648 8395200 0 c2t0d0 ENA

# vxdisksetup -if c3t1d0
# vxdg -g rootdg adddisk c3t1d0
# vxassist mirror app_vol alloc=c3t1d0

2) At the end of mirror:
vxprint -hmQqrL -g rootdg app_vol /tmp/kasper
vi /tmp/kasper I have removed all reference to plex app_vol-01 and app_vol-02 and i keep only reference to volume app_vol and plex app_vol-03. Later I have rename in the file app_vol in app_voltransfer and app_vol-03 in app_voltransfer-03.

3) Destroy new plex app_vol-03 and
# vxplex dis app_vol-03
# vxedit -rf rm app_vol-03

4) Create new framework for the volume (dg, disk …)
# vxdg -g rootdg rmdisk c3t1d0
# vxdg init appvoldg c3t1d0

5) create new volume on new dg
# vxmake -g appvoldg -d /tmp/kasper
# vxvol start app_voltransfer

# vxprint -g appvoldg -ht

dg appvoldg default default 107000 1170153481.1251.omis379

dm c3t1d0 c3t1d0s2 sliced 9919 143328960 -

v app_voltransfer - ENABLED ACTIVE 8388608 SELECT - fsgen
pl app_voltransfer-03 app_voltransfer ENABLED ACTIVE 8395200 CONCAT - RW
sd c3t1d0-01 app_voltransfer-03 c3t1d0 0 8395200 0 c3t1d0 ENA

6) mount new fs:
mount -F vxfs /dev/vx/dsk/appvoldg/app_voltransfer /apptransfer

7) check old and new fs:
# df -k |grep app
/dev/vx/dsk/rootdg/app_vol 4194304 2806182 1301407 69% /app
/dev/vx/dsk/appvoldg/app_voltransfer 4194304 2806182 1301407 69% /apptransfer

Leave a Comment more...

Running ‘vxdisk -g updateudid’ on an imported disk

by sebastin on Jan.27, 2009, under Solaris, Veritas Volume Manager

Document ID: 293237

Running ‘vxdisk -g updateudid’ on an imported disk group disk rendered the disk group unimportable.
Details:
The workaround is not to run ‘vxdisk -g updateudid’ on a disk that is part of an imported disk group. Deport the associated disk group first (if it is imported), and then clear the udid_mismatch flag. Please note that the “udid_mismatch” notation is merely a status flag and not an indication of a problem condition.

Do not run the vxdisk command with the usage:

“vxdisk -g updateudid ”
The “dm_name” is the name of the disk as it was named within the disk group and listed by the “vxdisk list” command under the DISK column.

Instead, run following command after deporting the disk group to clear the udid_mismatch flag on a disk.

“vxdisk updateudid ”
The “da_name” is the name of the disk as listed by the “vxdisk list” command under the DEVICE column.

Importing EMC BCV devices:

The following procedure can be used to import a cloned disk (BCV device) from an EMC Symmetrix array.

To import an EMC BCV device

1. Verify that the cloned disk, EMC0_27, is in the “error udid_mismatch” state:

# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
EMC0_1 auto:cdsdisk EMC0_1 mydg online
EMC0_27 auto - - error udid_mismatch

In this example, the device EMC0_27 is a clone of EMC0_1.

2. Split the BCV device that corresponds to EMC0_27 from the disk group mydg:

# /usr/symcli/bin/symmir -g mydg split DEV001

3. Update the information that VxVM holds about the device:

# vxdisk scandisks

4. Check that the cloned disk is now in the “online udid_mismatch” state:

# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
EMC0_1 auto:cdsdisk EMC0_1 mydg online
EMC0_27 auto:cdsdisk - - online udid_mismatch

5. Import the cloned disk into the new disk group newdg, and update the disk’s UDID:

# vxdg -n newdg -o useclonedev=on -o updateid import mydg

6. Check that the state of the cloned disk is now shown as “online clone_disk”:

# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
EMC0_1 auto:cdsdisk EMC0_1 mydg online
EMC0_27 auto:cdsdisk EMC0_1 newdg online clone_disk

Leave a Comment more...

How to recover from a serial split brain

by sebastin on Jan.27, 2009, under Solaris, Veritas Volume Manager

Document ID: 269233

How to recover from a serial split brain
Exact Error Message
VxVM vxdg ERROR V-5-1-10127 associating disk-media with :
Serial Split Brain detected. Run vxsplitlines

Details:
Background:
The Serial Split Brain condition arises because VERITAS Volume Manager ™ increments the serial ID in the disk media record of each imported disk in all the disk group configurations on those disks. A new serial (SSB) ID has been included as part of the new disk group version=110 in Volume Manager 4 to assist with recovery of the disk group from this condition. The value that is stored in the configuration database represents the serial ID that the disk group expects a disk to have. The serial ID that is stored in a disk’s private region is considered to be its actual value.
If some disks went missing from the disk group (due to physical disconnection or power failure) and those disks were imported by another host, the serial IDs for the disks in their copies of the configuration database, and also in each disk’s private region, are updated separately on that host. When the disks are subsequently reimported into the original shared disk group, the actual serial IDs on the disks do not agree with the expected values from the configuration copies on other disks in the disk group.
The disk group cannot be reimported because the databases do not agree on the actual and expected serial IDs. You must choose which configuration database to use. This is a true serial split brain condition, which Volume Manager cannot correct automatically. In this case, the disk group import fails, and the vxdg utility outputs error messages similar to the following before exiting:
VxVM vxconfigd NOTICE V-5-0-33 Split Brain. da id is 0.1, while dm id is 0.0 for DM VxVM vxdg ERROR V-5-1-587 Disk group : import failed: Serial Split Brain detected. Run vxsplitlines
The import does not succeed even if you specify the -f flag to vxdg.
Although it is usually possible to resolve this conflict by choosing the version of the configuration database with the highest valued configuration ID (shown as config_tid in the output from the vxprivutil dumpconfig ), this may not be the correct thing to do in all circumstances.
To resolve conflicting configuration information, you must decide which disk contains the correct version of the disk group configuration database. To assist you in doing this, you can run the vxsplitlines command to show the actual serial ID on each disk in the disk group and the serial ID that was expected from the configuration database. For each disk, the command also shows the vxdg command that you must run to select the configuration database copy on that disk as being the definitive copy to use for importing the disk group.
The following example shows the result of JBOD losing access to one of the four disks in the disk group:
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
- - d1 dgD280silo1 failed was:c2t1d0s2

# vxreattach -c c2t1d0s2
dgD280silo1 d1

# vxreattach -br c2t1d0s2
VxVM vxdg ERROR V-5-1-10127 associating disk-media d1 with c2t1d0s2:
Serial Split Brain detected. Run vxsplitlines

# vxsplitlines -g dgD280silo1

VxVM vxsplitlines NOTICE V-5-2-2708 There are 1 pools.
The Following are the disks in each pool. Each disk in the same pool
has config copies that are similar.
VxVM vxsplitlines INFO V-5-2-2707 Pool 0.
c2t1d0s2 d1

To see the configuration copy from this disk issue /etc/vx/diag.d/vxprivutil dumpconfig /dev/vx/dmp/c2t1d0s2
To import the diskgroup with config copy from this disk use the following command;

/usr/sbin/vxdg -o selectcp=1092974296.21.gopal import dgD280silo1

The following are the disks whose serial split brain (SSB) IDs don’t match in this configuration copy:
d2

At this stage, you need to gain confidence prior to running the recommended command by generating the following outputs :
In this example, the disk group split so that one disk (d1) appears to be on one side of the split. You can specify the -c option to vxsplitlines to print detailed information about each of the disk IDs from the configuration copy on a disk specified by its disk access name:

# vxsplitlines -g dgD280silo1 -c c2t3d0s2

VxVM vxsplitlines INFO V-5-2-2701 DANAME(DMNAME) || Actual SSB || Expected SSB
VxVM vxsplitlines INFO V-5-2-2700 c2t1d0s2( d1 ) || 0.0 || 0.0 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c2t2d0s2( d2 ) || 0.1 || 0.0 ssb ids don’t match
VxVM vxsplitlines INFO V-5-2-2700 c2t3d0s2( d3 ) || 0.1 || 0.0 ssb ids don’t match
VxVM vxsplitlines INFO V-5-2-2700 c2t9d0s2( d4 ) || 0.1 || 0.0 ssb ids don’t match
VxVM vxsplitlines INFO V-5-2-2706

This output can be verified by using vxdisk list on each disk. A summary is shown below:

# vxdisk list c2t1d0s2

# vxdisk list c2t3d0s2
Device: c2t1d0s2

Device: c2t3d0s2
disk: name= id=1092974296.21.gopal

disk: name=d3 id=1092974311.23.gopal
group: name=dgD280silo1 id=1095738111.20.gopal

group: name=dgD280silo1 id=1095738111.20.gopal
ssb: actual_seqno=0.0

ssb: actual_seqno=0.1

# vxdisk list c2t2d0s2

# vxdisk list c2t9d0s2
Device: c2t2d0s2

Device: c2t9d0s2
disk: name=d2 id=1092974302.22.gopal

disk: name=d4 id=1092974318.24.gopal
group: name=dgD280silo1 id=1095738111.20.gopal

group: name=dgD280silo1 id=1095738111.20.gopal
ssb: actual_seqno=0.1

ssb: actual_seqno=0.1

Note that though some disks SSB IDs might match that does not necessarily mean that those disks’ config copies have all the changes. From some other configuration copies, those disks’ SSB IDs might not match. To see the configuration from this disk, run
/etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t3d0s2 > dumpconfig_c2t3d0s2

If the other disks in the disk group were not imported on another host, Volume Manager resolves the conflicting values of the serial IDs by using the version of the configuration database from the disk with the greatest value for the updated ID (shown as update_tid in the output from /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/).

In this example , looking through the dumpconfig, there are the following update_tid and ssbid values:

dumpconfig c2t3d0s2

dumpconfig c2t9d0s2
config:tid=0.1058

Config:tid=0.1059
dm d1

dm d1
update_tid=0.1038

Update_tid=0.1059
ssbid=0.0

ssbid=0.0
dm d2

dm d2
update_tid=0.1038

Update_tid=0.1038
ssbid=0.0

ssbid=0.0
dm d3

dm d3
update_tid=0.1053

Update_tid=0.1053
ssbid=0.0

ssbid=0.0
dm d4

dm d4
update_tid=0.1053

Update_tid=0.1059
ssbid=0.0

ssbid=0.1

Using the output from the dumpconfig for each disk determines which config output to use by running the command:

# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht

Before deciding on which option to use for import, ensure the disk group is currently in a valid deport state:

# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk - (dgD280silo1) online
c2t3d0s2 auto:cdsdisk - (dgD280silo1) online
c2t9d0s2 auto:cdsdisk - (dgD280silo1) online

At this stage, your knowledge of how the serial split brain condition came about may be a little clearer and you should have chosen a configuration from one disk to be used to import the disk group. In this example, the following command imports the disk group using the configuration copy from d2:
# /usr/sbin/vxdg -o selectcp=1092974302.22.gopal import dgD280silo1
Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks. The actual and expected serial IDs for any disks in the disk group that are not imported at this time remain unchanged.
# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -

v SNAP-vol_db2silo1.1 - DISABLED ACTIVE 1024000 SELECT - fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA

v orgvol - DISABLED ACTIVE 1024000 SELECT - fsgen
pl orgvol-01 orgvol DISABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA

# vxrecover -g dgD280silo1 -sb

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol

UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking

# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol

# df /orgvol

/orgvol (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks 127386 files

# vxdisk -o alldgs list

DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk d1 dgD280silo1 online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online

# vxprint -htg dgD280silo1

dg dgD280silo1 default default 26000 1095738111.20.gopal

dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -

v SNAP-vol_db2silo1.1 - ENABLED ACTIVE 1024000 SELECT SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - ENABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA

v orgvol - ENABLED ACTIVE 1024000 SELECT orgvol-01 fsgen
pl orgvol-01 orgvol ENABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA

Leave a Comment more...

VxVM References

by sebastin on Jan.19, 2009, under Solaris, Veritas Volume Manager

VxVM References

Leave a Comment more...

adding luns to JNIC 5.3.x non-gui method

by sebastin on Sep.28, 2008, under Solaris, Veritas Volume Manager

In case of cluster configuration, find out the master node using
#vxdctl -c mode
run the following commands on the master node.
#vxdmpadm listctlr all ## find out the controllers and its status.cross check with /kernel/drv/jnic146x.conf entries.
CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
====================================================
c1 Disk ENABLED Disk
c2 SAN_VC ENABLED SAN_VC0
c3 SAN_VC ENABLED SAN_VC0

disable the first path c2
vxdmpadm disable ctlr=c2
confirm the ctlr status.

This command may be available only on 5.3.x drivers under /opt/JNIC146X/, jnic146x_update_drv
/opt/jJNIC146X/jnic146x_busy shows you the driver status;
/opt/JNIC146X/jnic146x_update_drv -u -a # -u update the driver perform LUN rediscovery on “-a” all instances.
check your messages file for updates and enable the controller
#vxdmpadm enable ctlr=c2 # confirm the ctlr status and do the same for other controllers.
#devfsadm
#format ## to label all the new LUNS
#vxdctl enable
#vxdisk list will show you the new luns with error status
Label the luns in veritas
#vxdisksetup -i cNtNdNs2 ##for all the new luns
Add luns to disk group
#vxdg -g “disk_group” adddisk racdgsvc32=c2t1d32s2
repeat for all luns
#Resize requested volumes
#vxresize -g “disk_group” vol_name +20g # add 20gb to vol_name

Leave a Comment more...

Off-Host Backup Processing with Veritas FlashSnap

by sebastin on Apr.29, 2008, under Solaris, Veritas Volume Manager

Borislav Stoichkov

Backup times and the resources associated with them are becoming more and more important in the evolving model of 24/7 application development and content management. Developers all over the world collaborate on the same projects and access the same resources that must be 100% available during the business hours of their respective time zones. This gives systems administrators very little room to completely satisfy their customers — the developers.

Source code and content repositories contain hundreds of projects and millions of small files that require considerable amounts of time and system resources to back up. Also, data protection is a top priority that presents system and backup engineers with the question of how to effectively ensure data protection and availability in case of a disaster and, at the same time, minimize the duration and resource overhead of the process.

The Problem

My organization was faced with these very issues. One of our high-profile customers was using Interwoven’s flagship product TeamSite installed on a Sun Solaris 8 server. Interwoven Teamsite’s key features are content management and services, code and media versioning, collaboration and parallel development, branching of projects, transactional content deployment, etc. Developers all over the world were using the system’s resources throughout the day for mission-critical tasks. As the number of projects and branches increased so did the number of files and the amount of data that needed to be backed up and protected.

Suddenly, the application was managing millions of small files and hundreds of branches and projects. Backup times were reaching 7-8 hours with the bottleneck caused by the sheer amount of files and data. Complaints were received that during the backup window the application as well as the system were becoming difficult to use and that system performance was becoming unacceptable. The customer requested a solution that would be as cheap as possible and would not require a change in their development and content management model.

From a storage perspective, the server had two internal mirrored drives for the operating system file systems under Veritas Volume Manager control. An external Fibre Channel array was attached to the machine presenting a single LUN on which the Interwoven Teamsite application was installed. The LUN had a 143-GB Veritas file system and was under Veritas Volume Manager control as well.

The idea for the solution was to take a snapshot of the application file system and use that snapshot for a backup to tape on another host. Thus, the backup window could be extended as much as needed without affecting the performance or usability of the server. File system snapshots, however, do not allow off-host processing. Given that Veritas Volume Manager was already installed and active on the machine, using its built-in volume snapshot features seemed natural. The only problems remaining were to break off the volume with the snapshot in a manner that was supported by the vendor and did not present any risks, and to minimize the time needed for the reverse operation — syncing the snapped off mirror without mirroring a 143-GB file system from scratch, which is a long and tedious process.

Implementing FlashSnap

The resolutions to both problems are found in the Veritas FlashSnap product. FlashSnap is a license key-enabled option of the Veritas Foundation Suite solutions. The license enables the use of the FastResync and Dynamic Split and Join features of Veritas Volume Manager. With FastResync enabled on a volume, Volume Manager uses a map to keep track of which blocks are updated in the volume and in the snapshot. In time the data on the original volume changes, and the data on the snapshot volume becomes outdated.

The presence of a FastResync map ensures that in an operation where the snapshot is resynchronized with the primary volume only the modified blocks (dirty blocks) are applied. Full mirror synchronization is no longer necessary. The map is persistent across reboots because it is stored in a data change object (DCO) log volume associated with the original volume. Dynamic Split and Join allow for the volume snapshots to be placed into a separate disk group, which can be deported and imported on another host for off-host processing. The only requirement is for the disks to be visible to the designated host. At a later stage, the disk group can be re-imported on the original host and joined with the original disk group or, if necessary, with a different one.

For the implementation, additional storage was required on the storage array equal to the original amount of 143 GB. The added storage was configured into a new LUN. A new low-end server running Sun Solaris 8 (host2) was attached to the array as well. The newly added LUN (LUN1) was presented to both hosts, while the original LUN (LUN0) was only made visible on the original host (host1).

DCO Logging

Persistent FastResync requires a DCO log to be associated with the original volume. That option has been available only since Veritas Volume Manager 3.2 and disk group version 90, so the volume management software was upgraded to the latest version. The existing disk group was upgraded to the latest version as well. The FlashSnap license obtained from Veritas was installed on both hosts. For verification that the newly added license is functional, the following command can be issued:

# vxdctl license
All features are available:
........
FastResync
DGSJ
A small problem arose from the fact that there was no room for a DCO log on LUN0 because all of its space was allocated for the application volume. Luckily, the file system on it was VXFS, and it was possible for the volume and the file system to be shrunk:

host1# vxresize -F vxfs -g DG1 Vol01 -20M
With that fixed, a DCO (data change object) log volume was associated with the original volume:

host1# vxprint -g DG1
.............
dm DG101 c4t9d0s2 - 286676992 - - - -

v Vol01 fsgen ENABLED 286636032 - ACTIVE - -
pl Vol01-01 Vol01 ENABLED 286656000 - ACTIVE - -
sd DG101-01 Vol01-01 ENABLED 286656000 0 - - -
host1# vxassist -g DG1 addlog Vol01 logtype=dco dcologlen=1056 ndcolog=1 DG101
host1# vxprint -g DG1
..............
dm DG101 c4t9d0s2 - 286676992 - - - -

v Vol01 fsgen ENABLED 286636032 - ACTIVE - -
pl Vol01-01 Vol01 ENABLED 286656000 - ACTIVE - -
sd DG101-01 Vol01-01 ENABLED 286656000 0 - - -
dc Vol01_dco Vol01 - - - - - -
v Vol01_dcl gen ENABLED 1056 - ACTIVE - -
pl Vol01_dcl-01 Vol01_dcl ENABLED 1056 - ACTIVE - -
sd DG101-02 Vol01_dcl-01 ENABLED 1056 0 - - -
The length of the DCO log determines the level at which changes are tracked. A longer DCO log will trigger more in-depth tracking and will require less time for the snapshot to resynchronize. Increasing the log too much may cause performance overhead on the system. The default number of plexes in the mirrored DCO log volume is 2. It is recommended that the number of DCO log plexes configured equal the number of data plexes in the volume — in our case, one. The default size for a DCO plex is 133 blocks. A different number can be specified, but it must be a number from 33 up to 2112 blocks in multiples of 33. If the snapshot volumes are to be moved to a different disk group, the administrator must ensure that the disks containing the DCO plexes can accompany them.

Establishing a Snapshot Mirror

The next step is to enable persistent FastResync on the volume, so that sequential re-mirroring operations take considerably less than the establishment of a full mirror and are applied from the DCO log:

host1# vxvol -g DG1 set fastresync=on Vol01
host1# vxprint -g DG1 -m Vol01 | grep fastresync
fastresync=on
The addition of LUN1 to the DG1 disk group as disk DG102 completes our preparation phase, so now we are ready to establish our snapshot:

host1# vxassist -g DG1 snapstart Vol01 alloc=DG102
This operation will establish a mirror of volume Vol01 and will add an additional DCO log object that will be in a DISABLED and DCOSNP state for use by the snapshot. The snapstart process takes a considerable amount of time, because it is a full mirror creation. The vxassist command will block until the snapshot mirror is complete. It can be placed in the background by using the -b argument to vxassist. During the snapstart phase, disk group DG1 will look like this:

v Vol01 fsgen ENABLED 286636032 - ACTIVE ATT1 -
pl Vol01-01 Vol01 ENABLED 286656000 - ACTIVE - -
sd DG101-01 Vol01-01 ENABLED 286656000 0 - - -
pl Vol01-02 Vol01 ENABLED 286656000 - SNAPATT ATT -
sd DG102-01 Vol01-02 ENABLED 286656000 0 - - -
dc Vol01_dco Vol01 - - - - - -
v Vol01_dcl gen ENABLED 1056 - ACTIVE - -
pl Vol01_dcl-01 Vol01_dcl ENABLED 1056 - ACTIVE - -
sd DG101-02 Vol01_dcl-01 ENABLED 1056 0 - - -
pl Vol01_dcl-02 Vol01_dcl DISABLED 1056 - DCOSNP - -
sd DG102-02 Vol01_dcl-02 ENABLED 1056 0 - - -
Once the mirror is established, the plex on disk DG102 will be in a SNAPDONE state ready to be separated from the original volume. If the snapshot is attempted before the snapshot plex is in a SNAPDONE state, the command will fail. If snapstart is placed in the background with the -b switch, the vxassist snapwait command will wait until the snapstart command is done and can be used in scripts to ensure that no other commands are issued before the completion of snapstart:

v Vol01 fsgen ENABLED 286636032 - ACTIVE - -
pl Vol01-01 Vol01 ENABLED 286656000 - ACTIVE - -
sd DG101-01 Vol01-01 ENABLED 286656000 0 - - -
pl Vol01-02 Vol01 ENABLED 286656000 - SNAPDONE - -
sd DG102-01 Vol01-02 ENABLED 286656000 0 - - -
dc Vol01_dco Vol01 - - - - - -
v Vol01_dcl gen ENABLED 1056 - ACTIVE - -
pl Vol01_dcl-01 Vol01_dcl ENABLED 1056 - ACTIVE - -
sd DG101-02 Vol01_dcl-01 ENABLED 1056 0 - - -
pl Vol01_dcl-02 Vol01_dcl DISABLED 1056 - DCOSNP - -
sd DG102-02 Vol01_dcl-02 ENABLED 1056 0 - - -
To execute the actual snapshot:

host1# vxassist -g DG1 snapshot Vol01 SNAP-Vol01
host1# vxprint -g DG1

v SNAP-Vol01 fsgen ENABLED 286636032 - ACTIVE - -
pl Vol01-02 SNAP-Vol01 ENABLED 286656000 - ACTIVE - -
sd Dg102-01 Vol01-02 ENABLED 286656000 0 - - -
dc SNAP-Vol01_dco SNAP-Vol01 - - - - - -
v SNAP-Vol01_dcl gen ENABLED 1056 - ACTIVE - -
pl Vol01_dcl-02 SNAP-Vol01_dcl ENABLED 1056 - ACTIVE - -
sd DG102-02 Vol01_dcl-02 ENABLED 1056 0 - - -
sp Vol01_snp SNAP-Vol01 - - - - - -

v Vol01 fsgen ENABLED 286636032 - ACTIVE - -
pl Vol01-01 Vol01 ENABLED 286656000 - ACTIVE - -
sd DG101-01 Vol01-01 ENABLED 286656000 0 - - -
dc Vol01_dco Vol01 - - - - - -
v Vol01_dcl gen ENABLED 1056 - ACTIVE - -
pl Vol01_dcl-01 Vol01_dcl ENABLED 1056 - ACTIVE - -
sd Dg102-02 Vol01_dcl-01 ENABLED 1056 0 - - -
sp SNAP-Vol01_snp Vol01 - - - - - -
Now the disk group can be split so that the disk containing the snapshot volume is placed in a different group:

host1# vxdg split DG1 SNAPDG1 SNAP-Vol01
The new disk group SNAPDG1 containing SNAP-Vol01 and its DCO log volume can be deported and imported on the alternate host:

host1# vxdg deport SNAPDG1
host2# vxdg import SNAPDG1
Following the split, the snapshot volume is disabled. The following commands can be used to recover and start the volume:

host2# vxrecover -g SNAPDG1 -m SNAP-Vol01
host2# vxvol -g SNAPDG1 start SNAP-Vol01
A consistency check can be performed on the volume’s file system, and it can be mounted for backup processing or any other type of data manipulation:

host2# fsck -F vxfs /dev/vx/rdsk/SNAPDG1/ SNAP-Vol01
host2# mount -F vxfs /dev/vx/dsk/SNAPDG1/ SNAP-Vol01 /data
Before the backup window kicks in, or in case the snapshot needs to be refreshed, the file system can be unmounted and the volume deported and imported again on the original host:

host2# umount /data
host2# vxvol -g SNAPDG1 stop SNAP-Vol01
host2# vxdg deport SNAPDG1
host1# vxdg import SNAPDG1
Now the disk(s) in disk group SNAPDG1 can be joined into disk group DG1:

host1# vxdg join SNAPDG1 DG1
host1# vxrecover -g SNAPDG1 -m Vol01
Once the snapshot volume is back into its original disk group, we can perform the snapback operation:

host1# vxassist -g DG1 snapback SNAP-Vol01
In some cases when there is data corruption on the original volume, the data on the snapshot volume can be used for the synchronization. This is achieved by using the resyncfromreplica argument to the vxassist -o option with snapback. The operation will not take long to execute at all. If performed within hours of the first snapshot, the process may take less than a minute depending on the amount of file system changes. In our environment, a snapback process that is executed approximately 24 hours after the previous one takes no longer than 12 minutes. Effectively, we have decreased the time it takes to back up the application from hours to less than 15 minutes from the developers’ and system users’ point of view.

Automating the Process

The last challenge in this project was to automate the process so that it would occur transparently on a daily basis before the backup window with no manual intervention required, as well as ensure that anything that went wrong with the process would be caught in a timely manner and resolved before the actual backup to tape. Remote syslog logging and log file scraping had been implemented in the environment for a while, and this gave us the option to log all errors to a remote syslog server. The string used to log errors was submitted to the monitoring department and was added to the list of strings that triggered an alert with the control center. The alert automatically generated a trouble ticket and dispatched it to an administrator. The whole process needed to be synchronized on both servers.

After some debate, we chose a solution utilizing SSH with public key authentication. The password-less OpenSSH private and public keys were generated on host1, and the public key was imported into the authorized_keys file for root on host2. Root logins through SSH were allowed on host2, and logins via SSH using the public key generated on host1 were only allowed from that server. Another aspect to the solution was that the same shell script, vxfsnap, would be used on both sides with a switch instructing it to execute in local or remote mode.

The vxfsnap Script

The vxfsnap script (see Listing 1) accepts the following arguments: original disk group, original volume name, name for the snapshot volume, name for the snapshot disk group, hostname/IP of the host processing the snapshot, and mount point for the snapshot volume. It has four modes of operation:

deport — Unmounts the file system and deports the snapshot disk group.
join — Imports the snapshot disk group and joins it into the target disk group executing a snapback.
snap — Performs the snapshot and deports the disk group.
import — Imports the snapshot disk group and mounts the file system.
Another optional switch can be used to freeze the Interwoven Teamsite application for the duration of snapback improving the consistency of the data used for the backup. This shell script was designed with re-usability in mind so that it can be implemented with little or no effort in a similar solution. It can be executed on one of the hosts, and it can control the process from a central location.

This command:

host1# vxfsnap -r -h host2 -G SNAPDG1 -V SNAP-Vol01 -m /data -e deport
would unmount the /data file system on host2 and deport the SNAPDG disk group. This can be followed by:

host1# vxfsnap -g DG1 -v Vol01 -G SNAPDG1 -V SNAP-Vol1 -e join -f
to import the SNAPDG1 disk group on host1 and perform everything including a snapback up to executing a snapshot, freezing the Interwoven Teamsite backing store, and unfreezing it after snapback is complete:

host1# vxfsnap -g DG1 -v Vol01 -G SNAPDG1 -V SNAP-Vol01 -e snap
Snap mode will take the snapshot. The separated volume will be split off into a new disk group that remains deported ready for other interested hosts to import. Finally, we can make use of the data on host2:

host1# vxfsnap -r -h host2 -G SNAPDG1 -V SNAP-Vol01 -m /data -e import
The vxfsnap script utility also can be used in other scripts that can be executed as cron jobs shortly before the backup window:

#!/bin/bash

DG=DG1
VOL=Vol01
SDG=SNAPDG1
SVOL=SNAP-Vol01
MNT="/backup"
RHOST="host2"

FLASHSNAP="/usr/local/vxfsnap/vxfsnap"

$FLASHSNAP -r -h $RHOST -G $SDG -V $SVOL -m $MNT -e deport && <\>
$FLASHSNAP -g $DG -v $VOL -G $SDG -V $SVOL -e join -f && <\>
$FLASHSNAP -g $DG -v $VOL -G $SDG -V $SVOL -e snap && <\>
$FLASHSNAP -r -h $RHOST -G $SDG -V $SVOL -m $MNT -e import
With the Veritas FlashSnap solution in place, the file system containing the Interwoven Teamsite application was added to the exclude list for the backup client software. Rebooting the server used for backup processing can potentially break the configuration, because the cron job requires that the file system be mounted on host2. This can be solved with a startup script that checks whether the designated disk group is recognized as local at boot time and mounts the volume under a specified mount point, or by adding the file system in the /etc/vfstab configuration file and expecting a failure if the disk group or volume are unavailable. Conclusions

This solution achieved a little more than it was designed to. Effectively, the copy of the data used for backup to tape is available all the time in case a file, directory, or the whole volume becomes corrupted and needs to be restored. Recovering from any of the mentioned disasters is a simple process that takes minutes, requires no special backup infrastructure resources, and adds further to the value of the solution.

Veritas Flashsnap is a technology that can help both users and administrators in their quest to better utilize the resources of a system. It can be used in simple scenarios with machines directly attached to the storage media or in more complex configurations in Storage Area Networks as a host-controlled solution. It can also be used with a number of applications for point-in-time backup copies at the volume level that can be used for anything from off-host backup processing to disaster recovery.

Borislav Stoichkov has an MS degree in Computer Science with a focus on cryptography as well as certifications from Sun Microsystems and Red Hat. He has been engineering and implementing solutions and managing Linux and Solaris systems in large enterprise environments for the past 5 years. Interests include secure communication and data storage, high performance computing. Currently, he works as a Unix consultant in the Washington DC area and can be reached at: borislav.stoichkov@meanstream.org.

Leave a Comment more...

Database Migrations the VxVM Way

by sebastin on Apr.29, 2008, under Solaris, Veritas Volume Manager

Migration Methods

The term database migration can mean a variety of things. It can refer to the movement from one database to another where the data is moved between the databases, such as moving from an Informix IDS version 7.31 to a new IDS version 9.40 database using Informix dbexport/dbimport utilities. Or, it can refer to the movement of the database to an entirely new platform, such as moving from Solaris 2.6 on SPARC to Linux 2.6 or Windows Server 2003 on Intel x86. It can refer to the movement of the database from one server to another, such as moving from a Sun Enterprise E450 to a Sun Fire 4800 system. Or, it can simply mean moving the database from one disk array to another.

The operative word here is “move”. An in-place upgrade of a database, from one version to another, is not considered a database migration. For example, upgrading an Informix IDS version 7.31 to IDS version 9.30 would not be considered a database migration.

Database migrations are initiated for a variety of reasons. Sometimes they are done for increased performance. If database loads or nightly database refreshes are taking too long, then a new server with more or faster CPUs may help. Sometimes they are done for data reorganization. Perhaps disk hot spots are leading to poor performance. Sometimes migrations are done as part of server consolidation projects, where entire departments are asked to move their databases to a single server. Often, it’s just a simple matter of economics. The original environment may have become too costly due to high maintenance costs, or the new environment may offer lower software licensing costs.

A number of database migration methods can be employed. The DBA can unload the data to a transportable file format and recreate the database from scratch on the destination system. If the new environment supports the database’s data files, the files can be archived and copied to the target system using a slew of Unix utilities (e.g., tar, cpio, pax, etc.). If the database data files are stored on raw devices, the Unix dd command can be used to pipe the data to the target system. If the raw devices are managed by a logical volume manager (LVM), such as Veritas Volume Manager (VxVM), then the data may be mirrored to new devices on a new array and then physically moved to the target system. I’ll demonstrate this last method, using Veritas Volume Manager, to quickly and reliably migrate a database.

VxVM Migration Prerequisites

Veritas Volume Manager, available for most Unix platforms, has become the de facto LVM in many shops because of its advanced features and standardized command set across platforms. To provide a brief overview, VxVM allows the creation of volumes, which are logical devices that appear to the operating system as a type of hard disk or disk partition (i.e., a virtual disk). Volumes can be constructed from one disk to many disks supporting various RAID levels (RAID-0, RAID-1, RAID-5, etc.) as well as simple disk concatenation. The advantages of volumes include increased storage capacity beyond single disk, various degrees of data protection, increased read/write performance, and ease of storage management to name a few.

Successful database migrations with VxVM require careful planning and preparation, but the reward is well worth the effort. Before the migration can begin, the DBA must determine the feasibility of the migration, since not all migrations can be performed with Veritas Volume Manager. There are several prerequisites for performing a migration with VxVM.

First, the database should to be built from raw device volumes. If it isn’t, forget about using VxVM for the migration. Instead, use any of the supplied database utilities or one of the aforementioned Unix archive/copy utilities. Second, does the target database support the original database’s data files? If the migration is to a minor version upgrade of the database on the same platform, then this is most likely the case. However, if the migration is to a new major version of the database, then the DBA may need to consult with the database vendor first. In any event, if the new version of the database doesn’t directly support the database files, VxVM may still be used for the migration and an in-place upgrade on the migrated database can be performed. Unfortunately, a VxVM database migration to a new platform, such as from Solaris to Windows, or to a new database platform, such as from Informix to Oracle, is probably not possible.

If you’ve satisfied the VxVM database migration prerequisites, then a database migration the VxVM way might just be for you.

Setting the Stage

So, you’ve outgrown your current server and decided to purchase a new, more powerful replacement. Your current server is hosting a multi-terabyte database on an old disk array and you need additional disk space for growth, so you’ve decided to purchase a new disk array as well. By purchasing both a new server and new storage, you’ve set the stage to performing a database migration. By performing the migration to a new server, you can continue to host the database on the old server while copying/moving the data to a new, more powerful server with more storage and without much downtime.

You’ve also consulted with the DBA, and he has satisfied all of the VxVM migration prerequisites and has a game plan. The DBA is going to stick with the same major version of the database software, but with a minor upgrade. You’ve decided that the new server will be running a new version of the operating system, and the DBA has confirmed that the database software is supported. The plan is to copy the database to the new server and bring it online with the same name, allowing for two copies of the database to exist at once. This will make migrating to the new database easier and transparent to the users.

Mirroring the Volumes

With some careful planning, you’ve attached the new disk array to the old server and configured the storage for use by the operating system. Because the new disk array has more capacity than the old array, disk space will not be an issue. To copy the data from the old array to the new, you must add a new disk (LUN) to each of the original VxVM disk groups, from the new array. Because the new LUNs are much larger, you should initialize the LUNs, soon to be christened VM disks by VxVM, with a large VxVM private region using the vxdisksetup command:

vxdisksetup -i privlen=8192
example: vxdisksetup -i c3t8d0 privlen=8192
The default private region length is 2048 (sectors), which I think is too small for today’s larger capacity disks. By increasing the private region, VxVM can keep track of more objects (e.g., you can create more volumes without worrying about running into a volume limit). After initializing the disks, add the disks to the VxVM disk groups with the vxdg command:

vxdg -g adddisk =
example: vxdg -g idsdg1 adddisk d2_lun1=c3t8d0
Be sure to add enough new disks to each of the original disk groups to allow the volumes to be mirrored to the new disks. If the volumes to be mirrored are simple volumes, you can use the vxmirror command:

vxmirror -g
example: vxmirror -g idsdg1 s1_lun1 d2_lun1
The vxmirror command will mirror every volume in the disk group from the old VM disk to the new VM disk. Perform this operation for all of the disk groups until all the volumes have been mirrored from the old array to the new. If your volumes are complex (e.g., VxVM RAID-0, RAID-5, etc.), use vxassist or vxmake to create the mirrors instead. Breaking the Mirrors

When all of the volumes have been successfully mirrored, the next step is to split or “break” them into two. It’s a good idea to get the DBA involved to schedule a database outage and shut down the database before breaking the mirrors. You don’t want changes to be made to the database after you have broken the mirrors. You could break the mirrors while the databases are online, but then you would have to keep track of the changes and apply them manually later.

Breaking the mirrors is a cumbersome process because you need to run the vplex command for each of the mirrored volumes:

vxplex -g dis
example: vxplex -g idsdg1 dis pdf11282004-02
I wrote a ksh function to automate this process. You can copy and paste this into your own script. I don’t like to automate VxVM tasks too much, because there are many things that can go wrong if you let a script take full control:

function make_vols {
dg=idsdg1
metadata=/var/tmp/$dg.config

plexes=$(vxprint -g $dg -mte '"d2_lun1" in (sd_disk)'|grep pl_name|awk -F= '{print $2}')
for plex in $plexes; do
echo "Disassociating $plex from disk group $dg"
#vxplex -g $dg dis $plex
volume_tmp=$(echo $plex|sed -e 's/-0[0-9]*$//')
volume=$(echo $volume_tmp"_d2")
echo "Creating new volume $volume using plex $plex"
#vxmake -g $dg -U gen vol $volume plex=$plex
echo "Extracting volume $volume metadata and appending it to $metadata"
#vxprint -hmvpsQqr -g $dg $volume >> $metadata
echo " "
done
}
Set the dg variable to the disk group with the mirrors you want to break. The “d2_lun1″ reference, in the function, is the name of new VM disk you added to the disk group (from the new array). Change this value to your own VM disk. I’ve commented out the VxVM commands to protect you from accidentally running this function without understanding what’s really going on. Since every VxVM environment is different, it’s difficult to write scripts that will work in every situation. I recommend using this function as a template for your own script. Note that function not only breaks the mirrors, but that it also creates new volumes (the volume names are appended by a “_d2″ to avoid conflicting with the existing volumes) from the disassociated plexes on the new disk (it will become apparent later why we needed to create new volumes in the first place). Also, the script extracts all of the VxVM volume metadata to a flat file, which will be used later. Run the function for each of your disk groups, until all of the mirrors have been broken.

There is a caveat with extracting the metadata. I’ve noticed that the permissions on the volumes do not get preserved. I’ll present a ksh function to correct this problem.

Deporting the Disk Groups

When all of the mirrors have been successfully split, the next step is to delete the newly created volumes. Don’t be alarmed — we’ll restore the volumes later from the metadata we extracted earlier. This step is necessary, because we must create new disk groups in which to store our new volumes. We can then export the new disk groups to our new server, leaving the old disk groups untouched.

Use this ksh function to remove the volumes:

function remove_vols {
dg=idsdg1
metadata=/var/tmp/$dg.config

volumes=$(grep "^vol " $metadata|awk '{print $2}')
for volume in $volumes; do
echo "Removing volume $volume from $dg"
#vxedit -g $dg -r rm $volume
echo " "
done
}
Once you’ve removed all of the new volumes, remove the new VM disks too:

vxdg -g rmdisk
example: vxdg -g idsdg1 rmdisk d2_lun1
Now you are ready to create new disk groups from the VM disks that you just removed:

vxdg init =
example: vxdg init idsdg1_d2 d2_lun2=c3t8d0
The new disk groups should have a name similar to the old one. Once you’ve created the new disk groups (there should be the same number of new groups as old), restore the volume metadata that was extracted earlier for every new volume that was created to the new disk group:

vxmake -g -d
example: vxmake -g idsdg1_d2 -d /var/tmp/idsdg1.config
Once you restore the metadata, all of the volumes that you originally removed from the old disk group will be restored like magic to the new disk group. Do this for each new disk group. After you’ve created the new disk groups with the restored volumes, all that’s left is to deport the new disk groups to the new server:

vxdg -n deport
example: vxdg -n idsdg1 deport idsg1_d2
Use the -n option to rename the disk groups back to original disk group name during the deport operation. It’s convenient to perform the disk group renaming during the deport so that, when you import the disk groups on the new server, the disk group names are the same as on the old server. Deport all of the new disk groups. Importing the Disk Groups

Once all of the new disk groups have been successfully deported, disconnect the new array from the old server and attach it to the new server. You’ll have to go through the motions of making the disk array visible on the new server. Don’t fret about integrity of the data. It’s safely stored on the VxVM disks. Don’t worry about device renumbering either (e.g., the cxtxdx name changing on the new server), because VxVM tracks disks by the information stored in the private region on the disk and not by the operating system device name.

Once you feel confident that all of the LUNs on the disk array are accounted for and visible to the operating system, import the disk groups:

vxdg import
example: vxdg import idsdg1
If you like, you can also use the menu-driven vxdiskadm command to import disk groups (menu item 8: Enable access to (import) a disk group). It conveniently lists all the disk groups that can be imported. Import all of the formerly deported disk groups, using the original name of the disk group on the old server (remember we deported the disk groups with the rename option). Once all of the disk groups have been imported, don’t forget to rename to volumes back to their original names. If you used the make_vols function, it appended a “_d2″ (or whatever value you chose) to the end of the new volume names). Use this ksh function to rename the volumes:

function rename_vols {
dg=idsdg1
volumes=$(vxprint -g $dg -mte v_kstate|grep "^vol "|awk '{print $2}')
for volume in $volumes; do
new_volume=$(echo $volume|sed -e 's/_d2$//')
echo "Renaming volume $volume to $new_volume in disk group $dg"
#vxedit -g $dg rename $volume $new_volume
echo " "
done
}
Modify this function if you used something other than “_d2″ for the new volumes. Do this for all of the disk groups. Before starting the volumes, make sure that the permissions on the volumes are correct. I’ve noticed that VxVM is not consistent in restoring the owner and group id from the metadata. This is critical for a database because the volumes must be owned by the database id (e.g., Informix or Oracle). Use this ksh function to correct the problem:

function fix_vols {
dg=idsdg1
volumes=$(vxprint -g $dg -mte v_kstate|grep "^vol "|awk '{print $2}')
for volume in $volumes; do
echo "Changing ownership and mode on $volume in disk group $dg"
#vxedit -g $dg set mode=0660 user=informix group=informix $volume
echo " "
done
}
Set “mode=”, “user=”, and “group=” to the correct values. Double-check that the permissions/ownerships on the volumes match those of the old server before starting the volumes:

#vxrecover -g -Esb
example: vxrecover -g idsdg1 -Esb
When volumes have all been started, it’s again time to get the DBA involved. If the database uses symbolic links to the raw device volumes, rather than referencing the devices directly, you will need to recreate the symbolic links. For example, for the following Informix IDS root dbspace (rootdbs), the volume /ids_prod1/rootdbs is really a symbolic link to the raw volume:

# onstat -d|grep "/rootdbs"
2ac06928 1 1 0 1000000 997130 PO- /ids_prod1/rootdbs

# ls -la /ids_prod1/rootdbs
lrwxrwxrwx 1 informix informix 33 Jun 10 2003
/ids_prod1/rootdbs -> /dev/vx/rdsk/idsdg1/rootdbs_prod1
The easiest way to recreate the symbolic links is to tar up the links on the old server and copy and extract the tarball to the new. Once the links have been created on the new server, make sure that they point to the correct volumes. They should, because we used the same disk group names as the old server during the disk group imports, and we renamed the volumes back to their original names too. If the database does use symbolic links, the links must be recreated exactly. VxVM preserves the device names and consistently stores the devices (volumes) in the /dev/vx directory (even across platforms). If you copied the symbolic links correctly from the old server to the new, the links will point to the right volumes.

Once the symbolic links have been verified as correct, the DBA can install the database software and copy over any configuration files needed from the old server to bring the database online. Once the database is online and the DBA is satisfied with the results, you can put another feather in your cap and call it a day.

Conclusion

It has come to my attention that Veritas Volume Manager for Unix (versions 3.2 and above), includes several new features that automatically perform some of the VxVM functions/commands I presented. Specifically, the vxdg command has been enhanced to allow you to move, join, and split disk groups. These enhancements will allow you to more easily perform a database migration, but they do require an extra license. It’s probably worth it to purchase the license, but it doesn’t hurt to know the sordid details anyway.

The database migration method I presented using Veritas Volume Manager is one that I have successfully used several times in the past. It may not be as efficient for migrating smaller databases, since there are many steps to perform, but it is well worth the effort if you have a very large database to migrate. The methods presented can be applied to other types of data migration, not just databases. Perhaps you will find some new uses and pass them along.

References

Rockwood, Ben. The Cuddletech Veritas Volume Manager Series: Advanced Veritas Theory. August 10, 2002. http://www.cuddletech.com/veritas/advx/index.html (March 28, 2004)

Veritas Software Corporation. Veritas Volume Manager 3.1.1 Administrator’s Guide. February 2001.

Veritas Software Corporation. How to move a disk between disk groups: TechNote ID: 182857. October 9, 2002. http://seer.support.veritas.com/docs/182857.htm (March 28, 2004)

Monday 18 May 2009

samba on linux error - cifs vfs error on or on cifs_get_inode_Info in lookup

If you see the following error on linux following a change on samba shares;

cifs vfs error on 0xxxxfff or on cifs_get_inode_Info in lookup

then check the fstab to make sure entries are correct, then umount and mount again

may have to do a umount -f in some instances

Nfs - Stale File Handle issues with samba share

if you get the following sorts of errors when trying to access a samba share;

either

NFS- STALE FILE HANDLE, or you get a permission denied, then just umount and remount and that could do the trick;

eufratt0015 # ls -lrt
ls: ATT-ATT: Permission denied
ls: QUID-2-RAT: Permission denied
total 28
drwxr-xr-x 2 wesdfesvc1 websss 4096 Jan 3 2006 sasdas
drwxrwxr-x 10 wdasdc1 websss 8192 Feb 20 2006 ATadasdsave
drwxrwxr-x 2 weasdasbssvc1 websss 4096 Feb 21 2006 ATdasdgr
eufratt0015 # umount QUID-2-RAT
eufratt0015 # mount QUID-2-RAT
eudt0055 # df -k
...........................
//10.1.218.122/QUID-2-RAT$
9446188 6251188 3195000 67% /app/att/QUID-2-RAT
# cd QUID-2-RAT
# pwd
/app/att/QUID-2-RAT
# ls
ATT-EnApplication2.zip ATT-EnApplication.zip

Friday 15 May 2009

list samba shares - smbclient

europat0015 # smbclient -L //10.1.213.112/ -U ATT-ATT
Password:
Domain=[BLONDIE8] OS=[Windows Server 2003 3790 Service Pack 1] Server=[Windows Server 2003 5.2]

Sharename Type Comment
--------- ---- -------
RAT-2-QUID$ Disk Directory for RAT to copy files to for processing by QUID
XXX Disk
XX Disk
C$ Disk Default share
X Disk
MICK Disk
ExcaKibar Disk
STANLEY Disk
IPC$ IPC Remote IPC
Code Disk
ADMIN$ Disk Remote Admin
ADV Disk Test ADV Share for Rob Aldridge ref AC
D$ Disk Default share
QED-2-RAT$ Disk Transfer to RAT share
SysAppl Disk
session request to 10.1.218.122 failed (Called name not present)
session request to 10 failed (Called name not present)
Domain=[BLONDIE8] OS=[Windows Server 2003 3790 Service Pack 1] Server=[Windows Server 2003 5.2]

Server Comment
--------- -------

Workgroup Master
--------- -------

samba and linux - changing the ip address of a mount point aka changing / remapping a samba share (windows share on linux host)

lots of pitfalls in this one, but essentially check the following;

1) the /etc/fstab, as this will give you lots of clues;
more /etc/fstab
................................
//10.3.128.132/ATT-2-QED$ /app/att/ATT-2-QUID cifs credentials=/e
tc/samba/cred_att_win,uid=wbsssssss1,gid=wbssssssphere,directio 0 0
//10.3.128.132/QED-2-ATT$ /app/att/QED-2-RAT cifs credentials=/e
tc/samba/cred_att_win,uid=wbsssssss1,gid=wbssssssphere,directio 0 0

Okay, some useful things there - it already tells you the IP of the remote windows share to mount. It also tells you there is a credentials file, so that when you come to mount the amended share, you can read this file in or use it to get the username and pw to make the command execute (more later)

2) check the smb.conf to clarify the mount

# Remote mount for ATT WebSphere read only
[websphere]
comment = ATT WebSphere Read Only Share
path = /app/WebSphere
valid users = gbwhoa
public = no
writable = no
printable = no

3) Please remember to umount the existing mount (ie in this case /app/att/ATT-2-QUID) or else you will get all sorts of pain, like trying to run the mount or smbmount command will return with

Could not resolve mount point /app/WebSphere/att/ATT-2-QED

or trying find the dir in the /app/att area will return a

ls: /app/att/ATT-2-QUID: Host is down
ls: /app/att/QED-2-RAT: Host is down

and trying to mkdir QED-2-RAT under /app/att will get
eudt0055 # mkdir ATT-2-QED
mkdir: cannot create directory `ATT-2-QUID': File exists

so remember - umount the dir!

eudt0055 # umount /app/att/ATT-2-QUID
eudt0055 # smbmount //10.3.218.132/ATT-2-QUID$ /app/att/ATT-2-QUID -o username=ATT-ATT,password=xxxxxxxx
eudt0055 # umount QED-2-RAT
eudt0055 # smbmount //10.3.218.132/QED-2-RAT$ /app/att/QED-2-RAT -o username=ATT-ATT,password=xxxxxxxx

make the change in fstab as well and then if you get the following;

eudt0055 # df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/cciss/c0d0p1 10321032 5040564 4756192 52% /
/dev/cciss/c0d0p2 6192608 160148 5717892 3% /home
none 2015508 0 2015508 0% /dev/shm
/dev/cciss/c0d0p3 6192608 348416 5529624 6% /var
/dev/cciss/c0d0p6 43175112 39522916 1458996 97% /app
df: `/app/WebSphere/att/ATT-2-QED': Permission denied
df: `/app/WebSphere/att/QED-2-ATT': Permission denied
eudt0055 #

umount the mount point and then mount it again

You can also have a 'accreditation' file, under /etc/samba, that contains password and username of user who share belongs to - the file can be called whatever you want it to be called, just vi the file and put username and password like this;

username=user
password=pass

and then in the fstab make explicit;

//10.3.128.132/ATT-2-QED$ /app/att/ATT-2-QUID cifs credentials=/e
tc/samba/cred_att_win,uid=wbsssssss1,gid=wbssssssphere,directio 0 0

samba and linux CLI - status, stop and start

/etc/rc.d/init.d/smb stop
/etc/rc.d/init.d/smb start
/etc/rc.d/init.d/smb restart
Two services are loaded :
- smb : to share some files
- nmbd : to explore the lan. Every 15 mn the system browse the lan in order to update the netbios name list.

To check the run :
/etc/rc.d/init.d/smb status

how to lookup an SMTP address

eust0202:root /root > nslookup -type=MX emea.aknark-net.com
Using /etc/hosts on: eust0202

looking up FILES
Trying DNS
emea.aknark-net.com
origin = nldc-00002.emea.aknark-net.com
mail addr = admin.emea.aknark-net.com
serial = 162022095
refresh = 900 (15M)
retry = 600 (10M)
expire = 86400 (1D)
minimum ttl = 3600 (1H)
eudt0202:root /root >

Thursday 14 May 2009

example of an fsck of a vxfs filesystem

remember to umount the fs first!!!

tudr0006> fsck -y -F vxfs -o full,nolog /dev/vx/rdsk/bbk02/data01
pass0 - checking structural files
pass1 - checking inode sanity and blocks
pass2 - checking directory linkage
pass3 - checking reference counts
pass4 - checking resource maps
OK to clear log? (ynq)y
set state to CLEAN? (ynq)y
tudr0006> fsck -y -F vxfs /dev/vx/rdsk/bbk02/data01
file system is clean - log replay is not required

running fsck on a vxfs filesystem

Running fsck on a VxFS filesystem

This section describes the use of fsck with VxFS filesystems.

The VxFS filesystem uses a tracking feature called intent logging to record pending changes to the filesystem structure. These changes are recorded in an ``intent log''. The VxFS fsck utility typically runs an intent log replay, which scans the intent log and completes or nullifies any pending updates. With VxFS, it is seldom necessary to run a full filesystem check.

NOTE: By default, the VxFS-specific fsck only runs an intent log replay, and does not perform a full filesystem check. The full filesystem check is provided, however, to handle cases where a filesystem is damaged due to I/O failure.
The following is the VxFS-specific format of the fsck command:

fsck [-F vxfs] -m special
fsck [-F vxfs] -n|N special
fsck [-F vxfs] [-o nolog,full] [-y|Y] special

The options are as follows:

-m
Check for mountability. To determine if a filesystem can be mounted, the -m option is specified.

-n
This option runs a full filesystem check. Specifying the -n option generates a report on errors but does not make any repairs in conjunction with those errors. Intent log replay is not performed. This form of the command can be run safely on mounted filesystems on which there might have been damage. If filesystem damage is suspected, a full fsck should be run to determine the extent of filesystem damage. (A full filesystem check is specified with the -o full option.)

-N
Synonym for -n

-y
Answer yes to all questions. This option has two ramifications. First, after an intent log replay is run, if the filesystem is not in a mountable state, a full filesystem check is initiated automatically. Second, yes is automatically answered to all questions posed by the full filesystem check.

-Y
Synonym for -y.

-o nolog
Inhibit log replay. An intent log replay is not performed. This option can be used to disable log replay, if the intent log is physically damaged.

-o full
Force full filesystem check. By default, only a log replay is run.
A full filesystem check is normally interactive, meaning the system administrator is prompted before any corrective actions are taken. This is not true if the -y or -n options are used. When these options are used, fsck automatically answers yes or no to prompts, instead of waiting for input.

When a system needs an FSCK and it drops to single user mode

You will see

Type control-d to proceed with normal startup,
or give root password for system maintenance:

so give the root pw...then you will see

single-user privelege assigned to /dev/console.
Entering System Maintenance Mode

then you will be in maintenance mode. Then you can run the fsck as directed by the OS as it is coming up.

Wednesday 13 May 2009

ALOMs are great

because if you have a hung console, you can reboot from the ALOM with the following commands;

poweroff

poweron

poweroff

Use the poweroff command to power off the host server to standby mode. If the server is already powered off, this command has no effect. However, ALOM is still available when the server is powered off, since ALOM uses the server's standby power. Some environmental information is not available when the server is in standby mode.

To Use the poweroff Command

Note - You must have reset/power (r) level user permission to use this command. See userperm for information on setting user permissions.

At the sc> prompt, type the following command:

sc> poweroff option(s)

Where option(s) is the desired option(s), if any.

If you type the poweroff command without any options, the command initiates a graceful shutdown of the Solaris Operating System, similar to one of the Solaris commands shutdown, init, or uadmin.

It can take up to 65 seconds for the poweroff command to completely shut down the system. This is because ALOM attempts to wait for a graceful shutdown to complete before the system is powered off.

Note - After the poweroff command shuts down the system, ALOM issues the following message:

SC Alert: Host system has shut down.

Wait until you see this message before powering the system back on.

Command Options

The poweroff command uses the following options. You can use these two options together. See Entering Command Options.

TABLE 5-7 poweroff Command Options
Option

Description

-f

Forces an immediate shutdown regardless of the state of the host. If the Solaris Operating System shutdown fails for any reason, use this option to force the system to be powered off immediately. This command works like the Solaris Operating System command halt; that is, it does not perform a graceful shutdown of the system or synchronize the file systems.

-y

Instructs ALOM to proceed without prompting the following confirmation question: Are you sure you want to power off the system?

Related Information

ALOM Shell Commands
bootmode
poweron

poweron

Use the poweron command to power on the server. If the host server's keyswitch, operation mode switch or rotary switch is in the Locked position, or if the server is already powered on, this command has no effect.

To Use the poweron Command

Note - You must have reset/power (r) level user permission to use this command. See userperm for information on setting user permissions.

At the sc> prompt, type the following command:

sc> poweron [-c] [fru]

Note - If you have just used the poweroff command to power off the host server, ALOM issues the following message:

SC Alert: Host system has shut down.

Wait until you see the message before powering the system back on.

To turn on power to a specific FRU (field-replaceable unit) in the server, type the following command:

sc> poweron fru

Where fru is the name of the FRU you want to power on.

For example, to turn power on to Power Supply 0, type:

sc> poweron PS0

Command Options

The poweron command uses two options:

-c - Goes immediately to the Solaris OS console upon completion.
fru - Powers on the specified FRU, (for example, you can use this command when a power supply is replaced in the host server.) ALOM supports the following FRUs. Note that some servers have fewer than four power supplies, so refer to your system documentation before executing these commands to verify that you are powering on the proper power supply for your server.

TABLE 5-8 poweron FRU Values
Value

Description

PS0

Powers on Power Supply 0 in the host server.

PS1

Powers on Power Supply 1 in the host server.

PS2

Powers on Power Supply 2 in the host server.

PS3

Powers on Power Supply 3 in the host server.

ALOM & console / console & ALOM

you get to the console through a terminal server (generally at least, if you are working remotely this is the only way you are going to do it)

you can get to the ALOM from the console

to do these things type the following (under each highlighted header (ie dont type console to ALOM!));

console to ALOM
#.

ALOM to console
console

when you think you have a hung console session

you are on a solaris box, in on the console, and you cant get any response

do a

#.

that should sort it out

cdpinfo and cdpinfo script

Display Cisco Discovery Protocol packet information via tcpdump or snoop. This information includes the name of the network switch the network interface is connected to, plus the port number, vlan, and duplex of the port.

#!/bin/ksh
#
# C D P I N F O . K S H
#
# Script to get the CDP packet from a Cisco switch
#

cdpinfo() {
nic=$1

#
# e1000 driver, special case

IFS="$OLDIFS"

if [ "$(echo "$nic" | grep "e1000")" ]
then
driver=e1000g
instance=`echo "$nic" | sed 's/e1000g//g'`
else
# now work out the speed/duplex/link, etc
driver=`echo $nic | sed 's/$[^0-9]*$[0-9]*/\1/'`
instance=`echo $nic | sed 's/[^0-9]*$[0-9]*$/\1/'`
fi

if [ "$driver" != "le" -a "$driver" != "qe" ]; then
if [ "$driver" = "ge" ]; then
advnegname="adv_1000autoneg_cap"
lpnegname="lp_1000autoneg_cap"
else
advnegname="adv_autoneg_cap"
lpnegname="lp_autoneg_cap"
fi

if [ "$driver" != "dmfe" -a "$driver" != "bge" -a "$driver" != e1000g -a "$d
river" != nge ]; then
# set the instance
/usr/sbin/ndd -set /dev/$driver instance $instance

# dmfe/bge/e1000/nge cards don't support the link provider
# (lp_*_cap) variables, ce uses kstat
if [ "$driver" = "ce" ]; then
lpneg=`kstat -p -m $driver -i $instance -s lp_cap_autoneg | awk '{print
$NF}'`
else
lpneg=`/usr/sbin/ndd /dev/$driver $lpnegname`
fi
case $lpneg in
0) lpneg=none ;;
1) lpneg=auto ;;
esac
else
# use the individual instance /dev entries
olddriver=$driver
driver="$driver$instance"

# set to unknown as per above
lpneg=`/usr/sbin/ndd /dev/$driver $lpnegname`
case $lpneg in
0) lpneg=none ;;
1) lpneg=auto ;;
esac
fi

advneg=`/usr/sbin/ndd /dev/$driver $advnegname`
case $advneg in
0) advneg=none ;;
1) advneg=auto ;;
esac

if [ "$driver" = "ce" ]; then
linkstatus=`kstat -p -m $driver -i $instance -s link_up | awk '{print $NF}
'`
else
linkstatus=`/usr/sbin/ndd /dev/$driver link_status`
fi
case $linkstatus in
0) linkstatus=down ;;
1) linkstatus=up ;;
esac

if [ "$driver" = "ce" ]; then
linkspeed=`kstat -p -m $driver -i $instance -s link_speed | awk '{print $N
F}'`
else
linkspeed=`/usr/sbin/ndd /dev/$driver link_speed`
fi
case $linkspeed in
0) linkspeed=10 ;;
1) linkspeed=100 ;;
esac

if [ "$driver" = "ce" ]; then
linkmode=`kstat -p -m $driver -i $instance -s link_duplex | awk '{print $N
F}'`
elif [ "$olddriver" = "bge" ]; then
linkmode=`kstat -p -m $olddriver -i $instance -s duplex | awk '{print $NF}
'`
else
linkmode=`/usr/sbin/ndd /dev/$driver link_mode`
fi
case $linkmode in
0) linkmode=HDX ;;
1) linkmode=FDX ;;
2) linkmode=FDX ;;
full) linkmode=FDX ;;
esac
else
advneg="????"
lpneg="????"
linkstatus="????"
linkspeed=10
linkmode=HDX
fi

# this runs the snoop and extracts the hexdump of the packet,
# splits the 4 digit strings into 2 digit strings and makes the
# hex upper case
snoopcriteria="ether 01:00:0c:cc:cc:cc and ether[20:2] = 0x2000"
switchid="??????????"; port="?/??"; duplex="???"; vlan="????";

snoop -x 0 -c 1 -d $nic $snoopcriteria 2>&1 > /tmp/.cdpinfo.ksh.$$ 2>&1 &

# get the pids, kick off the alarm and wait for the command to exit
cmdpid=$!
(sleep 65; kill $cmdpid > /dev/null 2>&1) &
alarmpid=$!
wait $cmdpid > /dev/null 2>&1
kill $alarmpid > /dev/null 2>&1

# otherwise, go ahead and parse the cdp packet
set -A rawpacket ` \
grep '^[ ][ ]*[0-9][0-9]*' /tmp/.cdpinfo.ksh.$$ | \
cut -c8-46 | \
sed 's/$[0-9a-f][0-9a-f]$$[0-9a-f][0-9a-f]$/\1 \2/g' | \
tr '[a-f]' '[A-F]'`

typeset -i arraylength=${#rawpacket[*]}
typeset -i i=0 id=0 len=0 subi=0 boolean=0 num=0

# grab the CDP version
typeset -i cdpver=${rawpacket[$(echo "ibase=16; 16" | bc)]}

# skip to location 1A in the packet
i=$(echo "ibase=16; 1A"|bc)
while [ $i -lt $arraylength ]; do
id=$(echo "ibase=16; ${rawpacket[$i]}*100+${rawpacket[$i+1]}" | bc)
len=$(echo "ibase=16; ${rawpacket[$i+2]}*100+${rawpacket[$i+3]}" | bc)
string=""

case $id in
1) subi=4
while [ $subi -lt $len ]; do
char=$(echo "ibase=16; obase=8; ${rawpacket[$i+$subi]}" | bc)
string="${string}\0${char}"
subi=$subi+1
done

switchid=$(print $string)
switchid=$(echo $switchid | sed 's/[^(]*[(]$[^)]*$[)]/\1/' | sed s'/G
igabitEthernet//g')
;;
3) subi=4
while [ $subi -lt $len ]; do
char=$(echo "ibase=16; obase=8; ${rawpacket[$i+$subi]}" | bc)
string="${string}\0${char}"
subi=$subi+1
done

port=$(print $string | sed s'/GigabitEthernet//g')
;;
10) subi=4; num=0
while [ $subi -lt $len ]; do
num=$(echo "ibase=16; $num*100+${rawpacket[$i+$subi]}" | bc)
subi=$subi+1
done

vlan=$num
;;
11) boolean=$(echo "ibase=16; ${rawpacket[$i+4]}" | bc)
if [ $boolean == 0 ]; then
duplex="HDX"
else
duplex="FDX"
fi
esac
i=$i+$len
done

# cdp version 1 doesn't tell you the duplex
if [ $cdpver = 1 ]; then
duplex="???"
fi

# remove the tempfile
rm -f /tmp/.cdpinfo.ksh.$$
}

#
# MAIN
#

# print header
echo "NIC IP_ADDR SWITCHID PORT VLAN LINK SPEED SWITCH_N/D HOST_N/D" | awk '{pri
ntf("%-8s %-13s %-34s %-8s %-8s %-8s %-10s %-13s %-13s\n", $1, $2, $3, $4, $5, $
6, $7, $8, $9)}'
echo "--------------------------------------------------------------------------
--------------------------------------------"
echo

#
# The old version assumes it needs to be plumbed up

#if [ $# = 0 ]; then
# nics=`ifconfig -a|grep '^[a-z][a-z]*[0-9]' \
# | grep -v lo0 \
# | egrep -v ':[0-9]: ' \
# | grep UP \
# | cut -d: -f1`
#else
# nics=$@
#fi

NIC_DATA=$(/usr/bin/kstat -p -c net | egrep -v "lpfc|streams|lo0|dls|ip|ipsec|tc
p|udp" | egrep "link_up|duplex" | grep -v link_duplex | awk -F: '{ printf("%s%s
%s\n",$1,$2,$4)}' | sort -u | sed -e s'/link_up//g' -e s'/duplex//g')

OLDIFS="$IFS"

IFS='
'
# now, cycle through the nics
for LINE in `echo "$NIC_DATA"`
do
NIC=`echo "$LINE" | awk '{print($1)}'`
LINK=`echo "$LINE" | awk '{print($2)}'`

#echo "NIC = $NIC"
#echo "LINK = $LINK"
if [ "$LINK" != "1" ] && [ "$LINK" != "full" ]
then
echo "$NIC LINK_DOWN" | awk '{printf("%-8s %-13s\n", $1, $2)}'
continue
else
cdpinfo ${NIC}
IFCONFIG=`ifconfig ${NIC} 2> /dev/null | grep inet | awk '{print($2)}'`
if [ ! "$IFCONFIG" ]
then
IFCONFIG="NOT_PLUMBED"
fi
echo "$NIC $IFCONFIG ${switchid} ${port} ${vlan} ${linkstatus} ${linkspeed
} ${lpneg}/${duplex} ${advneg}/${linkmode}" | awk '{printf("%-8s %-13s %-34s %-8
s %-8s %-8s %-10s %-13s %-13s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9)}'
fi

done