My.Sys.Admin: April 2009

Tuesday, 28 April 2009

What does the 'old' in the IDLE column when you run a 'who' mean?

old = idle over 24 hours

Kill inactive users

do a who -Hup

look for the PID of the user

kill -9 the user

Kill Inactive and Idle Linux Users

Kill Inactive and Idle Linux Users

Every once in awhile the SSH connection to my Linux server will die and I’ll be left with a dead user. Here’s how I discover the inactive session using the w command:

15:26:26 up 13 days, 23:47, 2 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
raam pts/0 wfc-main.wfcorp. Mon10 2days 0.04s 0.04s -bash
raam pts/1 pool-151-199-29- 15:26 0.00s 0.02s 0.01s w
You can easily tell there’s an idle user by glancing at the IDLE column; the user in the first row has been idle for 2 days. There are many ways of killing idle users, but here I’ll show you a few of my favorites. The bottom line is, you need to kill the parent process created by the idle user when he logged in. There are a number of ways of doing that.

Here is how I discover the parent process using the pstree -p command:

├─screen(29380)───bash(29381)───naim(29384)
├─scsi_eh_0(903)
├─sshd(1997)─┬─sshd(32093)─┬─sshd(32095)
│ │ └─sshd(32097)───bash(32098)─┬─mutt(32229)
│ │ └─screen(32266)
│ └─sshd(1390)─┬─sshd(1392)
│ └─sshd(1394)───bash(1395)───pstree(1484)
├─syslogd(1937)
└─usb-storage(904)
We need to find the parent PID for the dead user and issue the sudo kill -1 command. We use the -1 option because it’s a cleaner way of killing processes; some programs, such as mutt, will end cleanly if you kill them with -1. I can see by looking at the tree where I’m running the pstree command, so I just follow that down the tree until I find a common process (branch) shared by both users; this happens to be sshd(1997).

You can see there are two branches at the point — one for my current session and one for the idle session (I know this because I’m the only user logged into this Linux server and because I know I should only have one active session). So I simply kill the sshd(32093) process and the idle user disappears.

Of course, if you’re on a system with multiple users, or you’re logged into the box with multiple connections, using the above method and searching through a huge tree of processes trying to figure out which is which will not be fun. Here’s another way of doing it: Looking at the output from the w command above, we can see that the idle users’ TTY is pts/0 so now all we need is the PID for the parent process. We can find that by running who -all | grep raam:

raam + pts/0 May 10 10:45 . 18076 (wfc-main.wfcorp.net)
raam + pts/1 May 11 15:26 . 1390 (pool-151-199-29-190.bos.east.verizon.net)
Here we can see that 18076 is the PID for the parent process of pts/0, so once we issue kill -1 18076 that idle session will be gone!

Friday, 24 April 2009

what is emcgrab?

emcgrab is an explorer-like tool

collects emc config

it is usually in an emcgrab folder (if installed)

there is an .sh file

run it and it will ask you for a load of info like call id, your email etc

once that is done, it creates a tar and gzip file

the local pam module and how to reset user account lockout

Sometimes you'll get a call that states that a user is locked out of their account as they tried their pw too many times incorrectly. Although they are NIS or LDAP accounts, however they interact with the local PAM module which can use an account lockout if too many access attempts are made. This only applies to Linux, Sun has no module like this.

command to check if user is locked

/sbin/pam_tally --user gbuser

to reset

/sbin/pam_tally --user gbuser --reset

Vxvm and Emc - failed device troubleshooting and fix

If you have a suspected failed emc disk under vxvm

try a df -k (see if any io errors come back)
run a vxisk list (see if any failed devices are shown)
if failed devices are shown, you can then run the following commands;

Some useful commands;

^C# ./opt/emc/SYMCLI/V6.4.0/bin/syminq
Device Product Device
---------------------------------------- --------------------------- ---------------------
Name Type Vendor ID Rev Ser Num Cap (KB)
---------------------------------------- --------------------------- ---------------------
.........................................truncated........................................................................................
/dev/vx/rdmp/emcpower0s2 DGC RAID 5 0219 03000016 35651584
/dev/vx/rdmp/emcpower1s2 DGC RAID 5 0219 00000015 35651584
/dev/vx/rdmp/emcpower2s2 DGC RAID 5 0219 02000015 35651584

# vxprint -g dgamlap99 -htA
Disk group: dgamlap99
......................................................................
dg dgamlap99 default default 52000 1103536135.1260.eudt0040
dm amlap99dm01 - - - - NODEVICE
dm amlap99dm02 emcpower2s2 auto 1791 71294976 -
dm amlap99dm03 emcpower1s2 auto 1791 71294976 -
...............................................................
v mantas - DISABLED ACTIVE 178233344 SELECT - fsgen
pl mantas-01 mantas DISABLED NODEVICE 178233344 CONCAT - RW
sd amlap99dm01-01 mantas-01 amlap99dm01 0 71294976 0 - NDEV
sd amlap99dm02-03 mantas-01 amlap99dm02 14680064 56614912 71294976 emcpower2 ENA
sd amlap99dm03-02 mantas-01 amlap99dm03 20971520 50323456 127909888 emcpower1 ENA

# vxdisk list | grep amlap99dm (this assumes that you know the disk group etc after running a vxdisk list)
emcpower1s2 auto:sliced amlap99dm03 dgamlap99 online
emcpower2s2 auto:sliced amlap99dm02 dgamlap99 online
- - amlap99dm01 dgamlap99 failed was:emcpower0s2
#

# /etc/powermt display dev=all
Pseudo name=emcpower1a
CLARiiON ID=CK200033400328 [EUDT0040]
Logical device ID=600601602E030E00CA0FF19CCEE6D811 [LUN 0]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016010601796d2s0 SP A0 active alive 0 3
3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016910601796d2s0 SP B1 active alive 0 0
3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016110601796d2s0 SP A1 active alive 0 3
3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016810601796d2s0 SP B0 active alive 0 0

Pseudo name=emcpower2a
CLARiiON ID=CK200033400328 [EUDT0040]
Logical device ID=600601602E030E00CC0FF19CCEE6D811 [LUN 2]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP B
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016010601796d1s0 SP A0 active alive 0 3
3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016910601796d1s0 SP B1 active alive 0 0
3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016110601796d1s0 SP A1 active alive 0 3
3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016810601796d1s0 SP B0 active alive 0 0

Pseudo name=emcpower0a
CLARiiON ID=CK200033400328 [EUDT0040]
Logical device ID=600601602E030E00CD0FF19CCEE6D811 [LUN 3]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A
==============================================================================
---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---
### HW Path I/O Paths Interf. Mode State Q-IOs Errors
==============================================================================
3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016010601796d0s0 SP A0 active alive 0 3
3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016910601796d0s0 SP B1 active alive 0 0
3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016110601796d0s0 SP A1 active alive 0 3
3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016810601796d0s0 SP B0 active alive 0 0

the fix could be the following;

Veritas - how to fix failed device

# vxprint -g dgamlap99 -ht

:

DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE

:

dm amlap99dm01 - - - - NODEVICE

dm amlap99dm02 emcpower2s2 auto 1791 71294976 -

:

v mantas - DISABLED ACTIVE 178233344 SELECT - fsgen

pl mantas-01 mantas DISABLED NODEVICE 178233344 CONCAT - RW

sd amlap99dm01-01 mantas-01 amlap99dm01 0 71294976 0 - NDEV

sd amlap99dm02-03 mantas-01 amlap99dm02 14680064 56614912 71294976 emcpower2 ENA

# vxdisk -o alldgs list | grep amlap99dm

emcpower1s2 auto:sliced amlap99dm03 dgamlap99 online

emcpower2s2 auto:sliced amlap99dm02 dgamlap99 online

- - amlap99dm01 dgamlap99 failed was:emcpower0s2

# ./syminq

Device Product Device

---------------------------------------- --------------------------- ---------------------

Name Type Vendor ID Rev Ser Num Cap (KB)

---------------------------------------- --------------------------- ---------------------

:

/dev/vx/rdmp/emcpower0s2 DGC RAID 5 0219 03000016 35651584

/dev/vx/rdmp/emcpower1s2 DGC RAID 5 0219 00000015 35651584

/dev/vx/rdmp/emcpower2s2 DGC RAID 5 0219 02000015 35651584

# /etc/powermt display dev=all

:

Pseudo name=emcpower0a

CLARiiON ID=CK200033400328 [EUDT0040]

Logical device ID=600601602E030E00CD0FF19CCEE6D811 [LUN 3]

state=alive; policy=CLAROpt; priority=0; queued-IOs=0

Owner: default=SP A, current=SP A

==============================================================================

---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---

### HW Path I/O Paths Interf. Mode State Q-IOs Errors

==============================================================================

3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016010601796d0s0 SP A0 active alive 0 1

3072 pci@1c,600000/lpfc@1/fp@0,0 c0t5006016910601796d0s0 SP B1 active alive 0 0

3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016110601796d0s0 SP A1 active alive 0 1

3074 pci@1d,700000/lpfc@1/fp@0,0 c1t5006016810601796d0s0 SP B0 active alive 0 0

# vxdmpadm listctlr all

CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME

=====================================================

emcp EMC_CLARiiON ENABLED EMC_CLARiiON0

c3 Disk ENABLED Disk

# vxdmpadm getsubpaths ctlr=emcp

NAME STATE[A] PATH-TYPE[M] DMPNODENAME ENCLR-TYPE ENCLR-NAME ATTRS

================================================================================

emcpower0c ENABLED(A) - emcpower0s2 EMC_CLARiiON EMC_CLARiiON0 -

emcpower1c ENABLED(A) - emcpower1s2 EMC_CLARiiON EMC_CLARiiON0 -

emcpower2c ENABLED(A) - emcpower2s2 EMC_CLARiiON EMC_CLARiiON0 -

# vxdisk -g dgamlap99 check emcpower0s2

emcpower0s2: Okay

# /etc/vx/bin/vxreattach

# vxmend -g dgamlap99 fix stale mantas-01

# vxmend -g dgamlap99 fix clean mantas-01

# vxvol -g dgamlap99 startall

# umount /dev/vx/dsk/dgamlap99/mantas

# /opt/VRTS/bin/fsck /dev/vx/rdsk/dgamlap99/mantas

# mount /dev/vx/dsk/dgamlap99/mantas /mantas

Friday, 10 April 2009

Idenitfying a Simple Raid / CONCAT on vxvm

here we have 2 instances of simple RAID (ie CONCAT), where the disks are all lumped together to make big virtual disks. In these examples, we see just one plex with several sub disks under it. A quick way to check that it is simple raid is to calculate the size of the volume and if that tallies with the amount of subdisks, you know it is simple raid as all the available disks are jsut being utilised to create a big virtual disk;

v APP - ENABLED ACTIVE 142748172 SELECT - fsgen
pl APP-01 APP ENABLED ACTIVE 142748172 CONCAT - RW
sd PWDAML02-01 APP-01 PWDAML02 0 71374086 0 sdc ENA
sd PWDAML03-01 APP-01 PWDAML03 0 71374086 71374086 sdd ENA

71374086 x 2 = 142748172

Also, we know it is concat becuase the LAYOUT column says so on the PLEX line;

pl APP-01 APP ENABLED ACTIVE 142748172 CONCAT

Anyway, here are a couple of examples;

Disk group: dgdaml01

DG NAME NCONFIG NLOG MINORS GROUP-ID
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO

dg dgdaml01 default default 0 1146651645.1063.eupr0085

dm PWDAML02 sdc sliced 2330 71374086 -
dm PWDAML03 sdd sliced 2330 71374086 -

v APP - ENABLED ACTIVE 142748172 SELECT - fsgen
pl APP-01 APP ENABLED ACTIVE 142748172 CONCAT - RW
sd PWDAML02-01 APP-01 PWDAML02 0 71374086 0 sdc ENA
sd PWDAML03-01 APP-01 PWDAML03 0 71374086 71374086 sdd ENA

Disk group: dgdaml01

DG NAME NCONFIG NLOG MINORS GROUP-ID
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
RV NAME RLINK_CNT KSTATE STATE PRIMARY DATAVOLS SRL
RL NAME RVG KSTATE STATE REM_HOST REM_DG REM_RLNK
V NAME RVG KSTATE STATE LENGTH READPOL PREFPLEX UTYPE
PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE
SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE
SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE
DC NAME PARENTVOL LOGVOL
SP NAME SNAPVOL DCO

dg dgdaml01 default default 0 1118057845.1098.eupr0045

dm PWDAML01 sdc sliced 2074 71374086 -
dm PWDAML02 sdd sliced 2330 71374086 -
dm PWDAML03 sde sliced 2330 71374086 -
dm PWDAML04 sdf sliced 2330 71374086 -
dm PWDAML05 sdg sliced 2330 71374086 -

v amlapp01 - ENABLED ACTIVE 356870430 SELECT - fsgen
pl amlapp01-01 amlapp01 ENABLED ACTIVE 356870430 CONCAT - RW
sd PWDAML01-01 amlapp01-01 PWDAML01 0 71374086 0 sdc ENA
sd PWDAML02-01 amlapp01-01 PWDAML02 0 71374086 71374086 sdd ENA
sd PWDAML03-01 amlapp01-01 PWDAML03 0 71374086 142748172 sde ENA
sd PWDAML04-01 amlapp01-01 PWDAML04 0 71374086 214122258 sdf ENA
sd PWDAML05-01 amlapp01-01 PWDAML05 0 71374086 285496344 sdg ENA

Thursday, 9 April 2009

How to restart ssh daemon on linux

Obviously if sshd is not running then you will have to go through the ilo / Term Server and then do the following;

cd /etc/rc.d/init.d/

./sshd start