WORKBOOK

THE LOG BOOK | DAILY | WEEKLY | MONTHLY | YEARLY
SOMETIMES | USERS | "LOOK AT IT" COMMANDS

I’ve spent 12 years trying to decipher HP-UX manuals and struggle through reams of computer jargon just to do my job. In those 12 years I’ve managed to pick up a few tricks to make that job a lot simpler and easier. You probably won’t find most of this information in the manuals and some of it may even seem to run contrary to what you’ve been taught by trainers who never actually worked outside of a classroom. The following is a brief overview of the responsibilities of a busy System Administrator from my point of view and based on my experience.

Each computer should have one, and only one, person responsible for it’s health and well being - that person is known as the computer’s System Administrator or SA. No one can or should do anything on or to the computer without express consent of it’s SA.

The SA is responsible for security of the machine and any peripherals attached to it. It should be kept clean and dust free and in excellent repair. Two of the biggest areas of concern will be the day to day handling of users and their problems and the ever present backups.

The SA is also responsible for the condition of the operating system. Providing updates when applicable and providing patches between updates. The best way to keep current with security issues and patches is to subscribe to the Hewlett-Packard Support Line Mail Service. Send an email to support@us-external.hp.com .Don’t put anything in the subject. Put subscribe help in the body.

HP’s WEB page - The HP Electronic Support Center is at: http://us-support.external.hp.com/. Certain areas of the site have to be subscribed to and you need a ‘handle’ and/or an active support contract to get in. The rest of the site is good to browse through.

The work done to keep the operating system in good working order carried forward to all the other installed software. Provide updates and/or patches and/or scripts as necessary.

In the SA’s absence (scheduled and non-scheduled) the SA will appoint another team member to ‘stand-in’ and assume all of the their responsibilities.


THE LOG BOOK
The System Administrator will keep a log book, one for each of their machines containing any and all information necessary to carry on the operation of the computer in their absence. The book should have, at least, the following sections:
Service Calls
Patches
Configuration
Software
Crons and Scripts
System Recovery Information
Forgettable Commands

Is the LOG BOOK important? If you have never kept one before you may not think so. But I assure you the effort is well worth it. After you use it for awhile I guarantee it will be one of the first things you will want to set up for every one of your systems.


Service Calls -
The purpose of this section is to keep a running commentary (diary) on the computer under your control. Use it for problem resolution, updates (both software and hardware), new peripherals, replacements, etc. Write it all out, in English not computereeze. The more complete the record the easier it will be to track problems and solutions. I have found it invaluable in working with vendors who can conveniently forget or conveniently remember that they did something that they really didn’t (attached is a sample page).
A brief description of the problem, replacement, new peripheral, etc.
When you called - date and time
Who was called - PRC/HP or a vendor?
Stuck on hold? For how long?
The name of the person who will be helping with the problem
Reference # - it’s a good idea to highlight this number so you can find it easily
This person’s ideas of what’s wrong and their solution
Will they have a tech call? When?
Will they call you back? When?
Are they turning you over to someone else?
When was the work performed?
Who did it?
When did they show up?
When did they leave?
What did they do?
Were there new or replacement parts?
What was the old serial # and the new one?
Personal comments
Next problem
Next situation

This is just normal, common sense advice, nothing earth-shaking, but, if you keep accurate records you will be able to spot trouble before it happens and know who to call and who to stay away from. You will also have a running record of what’s been added to your computer and what’s been removed, and why.


Patches -
There are two schools of thought when it comes to patches. The first is - "if it ain’t broke don’t fix it". Don’t add patches just because you can. Add them if and when needed. The other school says "it must be broke if HP made a patch to fix it". Me? I look at the patch, if it relates to something I'm using I install it. I don’t ‘mass install’ patches. For example - I don't install Journaled File System patches ‘trusted systems’ since I can’t use JFS on a trusted system (currently).

Regardless of which one of the above schools you subscribe to; keep an up-to-date, hard copy, list of patches currently on your system. The next two commands should be sent to files [filename], edited so you can use it, and put in the book.

    /var/adm/sw/patch/PATCH.log > [filename]
    -- for a list of currently installed patches and their install dates. This file should list exactly the same patches as the next command.
    swlist -l fileset > [filename]
    -- to see a list of all software on the computer along with a brief description of the patch. This file should list exactly the same patches as the above command. You’ll find the patch section about halfway through 16 or more pages of installed software.

use swlist -l fileset | grep [patch name] to check to see if a particular patch exists.

I also don’t keep tar and gz files once they’re used. I can always get them again if I need them. However, if a client is uncomfortable with their removal back them up to tape.

Check to see that you have only one copy of the patch information on your disk not two or three. They take up a fair amount of storage space. You’ll find the patches (by default) in /var/adm/sw/products and /var/adm/sw/patch.


Configuration -
The purpose of this section is to give you a place where you can quickly find things like the amount of memory in the computer, it’s serial number, address and subnet mask on the network, name, gateway, and any other information that you may need from time to time.

You should also keep in this section a drawing showing of the physical layout of the computer as you know it to be and change it, correct it or add to it as the system grows and/or changes. It doesn’t have to be colorful or beautiful, just accurate.

This a good section to put the original (or, at least, copies of the original) contracts, purchase orders, delivery slips, etc. Add to it the paperwork for peripherals, additions and expansions. Don’t throw away paperwork from changed or replaced items - just mark them as removed or replaced, when it was replaced, by whom and by what.

It’s also a good idea to keep hard copies of current:

ioscan -kf – for a list of all drivers and hardware paths.
bdf – for a list of mounted file systems.
fstab – for a list of information about those file systems.


Software -
This is the section which should house copies of software licenses and agreements, customer contracts, customer needs, etc.

If you don’t have access to this paperwork write down where they are kept - who has them. Where are and who has the installation tapes, CD’s or disks

Store a hard copy list of software currently on the computer. Update it when things change.

swlist -l fileset -- for a list of all software on the computer


Crons and Scripts -
It is very important to keep a hard copy print out of crons. Don’t forget to also list:

who are they for?
what they are?
when they execute?
what they do?
why were they created?

SSO Montgomery has provided a cron for it’s system accounting. This is a rather extensive cron file and can be found in the /var/spool/cron/crontabs/root. Information on it’s purpose and use can be found in the SSO USAM Software Users Manual.

It is also very important to keep a hard copy print out of any scripts you write or are written for any reason for use on your computer. Don’t forget to also list:

where they are located?
who are they for?
when do they execute and why?
what they do?
why were they created?


System Recovery Information -
Every one of these sections is important but I think this is the most important - keep a hard copy of the output from all of the following commands:

/sbin/vgdisplay -v /dev/[volume group] - for each volume group.
/sbin/lvdisplay -v /dev/[volume group]/[logical volume] - for each logical volume.
/sbin/lvlnboot -v - boot definitions for all bootable volumes.
/etc/fstab - static information about all file systems.
/usr/bin/bdf - information about mounted file systems. Keep it current.
/usr/sbin/swapinfo - information about your swap areas.
/sbin/pvdisplay -v /dev/c?t?d? - for each physical volume.
/sbin/ioscan -fnC disk - for information on all hard drives and arrays.
/sbin/ioscan -kf - for information on all peripheral devices.

Printed information from any and all software packages that use logical volumes for their data storage. Many databases support the use of raw logical volumes. You must have a list of the logical volumes used by the databases and any raw volumes (so that they aren’t overwritten) - Print it out.

Keep each of the above printouts current. As changes are made to volume groups, logical volumes and/or other peripheral devices make sure the printouts reflect the changes.


Forgettable Commands -
I also keep a section on commands that I don’t use everyday or can’t remember (this section gets bigger everyday). I would like to think that I can remember what a certain command is for a given situation but, I find that in reality I cannot. So I write it down. For example -

To find files over 2MB and list them with the path, size, etc. -

find / -size +2000000c -exec || {} \;

To send it to a file that you can read later add >ecs1 (or any file name you wish to the above command just before you execute it -

find / -size +2000000c -exec ll {} \; >ecs1

To look at the ecs1 file and sort it on the third column -

cat ecs1 | sort -k3 > ecs2 - the new sorted file will be ecs2.

To find a user with UID 215 -

find -fsonly hfs -acl opt -a (! - user 215)_ -a -acl

To find the user with UID 215 and change all his or her files ownership’s to the user with UID 315 -

find -fsonly hfs -user 215 -exec /bin/chown 315 {} ;

OK, so if you can remember things like this - forget writing it down, you don’t need this section. Me? … well, I have to write it down.

As well as commands that I tend to forget - I also keep a hard copy of the most current /etc/groups and /etc/passwd so that I always have what users and groups exist on the computer


DAILY

Quick Eyeball Check | bdf | Core File Information | tmp | Backup | SAM Logs | Array Check

Quick Eyeball Check - Open all the doors on the T520 and arrays - check to see that all lights are green and control panels show no problems.
There are some solid and flashing lights which are ‘normal’
Service Processor - flashing green - OK
0 0 - flashing red numerals - OK
High Voltage Power Supply - flashing yellow - OK
Console/Lan SCSI I/O card - flashing yellow - OK
Do a quick visual check on all cables for wear and/or abuse. Visually check all connections for looseness.


bdf -
As root, run bdf. Check to see if all the file systems that should be mounted are, and that none of the file systems are full or getting that way.

It is a good idea to keep a record of bdf’s concentrating on file sizes. Try to get a feel for how fast the data is growing and on which file systems. This will give you an idea on when certain file systems need to be:

enlarged
new storage areas (hard disks and/or arrays) needed
files trimmed
old data removed or archived
It will also give you an idea of how much storage each of your clients is now using and will allow you to make a fairly accurate guess as to how much they will be using in the future.


Core File Information -
Core files are important for locating system errors. Currently the only ones who can effectively use core files are HP themselves. They will normally call into the system, and run check programs against the core files to find problems.

Unless you are saving the core files to solve a problem they should be deleted.

If you are using them for problem solving you need to make sure that the core and crash dumps need to run to completion. If the dump area is to small it will not complete and HP will not be able to find the answers to the problem in the file.

Provide an area on one of the drives, or arrays, as large as your system memory plus 50%. Not only must the system dump the entire contents of it’s memory into that area, it must also have some extra room to compress the files it creates.

As root, remove ALL core and crash dumps unless instructed not to by HP or PRC. Standard memory (core) dumps will normally go to /var/adm/crash by default. Diagnostic memory dumps (if Hewlett-Packard Diagnostics is running) will go to /tmp/syscore.

A typical command to remove all core files more that 24 hours old -

find / -name "core" -mtime 1 -exec rm -rf {} \; >/dev/null -

modify this as necessary to find and remove by size or date.


tmp -
/tmp is exactly what it says - it is a temporary holding place. Things that wind up here are temporary in nature and can (and should) be removed on a regular basis. A good rule of thumb is to remove anything that has been there for 14 days. You can write a little script and put it in cron to do it automatically for you.

Users have a tendency to work store little projects in /tmp. They should be encouraged to set up an area in their /home directory. They should also be encourage not to make it a dumping place for tons of little programs and junk they seldom if ever use. Keep their area neat, clean and small.

Remove all sub-directories except /tmp/syscore - certain HP diagnostics will use this directory.

Remember to check all /tmp sub-directories on the computer. Usually each program has it’s own /tmp directory.

The /tmp subdirectory can be found on most systems at these locations:

/.sso/.gateway/tmp
/.sso/.gateway/tmp/tmp
/.sso/.lib/tmp
/.sso/.lib/tmp/tmp
/.sso/.opsys/tmp
/.sso/.opsys/tmp/tmp
/.sso/.oracle/tmp
/home/ftp/pub/patch/tmp
/tmp
/usr/tmp
/var/adm/sw/patch/tmp
/var/tmp
/var/opt/pd/tmp
/var/spool/cron/tmp


Backup -
Different clients require different backup strategies and these should be worked out with them at contract time.

There are standard practices for backing up any and all data stored on the computer. One of the most common practices is to backup incrementally every evening and do a full backup sometime on the weekend.

Incremental means to backup only those files which have changed during the day. This saves tape and time when initially but becomes a very time consuming and exacting task when restoring lost or corrupted files.

Full means backup everything whether it has changed or not.

Every evening means at a time when the system can be taken down to single user mode and it will not affect any clients.

Weekends means the same as every evening except the downtime is longer because you have more to back up.

Create a cron to do it for you

Remember that DDS tapes have a finite lifespan - about 50 uses is the maximum.


SAM logs -
SAM keeps track of a lot of information and stores it in ‘log’ files in many, many places on the hard drive. The information in these files is important and should be ‘checked out’ on a regular basis. These files, if allowed to grow unchecked, will take over the hard drive and fill their respective file systems to capacity.

SAM provides a way to view the log files, and keep them in control by giving you a quick and easy way to either delete them or ‘trim’ them to default sizes.

As root, - sam - Routine tasks - System Log Files -

check to find errors, intrusions, etc.
trim as necessary


Array Check -
Arrays and their configuration is a topic discussed later on in this workbook.

As root, run arraydsp this will give you the array numbers you will need for the vgdisplay command. You can also find the array numbers from the menu on each of the arrays.

Then run vgdisplay -v [array number] on every array to do a complete check on all of the components of the array.

Look for "good" displayed by each hard drive. If the word good is missing from any of the drives that drive needs to be replaced.


WEEKLY


Backup -
I probably should just copy the backup part from the daily section above - and maybe I will. The only difference is that this backup will take longer because it is a full backup and it should be done at the most remote part of the weekend when you think nobody in their right mind should be awake enough to be on a computer. I will try to give more specific information on backups later in this epistle. However, for now ................

Different clients require different backup strategies and these should be worked out with them at contract time.

There are standard practices for backing up any and all data stored on the computer. One of the most common practices is to backup incrementally every evening and do a full backup sometime on the weekend.

Incremental means to backup only those files which have changed during the day. This saves tape and time when initially but becomes a very time consuming and exacting task when restoring lost or corrupted files.

Full means backup everything whether it has changed or not.

Every evening means at a time when the system can be taken down to single user mode and it will not affect any clients.

Weekends means the same as every evening except the downtime is longer because you have more to back up.

Create a cron to do it for you.

Remember that tapes have a finite life - about 50 uses is the maximum.

Where are these backup tapes stored?

Who is responsible for making sure the tapes are safe and that they are in some kind of arrangement that makes them easy to find. For instance are they stored by client/date or by machine/date or just thrown in a box. somewhere.

Is the documentation on each backup tape complete enough to insure that everyone knows what it is and where it is from and what kind of program did the backup - tar, sam, cpio, etc.


MONTHLY
This is easy - outside of all the other work you have to do on a daily basis - at least once a month clean the dust from the door grills on the arrays, JBOD, power modules and fan units. Big deal you can do it in your sleep, right? Just don’t forget to do it.


YEARLY
As if you had nothing else to do, on a yearly basis you need to find out the status of the contracts that concern your computer. Are the service contracts for hardware and software current and renewed?

And, make sure the batteries in the SCSI I/O cards are in good shape. They need to be replaced every other year.


SOMETIMES
Backups are wonderful things and they save much time and aggravation to the end user if they are done correctly and when they are supposed to be done. But there are those times when, no matter how careful you are and how much planning and thought you put into the backup procedure - somebody is going to lose something that came and went between backups. Just for the heck of it, on the spur of the moment and without taking the computer down to single users level I’ll pop a tape in the drive and backup an area I think gets the most traffic. I write the date and time on the label and store it with the rest. I can’t tell you how many times that single tape has had the needed files on it. Magic stuff.


USERS
Users hold a special place in the heart of every System Administrator. And it’s real close to the relationship between System and Network Administrators. Yet without these people SysAdmins wouldn’t have a job. Grin and bear it.

Think of how boring the job would be without the intensely insidious random interruptions, lost files, unresponsive monitors, modems and CPU’s, printers that won’t print and plotters that won’t plot.


Setup -


Passwords -


Maintenance -


Home -


"LOOK AT IT" COMMANDS


Arrays | Crons | Disks | File Systems | Logical Volumes | Oracle | Processes
Run Levels | Software | Swap | Users | Volume Groups


Arrays -
ps -ef | grep arraymon - checks to see if the ARMServer array server daemon is running. If it is not array command will not work.
/sbin/init.d/hparray start - will start the ARMServer daemon.
/opt/hparray/arraydsp -i - lists array id (serial) numbers.
arraydsp -v [array_id] - gives a lengthy description of the array, fans, drives, power units, LUNs, etc.
/usr/sbin/hpC2400/arrayscan - is an older array command that will list the disk arrays attached to the computer and their SCSI addresses.
arraylog -e [array_id] - displays the contents of the disk array controller event log.
arraylog -u [array_id] - displays the contents of the disk array controller usage log.


Crons -
more /var/spool/cron/crontabs/root - lists all crons, dates, times, etc. that currently affect the system.
crontab -l - lists all crons, dates, times, etc. that currently affect the system.


Disks -
ioscan -fnC disk - lists all devices attached to the computer which are known as disks. There will be entries for these disks, usually at 0/52 with target numbers of 6, 9, 11, and 14. 0/52.6.0 (/dev/dsk/c3t6d0) is the default boot disk.
diskinfo -v /dev/rdsk/[c3t6d0] - gives a verbose description of the disk. Size, manufacturer, blocks, etc. Note that you must use the raw (character) device file to obtain this information.
du - and it’s associated switches, a, b, r, s, t, and x give a cryptic list of files and/or file systems and the amount of disk space they currently occupy.
lssf /dev/dsk/[c3t1d0] - lists information about a device special file.
more /etc/lvmpvg - lists physical volume group information file.
pvdisplay - v /dev/dsk/[c3t1d0] - gives a lengthy description of the disk in it’s relationship as part of a Volume Group. If the disk is not part of a Volume Group this command produces no results. There is also a rather long list of the PEs and all of the associated Logical Volumes and their LEs.


File Systems -
bdf - gives a list of the currently mounted file systems, size, usage and mount point.
more /etc/fstab - lists the file systems available on the computer, mount points, type of file system, etc. It does not tell whether or not a particular file system is mounted.
more /etc/checklist - yields the same results as the fstab command above.
more /etc/mnttab - table of devices mounted by the mount command.
mount - lists all mounted file systems, mount points, and when they were mounted.
quot -F hfs [file system] - give the file system name, mount point, file owner name and number of files owned.
df - displays the number of free 512-byte blocks and free inodes available for all mounted file systems.


Logical Volumes -
lvdisplay -v /dev/[Volume Group]/[Logical Volume] - gives a complete listing of information concerning the Logical Volume including what Volume Group it belongs to, size and amount of PEs and Les, a listing of Physical Volume PEs and their relationship to the Logical Volume’s LEs.


Oracle -
ps -ef | grep ora - checks the running processes to see if there are a set of daemons running for each entry in /etc/oratab - if not, the database with no daemons is not working.
cat /etc/oratab - gives a list of Oracle databases on the computer. Those that will be started at boot-up time will have a Y next to them.


Processes -
ps -ef - lists all running processes
kill - for a gentle stop of the process
kill -9 [process number] - to drop it dead in it’s tracks


Run Level -
more /etc/inittab - is the script for the boot init process. It is the script that the computer uses to start the various processes at boot time. The first line usually contains the default run level of the computer.


Software -
swlist - lists all software products installed on the computer.
swlist -l bundle - displays information about a software product(bundle)installed on the computer.


Swap -
swapinfo - lists currently available swap space and their respective sizes.

See also swap


SYSTEM
top - top display and update information about the top ‘processor hog’ processes.


Users -
w - lists the current time, how long the system has been up in days, hours and minutes, how many users are currently on the computer, when they logged in, etc.
who - lists who is currently logged onto the computer, when they logged on and what terminal they are using.
man who - shows you the manual page for the who command. Who is a very useful command with many switches and variations. For instance who -b displays the last system boot date and time and who -a lists everything.
whoami - returns your current login name.
uptime - returns, among other things how many users are on the system and how long the system has been 'up'.


Volume Groups -
vgdisplay -v [Volume Group] - gives a complete listing of information concerning the Volume Group including the Logical Volumes it contains, what PVs are being used, size and number of PEs, etc.
strings /etc/lvmtab - lists all Volume Groups and their respective Physical Volumes.

Prepared by: Everette Smith, Impact Innovations Government Group, Inc.


Back

Home | Index | Startup & Shutdown | SAM | LVM | Devices
| Security | Workbook | Disaster | Information Sources | Glossary


Continue