Notices
Computer & Technology Related Post here for help and discussion of computing and related technology. Internet, TVs, phones, consoles, computers, tablets and any other gadgets.

UNIX help please

Thread Tools
 
Search this Thread
 
Old 02 April 2003, 11:02 AM
  #1  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

Our mainframe at work keeps freezing up. When it happens, it will respond eventually but usually takes a good few minutes. It normally ends up in us having to reboot it, as we aren't overly clued-up on unix.

I've read that it may be caused by the print daemon using excessive cpu cycles when a printer goes offline.

Any ideas what it could be/how to fix?

Cheers
Andy
Old 02 April 2003, 11:43 AM
  #2  
boxst
Scooby Regular
 
boxst's Avatar
 
Join Date: Nov 1998
Posts: 11,905
Likes: 0
Received 0 Likes on 0 Posts
Post

Hello

If you can actually get a telnet session up, using the "top" command will show you the processes that are currently using the CPU.

This might give you a starting point.

Steve.
Old 02 April 2003, 11:47 AM
  #3  
Gedi
Scooby Regular
 
Gedi's Avatar
 
Join Date: Jan 2003
Posts: 932
Likes: 0
Received 0 Likes on 0 Posts
Post

there is no way of telling from just reading the post. See if you can eliminate what processes are and arn't working correctly when it locks.

If your not clued up on unix, then its probable that you are gonna have to get somebody in to take a look at it.

Here are a few analysis commands if you do wanna attack it yourself:

vmstat is a program that allows one to get a quick look at statistics about the memory, CPU, and disk subsystems.

mpstat is a program that is available on Solaris and Linux that allows one to see statistics about processor utilization.

sar, sa, lastcomm, and last all allow one to examine historical data as well as more recent events on the system.

ps, which stands for "process status", is used to show processes executing on the system and information about them.

lsof, or list open files, shows all the files that the operating system currently has open.

readelf displays in detail the ELF (Executable Linking and Format) headers of a binary file.

ldd reads the contents of the ELF headers that show what shared object libraries the executable depends on.

There are litrally thousands of commands and thousands of things that could be going wrong. However you do seem to have a serious problem in the OS. Were talking about real operating systems here, not Micro$haft windows....heh. They are not supposed to crash like windows is. I think its a built in feature in windows..lol.

*spelling corrections*


[Edited by Gedi - 4/2/2003 12:50:41 PM]
Old 02 April 2003, 11:47 AM
  #4  
DrEvil
Scooby Regular
 
DrEvil's Avatar
 
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes on 0 Posts
Question

Top isn't a standard unix util on all unix flavours - usually installed specify by the system admins as it is shareware.

Could you tell us what sort of system it is?

ie. Sun, HP, SGI, IBM, Fujitsu-Siemens?

Old 02 April 2003, 01:11 PM
  #5  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

It doesnt have TOP on, I once downloaded it and tried to install it but the installer kept falling over.

It's SCO unixware 7.1.

I'm quite good with linux but things you expect to be there etc arent on unix.

The trouble with all the commands suggested are we can't really execute them when the problem arises. You can't telnet in (it says session forked) and if you were logged in from before it doesn't respond to commands.

Typing them into a console on the server doesn't fair much better - with it taking a few mins to actually execute the command.

I suspect the version of Unixware on the server is out of date, but its on there and i've just gotta deal with that.

I really feel it has something to do with the printer daemon/spooler.

Andy
Old 02 April 2003, 01:17 PM
  #6  
stevem2k
Scooby Regular
 
stevem2k's Avatar
 
Join Date: Sep 2001
Location: Kingston ( Surrey, not Jamaica )
Posts: 4,670
Likes: 0
Received 0 Likes on 0 Posts
Post

run vmstat under nohup and output it to a file. That way you have a 'history'of the subsystems next time it goes t1ts up.

Steve
Old 02 April 2003, 01:25 PM
  #7  
ragnarock2
Scooby Regular
 
ragnarock2's Avatar
 
Join Date: Jan 2003
Posts: 502
Likes: 0
Received 0 Likes on 0 Posts
Post

Why not disable the print daemon, and see if it still goes belly up? (i know it's a pain if people need prints straight away etc - but will prove your theory right, and give you an angle to attack from!!!)
Old 02 April 2003, 01:45 PM
  #8  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

vmstat doesnt seem to exist on our system.

Disabling the print daemon is not really an option due to the amount of jobs being processed and the intermittency of the fault.

Andy
Old 02 April 2003, 02:03 PM
  #9  
DrEvil
Scooby Regular
 
DrEvil's Avatar
 
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes on 0 Posts
Question

Does 'sar' exist on the system? If yes, set that up.

Alternatively, you can get some info on the processes running with:

ps -aux
(I'm assuming BSD format commands on SCO)

You can see how much CPU time each process uses..
Old 02 April 2003, 02:20 PM
  #10  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

As a Linux person, i'd tried ps -axu, but it doesnt understand the x part and u is "by userid" - so no use either. I wouldn't say the information given on -a was helpful, as it doesnt even give usernames etc.

Sar does exist, but when the system dies, sar just tells us what we already knew - the CPU is 0% free.

I'm thinking it might be the "Shivaport Atom" boxes which we host a couple of the dotmatrix printers off. I think they are a kind of terminal server, unfortunately as it appears that the printers to blame are on here I can't just edit their interface script and put in a sleep command

Cheers

Andy
Old 02 April 2003, 02:51 PM
  #11  
DrEvil
Scooby Regular
 
DrEvil's Avatar
 
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes on 0 Posts
Arrow

Right, well based on that you are picking up a SysV version of ps.

Is their something like /usr/ucb/ps or /usr/ucb/bin/ps , if yes, then you should be able to run ps -aux

You might try change the (u)limit settings for the root account, it may have an impact on what the process in question can allocate resource wise...

Sorry, I've never used SCO unix.

Rgds, Alex
Old 02 April 2003, 02:53 PM
  #12  
DrEvil
Scooby Regular
 
DrEvil's Avatar
 
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes on 0 Posts
Arrow

ignore the ulimit bit... misread your post...

hmmm.. perhaps lower the priority of the print daemon with 'nice'... read the man page carefully thou...
Old 02 April 2003, 08:03 PM
  #13  
orbv
Scooby Regular
 
orbv's Avatar
 
Join Date: Apr 2001
Location: Hants
Posts: 1,103
Likes: 0
Received 0 Likes on 0 Posts
Post

On bsd unix its 'ps aux' and not 'ps -aux'
Old 02 April 2003, 08:37 PM
  #14  
warbs
Scooby Regular
 
warbs's Avatar
 
Join Date: Apr 2002
Posts: 303
Likes: 0
Received 0 Likes on 0 Posts
Post

maybe a memory problem if you're getting "session forked" style errors.

Or it could be /var is full. When you login space is requiered under /var for things like wtmp/utmp. If the filesystem is full then this entry can't be created so some UNIX processes will fail - like login - this wouldn't explain existing sessions breaking.

The printer system will also use space under /var to spool stuff. if it's screwing up and either not deleting transient files or creating huge ones (like a user tries to print a binary document or something). Then var fills up, everything hangs, you reset the machine and magically /var is clean again till it happens again.

use :
"df" to check the filesystems capacity,
"du -s" to check which directory in a filesystem is taking space

If it's a memory problem then use "ps" to identify large processes.
"ps aux" (some ps commands don't use a - to force bsd mode).
If it's like Solaris then something like : "ps -e -o vsz,pid,args | sort -n" will order the processes in order of the amount of virtual memory they are using. Any huge processes will be obvious. You can then use "ps -e <PID>" to get more information about the process.

Finally what's on the console when the system freezes, can you log in as root on the console when it's hung, is the console switched on - some UNIX hangs when th console buffer fills if there is no console there ?

A bit vague I know but it's a long time since I logged into a SCO system, are the man pages installed ?

Cheers

Chris
Old 02 April 2003, 09:36 PM
  #15  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

The /var idea sounds like it could be it. I'm not at work at the moment, but will check when i get in in the morning.

The server boots into a GUI, which we normally have 4 console sessions open, 2 on each desktop (for various tasks). Attempting to issue any commands into these when the system goes **** up is majorly delayed. "Type a command, press enter, wait 5mins" sort of thing...

I'll check out the size of the /var partition tomorrow and report back.

Cheers,
Andy
Old 03 April 2003, 11:55 AM
  #16  
DrEvil
Scooby Regular
 
DrEvil's Avatar
 
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes on 0 Posts
Arrow

orbv - thats as maybe, but when running BSD compat commands on Sys5 (Solaris 2.x for example) you can use 'ps -aux'
Old 03 April 2003, 12:10 PM
  #17  
Rich B
Scooby Regular
 
Rich B's Avatar
 
Join Date: May 2002
Posts: 119
Likes: 0
Received 0 Likes on 0 Posts
Post

Also try doing ctrl-alt f1 or alt f1 (can't quite remember which) that will take you out of the gui to the proper console - you may be able to see any error messages on there.

Each function key will give you a different session, f1 being the console, f12 back to the GUI.

Rich
Old 03 April 2003, 12:46 PM
  #18  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

The var partition only has 1% space usage, so perhaps not that.

Next time it locks up I will try the function keys as above.

When its a bit quieter over here, we'll probably go round and take each printer in turn offline for a few mins and submit a job to it. Then see if it causes the problem. At least then we'd know which printer it is!!


Andy
Old 03 April 2003, 04:14 PM
  #19  
ragnarock2
Scooby Regular
 
ragnarock2's Avatar
 
Join Date: Jan 2003
Posts: 502
Likes: 0
Received 0 Likes on 0 Posts
Post

Remember the /var filesystem may be so small that a large print will fill it up, run the df on it when the system goes to ****!
Old 03 April 2003, 05:17 PM
  #20  
DrEvil
Scooby Regular
 
DrEvil's Avatar
 
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes on 0 Posts
Lightbulb

perhaps reconfigure the server with a separate f/s for /var/spool (if feasible)...
Old 03 April 2003, 06:46 PM
  #21  
Sheepsplitter
Scooby Regular
 
Sheepsplitter's Avatar
 
Join Date: Nov 2001
Posts: 1,072
Likes: 0
Received 0 Likes on 0 Posts
Post

Check you haven't run out of swap space.
The system could be thrashing!
Old 03 April 2003, 08:09 PM
  #22  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

Sheepsplitter:

As i say, not clued up on UNIX, how do I do that?


Andy
Old 03 April 2003, 08:49 PM
  #23  
warbs
Scooby Regular
 
warbs's Avatar
 
Join Date: Apr 2002
Posts: 303
Likes: 0
Received 0 Likes on 0 Posts
Post

How often is it hanging ?

If it's often, don't use the X Sessions on the console - just leave the console so that you can see a normal text login prompt.

According to google to get the console session active press : Ctrl-Sysreq and then H to get the home (console) session up.

This way if the Operating System is detecting a problem (like any full filesystem, network problem, memory defecit, etc.) Then you should see some information on the console when the error occurs.

If it hangs and there's no message - either you're looking at the wrong screen or it's a more subtle problem.

Have you looked in the system logs to see if anything is there ? According to google the system log is under : /usr/adm/syslog - if not try looking in /var/adm or /var/adm/log

Cheers

Chris



Old 03 April 2003, 09:01 PM
  #24  
Sheepsplitter
Scooby Regular
 
Sheepsplitter's Avatar
 
Join Date: Nov 2001
Posts: 1,072
Likes: 0
Received 0 Likes on 0 Posts
Post

See if you can use sar, vmstat or top to see how much free swap space you have.
If you have sar, run it in background mode and note if it dips just as you have the problem.
Old 03 April 2003, 09:55 PM
  #25  
SiDHEaD
Scooby Regular
Thread Starter
 
SiDHEaD's Avatar
 
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes on 0 Posts
Post

It hung this morning, last time was 2 days ago. Unfortunately I was in a meeting at the time so I was unable to try anything suggested. As usual our mainframe guy just reset it.

My boss and I are not at all happy about resetting it all the time, and if anything its beginning to look a bit of a joke to the users.

I will be sure to try it all next time

Andy
Old 03 April 2003, 10:08 PM
  #26  
ragnarock2
Scooby Regular
 
ragnarock2's Avatar
 
Join Date: Jan 2003
Posts: 502
Likes: 0
Received 0 Likes on 0 Posts
Post

on AIX you can use lsps -a to view the swap sapce - try looking in man pages for lsps? Or just type in lsps and if the command exists, it may give you the correct syntax
Related Topics
Thread
Thread Starter
Forum
Replies
Last Post
dogmaul
Computer & Technology Related
17
03 May 2003 12:46 PM
Dracoro
Computer & Technology Related
7
07 April 2003 10:58 AM
Dracoro
Computer & Technology Related
9
25 February 2003 03:08 PM
SD
Computer & Technology Related
5
24 April 2002 10:29 AM



Quick Reply: UNIX help please



All times are GMT +1. The time now is 10:00 PM.