UNIX help please
#1
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
Our mainframe at work keeps freezing up. When it happens, it will respond eventually but usually takes a good few minutes. It normally ends up in us having to reboot it, as we aren't overly clued-up on unix.
I've read that it may be caused by the print daemon using excessive cpu cycles when a printer goes offline.
Any ideas what it could be/how to fix?
Cheers
Andy
I've read that it may be caused by the print daemon using excessive cpu cycles when a printer goes offline.
Any ideas what it could be/how to fix?
Cheers
Andy
#2
Hello
If you can actually get a telnet session up, using the "top" command will show you the processes that are currently using the CPU.
This might give you a starting point.
Steve.
If you can actually get a telnet session up, using the "top" command will show you the processes that are currently using the CPU.
This might give you a starting point.
Steve.
#3
there is no way of telling from just reading the post. See if you can eliminate what processes are and arn't working correctly when it locks.
If your not clued up on unix, then its probable that you are gonna have to get somebody in to take a look at it.
Here are a few analysis commands if you do wanna attack it yourself:
vmstat is a program that allows one to get a quick look at statistics about the memory, CPU, and disk subsystems.
mpstat is a program that is available on Solaris and Linux that allows one to see statistics about processor utilization.
sar, sa, lastcomm, and last all allow one to examine historical data as well as more recent events on the system.
ps, which stands for "process status", is used to show processes executing on the system and information about them.
lsof, or list open files, shows all the files that the operating system currently has open.
readelf displays in detail the ELF (Executable Linking and Format) headers of a binary file.
ldd reads the contents of the ELF headers that show what shared object libraries the executable depends on.
There are litrally thousands of commands and thousands of things that could be going wrong. However you do seem to have a serious problem in the OS. Were talking about real operating systems here, not Micro$haft windows....heh. They are not supposed to crash like windows is. I think its a built in feature in windows..lol.
*spelling corrections*
[Edited by Gedi - 4/2/2003 12:50:41 PM]
If your not clued up on unix, then its probable that you are gonna have to get somebody in to take a look at it.
Here are a few analysis commands if you do wanna attack it yourself:
vmstat is a program that allows one to get a quick look at statistics about the memory, CPU, and disk subsystems.
mpstat is a program that is available on Solaris and Linux that allows one to see statistics about processor utilization.
sar, sa, lastcomm, and last all allow one to examine historical data as well as more recent events on the system.
ps, which stands for "process status", is used to show processes executing on the system and information about them.
lsof, or list open files, shows all the files that the operating system currently has open.
readelf displays in detail the ELF (Executable Linking and Format) headers of a binary file.
ldd reads the contents of the ELF headers that show what shared object libraries the executable depends on.
There are litrally thousands of commands and thousands of things that could be going wrong. However you do seem to have a serious problem in the OS. Were talking about real operating systems here, not Micro$haft windows....heh. They are not supposed to crash like windows is. I think its a built in feature in windows..lol.
*spelling corrections*
[Edited by Gedi - 4/2/2003 12:50:41 PM]
#4
Scooby Regular
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes
on
0 Posts
Top isn't a standard unix util on all unix flavours - usually installed specify by the system admins as it is shareware.
Could you tell us what sort of system it is?
ie. Sun, HP, SGI, IBM, Fujitsu-Siemens?
Could you tell us what sort of system it is?
ie. Sun, HP, SGI, IBM, Fujitsu-Siemens?
#5
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
It doesnt have TOP on, I once downloaded it and tried to install it but the installer kept falling over.
It's SCO unixware 7.1.
I'm quite good with linux but things you expect to be there etc arent on unix.
The trouble with all the commands suggested are we can't really execute them when the problem arises. You can't telnet in (it says session forked) and if you were logged in from before it doesn't respond to commands.
Typing them into a console on the server doesn't fair much better - with it taking a few mins to actually execute the command.
I suspect the version of Unixware on the server is out of date, but its on there and i've just gotta deal with that.
I really feel it has something to do with the printer daemon/spooler.
Andy
It's SCO unixware 7.1.
I'm quite good with linux but things you expect to be there etc arent on unix.
The trouble with all the commands suggested are we can't really execute them when the problem arises. You can't telnet in (it says session forked) and if you were logged in from before it doesn't respond to commands.
Typing them into a console on the server doesn't fair much better - with it taking a few mins to actually execute the command.
I suspect the version of Unixware on the server is out of date, but its on there and i've just gotta deal with that.
I really feel it has something to do with the printer daemon/spooler.
Andy
#7
Why not disable the print daemon, and see if it still goes belly up? (i know it's a pain if people need prints straight away etc - but will prove your theory right, and give you an angle to attack from!!!)
Trending Topics
#8
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
vmstat doesnt seem to exist on our system.
Disabling the print daemon is not really an option due to the amount of jobs being processed and the intermittency of the fault.
Andy
Disabling the print daemon is not really an option due to the amount of jobs being processed and the intermittency of the fault.
Andy
#9
Scooby Regular
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes
on
0 Posts
Does 'sar' exist on the system? If yes, set that up.
Alternatively, you can get some info on the processes running with:
ps -aux
(I'm assuming BSD format commands on SCO)
You can see how much CPU time each process uses..
Alternatively, you can get some info on the processes running with:
ps -aux
(I'm assuming BSD format commands on SCO)
You can see how much CPU time each process uses..
#10
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
As a Linux person, i'd tried ps -axu, but it doesnt understand the x part and u is "by userid" - so no use either. I wouldn't say the information given on -a was helpful, as it doesnt even give usernames etc.
Sar does exist, but when the system dies, sar just tells us what we already knew - the CPU is 0% free.
I'm thinking it might be the "Shivaport Atom" boxes which we host a couple of the dotmatrix printers off. I think they are a kind of terminal server, unfortunately as it appears that the printers to blame are on here I can't just edit their interface script and put in a sleep command
Cheers
Andy
Sar does exist, but when the system dies, sar just tells us what we already knew - the CPU is 0% free.
I'm thinking it might be the "Shivaport Atom" boxes which we host a couple of the dotmatrix printers off. I think they are a kind of terminal server, unfortunately as it appears that the printers to blame are on here I can't just edit their interface script and put in a sleep command
Cheers
Andy
#11
Scooby Regular
Join Date: Oct 2000
Location: Surrey, UK
Posts: 8,384
Likes: 0
Received 0 Likes
on
0 Posts
Right, well based on that you are picking up a SysV version of ps.
Is their something like /usr/ucb/ps or /usr/ucb/bin/ps , if yes, then you should be able to run ps -aux
You might try change the (u)limit settings for the root account, it may have an impact on what the process in question can allocate resource wise...
Sorry, I've never used SCO unix.
Rgds, Alex
Is their something like /usr/ucb/ps or /usr/ucb/bin/ps , if yes, then you should be able to run ps -aux
You might try change the (u)limit settings for the root account, it may have an impact on what the process in question can allocate resource wise...
Sorry, I've never used SCO unix.
Rgds, Alex
#14
maybe a memory problem if you're getting "session forked" style errors.
Or it could be /var is full. When you login space is requiered under /var for things like wtmp/utmp. If the filesystem is full then this entry can't be created so some UNIX processes will fail - like login - this wouldn't explain existing sessions breaking.
The printer system will also use space under /var to spool stuff. if it's screwing up and either not deleting transient files or creating huge ones (like a user tries to print a binary document or something). Then var fills up, everything hangs, you reset the machine and magically /var is clean again till it happens again.
use :
"df" to check the filesystems capacity,
"du -s" to check which directory in a filesystem is taking space
If it's a memory problem then use "ps" to identify large processes.
"ps aux" (some ps commands don't use a - to force bsd mode).
If it's like Solaris then something like : "ps -e -o vsz,pid,args | sort -n" will order the processes in order of the amount of virtual memory they are using. Any huge processes will be obvious. You can then use "ps -e <PID>" to get more information about the process.
Finally what's on the console when the system freezes, can you log in as root on the console when it's hung, is the console switched on - some UNIX hangs when th console buffer fills if there is no console there ?
A bit vague I know but it's a long time since I logged into a SCO system, are the man pages installed ?
Cheers
Chris
Or it could be /var is full. When you login space is requiered under /var for things like wtmp/utmp. If the filesystem is full then this entry can't be created so some UNIX processes will fail - like login - this wouldn't explain existing sessions breaking.
The printer system will also use space under /var to spool stuff. if it's screwing up and either not deleting transient files or creating huge ones (like a user tries to print a binary document or something). Then var fills up, everything hangs, you reset the machine and magically /var is clean again till it happens again.
use :
"df" to check the filesystems capacity,
"du -s" to check which directory in a filesystem is taking space
If it's a memory problem then use "ps" to identify large processes.
"ps aux" (some ps commands don't use a - to force bsd mode).
If it's like Solaris then something like : "ps -e -o vsz,pid,args | sort -n" will order the processes in order of the amount of virtual memory they are using. Any huge processes will be obvious. You can then use "ps -e <PID>" to get more information about the process.
Finally what's on the console when the system freezes, can you log in as root on the console when it's hung, is the console switched on - some UNIX hangs when th console buffer fills if there is no console there ?
A bit vague I know but it's a long time since I logged into a SCO system, are the man pages installed ?
Cheers
Chris
#15
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
The /var idea sounds like it could be it. I'm not at work at the moment, but will check when i get in in the morning.
The server boots into a GUI, which we normally have 4 console sessions open, 2 on each desktop (for various tasks). Attempting to issue any commands into these when the system goes **** up is majorly delayed. "Type a command, press enter, wait 5mins" sort of thing...
I'll check out the size of the /var partition tomorrow and report back.
Cheers,
Andy
The server boots into a GUI, which we normally have 4 console sessions open, 2 on each desktop (for various tasks). Attempting to issue any commands into these when the system goes **** up is majorly delayed. "Type a command, press enter, wait 5mins" sort of thing...
I'll check out the size of the /var partition tomorrow and report back.
Cheers,
Andy
#17
Also try doing ctrl-alt f1 or alt f1 (can't quite remember which) that will take you out of the gui to the proper console - you may be able to see any error messages on there.
Each function key will give you a different session, f1 being the console, f12 back to the GUI.
Rich
Each function key will give you a different session, f1 being the console, f12 back to the GUI.
Rich
#18
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
The var partition only has 1% space usage, so perhaps not that.
Next time it locks up I will try the function keys as above.
When its a bit quieter over here, we'll probably go round and take each printer in turn offline for a few mins and submit a job to it. Then see if it causes the problem. At least then we'd know which printer it is!!
Andy
Next time it locks up I will try the function keys as above.
When its a bit quieter over here, we'll probably go round and take each printer in turn offline for a few mins and submit a job to it. Then see if it causes the problem. At least then we'd know which printer it is!!
Andy
#23
How often is it hanging ?
If it's often, don't use the X Sessions on the console - just leave the console so that you can see a normal text login prompt.
According to google to get the console session active press : Ctrl-Sysreq and then H to get the home (console) session up.
This way if the Operating System is detecting a problem (like any full filesystem, network problem, memory defecit, etc.) Then you should see some information on the console when the error occurs.
If it hangs and there's no message - either you're looking at the wrong screen or it's a more subtle problem.
Have you looked in the system logs to see if anything is there ? According to google the system log is under : /usr/adm/syslog - if not try looking in /var/adm or /var/adm/log
Cheers
Chris
If it's often, don't use the X Sessions on the console - just leave the console so that you can see a normal text login prompt.
According to google to get the console session active press : Ctrl-Sysreq and then H to get the home (console) session up.
This way if the Operating System is detecting a problem (like any full filesystem, network problem, memory defecit, etc.) Then you should see some information on the console when the error occurs.
If it hangs and there's no message - either you're looking at the wrong screen or it's a more subtle problem.
Have you looked in the system logs to see if anything is there ? According to google the system log is under : /usr/adm/syslog - if not try looking in /var/adm or /var/adm/log
Cheers
Chris
#25
Scooby Regular
Thread Starter
Join Date: Apr 2002
Location: Birmingham
Posts: 9,196
Likes: 0
Received 0 Likes
on
0 Posts
It hung this morning, last time was 2 days ago. Unfortunately I was in a meeting at the time so I was unable to try anything suggested. As usual our mainframe guy just reset it.
My boss and I are not at all happy about resetting it all the time, and if anything its beginning to look a bit of a joke to the users.
I will be sure to try it all next time
Andy
My boss and I are not at all happy about resetting it all the time, and if anything its beginning to look a bit of a joke to the users.
I will be sure to try it all next time
Andy
Thread
Thread Starter
Forum
Replies
Last Post