[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

IOC problem




Hi,

There was an incident that affects the target system that happened on 
Sunday morning, round 5:40am. The people on shift tried to move the target 
to one of the prescribed positions when they noticed that one of the 
actuators has gone to the limit of motion and none other was activated.
What happened is that the IOC has been in a hangup state when they tried 
to move the target, it is not crashed, it is still alive, but it takes 
action at a very slow pace. If one clicks an action button and the IOC is 
in this state it may take the action after one minute or five or whatever, 
but it eventually takes the action. When it takes an action for target 
motion in this state it activates a relay which in turn activates an 
actuator and the target starts to move, but because the IOC is next to 
dead it reads back the motion after a minute or a few minutes to check 
where the target is, and the target may well be at the limit of motion.
And this is what happened.
When this happens, the IOC hangup, everything freezes on the target 
computer screen, and it looks like the people on shift didn't pay too much 
attention to this. Green light for the heartbeat but no beat it is not a 
working state, it is a coma. I rebooted the IOC and eventually it 
recovered and it also moved the target as it was supposed to do it. 
The software expert is working on this. Her initial diagnosis about the 
problem is that the network driver that communicates with the IOC has a 
different clocking from the IOC clocking, a clocking system that has a 
microdelay incorporated that might be responsible for the hangups. She 
doesn't understand exactly how this happens. This IOC is the first one of 
its kind that she started using, it is a powerpc IOC. She doesn't have 
this problem with any of the old type IOCs she's been using. The advantage 
of this IOC is that it is very fast compared to the old types, she says. 
Switching to the old type IOC means rewriting the network driver and 
testing it, a long haul. Another solution is to reboot the IOC whenever 
this happens, with the understanding that it may happen quite frequently. 
On Sunday it happened again just four hours later, round 10 am. Mike Seely 
told me that previous running experience in Hall C is that an IOC reboot 
happens once every 24h. Since rebooting the IOC requires beam off, 
according to our procedures, it may be at least annoying to 
counterproductive to relay on this solution only.
The software expert is working on an inprovement of the target motion 
so that we won't have an actuator moving to the limit of motion if the IOC 
hangs during a motion process or an operator who doesn't pay close 
attention to the state of the IOC triggers a target motion when the IOC is 
hanging. She is also implementing on our request a watchdog alarm for the 
IOC heartbeat. It also seems like these hangups won't trigger the HPH to 
go to the max power (1000 W).


Silviu