Personal tools
You are here: Home Exp Spec Info musr troubleshooting_mvme.html

troubleshooting_mvme.html

by admin last modified Apr 11, 2007 03:23 PM


Troubleshooting the MVME cpu boards


Troubleshooting MUSR DAQ boards

Troubleshooting in general

MUSR DAQ troubleshooting :

Introduction

Each beamline (M15, M20, M9B) has two MVME cpu boards in use for TD-MUSR. A MVME162 is used to run CAMP slow controls, and an MVME2305 power-pc board is used for the VME Histogram Memory and to read out the VMEIO and VME TDC (clock). These modules all reside in the same VME crate. Both boards require their DAQ Linux host (midm15, midm20 or midm9b)  to be up and accessible over the ethernet. The MVME2305 boots from this computer. The MVME162 has the files needed to bootup stored in an internal RAM disk, but still requires the DAQ computer to access CAMP files.

VME access fails

Data acquisition program reports failure due to VME access.

Check that the VME crate is powered up.

The VME crate is located in the blue racks in the electronics area adjacent to the counting rooms.

Ping the boards from the Linux box to see if they are up

e.g. for M15 (from e.g. midm15)
> ping m15hmvw
> ping m15vw
Substitute your beamline (m20, m9b) for "m15".

Check the MVME console display

Bring up a consol display to display the consol messages of the MVME board.

If either MVME board is booted up, pressing carriage return will result in the prompt "->", and typing "i" at the prompt will result in a list of tasks.

If the MVME2305 is booted up, there may also be a v680 display active. If there is no active consol display, or the board does not respond to carriage return,  reboot the MVME board. Note that rebooting the MVME2305 will cause any data in the Histogram Memory to be lost.

To Reboot MVME cpu(s)

If the MUSR User Interface is running, it may ask you if you want to reboot the MVME162 or MVME2306 (CAMP) boards. If so, answer yes.  Otherwise,

to reboot the MVME board(s), do one of the following:

  • If the board IS responding (i.e. you have a prompt "->" on the console display) 
type "reboot" at the prompt, i.e.
-> reboot
  • If the board is NOT responding,
    • type "Cntrl X" on the console display
    •  Press the reset button labelled "RST" or "RESET" on the front panel of either MVME cpu board in the VME crate
    • Cycling the power on the VME crate will also cause both MVME boards to reboot.
The last two options (Hard reboots) will reboot both VME boards.

Note that the MVME162 boots up very quickly, while the MVME2305 (PPC)  can take several minutes. You should see a number of messages appear on the consol, including a large VxWorks banner.

  Linux access programs

The MUSR User Interface will detect whether the MVME162 (CAMP) and MVME2306 (PPC) boards are up and running.  If not, it will write error messages and  may attempt to reboot the MVME board(s).
If the boards appear to be up and running, the camp/camac board (MVME162) should be accessible by the camp program running under linux on the host (midm15/midm9b/midm20).
$ camp
Rebooting the MVME2306 (PPC) will cause the DAQ frontend program to start up automatically. Using the MUSR User Interface, click on Extras....VME Utility to check on the frontend.

MVME board will not boot up

Nothing at all on consol display

If you see nothing at all on the  consol display    after rebooting, check whether both boards are failing to respond.  If the boards respond to "ping" the consols may be disconnected or seyon/minicom (serial display programs) not working properly, or the VME crate may be switched off. If responding to "ping"  telnet in from an xterm e.g.
Tue> telnet m9bvw
This is an alternative way of getting to the console(s), however it is dependent on the network being up, unlike the console connection (direct serial line).


General troubleshooting with MVME DAQ boards

Error messages on the consol display

You may see a list of the boot-up parameters, followed by
Attaching network interface dc0 ... done
Attaching network interface lo0 ... done
Loading...
Error loading file : Error number ...
Can't load boot file

[VxWorks Boot]:
This indicates a problem with the network connection. If you don't get this far, it may indicates a problem with the boot-up parameters. Note that for the MVME2305 (Hm/TDC) board there can be delays of several minutes even when the network is operational.

Note that the MVME162 (camp) board boots up much faster due to its internal ram disk. It does not need the network to access its bootup or startup files - only for access to various camp initialization files.

There may also be problems with the startup file, in which case you would see the large VxWorks banner (indicating that the VxWorks system has been loaded) before messages about file not found. Check that the boot-up parameters are correct.

Check the host is up and running

Check the host machine is up & accessible over the network. If you have just rebooted, wait until it comes up, then reboot the MVME boards.

Check the network is responding

Test your network connections using "ping".

NOTE: For MUSR DAQ systems, occasionally the MVME162 (CAMP) board is accessible over the network, but the MVME2305 is not. The MVME162 board runs on Thin Ethernet cable whereas the MVME2305 runs on 100baseT, needing an extra conversion box. Powering on/off this box may solve the problem.

MVME boot-up parameters

Check the the boot-up parameters are correct. If they have become corrupted, re-enter them and reboot.

Problems with the Startup file

There may be a problem with the startup filename. In this case,  a message will appear after a delay, followed by the consol prompt. Check the startup filename is correct. Change the bootup params to remove the startup file and reboot.
Then for e.g. CAMP boards enter either
< /ram/stcamplx.m9b 
       to boot from the RAM

or

cd "musr/vw"
< stcamp_lx.m9b
      to boot from the linux host

Startup files fails, no consol prompt

If the startup file fails, it is possible for the board to go into limbo, i.e. the consol prompt cannot be regained by any means. The procedure to recover from this is detailed here. It is recommended you get help from a DAQ expert before trying this. After recovering from this problem, always load the startup file by hand (see above) for debugging.

Corrupt image on the ram disk (CAMP MVME162 only)

It has been known for one or more of the object modules on the ram disk to become corrupt, and to cause a failure while the startup file is being executed. This often causes the "board in limbo " problem above. If the startup file fails after loading a module from the ram disk (CAMP MVME 162 only), alter the startup file to load (another copy) of the file from the host to diagnose the problem. If it is necessary, update the MVME162 ram disk with one or more files.

Board failure

If it is necessary to use a spare board, get help from a DAQ expert. Refer to how to update the MVME162 ram disk.
Document Actions