Copyright (c) Hyperion Entertainment and contributors.

Difference between revisions of "Troubleshooting Your Software"

From AmigaOS Documentation Wiki
Jump to navigation Jump to search
m (→‎Use debugging tools: Added internal and external links for GDB, debug kernel, db101 and memguard.)
Line 135: Line 135:
 
=== Use debugging tools ===
 
=== Use debugging tools ===
   
A variety of debugging tools are available to help locate faulty code. Some of these are source level and other debuggers, crash interceptors, vital watchdog and memory invalidation tools like ''MemGuard''.
+
A variety of debugging tools are available to help locate faulty code. Some of these are source level and other debuggers (e.g. [[GDB_for_Beginners|GDB]] or [http://os4depot.net/index.php?function=showfile&file=development/debug/db101.lha db101]), crash interceptors, vital watchdog and memory invalidation tools like [http://os4depot.net/index.php?function=showfile&file=development/debug/memguard.lha ''MemGuard''].
  +
  +
AmigaOS4's [[Debug Kernel]] is probably the first step of practical debugging.
   
 
== A Final Word About Testing ==
 
== A Final Word About Testing ==

Revision as of 17:42, 18 February 2013

Troubleshooting Your Software

Many Amiga programming errors have classic symptoms. This guide will help you to eliminate or avoid these problems in your software.

Typical Problems

Audio: Corrupted Samples

On Classic Amiga hardware, the bit data for audio samples must be in Chip RAM. Check your compiler manual for directives or flags which will place your audio sample data in Chip RAM. Or dynamically allocate Chip RAM and copy or load the audio sample there.

Character Input/Output Problems

RAWKEY users must be aware that RAWKEY codes can be different letters or symbols on national keyboards. If you need to use RAWKEY, run the codes through RawKeyConvert() (see the Intuition Mouse and Keyboard) to get proper translation to correct ASCII codes. Improper display or processing of high-ASCII international characters can be caused by incorrect tolower()/toupper(), or by sign extension of character values when switched on or assigned into larger size variables. Use unsigned variables such as UBYTE (not char) for strings and characters whenever possible. Internationally correct string functions are provided in the utility.library.

CLI Error Message Problems

Improper error messages are caused by calling exit(n) with an invalid or missing return value n. Assembler programmers using startup code should jump to the startup code's _exit with a valid return value on the stack. Programs without startup code should return with a valid value in D0. Valid return values such as RETURN_OK, RETURN_WARN, RETURN_FAIL are defined in <dos/dos.h> and <dos/dos.i>. Values outside of these ranges (-1 for instance) can cause invalid CLI error messages such as "not an object module". Useful hint - if your program is called from a script, your valid return value can be conditionally branched on in the script (i.e., call program, then perform actions based on IF WARN or IF NOT WARN). RETURN_FAIL will cause the script to stop if a normal FAILAT value is being used in script.

CLI Won't Close on RUN

A CLI can't close if a program has a Lock() on the CLI input or output stream ("*"). If your program is "RUN >NIL:" from a CLI, that CLI should be able to close unless your code or your compiler's startup code explicitly opens "*".

Crashes and Memory Corruption

Memory corruption, address errors, and illegal instruction errors are generally caused by use of an uninitialized, incorrectly initialized, or already freed/closed pointer or memory. You may be using the pointer directly, or it may be one that you placed (or forgot to place) in a structure passed to system calls. Or you may be overwriting one of your arrays, or accidentally modifying or incrementing a pointer later used in a free/close. Be sure to test the return of all open/allocation type functions before using the result, and only close/free things that you successfully opened/allocated. Use watchdog/torture utilities such as MemGuard to catch use of uninitialized pointers or freed memory, and other memory misuse problems. Use the debugging tool TNT to get additional debugging information instead of a Software Error requester. You may also be overflowing your stack - your compiler's stack checking option may be able to catch this. Cut stack usage by dynamically allocating large structures, buffers, and arrays which are currently defined inside your functions.

Corruption or crashes can also be caused by passing wrong or missing arguments to a system call (for example SetAPen(3) or SetAPen(win,3), instead of SetAPen(rp,3)). C programmers should use function prototypes to catch such errors. If using short integers be sure to explicitly type long constants as long (e.g., 42L). (For example, with short ints, 1 << 17 may become zero). If corruption is occurring during exit, use printf() (or kprintf(), etc.) with Delay(n) to slow down your cleanup and broadcast each step. See <exec/alerts.h> for Amiga-specific alert numbers. Also see "Crashes - After Exit" below.

After Exit

If this only happens when you start your program from Workbench, then you are probably UnLocking() one of the WBStartup message wa_Locks, or UnLocking() the Lock() returned from an initial CurrentDir() call. If you CurrentDir(), save the lock returned initially, and CurrentDir() back to it before you exit. Only UnLock() locks that you created.

If you are crashing from both Workbench and CLI, and you are only crashing after exit, then you are probably either freeing/closing something twice, or freeing/closing something your did not actually allocate/open, or you may be leaving an outstanding device I/O request or other wakeup request. You must abort and WaitIO() any outstanding I/O requests before you free things and exit (see the Autodocs for your device, and for Exec AbortIO() and WaitIO()). Similar problems can be caused by deleting a subtask that might be in a WaitTOF(). Only delete subtasks when you are sure they are in a safe state such as Wait(0L).

Subtasks, Interrupts

If part of your code runs on a different stack or the system stack, you must turn off compiler stack-checking options. If part of your code is called directly by the system or by other tasks, you must use long code/long data or use special compiler flags or options to assure that the correct base registers are set up for your subtask or interrupt code.

Window Related

Be careful not to CloseWindow() a window during a while(msg=GetMsg(...)) loop on that window's port (next GetMsg() would be on freed pointer). Also, use ModifyIDCMP(NULL) with care, especially if using one port with multiple windows. Be sure to ClearMenuStrip() any menus before closing a window, and do not free items such as dynamically allocated gadgets and menus while they are attached to a window. Do not reference an IntuiMessage's IAddress field as a structure pointer of any kind before determining it is a structure pointer (this depends on the Class of the IntuiMessage). If a crash or problem only occurs when opening a window after extended use of your program, check to make sure that your program is properly freeing up signals allocated indirectly by CreatePort(), OpenWindow() or ModifyIDCMP().

Workbench Only

If you are crashing near the first DOS call, either your stack is too small or your startup code did not GetMsg() the WBStartup message from the process message port. If your program crashes during execution or during your exit procedure only when started from Workbench, and your startup opens no stdio window or "NIL:" file handles for WB programs, then make sure you are not writing anything to stdout (printf(), etc.) when started from WB (argc==0). See also "Crashes - After Exit".

Device-related Problems

Device-related problems may caused by: improperly initialized port or I/O request structures (use CreatePort() and CreateExtIO()); use of a too-small I/O request (see the device's <.h> files and Autodocs for information on the required type of I/O request); re-use of an I/O request before it has returned from the device (use the debugging tool IO_Torture to catch this); failure to abort and wait for an outstanding device request before exiting; waiting on a signal/port/message allocated by a different task.

Disk Icon Won't Go Away

This occurs when a program leaves a Lock() on one or more of a disk's files or directories. A memory loss of exactly 24 bytes is usually Lock() which has not been UnLocked().

DOS-related Problems

In general, any dos.library function which fills in a structure for you (for example, Examine()), requires that the structure be longword aligned. In most cases, the only way to insure longword alignment in C is to dynamically allocate the structure. Unless documented otherwise, dos.library functions may only be called from a process, not from a task. Also note that a process's pr_MsgPort is intended for the exclusive use of dos.library. (The port may be used to receive a WbStartup message as long as the message is GetMsg()'d from the port before DOS is used.

Fails only on machines with Fast RAM

Data and buffers which will be accessed directly by the custom chips must be in Chip RAM. This includes bitplanes (use OpenScreen() or AllocRaster()), audio samples, trackdisk buffers, and the graphic image data for sprites, pointers, bobs, images, gadgets, etc. Use compiler or linker flags to force Chip RAM loading of any initialized data needing to be in Chip RAM, or dynamically allocate Chip RAM and copy any initialization data there.

Graphics: Corrupted Images

On classic Amiga hardware, the bit data for graphic images such as sprites, pointers, bobs, and gadgets must be in Chip RAM. Check your compiler manual for directives or flags which will place your graphic image data in Chip RAM. Or dynamically allocate Chip RAM and copy them there.

Hangs

One Program Only

Program hangs are generally caused by Wait()ing on the wrong signal bits, on the wrong port, on the wrong message, or on some other event that will never occur. This can occur if the event you are waiting on is not coming, or if one task tries to Wait(), WaitPort(), or WaitIO() on a signal, port, or window that was created by a different task. Both WaitIO() and WaitPort() can call Wait(), and you cannot Wait() on another task's signals. Hangs can also be caused by verify deadlocks. Be sure to turn off all Intuition verify messages (such as MENUVERIFY) before calling AutoRequest() or doing disk access.

Whole System

This is generally caused by a Disable() without a corresponding Enable(). It can also be caused by memory corruption, especially corruption of low memory. See "Crashes and Memory Corruption".

Memory Loss

First determine that your program is actually causing a memory loss. It is important to boot with a standard Workbench because a number of third party items such as some background utilities, shells, and network handlers dynamically allocate and free pieces of memory. Open a Shell for memory checking, and a Shell or Workbench drawer for starting your program. Arrange windows so that all are accessible, and so that no window rearrangement will be needed to run your program.

In the Shell, type Avail FLUSH<RET> several times (2.0 option). This will flush all non-open disk-loaded fonts, devices, etc., from memory. Note the amount of free memory. Now without rearranging any windows, start your program and use all of your program features. Exit your program, wait a few seconds, then type Avail FLUSH<RET> several times. Note the amount of free memory. If this matches the first value you noted, your program is fine, and is not causing a memory loss.

If memory was actually lost, and your program can be run from CLI or Workbench, then try the above procedure with both methods of starting your program. Note that under 2.0, there will be a slight permanent (until reboot) memory usage of about 672 bytes when the audio.device or narrator.device is first opened. See "Memory Loss - CLI Only" and "Memory Loss - WB Only" if appropriate. If you lose memory from both WB and CLI, then check all of the open/alloc/get/create/lock type calls in your code, and make sure that there is a matching close/free/delete/unlock type call for each of them (note - there are a few system calls that have or require no corresponding free - check the Autodocs). Generally, the close/free/delete/unlock calls should be in opposite order of the allocations.

If you are losing a fixed small amount of memory, look for a structure of that size in the Structure Offsets listing in the SDK. For example, a loss of exactly 24 bytes is probably a Lock() which has not been UnLocked(). If you are using ScrollRaster(), be aware that ScrollRaster() left or right in a Superbitmap window with no TmpRas will lose memory under 1.3 (workaround - attach a TmpRas). If you lose much more memory when started from Workbench, make sure your program is not using Exit(n). This would bypass startup code cleanups and prevent a Workbench-loaded program from being unloaded. Use exit(n) instead.

CLI Only

Make sure you are testing in a standard environment. Some third-party shells dynamically allocate history buffers, or cause other memory fluctuations. Also, if your program executes different code when started from CLI, check that code and its cleanup. And check your startup.asm if you wrote your own.

CTRL-C Exit Only

You have Amiga-specific resources opened or allocated and you have not disabled your compiler's automatic Ctrl-C handling (causing all of your program cleanups to be skipped). Disable the compiler’s Ctrl-C handling and handle Ctrl-C (SIGBREAKF_CTRL_C) yourself.

During Execution

A continuing memory loss during execution can be caused by failure to keep up with voluminous IDCMP messages such as MOUSEMOVE messages. Intuition cannot re-use IDCMP message blocks until you ReplyMsg() them. If your window's allotted message blocks are all in use, new sets will be allocated and not freed till the window is closed. Continuing memory losses can also be caused by a program loop containing an allocation-type call without a corresponding free.

Workbench Only

Commonly, this is caused by a failure of your code to unload after you exit. Make sure that your code is being linked with a standard correct startup module, and do not use the Exit(n) function to exit your program. This function will bypass your startup code's cleanup, including its ReplyMsg() of the WBStartup message (which would signal Workbench to unload your program from memory). You should exit via either exit(n) where n is a valid DOS error code such as RETURN_OK (<dos/libraries.h>), or via final "}" or return. Assembler programmers using startup code can JMP to _exit with a long return value on stack, or use the RTS instruction.

Menu Problems

A flickering menu is caused by leaving a pixel or more space between menu subitems when designing your menu. Crashing after browsing a menu (looking at menu without selecting any items) is caused by not properly handling MENUNULL select messages. Multiple selection not working is caused by not handling NextSelect properly. See the Intuition Menus chapter.

Out-of-Sync Response to Input

Caused by failing to handle all received signals or all possible messages after a Wait() or WaitPort() call. More than one event or message may have caused your program to awakened. Check the signals returned by Wait() and act on every one that is set. At ports which may have more than one message (for instance, a window's IDCMP port), you must handle the messages in a while(msg=GetMsg(...)) loop.

Performance Loss in Other Processes

This is often caused by a one program doing one or more of the following: busy waiting or polling; running at a higher priority; doing lengthy Forbids(), Disables(), or interrupts.

Windows

Borders Flicker after Resize

Set the NOCAREREFESH flag. Even SMART_REFRESH windows may generate refresh events if there is a sizing gadget. If you don't have specific code to handle this, you must set the NOCAREREFRESH flag. If you do have refresh code, be sure to use the Begin()/EndRefresh() calls. Failure to do one or the other will leave Intuition in an intermediate state, and slow down operation for all windows on the screen.

Visual Problems

Many visual problems in windows can be caused by improper font specification or improper setting of gadget flags. See "Release 4 Compatibility" for detailed information on common problems.

General Debugging Techniques

Narrow the search

Use methodical testing procedures, and debugging messages if necessary, to locate the problem area. Low level code can be debugged using IExec->DebugPrintf() serial messages. Check the initial values, allocation, use, and freeing of all pointers and structures used in the problem area. Check that all of your system and internal function calls pass correct initialized arguments, and that all possible error returns are checked for and handled.

Isolate the problem

If errors cannot be found, simplify your code to the smallest possible example that still functions. Often you will find that this smallest example will not have the problem. If so, add back the other features of your code until the problem reappears, then debug that section.

Use debugging tools

A variety of debugging tools are available to help locate faulty code. Some of these are source level and other debuggers (e.g. GDB or db101), crash interceptors, vital watchdog and memory invalidation tools like MemGuard.

AmigaOS4's Debug Kernel is probably the first step of practical debugging.

A Final Word About Testing

Test your program with memory watchdog and invalidation tools on a wide variety of systems and configurations. Programs with coding errors may appear to work properly on one or more configurations, but may fail or cause fatal problems on another. Test all of your program functions on every machine.

Test all error and abort code. A program with missing error checks or unsafe cleanup might work fine when all of the items it opens or allocates are available, but may fail fatally when an error or problem is encountered. Try your code with missing files, filenames with spaces, incorrect filenames, cancelled requesters, Ctrl-C, missing libraries or devices, low memory, missing hardware, etc.

Test all of your text input functions with high-ASCII characters (such as the character produced by pressing Alt-F then "A"). Note that RAWKEY codes can be different keyboard characters on national keyboards (higher levels of keyboard input are automatically translated to the proper characters).

Write good code. Test it. Then make it great.