Skip Headers

Oracle Intelligent Agent User's Guide
Release 9.2.0.2

Part Number A96676-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page Go to next page
View PDF

B
Troubleshooting

This chapter covers generic troubleshooting strategies in the event your Intelligent Agent does not function properly. The following topics are discussed:

Troubleshooting the Intelligent Agent

Under most circumstances, the Intelligent Agent itself requires very little in the way of configuration. In order to function properly, however, the Agent must be able to communicate with the managing host and managed services. If you are familiar with Oracle and your operating system, using the following abbreviated checklists will likely solve problems that can interfere with Agent operation.


Important:

Because the Agent is continuously being improved from one release to the next, it is strongly recommended that you upgrade to the latest Agent available for your particular server release. Oftentimes, this will resolve problems you may encounter with earlier versions of the Agent. In general, the lowest acceptable version of the Agent should be compatible with the highest version of the software being monitored.


Quick Checks

The following checklists cover the areas most likely to affect Agent operation. Agent troubleshooting checklists have been divided according to the two most common platforms on which the Agent is run: Windows NT and UNIX. The checklists are abbreviated and assume knowledge of both Oracle, the operating system, and related communication protocols. Specific troubleshooting procedures are covered in detail later in this chapter.

Quick Checks for the Windows NT Agent

If you are running an Agent on a Windows NT system, use the following checklist.

  1. Make sure the Agent service is up by checking the OracleAgent service in your control panel. If the Agent did not start up, use any of the following hints listed below.
  2. Check for messages written to the NT Event Viewer (under Administrative Tools) since this is where the NT Agent writes any problems associated with startup.
  3. Check if snmp_ro.ora, snmp_rw.ora, and services.ora are created by the Agent on startup. snmp_ro.ora and snmp_rw.ora are in the ORACLE_HOME\network\admin directory, and services.ora is in the ORACLE_HOME\network\agent directory.

    Compare the services listed with the services which are available on the machine. Please refer to Appendix A, "Agent Configuration Files" for valid sample files.

    If services are missing, check the following files for inconsistency or corruption:

    • listener.ora
    • tnsnames.ora
  4. If you have upgraded the database software and one of your machines is having problems with the generated snmp_ro.ora, snmp_rw.ora or services.ora file, follow the instructions below:
    1. Run catsnmp.sql under the INTERNAL or SYS account (NOT the dbsnmp account). Normally the catsnmp.sql script is run from catalog.sql upon database creation but since this is an upgrade, you may not have run this script yet. If the necessary scripts have not been run, the dbsnmp account is not created.


      Important:

      When installing a 9i Agent on a machine running a pre-9i database, you must re-run a version specific copy of catsnmp.sql which is located in the ORACLE_HOME/sysman/admin directory of the 9i Enterprise Manager client install. For example, if you are running Oracle 8.1.7 on a machine and then install a 9i Agent, you must re-run the catsnmp_8i.sql script after installing the new Agent. This operation must be performed for each pre-9i database serviced by this Agent.

      Do not run the 9i version of the catsnmp.sql script against pre-9i databases.


    2. If you have more than one SID or older SIDs referenced in the oratab file, run catsnmp.sql against each of the databases.
    3. The snmp_ro.ra file is a read only file which means that all changes to the file will be overwritten each time the Agent is started. You can make changes (if needed) to the snmp_rw.ora file.

    If you are trying to do backups, you must run backupts.sql with the dbsnmp/dbsnmp account.


    Warning:

    Do not modify the Tcl scripts (job and events scripts written in Tool Command Language) that come with the Agent. If you want to submit a job different from the ones that are predefined with the Agent, use the TCL Job where you are allowed to pass in arbitrary scripts and have the Agent execute them.


  5. Check that you do not have a system path set to external drives.

    The Agent is a service and runs by default as SYSTEM. It also needs DLLs from the ORACLE_HOME/BIN directory. If you need mapped drives in your path, you MUST NOT set them in the SYSTEM path.

    To set your own path:

    1. Move mapped drive paths out of SYSTEM path variables and into your own.
    2. Reboot to "unset" the systems path.
  6. If you still do not know why the Agent did not start, trace the Agent.
    1. Set the following variables in snmp_rw.ora:

      dbsnmp.trace_level=admin (or 16 if you want maximum information)

      dbsnmp.trace_directory=<any directory in which the Oracle user has write privileges>

      dbsnmp.trace_file=<name of the trace output file>

      dbsnmp.trace_unique=true/false (When TRUE, this parameter generates a unique trace file each time.)

    2. Restart the Agent.
    3. Check the log files located in the oracle_home/network/log directory.

      DBSNMP.LOG should show general Agent problems.

      DBSNMP.NOHUP should show any errors related to the Agent's "watchdog" dbsnmpwd process.

      NMICONF.LOG should show problems with auto-discovery.

  7. Ensure that the DNS Host entry is set to the node name in the listener.ora and tnsnames.ora files.
    1. Run the start button-> settings-> control panel-> network-> protocol-> TCP/IP properties.
    2. Check the DNS Host entry.
  8. Check if you have TCP/IP installed. TCP/IP is a requirement.

Quick Checks for UNIX Agents

If you are running an Agent on a UNIX system, use the following checklist.

  1. Make sure requisite system requirements are met:
    • TCP/IP is configured correctly
    • Valid ORATAB file for your Oracle environment.
  2. Check the Agent's status. Enter the command:
    agentctl status
    
    

    Alternatively, you can check to see if the Intelligent Agent is running by entering the following command:

    ps -eaf | grep dbsnmp
    
    
    

    These checks should show that a "dbsnmp" process is running and/or "dbsnmpwd" watchdog script is running.

  3. Check the ORACLE_HOME/network/log/dbsnmp*.log file for errors on UNIX. (nmiconf.log for discovery).
  4. Check that the Oracle user has write permissions to the following directories:
    • ORACLE_HOME/network/log
    • ORACLE_HOME/network/trace
    • ORACLE_HOME/network/agent
    • ORACLE_HOME/network/agent/jobout
  5. Check snmp_ro.ora, snmp_rw.ora, and services.ora for the entries created by the Agent. snmp_ro.ora and snmp_rw.ora are in the ORACLE_HOME/network/admin directory, and services.ora is in the ORACLE_HOME/network/agent directory. Alternatively, you can check the directory pointed to by the TNS_ADMIN environment variable.

    Compare the services listed with the services which are available on the machine. Please refer to Appendix A, "Agent Configuration Files" for valid sample files.

    If services are missing, check the following files for inconsistency or corruption:

    • listener.ora
    • tnsnames.ora
    • oratab
  6. If you have upgraded the database software and one of your machines is having problems with the generated snmp_ro.ora, snmp_rw.ora or services.ora file, follow the instructions below:
    1. Run catsnmp.sql under the INTERNAL or SYS account (NOT the dbsnmp account). Normally the catsnmp.sql script is run from catalog.sql upon database creation but since this is an upgrade, you may not have run this script yet. If the necessary scripts have not been run, the dbsnmp account is not created.


      Important:

      When installing a 9i Agent on a machine running a pre-9i database, you must re-run a version specific copy of catsnmp.sql which is located in the ORACLE_HOME/sysman/admin directory of the 9i Enterprise Manager client install. For example, if you are running Oracle 8.1.7 on a machine and then install a 9i Agent, you must re-run the catsnmp_8i.sql script after installing the new Agent. This operation must be performed for each pre-9i database serviced by this Agent.

      Do not run the 9i version of the catsnmp.sql script against pre-9i databases.


    2. If you have more than one SID or older SIDs referenced in the oratab file, run catsnmp.sql against each of the databases.
    3. The snmp_ro.ra file is a read only file which means that all changes to the file will be overwritten each time the Agent is started. You can make changes (if needed) to the snmp_rw.ora file.

    If you are trying to do backups, you must run backupts.sql with the dbsnmp/dbsnmp account.


    Warning:

    Do not modify the Tcl scripts (job and events scripts written in Tool Command Language) that come with the Agent. If you want to submit a job different from the ones that are predefined with the Agent, use the TCL Job where you are allowed to pass in arbitrary scripts and have the Agent execute them.


  7. If you have problems running the Intelligent Agent control utility (agentctl), set tracing for agentctl as follows:
    • agentctl.trace_level=admin
    • agentctl.trace_directory
    • agentctl.trace_file
  8. If you still do not know why the Agent did not start, trace the Agent by setting the following variables in snmp_rw.ora and then re-start the Agent.
    • dbsnmp.trace_level=admin (or 16 if you want more information)
    • dbsnmp.trace_directory=<any directory which the Oracle user can write to>
    • dbsnmp.trace_file=agent

Additional Checks

If after going through the quick checks your Intelligent Agent still is not functioning correctly, use the following section to cover other areas of Agent operation that are less probable causes of Agent operating problems. In addition, many of the steps in the checklists are covered in greater detail for those users who may be less familiar with Oracle and/or the operating system on which the Agent is running. The following questions are covered in this section:

Is TCP/IP configured and running correctly?

One of the most common problems that prevents the Agent from starting is TCP/IP configuration. To check whether your TCP/IP setup is configured correctly, issue the following commands at the command line:

Correcting TCP/IP configuration problems

  1. (Windows NT) Edit the WINNT\system32\drivers\etc\hosts and lmhosts files.

    If these files have never been used, only sample files will exist in the directory. Either rename or copy the .sam files to just the file name with no extension.

    (UNIX) Log in as root and edit the /etc/hosts file.

    NIS/DNS setups may require modification at the NIS/DNS level to correct TCP/IP configuration problems.

  2. Verify that the IP address and host information for each system are correct.

    Example: (Windows NT)

    HOSTS file: 
            122.111.111.111   myHost
    
    LMHOSTS file: 
            122.111.111.111   myHost  #PRE
    

    Note:

    You can also verify this information through the Windows NT Control Panel -> Network property sheet.


  3. Delete the $ORACLE_HOME\network\agent\*.q and services.ora files.


    Note:

    The *.q files are binary files containing information about Agent state and current jobs and events. Do not delete these files without first removing all jobs and events registered against this Agent. Because the hostname of the machine on which the Agent is running on is used to encrypt the 'Q' files, these files cannot be copied from one system to another.


  4. Delete the $ORACLE_HOME\network\admin\snmp_ro.ora and
        $ORACLE_HOME\network\admin\snmp_rw.ora files. 
    
    
  5. Restart the Agent.

Do the DNS Name and the Computer Name Match? (Windows NT)

Before Release 8.0.4 of the Agent, the NT Agent required the DNS Hostname and the Computer Name to be identical. These parameters can be checked/changed from the following Windows NT Control Panel property sheets.

To verify the computer name:

To verify the DNS Name:

Are the Oracle Net Configuration files correct?

In addition to proper network configuration, which allows nodes in your network to communicate, components of your Oracle environment must also be able to communicate with each other. Oracle Net provides the session and data communication medium between client machines and Oracle servers, or between Oracle servers. For this reason, proper Oracle Net configuration is a prerequisite for Agent communication. This section covers the most common problems that can occur when Agent communication fails.

Oracle Net configuration files are found in either the $TNS_ADMIN location or the $ORACLE_HOME/network/admin directory (both UNIX and NT).

Primary configuration files are:

See Appendix A, "Agent Configuration Files" for information and examples of the above files.

TNS_ADMIN variable usage during Agent Discovery

(UNIX)

All versions of the Unix discovery script allow the use of the TNS_ADMIN variable to locate input files (listener.ora and tnsnames.ora). Only Agent versions 7.3.4 and above correctly write the output files (snmp_ro.ora and snmp_rw.ora) into TNS_ADMIN, if set.

(Windows NT)

Beginning with version 8.0.5, the discovery script also reads the TNS_ADMIN value from the NT Registry.

The Agent also uses the TNS alias information found in the listener.ora file. The Agent does so even within an Oracle names environment. This behavior is intentional since an Oracle Names server may be temporarily unavailable and the Agent needs to be able to resolve names at all times. Check the following to make sure the local translation of the TNS alias takes place:

  1. Verify the listener.ora file has at least one TCP entry configured for the target.

    Do not activate the listener on port 1748, since Agent is listening on this port. (This is the reason you can use TNSPING against the Agent; TNSPING cannot differentiate between a listener and an Agent)

  2. Ensure that the DNS Host entry is set to the node name in the listener.ora and tnsnames.ora files.
    1. From the Windows NT menu bar, click Start -> Settings -> Control Panel
    2. Double-click on the Network icon
    3. Click on the Protocols tab
    4. Select TCP/IP Protocol and click Properties.
    5. Check the DNS Host entry.

Is Oracle Net functioning properly?

If your Oracle Net configuration is correct and you are still unable to contact the Agent, the next step is to determine whether services in your Oracle Net network can be reached. You can use the TNSPING utility on each database you want to access by entering the following at the command prompt:

tnsping <network service name>

If you can connect successfully from a client to a server (or from a server to a server) using TNSPING, the command will return an estimate of the round trip time (in milliseconds) it takes to reach the Oracle Net service. This indicates Oracle Net is functioning properly.

Pinging the Intelligent Agent

If you want to see if an Agent is up, do this:

$ tnsping (address=(protocol=tcp)(host=<<hostname>>)(port=1748))

Testing the Connections to the Agent

You can also use the TNSPING utility to test connections to the Agent:

$ tnsping "(address=(protocol=tcp)(host=<<hostname>>)(port=1748))"

If there is a successful connection, you will see a message similar to the following:

Attempting to contact (address=(protocol=tcp)(host=<<hostname>>)(port=1748)) 
OK (750 msec)

If a host which is not in the invited_nodes list is trying to contact the node, the following error will be shown in the output:

Attempting to contact (address=(protocol=tcp)(host=<<hostname>>)(port=1748)) 
TNS-12547: TNS: lost contact

Did the Agent startup successfully?

To check whether the Agent process is running issue the following command:

agentctl status

If the Agent did not start up, use any of the hints listed in the following table:

Table B-1 Troubleshooting an Agent that Will Not Start
UNIX Windows NT

Check the

$ORACLE_HOME/network/log/dbsnmp*.log

file for errors

Check for messages written to the NT Event Viewer (under Administrative Tools) since this is where the NT Agent writes any problems associated with startup.

Check the

$ORACLE_HOME/network/log/nmiconf.log

file for errors.

Check the

$ORACLE_HOME/network/log/nmiconf.log

file for errors.

Check that the Oracle user has write permissions to the following directory:

$ORACLE_HOME/network/log

Check the properties of the Agent Service to verify the OS account used by the Agent (default is 'System') Check that the Agent user has write permissions to the following directory:

$ORACLE_HOME/network/log

Check snmp_ro.ora, snmp_rw.ora, and services.ora for the entries created by the Agent. The snmp_ro and snmp_rw.ora files are located in the $ORACLE_HOME/network/admin directory, and services.ora is in the $ORACLE_HOME/network/agent directory.

Check if snmp_ro.ora, snmp_rw.ora, and services.ora are created by the Agent on startup.The snmp_ro and snmp_rw.ora files are located in the $ORACLE_HOME\network\admin directory, and services.ora is located in the $ORACLE_HOME\network\agent directory.

Compare the services listed with the services which are available on the machine. See Appendix A for valid sample files. If services are missing, check the following files for inconsistency or corruption:

  • listener.ora
  • tnsnames.ora
  • oratab

Compare the services listed with the services which are available on the machine. See Appendix A for valid sample files. If services are missing, check the following files for inconsistency or corruption:

  • listener.ora
  • tnsnames.ora

Check if you have TCP/IP installed. TCP/IP is a requirement. See Is TCP/IP configured and running correctly?

Check if you have TCP/IP installed. TCP/IP is a requirement. See Is TCP/IP configured and running correctly?

If you still do not know why the Agent did not start, turn on tracing. (see Tracing the Intelligent Agent)

Check that you DO NOT have a systems path variable containing external drives. The Agent is a service and runs by default as SYSTEM. It also needs DLLs from the $ORACLE_HOME/bin directory. If you need external mapped drives in your path, you MUST NOT set them in the SYSTEM path. To set your own path:

  1. Move external mapped drive paths out of systems path variable and into your own.
  2. Reboot to "unset" the systems path.

Check the $ORACLE_HOME/network/log/AGENTSRVC.log file. This file will show startup errors that occured when the Agent service was started.

If you still do not know why the Agent did not start, turn on tracing. For more information on setting up Agent tracing, see "Tracing the 9i Agent")

For both UNIX and Windows NT systems check:

$ORACLE_HOME/network/log/dbsnmp.nohup

Did the Agent connect to ALL instances on its node?

To test whether an Agent can connect to the database(s) it monitors on a given node, try connecting to each database with the following connect string:

dbsnmp/dbsnmp@address_list 

You must perform this test on the node where the Agent resides.

Is the Agent running with the correct permissions? (UNIX)

To verify whether the Agent has the correct user permissions, see Installing the Intelligent Agent on page 2-2 .

Does the OS user exist and does it have the correct permissions? (Windows NT)

An OS user needs to be specified for the node and must have the following permissions:

Are there errors?

(Windows NT) Check the NT EVENT VIEWER -> APPLICATIONS -> LOG for any errors starting the DBSNMP process.

(Windows NT and UNIX) Check the $ORACLE_HOME/network/log/nmiconf.log file for discovery errors.

For both UNIX and Windows NT systems check the following file for additional errors:

$ORACLE_HOME/network/log/dbsnmp.nohup 

Intelligent Agent Error Messages and Resolutions

The following error messages and resolution are categorized by operating system. Situations that apply to all systems are listed under "Generic Agent."

Generic Agent

'Failed to authenticate user' error when running a job

In order for the Agent to execute jobs on a managed node, the following conditions must be met:

'Login denied', 'Invalid username/password' messages in trace files

This usually happens if you have a databases prior to 7.3.3 on the machine. From V7.3.3 onwards, a script called CATSNMP.SQL is included in the CATALOG.SQL dictionary script. This script is responsible for creating the DBSNMP user the Agent needs to connect. Older databases did not have this script yet.

Verify if the user 'DBSNMP' exists. If not, run the catsnmp.sql script.


Important:

When installing a 9i Agent on a machine running a pre-9i database, you must re-run a version specific copy of catsnmp.sql which is located in the ORACLE_HOME/sysman/admin directory of the 9i Enterprise Manager client install. For example, if you are running Oracle 8.1.7 on a machine and then install a 9i Agent, you must re-run the catsnmp_8i.sql script after installing the new Agent. This operation must be performed for each pre-9i database serviced by this Agent.

Do not run the 9i version of the catsnmp.sql script against pre-9i databases.


'ORACLE_HOME does not exist' when starting the Agent

This message comes from the discovery script, nmiconf.tcl. Make sure you have $ORACLE_HOME environment variable set to the ORACLE_HOME of the Agent and re-start the Agent.

The Agent is only finding one database on a certain node

If you have more than one database on a single node, then you need to make sure that each instance has a unique database name by specifying one of the following:

No snmp_ro.ora and snmp_rw.ora are generated.

This error can occur if the Agent cannot write to $ORACLE_HOME\network\admin. Refer to the $ORACLE_HOME\network\log\nmiconf.log for errors. For more information on Agent startup problems, see "Did the Agent startup successfully?".

Not all services are discovered.

Check the services.ora file to determine which services have been discovered.

All the database services the Agent finds on a machine, must be defined in the relevant SQL*Net/Oracle Net configuration files. If the service(s) are not defined, service discovery will fail and, in the worst case, the Agent will hang or return errors.

Windows NT: Beginning with version 8.0.4, the Agent searches for service names that begin with 'OracleService' or 'OracleService<SID>'. Every entry beginning with 'OracleService' is considered to be a database running on this machine. Every SID encountered by the Agent must be defined in the relevant SQL*Net/Oracle Net files.

UNIX: The oratab file is used to determine which SIDs are present. For 7.3.3 Agents and earlier, discovery fails if it encounters a SID that is not accurate (like in a Developer 2000 environment). To work around this problem, the environment variable $ORATAB can be used to access an alternate oratab file which contains only the databases you wish the Agent to see.

For the remaining databases, check the oratab file, and the SQL*Net/Oracle Net files to see if these files exist and that all definitions are present. Make sure that all of the databases are listed in the listener.ora file. For more information, see "Are the Oracle Net Configuration files correct?" and "Is Oracle Net functioning properly?" .

'Invalid service name' while registering a job or event. (Agent-specific error)

OR

'File operation error' while registering a job or event. (OS-specific error)

These errors are usually seen when the services on the Enterprise Manager Console and the services discovered by the Agent are out of sync. For example, if you have an event registered against TESTDB and someone changes the name of the database to PRODDB, that Agent and Console are out of sync.

To fix this, start by removing all job and event registrations from this service and dropping the node where the services exist from the console. Rediscover the node from the console using the auto-discovery wizard.

NOTE: With 7.3.2 the alias are case sensitive.

If you have a NT Agent please refer to 'Invalid service name' while registering a job or event.

'Oralogin failed in orlon'

You may receive this error while executing a TCL script using the oratcl verb oralogon. "Oralogin failed in orlon" means that the connect string is either wrong or for some reason, the account used cannot logon to the database. To debug the this error, turn on Tcl tracing.

ORA-1017 'Invalid username/password'

Invalid username/password errors may occur when starting up the Agent on UNIX systems from an X-terminal. This problem can occur because the Data Gatherer (pre-9i) cannot connect to the Capacity Planner repository to upload collected data. This message will repeat every couple of minutes.

NT Agent

For any NT Operating System Error when starting the Agent

See Oracle Intelligent Agent - Windows Event Log Messages for information on Windows NT-Agent error message cross-referencing.

In order to debug the Agent after you have received an OS error, follow the following steps:
'Failed to connect to Agent' error.
(Jobs that remain in submitted status)

There are in fact two hostname definitions on NT: One NETBios one, used for the NT's internal Named Pipes protocol, which is always installed. The other is the TCP/IP hostname, which is only configurable when you install TCP/IP on NT.

To find the NT NetBios hostname:

To find the TCP/IP hostname:

On an NT server, you can 'ping' the two names, even if they are configured differently. Other clients, however, only 'ping' real TCP/IP hostnames. If the Agent is using local IPC connections, it uses Named Pipes, thus it uses the NetBios name. All external connections will use the TCP/IP name.

A mismatch in these names leads to 'unable to contact Agent', or forever pending jobs in the console. Therefore, make sure that the NetBios and the TCP/IP hostname are identical.

Receive the error failed -> 'output from job lost' while running job.

The Windows NT user that you created for the Agent (see Agent Configuration, Configuration Guide) needs read/write permissions to the $ORACLE_HOME\network\agent directory (and TEMP directory, for some applications) and read permissions to the SYSTEM32 directory

Verify that the NT user has these permissions.

UNIX Agent

Discovery fails with no services at all

First check that all of the SQL*Net/Oracle Net files are present and correctly defined. You can then debug discovery by editing your oratab file contains only a valid SID with a listener running. After you get this working, you can add the remaining entries in the oratab file to see which entry is causing the problem.

Check the $ORACLE_HOME/network/log/nmiconf.log files for errors.

NMS-0308 : 'Failed to listen on address : another Agent may be running'.

There are two possible causes for this error:

  1. If two Agents are installed on a machine, in two different ORACLE_HOME, then you see this message if you try to start the second Agent. This is because both Agents try to listen the same default port #1748

Only have one Agent on a machine.

  1. The port 1748 where the Agent listens is being used by someone else, or is not being released by dead process that were formerly using it (unfortunately common problem on SUN) .

To confirm port is being used by someone else

  1. Use this command in UNIX
    netstat -a | grep 1748 
    
    

    If any result shown on screen that ends in "LISTENING" then the port is in use.

  2. If the following is true :
    • netstat -a | grep 1748 ---> results in "LISTENING"
    • agentctl status agent ( results in "The db subagent is not started.")
    • Then do this.
    • ps -ef | grep dbsnmp
    • kill -9 ______ (fill in process numbers)
    • restart Agent with agentctl start agent

If it still fails to start the Agent, go through steps again, but before re-starting the AGENT, do this.

This will re-start the Agent and remove all of the job and event queues (.q files) it was using in the past.

If all else fails, re-booting the machine will free up the port.

NMS-001 while starting the Agent (SNMP environemnts only)

This message indicates that the SNMP Master Agent (the process on UNIX that controls the SNMP protocol) could not be contacted. By default the Agent listens and works over SQL*Net or Oracle Net, but the Agent can also work over SNMP on UNIX systems.

This message can safely be ignored unless you are trying to communicate with a Master Agent.

NMS-00207 Agent xxxx user account is locked for database yyyy

Events registered with the Agent for monitoring a 9i version of the database will not work because the database account is locked.

Under these conditions, an Enterprise Manager database up-down event will always indicate that the database is down. The Agent's log file dbsnmp.log will contain a NMS-00207 error message indicating the dbsnmp user account for the database is locked.

To resolve this problem, you must log into the database and perform the following:

  1. Unlock the "dbsnmp" account by running the sql statement:
    ALTER USER dbsnmp ACCOUNT UNLOCK;
    
  2. Reset the password for dbsnmp account by running the sql statement:
    ALTER USER dbsnmp IDENTIFIED BY <password>;
    
  3. Add the reset password to the Agent configuration file snmp_rw.ora as follows:
    SNMP.CONNECT.<service_name>.PASSWORD=<password>
    

    where service_name is the name of the seed database as discovered by the Agent in snmp_ro.ora/snmp_rw.ora.

  4. Stop and start the Agent using agentctl.

Run the catsnmp.sql script for that database with either the SYS or INTERNAL accounts.

NMS-205 Failure to connect to database name with username/password string

The 'dbsnmp' user could not be located.

Run the catsnmp.sql script for that database with either the SYS or INTERNAL accounts.

NMS-351 Encryption key supplied is not the one used to encrypt the file

This happens if there mismatches between the ID's in the '*.q' files in the $ORACLE_HOME/network/agent directory. This condition can be caused by the following

Delete all the '*.q' in the $ORACLE_HOME/network/agent directory. Rebuild your repository. Restart the Agent.

Tracing the 9i Agent

Tracing and logging of the Intelligent Agent allows tracking of all communication between the Intelligent Agent and Management Server(s) as well as Agent startup and discovery information. To turn on tracing for the 9i Intelligent Agent, you will need to modify the Agent's snmp_rw.ora file. This file is normally in the $ORACLE_HOME\network\admin directory. The snmp_rw.ora is created the first time the Agent process is started. If the file is not created and you need to trace the startup process, manually create a text file and add the necessary tracing parameters to the file.

The log file, $ORACLE_HOME/network/log/dbsnmp.log, is written by the Agent on every startup, even if tracing is not turned on. It contains the name and version of the Agent and the name and location of the Agent's configuration files. If tracing is turned on, it also contains problems encountered with the database and listener connections.

The log file, $ORACLE_HOME/network/log/nmiconf.log, is created upon first start up of the Agent and appended upon subsequent Agent startups. The auto discovery is done by the Tcl script, nmiconf.tcl (hence, the log file name). This file is written to only during startup. $ORACLE_HOME/agentbin/ORATCLSH is a special-purpose TCL shell that supports all standard TCL verbs (supported in TCL82) plus a large subset (not all) of the ORATCL verbs supported by the Intelligent Agent. ORATCLSH is not a general purpose utility and may only be used in combination with the Intelligent Agent as it depends on files and data structures maintained by the Agent.

There is no documentation of ORATCLSH as it has never been part of the supported feature set of the Intelligent Agent. It is provided strictly as a debugging tool to help Oracle customers and developers in developing Enterprise Manager job and event scripts. The executable ORATCLSH is provided for debugging your TCL scripts. Before executing ORATCLSH, set the environment variable TCL_LIBRARY to point to $ORACLE_HOME/network/agent/tcl, the location of the init.tcl file.

By default the following log files are created under the Agent's ORACLE_HOME/network/log directory:

Setting various tracing and logging parameters in the snmp_rw.ora file allows you to monitor the following areas:

The following tables organize the tracing and logging parameters according to their respective functional areas.

Parameters Used to Trace the Agent (dbsnmp) Process

Table B-2 Parameters Used to Trace the Agent (dbsnmp) Process
Parameter Description

dbsnmp.trace_level = <OFF|USER|ADMIN|nn>

This parameter is used to turn tracing on and specify the amount of data that is collected (trace level). A trace level of 16 provides the most detailed level of information. This parameter must be in lower case. Trace levels are identical to those used by SQL*Net or Oracle Net.

dbsnmp.trace_file= <filename>

The default trace file is dbsnmp.trc. If you want the trace to be written to a different file, add the dbsnmp.trace_file parameter.

dbsnmp.trace_directory=<directory>

sThe default trace directory is $ORACLE_HOME/network/trace. If you want to change the location of the trace file, add the dbsnmp.trace_directory parameter.

dbsnmp.trace_filecnt=<integer>

This parameter defines the maximum number of trace files to be generated.

dbsnmp.trace_filesize=<integer in kilobytes>

This parameter defines the maximum size of a trace file in kilobytes.

dbsnmp.trace_unique={true/false}

This parameter generates a unique trace file each time if set to true.

dbsnmp.trace_timestamp={true/false}

This parameter determines whether to put a timestamp before each line of trace.

dbsnmp.log_directory=<directory>

The default log directory is $ORACLE_HOME\network\log. If you want to change the name, you can add the dbsnmp.log_directory parameter. However, the Agent must have write privileges to the directory that you specify.

dbsnmp.log_file=<filename>

The default log file is called dbsnmp.log. If you want to change the name, you can add the dbsnmp.log_file parameter. This log is written by the Agent on every startup, even if tracing is not turned on. It contains the name and version of the Agent and the name and location of the Agent's configuration files. If tracing is turned on, it also contains problems encountered with the database and listener connections.

dbsnmp.log_unique={true/false}

This parameter will cause the Agent to create a unique log file each time the Agent is started.

Table B-3 Parameters Used to Trace the Job System
Parameter Description

dbsnmpj.trace_level={OFF|USER|ADMIN|nn}

This parameter is used to turn tracing on for the job system and also specifies the amount of data that is collected (trace level). A trace level of 16 provides the most detailed level of information. This parameter must be in lower case.

dbsnmpj.trace_directory=<directory>

The default trace directory is $ORACLE_HOME/network/trace. If you want to change the location of the trace file, add the dbsnmpj.trace_directory parameter.

dbsnmpj.trace_file=<filename>

The default trace file is dbsnmpj.trc. If you want the trace to be written to a different file, add the dbsnmpj.trace_file parameter.

dbsnmpj.trace_filecnt=<integer>

This parameter defines the maximum number of trace files to be generated.

dbsnmpj.trace_filesize=<integer in kilobytes>

This parameter defines the maximum size of a trace file in kilobytes.

dbsnmpj.log_directory=<directory>

The default log directory is $ORACLE_HOME\network\log. If you want to change the name, you can add the dbsnmpj.log_directory parameter. However, the Agent must have write privileges to the directory that you specify.

dbsnmpj.log_file=<filename

The default log file is called dbsnmpj.log.

dbsnmpj.log_unique={true/false}

This parameter will cause the Agent to create a unique log file each time a job is ran.

Table B-4 Parameters Used to Trace Agent Startup
Parameter Description

agentctl.trace_level={OFF|USER|ADMIN|nn}

This parameter is used to turn the tracing on for the agentctl utility (starting the agent). If a problem is occurring, Oracle Support will normally need a trace at level 16.

agentctl.trace_file=<filename>

The default trace file is agentctl.trc. If you want the trace to be written to a different file, add the agentctl.trace_file parameter.

agentctl.trace_directory=<directory>

The default trace directory is $ORACLE_HOME/network/trace. If you want to change the location of the trace file, add the agentctl.trace_directory parameter.

agentctl.trace_filecnt=<integer>

This parameter defines the maximum number of trace files to be generated.

agentctl.trace_filesize=<integer in kilobytes>

This parameter defines the maximum size of a trace file in kilobytes.

agentctl.trace_unique={true/false}

This parameter generates a unique trace file each time if set to true.

agentctl.trace_timestamp={true/false}

This parameter determines whether to put a timestamp before each line of trace.

Tracing the Data Collection Services

Because the data collection service (formerly the Data Gatherer) functionality has been integrated into the 9i Intelligent Agent, data collection-based tracing can be turned on using one of the following procedures (according to platform).

On UNIX:

  1. Set the VP_DEBUG environment variable to one:
    >setenv VP_DEBUG 1
    
    
  2. Start the agent:
    > agentctl start agent. 
    
    

    Any collection activity will be logged in $ORACLE_HOME/network/log/dbsnmp.nohup.

On Windows NT (from a DOS window):

  1. Set the VP_DEBUG environment variable to one:
    >set VP_DEBUG=1
    
    
  2. Start the Agent and redirect the output to a text file:
    > dbsnmp -agent_name Oracleora920Agent > stdout.log2> odg.log
    

On Windows 2000 (from a DOS window):

  1. Set the VP_DEBUG environment variable to one:
    >set VP_DEBUG 1
    
    
  2. Start the Agent and redirect the output to a text file:
    > dbsnmp -agentname Oracleora920Agent > stdout.log2> odg.log
    

Multiple ORACLE_HOMEs

If there are multiple ORACLE_HOMEs on the same machine, perform the following.

  1. Navigate to the specific ORACLE_HOME/bin directory of the Agent.
  2. Set the VP_DEBUG environment variable to one: >setenv VP_DEBUG 1
    e:/920/bin/ > set VP_DEBUG=1
    
    
  3. Start the Agent
    > agentctl start agent.
    

Tracing the Event System (tracing Tcl)

You can also turn on Event tracing by setting the dbsnmp.trace_level parameter in the snmp_rw.ora file to a level greater than or equal to one (dbsnmp.trace_level >= 1). You must shut down and re-start the Agent for these parameters to take effect. Tcl tracing creates a file, oratcl.trc, in the ORACLE_HOME/network/trace directory. Every time an event is triggered, an entry is added to the oratcl.trc file.

Interpreting Intelligent Agent Startup Errors on Windows NT/2000

When the Oracle Intelligent Agent service fails or fails to start with a pre-determined error code it calls the Windows ReportEvent function to write an entry to the event log, Windows NT then passes the parameters to the event-logging service. This in turn uses the information to write a log record to the event log. Other errors are reported to the nmi.log and nmiconf.log.

When the Windows NT event viewer application starts it uses the OpenEventLog function to open the event log for an event source. The event viewer can then use the ReadEventLog function to read event records from the log. ReadEventLog returns a buffer containing an EVENTLOGRECORD structure and additional information that describes a logged event.

Oracle Intelligent Agent - Windows Event Log Messages

When the Intelligent Agent service fails to start, the Windows Service Manager will return the underlying error code, but because it is not able to interpret the Oracle Event Message it incorrectly returns the Windows NT Win32 message text. The correct message, however, will appear in the NT event log.

The following table defines the events that the Intelligent Agent displays in the Windows NT Event viewer and the associated Win32 error text :

Table B-5 MS Windows-Intelligent Agent Error Message Translation
ID Windows NT/2000 Message Agent Description

1

Incorrect Function

OracleAgent failed to register its Service Control Handler

2

The system cannot find the file specified

OracleAgent failed to report its status to the Service Control Manager

3

The system cannot find the path specified

OracleAgent failed to create a thread synchronization object.

4

The system cannot open the file

OracleAgent failed to create a thread.

5

Access is denied

OracleAgent failed to allocate memory.

6

The handle is invalid

OracleAgent failed to get the encryption key.

7

The storage control blocks were destroyed

Oracle Agent failed to fun auto-discovery script nmiconf.tcl. Look in the nmiconf.log for more information.

The Intelligent Agent was unable to run the Auto Discovery process. This can be either due to invalid configuration files, a problem with the TCP/IP layer, or an error in the TCL libraries. Verify the SQL*Net or Oracle Net configuration files, and make sure that you can make normal loopback connections via SQL*Net or Oracle Net from the server to the databases.

If the problem persists, reinstall the Intelligent Agent software.

8

Not enough storage is available to process this command

OracleAgent failed to initialize Oracle CORE library

9

The storage control block address is invalid

OracleAgent failed to initialize Oracle NLS library.

The NLS environment and/or registry variables are not correct. Check the system for variables beginning with 'NLS'. Try to make a connection to a database using a standard tool like SQL*Plus or SQL*Worksheet, to verify the NLS client settings. If the problem persists, reinstall the Required Support Files software to correct this error.

10

The environment is incorrect

OracleAgent failed to initialize Oracle SQL*Net or Oracle Net library:%1.

The SQL*Net or Oracle Net environment and/or registry variables are not correct. Try to make SQL*Net or Oracle Net connections using a standard tool, like SQL*Plus or SQL*Plus Worksheet. If the problem persists, reinstall the SQL*Net or Oracle Net client (and server if a listener is running as well) software to correct this error.

11

An attempt was made to load a program with an incorrect format

OracleAgent failed to initialize DES encryption

12

The access code is invalid

OracleAgent failed to initialize Oracle Remote Operations Library

14

Not enough storage is available to complete this operation

OracleAgent failed to create file dbsnmp.ver

15

The system cannot find the drive specified

OracleAgent failed to create/read its queue files.

The hostname of the machine is encrypted in the agent '.q' files. If this hostname found in the files by the agent does not match the name of the machine the agent is running on, this error is generated. Remove the agent '.q' files and restart the agent.

16

The directory cannot be removed

OracleAgent failed to create job scheduling symbol table

17

The system cannot move the file to a different disk drive

OracleAgent failed to initialize its connection cache.

There is an incompatibility between the TCP hostname and the NetBios hostname of the machine. With the Network applet from the Control Panel, synchronize the names of the machine in the protocol properties dialogs.

18

There are no more files

OracleAgent failed to sign on to SNMP.

19

The media is write protected

OracleAgent failed to read SNMP indexes from parameter file.

20

The system cannot find the drive specified

OracleAgent failed to connect to the database.

Using the information from the SQL*Net or Oracle Net configuration files, the agent was not capable of making a connection to the desired databases. Most likely a protocol error, like TCP/IP conflicts or bad TCP port being referenced in the configuration files. Check the agent log files for the specific connection errors.

21

The drive is not ready

OracleAgent failed to build SNMP cache.

22

The device does not recognize the command.

OracleAgent failed to build MIB.

There are files in the $ORACLE_HOME\network\agent\mib directory which do not comply to the MIB specifications. Verify all files in that directory. Remove all non-MIB files, or corrupt MIB files.

23

Data error <cyclic redundancy check>

OracleAgent failed to register MIB row.

24

The program issued a command but the command length is incorrect

OracleAgent failed to restart its communication thread.

25

The drive issued a command but the command length is incorrect

OracleAgent failed to listen on designated port. Another OracleAgent may already be started.

There was an internal error using the encryption key used in the '.Q' files. This can either be a TCP/IP conflict on the machine, or a lack of system resources on the machine preventing a proper initialization of the Intelligent Agent routines. Check the system resources, and verify if the TCP/IP setup is correct on the machine.

The following errors are generated when the Windows NT program managing the services can no longer control, or looses control over the service.

1067

The process terminated unexpectedly

Most likely a corruption in the DLL's, a problem with system settings, etc.

Check for Dr. Watson logs, errors in the event viewer and messages in the log files to get more details on the exact condition.

2140

An internal Windows NT error occurred

The services program received conflicting information from the service. Most likely, an abnormal termination of the Intelligent Agent, due to conflicting system information discovered. More information about the abort of the Intelligent Agent can be found in the discovery log files.

2186

The service is not responding to the control function

The Intelligent Agent failed to report it's status to the SERVICES program monitoring the services.

If you can not stop the Agent service, use the 'KILL' command from the command line to stop the agent process. The 'KILL' command is part of the optional MS Resource Kit, and not a standard NT tool.

Typical Intelligent Agent Configuration Issues on UNIX

The Intelligent Agent software is delivered with the RDBMS server software. However, this does not mean that the Intelligent Agent software must be installed together with the database. It is quite possible to install the Agent alone, in a dedicated "$ORACLE_HOME", separate from the rest of the Oracle software.

The only thing the Intelligent Agent needs is SQL*Net or Oracle Net, to make connections to the databases it needs to monitor, and to be able to communicate with the rest of the Enterprise Manager framework. The SQL*Net or Oracle Net product, and the underlying 'Common Libraries' are products that are installed automatically whenever the Agent is installed on UNIX.

It is not necessary to have a SQL*Net listener running in the same "$ORACLE_HOME" as the Agent, and it also does not have to be of the same base version as the Agent.

Linking the Intelligent Agent


Important:

Be sure to shut down the Agent BEFORE relinking the software.


As soon as the software is installed from the installation media, the Intelligent Agent is relinked using the current system libraries present on the system.

If a change is made to the UNIX kernel, or the system libraries, it is advised to relink the Intelligent Agent, using the following commands:

$ cd $ORACLE_HOME/network/lib
$ make -f ins_oemagent.mk install

This will create two new executables: dbsnmp and oratclsh, both created in the "$ORACLE_HOME/bin" directory. If a version of these executables already exists, the old ones will be renamed to "dbsnmp0" and "oratclsh0". As soon as the new software has been tested, these safety copies can be removed.

When relinking database software, or after making changes to the installed SQL*net protocols drivers, this does require a shutdown of all databases of that version!

As soon as the agent is relinked however, either the "root.sh" file needs to be executed again, or a manual intervention is needed to adjust the dbsnmp executable.

If the "root.sh" file is meanwhile adjusted or overwritten by a newer file, the following commands have to be executed manually, as the 'root' user:

$ cd $ORACLE_HOME/bin
$ chmod 6751 dbsnmp 
$ chown root dbsnmp

These steps are essential for the proper working of the agent. If the agent has not the 'setuid' permissions (given by the chmod '6751' command), or is not owned by root ('chown root' command), the discovery can fail, and jobs will not get executed properly.

Also, when the agent has already been started on the machine, some of the files do have the 'root' ownership, making the agent fail to start, and update the wrong files, after a relink and an ownership back to the Oracle owner.

Running the Intelligent Agent

Whenever a program runs on UNIX under the 'root' permissions, the environment variable "LD_LIBRARY_PATH" is not read for security reasons. This means that all shared libraries, linked dynamically to the .EXE will have to be referenced using their absolute location on the disk.

You can check the linked shared libraries using the 'LDD' command. For example:

$ ldd dbsnmp 

To avoid problems with the shared libraries, there are three options:

If an error is generated when the agent starts, perform the following:

  1. Relink the agent:
    $ make -f ins_oemagent.mk install
    
    
  2. Change the owner and permissions of the executable:
    $ chmod 6751 dbsnmp
    $ chown root dbsnmp
    

Understanding and Troubleshooting the Data Gatherer


Important:

This section only applies pre-9i Intelligent Agents. The data collection services provided by the Data Gatherer are now integrated with the 9.x Intelligent Agent.


The Oracle Data Gatherer is a daemon process which manages the collection of performance statistics from the Oracle database and from the host operating system for use by Enterprise Manager tools, such as the Oracle Performance Manager and the Oracle Capacity Planner. As mentioned above, this functionality is now an integral part of 9.x Agent. This section only applies to older versions of the Agent/Data Gatherer.

Data Gatherer Configuration

The Data Gatherer collects the following types of data:

You may not be able to collect operating system data if the Data Gatherer is not available for that particular operating system. To collect operating system data the Data Gatherer must be installed on the same host as the OS. It may not be possible to install and configure the Oracle Data Gatherer on a particular host if both of the following requirements are not met:

Using the Data Gatherer when Installed on Server Host

This is the assumed configuration for using the Oracle Data Gatherer. The Data Gatherer is installed separately as part of the database server installation. In 8.0.5 or higher, the Data Gatherer is installed along with the Intelligent Agent (but not an integral part of the Agent, as is the case for 9.x versions of the Agent). The Data Gatherer can be used to monitor database statistics for any database on that host, and also can be used to monitor OS statistics for the host itself.

Using the Data Gatherer when Installed on Client Host

It is possible to install the Data Gatherer on the same host where the client will be run from (if the client is Windows NT 4.0, this is not supported on Windows 95 or Windows 98). In this configuration, you will be able to monitor the database statistics for remote databases, but will not be able to monitor the operating system statistics on the remote host.

Using the Data Gatherer on an Alternative Port

The Data Gatherer and clients are installed and run assuming the use of a well-known (IANA registered) port (1808), which is used for communication between clients and the Data Gatherer server. If you wish to use an alternate port, you may do so, however this configuration is not supported.

Using Multiple Oracle Homes

It is possible to install the Oracle Data Gatherer in an environment with multiple Oracle homes, however there are two issues to keep in mind if you attempt to do this:

Data Gatherer Recovery Capabilities

Data Gatherer Restart at Host Restart

The Data Gatherer is configured to save the state of all current historical collections, such that when it restarts it will create recovery threads to restart these collections from the state files. If the Data Gatherer is configured to start automatically when the system reboots, then collections should be able to continue.

Collecting Historical Data from a Cycling Database

If the database from which data is being collected is cycled (e.g. shutdown each evening for backups) then the Data Gatherer is designed to continue collecting data from the database when it restarts. The Data Gatherer attempts to reconnect to the database at the specified collection interval until it becomes available.

Miscellaneous Data Gatherer Issues

This status check will result in one of two messages:

The Oracle Data Gatherer is running!

-OR-

The Oracle Data Gatherer is not running

Clean Starting the Intelligent Agent

Clean starting the Agent involves clearing all existing job and event definitions. This should only be necessary when the Enterprise Manager environment needs to be reinitialized, or upon specific request from Customer Support. Actions will need to be performed from both the Console and the Agent node.

To clean start the Agent:

  1. From the Enterprise Manager Console.
    • Remove all jobs and events registered against the machine the agent is running on.
    • As soon as all events and jobs have been removed, shut down the Console.
  2. Stop the Intelligent Agent currently running on the desired target.

    On UNIX, issue the following command:

    $ agentctl stop agent
    

    After the stop command has been issued, use the 'ps' command to verify that the 'dbsnmp' processes have been stopped. If the Intelligent Agent cannot be stopped, use the 'ps' command to obtain the process ID's of the dbsnmp processes, and use the 'kill -9' command to terminate these processes.

    On Windows NT, use the Control Panel / Services applet to turn the Agent service off, or issue the command line option:

    C:\> agentctl stop agent 
    

    When the agent is stopped, use the TaskManager to verify that the 'dbsnmp' process has been stopped.

  3. Go to the "$ORACLE_HOME/network/agent" directory and delete the following files:
    • SERVICES.ORA: File containing all the services the agent has discovered. This file will be recreated the next time the agent starts.
    • All files with the '.q' extension: These files are the binary representation of all registered and running jobs and events. The agent creates new 'q' files on startup when they do not exist.

    Information about the host on which the agent is running is also stored in these files. If the name or the IP address of the agent machine changes, these files need to be recreated.

    • All files with the '.inp' extension: These are the parameter files for registered jobs.
    • All files with the '.jou': These are journal files with information the agent will use to execute jobs.
    • All files with the 'out' prefix: These are temporary output files that are generated when jobs are being executed. These files should never remain in this directory after a job finishes.
    • All files with the 'tcl' prefix: These are Tcl template scripts, including the specific commands, specified in the job sent to the agent
    • DBSNMP.VER : This is a text file with the agent version information. This file will also be recreated when the agent starts again.
  4. Go to the "$ORACLE_HOME/network/admin" directory. In this directory, remove the following files:
    • SNMP_RO.ORA : The read-only information the agent has discovered of the services on the machine.

      This file should never be edited by a user. During startup, this file is read if it exists, and then recreated again with the new discovery information.

    • SNMP_RW.ORA : The read-write information the agent has discovered of the services on the machine.

      Some of the information in this file can be edited by an administrator to provide more info about the discovered services.

      Upon Agent startup, this file is read if it exists, and after the discovery written again. Information is only added to this file.

  5. Rename or remove the existing log files the agent has already created. This allows you to track any log messages since clean starting the Agent.

    The following files, located in the "$ORACLE_HOME/network/log" directory. They are used during startup of the agent:

    • NMICONF.LOG : Discovery log files
    • NMI.LOG : SQL*Net connection errors (Windows NT only)
    • DBSNMP.LOG : Communication and working thread logging, if logging is enabled

    You should also remove all files from $ORACLE_HOME/network/agent/library directory.

  6. Create an empty SNMP_RO.ORA file and add the following line.
    snmp.visibleservices = ()
    
  7. If tracing is needed to debug a certain situation, a new "SNMP_RW.ORA" can be created, with the debug parameters:
    dbsnmp.trace_level=16
    dbsnmp.trace_unique=true
    
    
    
  8. Restart the Agent.

    After the agent has started, verify the "SERVICES.ORA" file first. If this file contains all the services on the machine, then check the "SNMP_RO.ORA" and "SNMP_RW.ORA" files. Discovery problems can be found in the file "NMICONF.LOG".


    Important!:

    Deleting the '.q' files erases all state information from the Agent, i.e., the Intelligent Agent has no information about the registered jobs and events.

    Removing the '.q' files while jobs and events are submitted against the agent will result in synchronization errors between the framework and the agent. Therefore, use this method only on specific request by Customer Support when there is an explicit need for re-initializing the Intelligent Agent.

    Events/jobs are internally identified by a sequence number. When a 'q' file is created, the sequence is reset. The Management Server on submission of an event realizes that the internal ID that it gets back is less than any other that it has previously registered for the node and triggers the message indicating there is a skew between the systems. The only option is to remove all the events and jobs and reregister them. As soon as the Management Server detects a mismatch of ID's, the Agent will be marked as 'corrupted', and all job or events you wish to send to this agent will fail with the error:

    VNI-4040: The agent on node is not in sync with the Management 
    Server
    

  9. Delete the contents of the $ORACLE_HOME/network/agent/reco directory.

Diagnosing Intelligent Agent Discovery Errors on UNIX

The discovery process on UNIX involves the following actions:

Files Used During Discovery on UNIX

listener.ora : Definitions of incoming SQL*Net connections

1 file per $ORACLE_HOME

Located in one of the following locations (using this order searching for it):

nmiconf.log : Intelligent Agent discovery warnings/errors

1 per Intelligent Agent

Located in $ORACLE_HOME/network/log

nmiconf.lst : List of third party additional discovery script to run

1 per Intelligent Agent

Located in $ORACLE_HOME/network/agent/config

nmiconf.tcl : Intelligent Agent discovery script

1 per Intelligent Agent

Located in $ORACLE_HOME/network/agent/config

oratab : File with all databases present on the machine

Only 1 per machine

Located in either /etc or /var/opt/oracle

services.ora : File with all service definitions the agent found

1 per Intelligent Agent

Located in $ORACLE_HOME/network/agent

snmp_ro.ora : File with all read-only service information

1 per Intelligent Agent

Located in $TNS_ADMIN or $ORACLE_HOME/network/admin

snmp_rw.ora : File with all updateable service information

1 per Intelligent Agent

Located in $TNS_ADMIN or $ORACLE_HOME/network/admin

sqlnet.ora : File with SQL*Net specific parameters

1 file per $ORACLE_HOME

Located in either (using this order searching for it):

tnsnames.ora : File with the TNS aliases to connect to databases

1 file per $ORACLE_HOME

Located in either (using this order searching for it):

Diagnosing the Discovery Problem on UNIX

Phase 1: Checking the ORATAB file

The ORATAB file is located in either the /etc, or the /var/opt/oracle directory.

You should consult your OS-specific documentation to see which directory is used to store the configuration files.

Things to look at:

Phase 2: Checking the SQLNET.ORA file

For every ORACLE_HOME on the system, the Agent looks for the SQL*Net or Oracle Net files. It needs the SQLNET.ORA and LISTENER.ORA files first, to obtain the database service definitions. Sometimes, in the case of missing information, the TNSNAMES.ORA file is also required.

The Agent searches for the SQL*Net files in the follow order:

  1. First, it checks the environment in search for a TNS_ADMIN variable. If one is found, this directory is used for the retrieval of the SQLNET information.


    Note:

    The Intelligent Agent uses its own environment. It does not perform a login (running the profile/login scripts), nor does it run the 'oraenv' script to get the information for a specific ORACLE_HOME. If certain TNS_ADMIN values are enforced in login scripts, or in the 'oraenv' script which are different from the one in the Agent environment, they are not used by the Intelligent Agent!

    If you have more than one ORACLE_HOME on the system, and a TNS_ADMIN variable is specified, 'duplicate definition' warnings will be logged in the NMICONF.LOG file starting from the second ORACLE_HOME. This occurs because the Agent finds the same SQL*Net files for that home, which will naturally contains identical information. These warnings can be ignored.


  2. If a TNS_ADMIN environment variable is not specified, the default OS specific configuration location is searched for SQL*Net files. This can either be the '/etc' directory or '/var/opt/oracle', depending on the UNIX flavor you are working on.


    Note:

    If the Agent finds files in the OS specific location, and there are several ORACLE_HOMEs on the system, 'duplicate definition' warnings will be logged in the NMICONF.LOG file.


  3. Finally, if nothing is found, the default $ORACLE_HOME/network/admin directory is searched for the necessary files.


    Note:

    If SQL*Net files are not found in a particular $ORACLE_HOME, the Agent skips to the next home and searches for the info there. If after completion the scan of all homes, information about a specific SID found in the ORATAB file is not found, a warning will be logged in the NMICONF.LOG file saying the SID will be skipped. This is a common warning when the Intelligent Agent is installed in a separate ORACLE_HOME: During the installation of the Agent, the install routines will adjust the ORATAB file, but there is no actual database with the SID name given during the installation.


    Once the SQL*Net configuration directory is established, the actual reading of the information can begin.

    Only one parameter is read from the SQLNET.ORA file: The names.default_domain parameter.

Phase 3: Checking the LISTENER.ORA file.

Using the same SQL*Net configuration directory, the information from the listener.ora file is read.

This contains two parts:

The end result is a list of listeners. And for each listener the list of SIDs the listener works for. Every SID in these lists on its turn has, a list of the details needed for that database service.

Phase 4: Verifying the Information

As soon as all the files are parsed and treated, and all services are found, the Agent verifies if all the information is present and valid.

Diagnosing Agent Discovery Errors on Windows NT

The discovery process on Windows NT involves the following actions:

Files Used During Discovery

listener.ora : File with definitions of incoming SQL*Net connections

1 file per $ORACLE_HOME

Located in either (using this order searching for it):

nmiconf.log : File with Intelligent Agent discovery warnings/errors

1 per Intelligent Agent

Located in:

nmiconf.lst : List of 3rd party additional discovery script to run

1 per Intelligent Agent

Located in:

nmiconf.tcl : Intelligent Agent discovery script

1 per Intelligent Agent

Located in:

services.ora : File with all service definitions the agent found

1 per Intelligent Agent

Located in:

snmp_ro.ora : File with all read-only service information

1 per Intelligent Agent

Located in either (in order according to search priority):

snmp_rw.ora : File with all updateable service information

1 per Intelligent Agent

Located in either (in order according to search priority):

sqlnet.ora : File with SQL*Net specific parameters

1 file per $ORACLE_HOME

Located in either:

tnsnames.ora : File with the TNS aliases to connect to databases

1 file per $ORACLE_HOME

Located in either (ordered according to search priority):

Diagnosing the Discovery Problem on NT

Phase 1: Scanning the registry

The registry is scanned for database services. For each 'OracleService' NT service found, a potential database service entry is created, and the corresponding ORACLE_HOME is determined.

Things to point here:

on UNIX.

Scanning the registry generates two lists:

Phase 2: Scanning the SQLNET.ORA file

For every ORACLE_HOME on the system, the Agent looks for the SQL*Net/Oracle Net files. It needs the SQLNET.ORA and LISTENER.ORA files first, to get the database service definitions. Sometimes, in case of missing information, the TNSNAMES.ORA file is also required.

The Agent looks for the SQL*Net files in this order:

  1. It checks the environment in search for a TNS_ADMIN variable. If one is found, this directory is used for the retrieval of the SQL*Net/Oracle Net information.
  2. Next, the registry is checked in search for a TNS_ADMIN variable in the 'HOMEx' of the ORACLE_HOME the Agent is verifying. If one is found there, it is used.
  3. Finally, if nothing is found, the default $ORACLE_HOME\net80\admin (Versions 8.0.X only) or $ORACLE_HOME\network\admin directory is searched for the necessary files.


    Note:

    If no SQL*Net/Oracle Net files are found in a particular $ORACLE_HOME, the Agent skips to the next home and searches for the info there. If after all Oracle_homes are scanned and information about a specific SID found in the registry is not found, a warning is logged in the NMICONF.LOG file saying the SID will be skipped.


Once the SQL*Net/Oracle Net configuration directory is established, the actual reading of the information can begin.

Only one parameter is read from the SQLNET.ORA file: The names.default_domain parameter.

Phase 3: Looking at LISTENER.ORA file

Using the same SQL*Net/Oracle Net configuration directory, the information from the listener.ora file is read.

This contains two parts:

The end result here is a list of listeners, with for each listener the list of SID's this listener is working for. Every SID in those lists on its turn has a list of the details needed for that database service.

Phase 4: Verifying the information

As soon as all the files are parsed and processed, and all services are found, the Agent verifies that all the information is present and valid.