2
ODM Programming

This chapter discusses two major topics:

The requirements for compiling and executing ODM programs
How to perform common data mining tasks using Oracle9i Data Mining (ODM).

For an example of ODM basic usage, see Chapter 3.

This chapter provides an overview of the steps required to perform basic ODM tasks. For detailed examples of how to perform these tasks, see the ODM sample programs. The ODM sample programs are distributed with the ODM documentation. For an overview of the ODM sample programs, see Appendix A.

This chapter does not include a detailed description of any of the ODM API classes and methods. For detailed information about the ODM API, see the ODM Javadoc in the directory $ORACLE_HOME/dm/doc on any system where ODM is installed.

2.1 Compiling and Executing ODM Programs

ODM depends on the following Oracle9i Java Archive (.jar) files:

$ORACLE_HOME/jdbc/lib/classes12.jar
$ORACLE_HOME/lib/xmlparserv2.jar
$ORACLE_HOME/rdbms/jlib/jmscommon.jar
$ORACLE_HOME/rdbms/jlib/aqapi.jar
$ORACLE_HOME/rdbms/jlib/xsu12.jar
$ORACLE_HOME/dm/lib/odmapi.jar

These files must be in your CLASSPATH to compile and execute ODM programs.

If you use a database character set that is not US7ASCII, WE8DEC, WE8ISO8859P1, or UTF8, you must also include the following in your CLASSPATH:

$ORACLE_HOME/jdbc/lib/nls_charset12.zip

If you do not include nls_charset12.zip in your CLASSPATH, an ODM program will fail with the following error:

oracle.jms.AQjmsException: Non supported character set:oracle-character-set-178

2.2 Using ODM to Perform Mining Tasks

This section describes the steps required to perform several common data mining tasks using ODM.

All work in ODM is done using MiningTask objects.

2.2.1 Build a Model

This section summarizes the steps required to build a model.

Prepocess the input data, as required.
Discretize (bin) the input data. (This step is optional, ODM algorithms can automatically bin input data.)
Construct and store a MiningFunctionSettings object.
Construct and store a MiningBuildTask object.
After successful construction of the build task object, call a store method to store the object in the data mining server.
Call the execute method; the execute method queues the work for asynchronous execution and returns a task identifier to the caller.
Periodically call the getCurrentStatus method to get the status of the task. Alternatively, use the waitForCompletion method to wait until all asynchronous activity for task completes.

After successful completion of the task, a build results object exists.

The following sample programs illustrate building ODM models:

Sample_AdaptiveBayesNetworkBuild.java
Sample_NaiveBayesBuild.java
Sample_AssociationRulesBuild.java
Sample_ClusteringBuild.java

2.2.2 Perform Tasks in Sequence

Data mining tasks are usually performed in sequence. The following sequence of tasks is typical:

Collect and preprocess data
Build a model
Test the model
Calculate lift
Apply the model

To implement a sequence of dependent task executions, you may periodically check the asynchronous task execution status using the getCurrentStatus method or block for completion using the waitForCompletion method. You can then perform the dependent task after completion of the previous task.

For example, follow these steps to perform the build, test, and compute lift sequence:

Perform the build task as described in Section 2.2.1 above.
After successful completion of the build task, start the test task by calling the execute method on a MiningTestTask object. Either periodically check the status of the test operation or block until the task completes.
After successful completion of the test task, execute the compute lift task by calling the execute method on a MiningComputeLiftTask object.

2.2.3 Find the Best Model

Model Seeker builds multiple models; it then evaluates and compares the models to find a "best" model.

Follow these steps to use Model Seeker:

Create a single ModelSeekerTask (MST) instance to hold the information needed to specify the models to build. The required information is defined in subclasses of the MiningFunctionSettings (MFS) and MiningAlgorithmSettings (MAS) classes.

You can specify a combination of as many instances of the following as desired:
- NaiveBayesAlgorithmnSettings
- CombinationNaiveBayesSettings
- AdaptiveBayesNetworkAlgorithmSettings
- CombinationAdaptiveBayesNetSettings
(You cannot specify clustering models or Association Rules models.)
Call the Model Seeker Task execute method. The method returns once the task is queued for asynchronous execution.
Periodically call the getCurrentStatus method to get the status of the task, using the task name. Alternatively, use the waitForCompletion method to wait until all asynchronous activity for the required work completes.
When the model seeker task completes, use the getResults method to view the summary information and the best model. Model Seeker discards all models that it builds except the best one.

The sample program Sample_ModelSeeker.java illustrates how to use Model Seeker.

2.2.4 Find and Use the Most Important Attributes

Models based on data sets with a large number of attributes can have very long build times. To minimize build time, you can use ODM Attribute Importance to identify the critical attributes and then build a model using these attributes only.

Identify the most important attributes by building an Attributes Importance model as follows:

Create a Physical Data Specification for input data set.
Discretize the data if required.
Create and store mining settings for the attribute importance.
Build the Attribute Importance model.
Access the model and retrieve the attributes by threshold.

The sample program Sample_AttributeImportanceBuild.java illustrates how to build an attribute importance model.

After identifying the important attributes, build a model using the selected attributes as follows:

Access the model and retrieve the attributes by threshold or by rank.
Modify the Data Usage Specification by calling the function adjustAttributeUsage defined on MiningFunctionSetting. Only the attributes returned by Attribute Importance will be active for model building.
Build a model using the new Logical Data Specification and Data Usage Specification.

The sample program Sample_AttributeImportanceUsage.java illustrates how to build a model using the important attributes.

2.2.5 Apply a Model to New Data

You make predictions by applying a model to new data, that is, by scoring the data.

Any table that you score (apply a model to) must have the same format as the table used to build the model. If you build a model using a table that is in transactional format, any table that you apply that model to must be in transactional format. Similarly, if the table used to build the model was in nontransactional format, any table to which you apply the model must be in nontransactional format.

Note that you can score a single record, which must also be in the same format as the table used to build the model.

2 ODM Programming