Skip Headers

Oracle9i Data Mining Concepts
Release 9.2.0.2

Part Number A95961-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page Go to next page
View PDF

3
ODM Basic Usage

This chapter contains complete examples of using ODM to build a model and then score new data using that model. These examples illustrate the steps that are required in all code that uses ODM. The following two sample programs are discussed in this chapter:

The complete code for these examples is included in the ODM sample programs that are installed when ODM is installed. For an overview of the ODM sample programs, see Appendix A. For detailed information about compiling and linking these programs, see Section A.6.

The data that the sample programs use are included with sample programs: Sample_NaiveBayesBuild_short.java uses census_2d_build_unbinned and Sample_NaiveBayesApply_short.java uses census_2d_apply_unbinned. For more information about these tables, see Section A.4. The data used sample programs is installed in the ODM_MTR schema.

This chapter does not include a detailed description of any of the ODM API classes and methods. For detailed information about the ODM API, see the ODM Javadoc in the directory $ORACLE_HOME/dm/doc on any system where ODM is installed.

The sample programs have a number of steps in common. Common steps are repeated for simplicity of reference.

These short sample programs use data tables that are used by the other ODM sample programs. The short sample program that builds a model uses the table CENSUS_2D_BUILD_UNBINNED; the short sample program that applies the model uses CENSUS_2D_APPLY_UNBINNED. For more information about these tables, see Section A.4.

Note that these short sample programs do not use the property files that the other ODM use.


Note:

If you execute SampleNaiveBayesBuild.java and then execute SampleNaiveBayesBuild_short.java or Sample_NaiveBayesApply_short.java, you must change the buildtablename to a new name. Otherwise, you get a unique constraint error because the model name, MFS name, and Mining Task name are identical in both programs.


3.1 Using the Short Sample Programs

The short sample programs must be compiled and then executed in the proper order; you must execute SampleNaiveBayesBuild_short.java (which builds the model) before you execute SampleNaiveBayesApply_short.java (which applies the model to data).

You can compile and execute these models in several ways:

These methods are described in Section A.6.

Note that the short sample programs do not use property files.

3.2 Building a Model

This section describes the steps that must be performed by any program that builds an ODM model.

The sample program Sample_NaiveBayesBuild_short.java is a complete executable program that illustrates these required steps. The data for the sample program is CENSUS_2D_BUILD_UNBINNED.

3.2.1 Before Building an ODM Model

Before you build an ODM model, ODM must be installed on your system. You need to know the URL of the database where the ODM Data Mining Server resides, the user name, and the password. (Ask the person who installed the program what the user name and password are.)

Before you execute an ODM program, the ODM Monitor must be running.

Before you build a model, you must identify the data to be used during model building. The data must reside in a table in an Oracle9i database. You should clean the data as necessary; for example, you may want to treat missing values and deal with outliers, that is, extreme values that are either errors or values that may skew the binning. The table that contains the data can be in either transactional or nontransactional form. The ODM sample programs include data tables to use for model building.

Before you building a model, you must also know what data mining function that you wish to perform; for example, you may wish to create a classification model. You may specify which algorithm to use or let ODM decide which algorithm to use. The sample programs described in this chapter build and apply a Naive Bayes model.

3.2.2 Main Steps in ODM Model Building

For ODM to build a model, ODM must know the answers to the following questions:

The following steps provide answers to the questions asked above:

  1. Connect to the DMS (data mining server).
  2. Create a PhysicalDataSpecification object for the build data.
  3. Create a MiningFunctionSettings object (in this case, a ClassificationFunctionSettings object with no supplemental attributes).
  4. Build the model.

The steps are illustrated below with code for building a Naive Bayes model.

3.2.3 Connect to the Data Mining Server

Before building a model, it is necessary to create an instance of DataMiningServer. This instance is used as a proxy to create connections to a Data Mining Server (DMS). The instance also maintains the connection. The DMS is the server-side, in-database component that performs the actual data mining operations within ODM. The DMS also provides a metadata repository consisting of mining input objects and result objects, along with the namespaces within which these objects are stored and retrieved.

//Create an instance of the DMS server.
//The mining server DB_URL, user_name, and password for your installation
//need to be specified
dms=new DataMiningServer("DB_URL", "user_name", "password");

//get the actual connection dmsConnection = dms.login();

3.2.4 Describe the Build Data

Before ODM can use data to build a model, it must know where the data is and how the data is organized. This is done through a PhysicalDataSpecification instance where you indicate whether the data is in nontransactional or transactional format and describe the roles the various data columns play.

3.2.4.1 Location Access Data for Build Data

Before you create a PhysicalDataSpecification instance, you must provide information about the location of the build data. This is accomplished using a LocationAccessData object.

//Create a LocationAccessData using the table_name
//(CENSUS_2D_BUILD_UNBINNED) and schema_name for your installation
LocationAccessData lad =
     new LocationAccessData("CENSUS_2D_BUILD_UNBINNED", "schema_name");

Next, create the actual PhysicalDataSpecification instance.

3.2.4.2 Physical Data Specification for Nontransactional Build Data

If the data is in nontransactional format, all the information needed to build a PhysicalDataSpecification is contained in the LocationAccessData object.

//Create the actual PhysicalDataSpecification for a 
//NonTransactionalDataSpecification object since the 
//data set is nontransactional
PhysicalDataSpecification m_PhysicalDataSpecification =
                  new NonTransactionalDataSpecification(lad);

3.2.4.3 Physical Data Specification for Transactional Build Data

If the data is in transactional format, you must specify the role that the various data columns play.

//Create the actual PhysicalDataSpecification for a transactional 
//data case
PhysicalDataSpecification m_PhysicalDataSpecification = 
        new TransactionalDataSpecification(
                "CASE_ID",     //column name for sequence id
                "ATTRIBUTES",  //column name for attribute name
                "VALUES",      //column name for value
                lad);

3.2.5 Create the MiningFunctionSettings Object

The MiningFunctionSettings (MFS) object tells the DMS the type of model to build, the function of the model, and the algorithm to use.

ODM supports the following mining functions:

The MFS allows a user to specify the type of result desired without having to specify a particular algorithm. If an algorithm is not specified, the underlying DMS selects the algorithm based on user-provided parameters.

3.2.5.1 Specify the Default Algorithm for Classification

To build a model for classification using ODM's default classification algorithm, use a ClassificationFunctionSettings object with a null MiningAlgorithmSettings for the MFS. An easy way to create a ClassificationFunctionSettings object is to use the create method, as illustrated below. In this case, it is necessary to indicate the name of the target attribute, the type of the target attribute, and whether the data has been prepared (binned) by the user. Unprepared data will automatically be binned by ODM.

//Specify "class" as the target attribute name, categorical for the target
//attribute type, and set the DataPreparationStatus to unprepared. 
//Automatic binning will be applied in this case.
ClassificationFunctionSettings m_ClassificationFunctionSettings =
         ClassificationFunctionSettings.create(
                  dmsConnection,
                  null,
                  m_PhysicalDataSpecification,
                  "class",
                  AttributeType.categorical,
                  DataPreparationStatus.getInstance("unprepared"));

3.2.5.2 Specify the Naive Bayes Algorithm

If a particular algorithm is to be used, the information about the algorithm is captured in a MiningAlgorithmSettings instance. For example, if you want to build a model for classification using the Naive Bayes algorithm, first create a NaiveBayesSettings instance to specify settings for the Naive Bayes algorithm. Two settings are available: singleton threshold and pairwise threshold. Then create a ClassificationFunctionSettings instance for the build operation.

//Create the Naive Bayes algorithm settings by setting the thresholds 
//to 0.01. 
NaiveBayesSettings algorithmSetting = new NaiveBayesSettings(0.01f 0.01f);

//Create the actual ClassificationFunctionSettings using
//algorithmSetting for MiningAlgorithmSettings. Specify "class" as 
//the target attribute name, "categorical" for the target attribute
//type, and set the DataPreparationStatus to "unprepared".
//Automatic binning will be applied in this case.
ClassificationFunctionSettings m_ClassificationFunctionSettings =
          ClassificationFunctionSettings.create(
                 dmsConnection,
                 algorithmSetting,
                 m_PhysicalDataSpecification,
                 class,
                 Attribute Type.categorical,
                 DataPreparationStatus.getInstance(unprepared));

3.2.5.3 Validate the Mining Function Settings for Build

Because MiningFunctionSettings objects are complex objects, it is good practice to validate whether they were correctly created before starting the actual build task. If the MiningFunctionSettings object is a valid one, it should be persisted in the DMS for later use. This is illustrated below for the ClassificationFunctionSettings in our example.

//Validate and store the ClassificationFunctionSettings object
//with the name "Sample_NB_MFS".
m_ClassificationFunctionSettings.validate();
m_ClassificationFunctionSettings.store(dmsConnection, "Sample_NB_MFS");

3.2.6 Build the Model

Now that all the required information for building the model has been captured in an instance of PhysicalDataSpecification and MiningFunctionSettings, the last step needed is to decide whether the model should be built synchronously or asynchronously.

If you are calling ODM from an application, the design of the calling application may determine whether to build the model synchronously or asynchronously. Also, if the data used to build the model is large, it may take a significant amount of time to build the model; in such a case, you will probably want to build the model asynchronously.

3.2.6.1 Build the Model Synchronously

For a synchronous build, use the static MiningModel.build method. Note that this method is deprecated for ODM release 2.

//Build the model using the MFS named "Sample_NB_MFS" and store the
//model under the name "Sample_NB_Model".
MiningModel.build(
     dmsConn,
     lad,
     m_PhysicalDataSpecification,
     "Sample_NB_MFS",
     "Sample_NB_Model");

3.2.6.2 Build the Model Asynchronously

For an asynchronous build, create an instance of MiningTask. A mining task can be persisted in the DMS using the store method and executed at any time; however, it can be executed only once. Once the task is executing, query the current status information of a task by calling the getCurrentStatus method. This call returns a MiningTaskStatus object, which provides more details about the state. You can get the complete status history of a task by calling the getStatusHistory method.

//Create a Naive Bayes build task and execute it.
//MiningFunctionsSettings name (for example, "Sample_NB_MFS"), and 
//the ModelName (for example, "Sample_NB_Model") need to be specified.
MiningBuildTask task = 
     new MiningBuildTask(
          m_PhysicalDataSpecification,
          "Sample__NB_MFS",
          "Sample_NB_Model");

 //Store the task under the name "Sample_NB_Build_Task"
task.store(dmsConnection, "Sample_NB_Build_Task");

//Execute the task
task.execute(dmsConnection);

After the MiningModel.build or the task.execute call successfully completes, the model will be stored using the name that you specified (in this case, Sample_NB_Model) in the DMS.

3.3 Scoring Data Using a Model

After you've created a model, you can apply it to new data to make predictions; the process is referred to as "scoring data."

ODM can be used to score multiple records specified in a single database table or to score a single record. This section describes scoring multiple records.

The sample program Sample_NaiveBayesApply_short.java is a complete executable program that illustrates these required steps. The data for this sample program is CENSUS_2D_APPLY_UNBINNED. Note that this sample program does not use a property file.

3.3.1 Before Scoring Data

Before scoring an ODM model, you must have built an ODM model. This implies that ODM is installed on your system, and that you know the location of the database, the user name, and the password.

Before executing an ODM program, the ODM Monitor must be running.

Before you score data, the data must reside in a table in an Oracle9i database. The data to score must be compatible with the build data that you used when you built the model. You should clean the data to be scored in the same way that you cleaned the build data. If the build data for the model was not binned, the data to score must also be not binned.

The table that contains the data to score can be in either transactional or nontransactional form.

3.3.2 Main Steps in ODM Scoring

For ODM to score data using a model, ODM must know the answers to the following questions:

The following steps provide answers to the questions above:

  1. Connect to the DMS (data mining server).
  2. Create a PhysicalDataSpecification object for the input data (the data that you want to score).
  3. Create a LocationAccessData object for the input and output data.
  4. Create a MiningApplyOutput object for the output data.
  5. Score the data.

The steps above are illustrated in this section with code for scoring a Naive Bayes model.

3.3.3 Connect to the Data Mining Server

Before scoring data, it is necessary to create an instance of DataMiningServer. This instance is used as a proxy to create connections to a Data Mining Server (DMS). The instance also maintains the connection. The DMS is the server-side, in-database component that performs the actual data mining operations within ODM. The DMS also provides a metadata repository consisting of mining input objects and result objects, along with the namespaces within which these objects are stored and retrieved.

//Create an instance of the DMS server.
//The mining server DB_URL, user_name, and password for your installation
//need to be specified.
dms=new DataMiningServer("DB_URL", "user_name", "password");

//get the actual connection dmsConnection = dms.login(();

3.3.4 Describe the Input Data

Before ODM can apply a model to data, it must know the physical layout of the data. This is done through a PhysicalDataSpecification instance where you indicate whether the data is in nontransactional or transactional format and describe the roles the various data columns play.

3.3.4.1 Location Access Data for Apply Input

Before you create a PhysicalDataSpecification instance, you must provide information about the location of the input data. This is accomplished using a LocationAccessData object.

//Create a LocationAccessData using the table_name
//(CENSUS_2D_APPLY_UNBINNED) and the schema_name for your installation
LocationAccessData lad =
    new LocationAccessData("CENSUS_2D_APPLY_UNBINNED", "schema_name)";

Next, create the PhysicalDataSpecification instance.

3.3.4.2 Physical Data Specification for Nontransactional Input Data

If the data is in nontransactional format, all the information needed to build a PhysicalDataSpecification is contained in the LocationAccessData object.

//Create the actual PhysicalDataSpecification for a 
//NonTransactionalDataSpecification object since the 
//data set is nontransactional
PhysicalDataSpecification m_PhysicalDataSpecification =
                new NonTransactionalDataSpecification(lad);

3.3.4.3 Physical Data Specification for Transactional Input Data

If the data is in transactional format, you must specify the role that the various data columns play.

//Create the actual PhysicalDataSpecification for transactional 
//data case
PhysicalDataSpecification m_PhysicalDataSpecification = 
        new TransactionalDataSpecification(
                "CASE_ID",     //column name for sequence id
                "ATTRIBUTES",  //column name for attribute name
                "VALUES",      //column name for value
                lad);

3.3.5 Describe the Output Data

Before scoring the input data the DMS needs to know where to store the output of the scoring.

3.3.5.1 Location Access Data for Apply Output

Create a LocationAccessData object specifying where to store the apply output. The following code specifies writing to the output table CENSUS_NB_APPLY_RESULT.

// LocationAccessData for output table to store the apply results.
LocationAccessData ladOutput = new LocationAccessData ("CENSUS_NB_APPLY_RESULT",
"output_schema_name");

3.3.6 Specify the Format of the Apply Output

The DMS also needs to know the content of the scoring output. This information is captured in a MiningApplyOutput (MAO) object. An instance of MiningApplyOutput specifies the data (columns) to be included in the apply output table that is created as the result of an apply operation. The columns in the apply output table are described by a combination of ApplyContentItem objects. These columns can be either from the input table or generated by the scoring task (for example, prediction and probability). The following steps are involved in creating a MiningApplyOutput object:

  1. Create an empty MiningApplyOutput object.
  2. Create an ApplyContentItem object describing which generated columns to be included in the output and add it to the MiningApplyOutput object.
  3. Create ApplyContentItem objects describing columns from the input table to be included in the output and add them to the MiningApplyOutput object.
  4. Validate the MiningApplyOuput that you created.

3.3.6.1 Create an Empty Mining Apply Output Object

Create an empty MiningApplyOutput object as follows:

// Create MiningApplyOutput object
MiningApplyOutput m_MiningApplyOutput = new MiningApplyOutput();

3.3.6.2 Specify the Generated Columns in the Apply Output

There are two options for generated columns, described by the following ApplyContentItem subclasses:

For the current example, let's use an ApplyTargetProbabilityItem instance. Before creating an instance of ApplyTargetProbabilityItem, it is necessary to specify the names and the data types of the prediction, probability, and rank columns for the output. This is done through Attribute objects.

// Create Attribute objects that specify the names and data 
// types of the prediction, probability, and rank columns for the 
// output.
Attribute predictionAttribute = 
new Attribute("myprediction", DataType.stringType);
Attribute probabilityAttribute = 
new Attribute("myprobability", DataType.stringType);
Attribute rankAttr = 
new Attribute("myrank", DataType.stringType);

// Create the ApplyTargetProbabilityItem instance
ApplyTargetProbabilityItem aTargetAttrItem = 
new ApplyTargetProbabilityItem(predictionAttribute, probabilityAttribute,
rankAttr);

An ApplyTargetProbabilityItem class contains a set of target values whose prediction and probability appear in the apply output table, regardless of their ranks. A target value is represented as a Category, and it must be one of the target values in the target attribute used when building the model to be applied. This step is not necessary for the ApplyMultipleScoringItem case.

// Create Category objects to represent the target values
// to be included in the apply output table. In this example
// two target values are specified.
Category target_category = new Category("positive_class", "0",
DataType.getInstance("int")); Category target_category1 = new Category("positive_class", "1",
DataType.getInstance("int")); // Add the target values to the ApplyTargetProbabilityItem // instance aTargetAttrItem.addTarget(target_category); aTargetAttrItem.addTarget(target_category1); // Add the ApplyTargetProbabilityItem to the MiningApplyOutput // object m_MiningApplyOutput.addItem(aTargetAttrItem);

3.3.6.3 Specify the Input Columns to be Included in Output

The input table columns to be included in the apply output are described by ApplySourceAttributeItem instances. Each instance maps a column in the input table to a column in the output table. These columns are described by a source Attribute and a destination Attribute.

// In this example, attribute "PERSON_ID" from the source table 
// will be returned in the column "ID" in the output table.
// This specification is captured by the
// m_ApplySourceAttributeItem object.
MiningAttribute sourceAttribute = new MiningAttribute(
                 "PERSON_ID", 
                  DataType.intType,
                  AttributeType.notApplicable,
                  false,
                  false);

Attribute destinationAttribute = new Attribute(
                "ID",
                DataType.intType);

ApplySourceAttributeItem m_ ApplySourceAttributeItem =
             new ApplySourceAttributeItem(
             sourceAttribute,
             destinationAttribute)

// Add the ApplySourceAttributeItem object 
// to the MiningApplyOutput object
m_MiningApplyOutput.addItem(m_ApplySourceAttributeItem);

Note that the source mining attribute could have been taken from the logical data of the model's function settings.

3.3.6.4 Validate the Mining Apply Output Object

Because MiningApplyOutput objects are complex objects, it is a good practice to validate that they were correctly created before you do the actual scoring. This is illustrated below for the MiningApplyOutput in our example.

// Validate the MiningApplyOutput
m_ MiningApplyOutput.validate();

3.3.7 Apply the Model

Now that all the required information for scoring the model has been captured in instances of PhysicalDataSpecification, LocationAccessData, and MiningApplyOutput, the last step is

If you are calling ODM from an application, the design of the calling application may determine whether to apply the model synchronously or asynchronously. Also, if the input data has many cases, the apply operation may require a significant amount of time; in such a case,, you will probably want to apply the model asynchronously.

3.3.7.1 Apply the Model Synchronously

For synchronous apply, use the static SupervisedModel.Apply method. Note that this method is deprecated for ODM release 2.

// Synchronous Apply
// Score the model using the model named "Sample_NB_Model" and 
// store the results in the "Sample_NB_APPLY_RESULT"
public static void apply(
              dmsConn,
              lad,
              m_PhysicalDataSpecification,
              "Sample_NB_Model",
              m_MiningApplyOutput,
              ladOutput,
              "Sample_NB_APPLY_RESULT")

3.3.7.2 Apply the Model Asynchronously

For asynchronous apply, it is necessary to create an instance of MiningTask. A mining task can be persisted in the DMS using the store(dmsConn, taskName) method and executed at any time; such a task can be executed only once. The current status information of a task can be queried by calling the getCurrentStatus(dmsConn, taskName) method. This returns a MiningTaskStatus object, which provides more details about the state. You can get the complete status history of a task by calling the getStatusHistory(dmsConn, taskName) method.

// Asynchronous Apply
/ Create a Naive Bayes apply task and execute it.
// Result name (e.g., "Sample_NB_APPLY_RESULT"), and the 
// model name (e.g., "Sample_NB_Model") need to be specified
MiningApplyTask task =  new MiningBuildTask(
                        m_PhysicalDataSpecification, 
                        "Sample_NB_Model", 
                        m_MiningApplyOutput,
                        ladOutput,
                        "Sample_NB_APPLY_RESULT");

// Store the task under the name "Sample_NB_APPLY_Task"
task.store(dmsConnection, "Sample_NB_APPLY_Task");

// Execute the task
task.execute(dmsConnection);