Friday, April 15, 2005

Mining Model Viewer & DMX (for Association Rule Mining Model)

One of an interesting Mining Model Viewer is the “Microsoft Mining Content Viewer” (fig. 1). This gives results from a mining model in a tabular form which is easy to understand and very helpful while writing DMX queries.



Click to zoom-in



Figure 1

To retrieve the whole table using DMX query:

Select * From Customers.Content

Each row stores information about a node. The NODE_TYPE column gives the type of node. Value 1 means the node stores information about the Model, 7 indicates ItemSet, and 8 Association Rule. There are other types of node but I am talking in context of Association Rule Mining Model.

To get all the ItemSets:

Select * From Customers.Content Where NODE_TYPE = 7

To retrieve all the Association Rule:

Select * From Customers.Content Where NODE_TYPE = 8


Click to zoom-in


Figure 2


While working on a Mining Model from C# code, it won’t be efficient to parse the NODE_DESCRIPTION to retrieve the attribute name and attribute value for an ItemSet. If you expand the NODE_DISTRIBUTION column (fig. 2) then you could see the attribute name, attribute value, and related statistics for a node in a tabular form. From C# code if you execute

Select * From Customers.Content Where NODE_NAME = 100

reader[“NODE_DISTRIBUTION”] will only return a string with value "NODE_DISTRIBUTION". The DMX query to get the Attribute name and value in a tabular form would be something like this:

SELECT FLATTENED
(SELECT Attribute_Name, Attribute_Value, [Support], [Probability], [Variance] FROM node_distribution ) AS d
FROM Customers.CONTENT
WHERE [Node_Name] = 675

OUTPUT:


What about Assocaition Rule? It also faces the same problem as above.

Assocaition rule for ex:

Indiana Jones and the Last Crusade = Existing, Gender = Male -> Indiana Jones and the Raiders of the Lost Ark = Existing

is divided in to two parts; 1) right hand side (rhs), the one after the arrow and 2) left hand side (lfs), the one before the arrow.

NODE_DISTRIBUTION column holds the rhs for an association rule. You could run a DMX query similar to the one shown above to get a flat list of NODE_DISTRIBUTION. Getting the lhs is bit tricky. If you retrieve the NODE_RULE column of a node that has NODE_TYPE = 8 (i.e., association rule node), the return value is an XML string that looks something like this:

<AssocRule
support="122" confidence="0.813333" length="3"
LHS_ID="177" rule="Indiana Jones and the Last Crusade =
Existing, Gender = Male -> Indiana Jones and the Raiders of the Lost Ark =
Existing" />



The value of the attribute LHS_ID (in this case 177) is the NODE_NAME for the ItemSet that forms the lhs of this rule. So to get the LHS you will first have to get the LHS_ID attribute value from this XML and then get the flattened NODE_DISTRIBUTION for the node pointed by that value.