"A meek endeavor to the triumph" by Sampath Jayarathna

Wednesday, December 30, 2015

[Weka] Attribute Selection/Ranking using Relief Algorithm

Following code snippet will show you how to find attribute ranking of the features from a data set before using in classification applications. I will be using the standard Weka 3.7.13 and the sample data file "weather.numeric.arff" inside your data folder of the Weka. I assume you know how to setup weka.jar files in your development environment.

Attribute means the something as feature in Weka.

This is the content of the sample data file,
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
To perform attribute selection, three elements are required. One is search method, and the second is evaluation method. Both elements need to be initiated and defined in a container class AttributeSelection. The third element is data. So the general framework of setting up attribute selection is like this:

 public static void main(String[] args) throws Exception {
         // load data
         String workingDirectory = System.getProperty("user.dir");
        String fs = System.getProperty("file.separator");
        String wekadatafile = workingDirectory + fs + "data" + fs + "weather.numeric.arff";
        BufferedReader datafile = readDataFile(wekadatafile);
        Instances data = new Instances(datafile);

         if (data.classIndex() == -1)
                data.setClassIndex(data.numAttributes() - 1);
         useLowLevel(data, wekadatafile);

   * uses the low level approach
  protected static void useLowLevel(Instances data, String datafile) throws Exception {
         System.out.println("\n3. Low-level");
         AttributeSelection attsel = new AttributeSelection();
         Ranker search = new Ranker();
         ReliefFAttributeEval evals = new ReliefFAttributeEval();
         // un-comment here to display the results from the ranking
         // expand the ranked attributes so you can find the index, name and weight of the features
         double[][] ranked = attsel.rankedAttributes();
         System.out.println("ranked attributes!!!\n");
         for(int i=0;i<ranked.length;i++){
          System.out.println(" Feature:"+ data.attribute(index).name() +" weight:"+ ranked[i][1]);

3. Low-level
ranked attributes!!!
Feature:outlook weight:0.0548
Feature:humidity weight:0.0113
Feature:windy weight:-0.0024
Feature:temperature weight:-0.0314

The overall setup for attribute selection is clear and intuitive. What's not so obvious is that search methods include ranking and sub-setting methods, and correspondingly, evaluation methods have individual evaluation and subset evaluation. Ranking search can't be used together with a subset evaluator, and vice versa. 

If you are using Subset evaluation methods like CfsSubsetEval, then you need to use Subset search method like GreedyStepwise etc. 
    //CfsSubsetEval eval = new CfsSubsetEval();
    //GreedyStepwise greedySearch = new GreedyStepwise();

Subset Search Methods:
1. BestFirst
2. GreedyStepwise
3. FCBFSearch (ASU)

Subset Evaluation Methods:
1. CfsSubsetEval
2. SymmetricalUncertAttributeSetEval (ASU)

Individual Search Methods:
1. Ranker

Individual Evaluation Methods:
1. CorrelationAttributeEval
2. GainRatioAttributeEval
3. InfoGainAttributeEval
4. OneRAttributeEval
5. PrincipalComponents (used with a Rander search to perform PCA and data transform
6. ReliefFAttributeEval
7. SymmetricalUncertAttributeEval

No comments: