"A meek endeavor to the triumph" by Sampath Jayarathna

Wednesday, December 30, 2015

[Weka] Attribute Selection/Ranking using Relief Algorithm

Following code snippet will show you how to find attribute ranking of the features from a data set before using in classification applications. I will be using the standard Weka 3.7.13 and the sample data file "weather.numeric.arff" inside your data folder of the Weka. I assume you know how to setup weka.jar files in your development environment.

Attribute means the something as feature in Weka.

This is the content of the sample data file,
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
To perform attribute selection, three elements are required. One is search method, and the second is evaluation method. Both elements need to be initiated and defined in a container class AttributeSelection. The third element is data. So the general framework of setting up attribute selection is like this:

 public static void main(String[] args) throws Exception {
         // load data
         String workingDirectory = System.getProperty("user.dir");
        String fs = System.getProperty("file.separator");
        String wekadatafile = workingDirectory + fs + "data" + fs + "weather.numeric.arff";
        BufferedReader datafile = readDataFile(wekadatafile);
        Instances data = new Instances(datafile);

         if (data.classIndex() == -1)
                data.setClassIndex(data.numAttributes() - 1);
         useLowLevel(data, wekadatafile);
  }

 /**
   * uses the low level approach
   */
  protected static void useLowLevel(Instances data, String datafile) throws Exception {
         System.out.println("\n3. Low-level");
         AttributeSelection attsel = new AttributeSelection();
         Ranker search = new Ranker();
         ReliefFAttributeEval evals = new ReliefFAttributeEval();
         attsel.setRanking(true);
         attsel.setEvaluator(evals);
         attsel.setSearch(search);
         attsel.SelectAttributes(data);
         // un-comment here to display the results from the ranking
        //System.out.println(attsel.toResultsString());
 
         // expand the ranked attributes so you can find the index, name and weight of the features
         double[][] ranked = attsel.rankedAttributes();
         System.out.println("ranked attributes!!!\n");
         for(int i=0;i<ranked.length;i++){
          System.out.println(" Feature:"+ data.attribute(index).name() +" weight:"+ ranked[i][1]);
         }
  }

Output
3. Low-level
ranked attributes!!!
Feature:outlook weight:0.0548
Feature:humidity weight:0.0113
Feature:windy weight:-0.0024
Feature:temperature weight:-0.0314

The overall setup for attribute selection is clear and intuitive. What's not so obvious is that search methods include ranking and sub-setting methods, and correspondingly, evaluation methods have individual evaluation and subset evaluation. Ranking search can't be used together with a subset evaluator, and vice versa. 

If you are using Subset evaluation methods like CfsSubsetEval, then you need to use Subset search method like GreedyStepwise etc. 
    //CfsSubsetEval eval = new CfsSubsetEval();
    //GreedyStepwise greedySearch = new GreedyStepwise();
    //search.setSearchBackwards(true);
    //attsel.setEvaluator(eval);
    //attsel.setSearch(greedySearch);

Subset Search Methods:
1. BestFirst
2. GreedyStepwise
3. FCBFSearch (ASU)

Subset Evaluation Methods:
1. CfsSubsetEval
2. SymmetricalUncertAttributeSetEval (ASU)

Individual Search Methods:
1. Ranker

Individual Evaluation Methods:
1. CorrelationAttributeEval
2. GainRatioAttributeEval
3. InfoGainAttributeEval
4. OneRAttributeEval
5. PrincipalComponents (used with a Rander search to perform PCA and data transform
6. ReliefFAttributeEval
7. SymmetricalUncertAttributeEval

No comments: