Following code snippet will show you how to find attribute ranking of the features from a data set before using in classification applications. I will be using the standard Weka 3.7.13 and the sample data file "weather.numeric.arff" inside your data folder of the Weka. I assume you know how to setup weka.jar files in your development environment.
Attribute means the something as feature in Weka.
Attribute means the something as feature in Weka.
This is the content of the sample data file,
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@dataTo perform attribute selection, three elements are required. One is search method, and the second is evaluation method. Both elements need to be initiated and defined in a container class AttributeSelection. The third element is data. So the general framework of setting up attribute selection is like this:
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
public static void main(String[] args) throws Exception {
// load data
String workingDirectory = System.getProperty("user.dir");
String fs = System.getProperty("file.separator");
String wekadatafile = workingDirectory + fs + "data" + fs + "weather.numeric.arff";
BufferedReader datafile = readDataFile(wekadatafile);
Instances data = new Instances(datafile);
if (data.classIndex() == -1)
data.setClassIndex(data.numAttributes() - 1);
useLowLevel(data, wekadatafile);
}
/**
* uses the low level approach
*/
protected static void useLowLevel(Instances data, String datafile) throws Exception {
System.out.println("\n3. Low-level");
AttributeSelection attsel = new AttributeSelection();
Ranker search = new Ranker();
ReliefFAttributeEval evals = new ReliefFAttributeEval();
attsel.setRanking(true);
attsel.setEvaluator(evals);
attsel.setSearch(search);
attsel.SelectAttributes(data);
// un-comment here to display the results from the ranking
//System.out.println(attsel.toResultsString());
// expand the ranked attributes so you can find the index, name and weight of the features
double[][] ranked = attsel.rankedAttributes();
System.out.println("ranked attributes!!!\n");
for(int i=0;i<ranked.length;i++){
System.out.println(" Feature:"+ data.attribute(index).name() +" weight:"+ ranked[i][1]);
}
}
Output
3. Low-level
ranked attributes!!!
Feature:outlook weight:0.0548
Feature:humidity weight:0.0113
Feature:windy weight:-0.0024
Feature:temperature weight:-0.0314
The overall setup for attribute selection is clear and intuitive. What's not so obvious is that search methods include ranking and sub-setting methods, and correspondingly, evaluation methods have individual evaluation and subset evaluation. Ranking search can't be used together with a subset evaluator, and vice versa.
If you are using Subset evaluation methods like CfsSubsetEval, then you need to use Subset search method like GreedyStepwise etc.
//CfsSubsetEval eval = new CfsSubsetEval();
//GreedyStepwise greedySearch = new GreedyStepwise();
//search.setSearchBackwards(true);
//attsel.setEvaluator(eval);
//attsel.setSearch(greedySearch);
Subset Search Methods:
1. BestFirst
2. GreedyStepwise
3. FCBFSearch (ASU)
Subset Evaluation Methods:
1. CfsSubsetEval
2. SymmetricalUncertAttributeSetEval (ASU)
Individual Search Methods:
1. Ranker
Individual Evaluation Methods:
1. CorrelationAttributeEval
2. GainRatioAttributeEval
3. InfoGainAttributeEval
4. OneRAttributeEval
5. PrincipalComponents (used with a Rander search to perform PCA and data transform
6. ReliefFAttributeEval
7. SymmetricalUncertAttributeEval
No comments:
Post a Comment