Data Class Reference

#include <Data.h>

List of all members.

Public Member Functions

 Data ()
 ~Data ()
void read (std::string fName)
void discretizeEqInt (int bins, Data *combine)
void discretizeEqFreq (int bins, Data *combine=NULL)
bool subsample (unsigned int desClass, float per)
unsigned int microsample (unsigned int amount)
void normalizeAttribute (int attIndex)
Dataclone ()
bool cover (Rule *rule)
int compareListItems (ListItem l1, ListItem l2)
void calcLift ()
void calcPDPFEst (unsigned int LOC)
void calcProbSupt ()
float getLift ()
int getTotLOC ()
std::vector< int > getLOCs ()
std::vector< std::vector
< InstanceElement * > * > 
getInstanceSet ()
unsigned int getNumAtts ()
unsigned int getNumClasses ()
unsigned int getClassIndex (std::vector< InstanceElement * > *instance)
unsigned int getNumAttVals (std::string att)
std::string getAttName (int index)
unsigned int getAttIndex (std::string name)
unsigned int getAttValIndex (std::string attName, std::string valName)
std::string getAttValName (std::string att, int index)
std::string getClassName (int index)
std::vector< int > getClassFreqs ()
const std::vector< std::vector
< int * > > * 
getFrequencyTable ()
void printAttributes ()
void printDataSet (std::ostream &stream)
void printClassDist ()
void printInstance (int inst)
void printFrequencyTable (std::ostream &stream)

Protected Member Functions

void processAttribute (std::string line)
void processInstance (std::string line)
std::string preprocessString (std::string line)
int find (std::string att, std::vector< std::string > &l)


Detailed Description

This class represents a data file. It facilitates the structures necessary to easily get infomormation from the data.

Constructor & Destructor Documentation

Data::Data (  ) 

Empty Constructor.

Data::~Data (  ) 

Destructor.


Member Function Documentation

void Data::read ( std::string  fName  ) 

Reads in the training ARFF file and creates the Data instance.

Parameters:
fName The name of the file to read in.
Returns:
The file stream.

void Data::discretizeEqInt ( int  bins,
Data combine 
)

This method will discretize the attributes that are continuous using an equal interval discretization method.

Parameters:
bins The number of bins to use.
combine Another data set to combine with this one in the discretization.

void Data::discretizeEqFreq ( int  bins,
Data combine = NULL 
)

This method will discretize the attribtues that are continous using an equal frequency discretization method.

Parameters:
bins The number of bins to use.
combine Another data set to combine with this one in the discretization. If this is null, this is ignored.

bool Data::subsample ( unsigned int  desClass,
float  per 
)

This method will subsample the data. That is, remove instances of data that are not the desired class until the percentage of the desired class in the entire data set is met.

Parameters:
desClass The index of the desired class.
per The desired percent. If this is smaller than the percent makeup already, this method does nothing.
Returns:
true if the set has been altered, false otherwise.

unsigned int Data::microsample ( unsigned int  amount  ) 

This method will microsample that data. This involves having an equal distribution of all classes and a total number of each class being equal to amount.

Parameters:
amount The number of each class to be left in the data set.
Returns:
The actual number of each class left in the data set. If amount > size( class ) than it will only remove from other classes.

void Data::normalizeAttribute ( int  attIndex  ) 

This method will normalize an attribute so that each value is between 0 and 1 and the greatest attribute is equal to 1.

Parameters:
attIndex The index of the attribute to normalize.

Data * Data::clone (  ) 

Creates a copy of the Data with the attributes and instance information.

Returns:
The copied Data.

bool Data::cover ( Rule rule  ) 

This method will remove all instances of data from the data set that are covered by a given rule.

Parameters:
rule The rule to check coverage.
Returns:
true if the set was altered, false otherwise.

int Data::compareListItems ( ListItem  l1,
ListItem  l2 
)

Compares two ListItems.

Parameters:
l1 The first ListItem.
l2 The second Listitem.
Returns:
0 if l1 = l2, -1 if l1 < l2, or 1 if l1 > l2.

void Data::calcLift (  ) 

Calculates the base lift of the data.

void Data::calcPDPFEst ( unsigned int  LOC  ) 

Calulates the base infomation needed for Effort scoring.

Parameters:
LOC The attribue that is the lines of code.

void Data::calcProbSupt (  ) 

Calculates the frequency counts of each attribute-value pair. Assumes all data is discrete. Assumes only 2 ordered classes.( Best is second class )

float Data::getLift (  ) 

Gets the base lift of the data.

Returns:
The base lift.

int Data::getTotLOC (  ) 

Gets the total lines of code in this data instance.

Returns:
The total lines of code.

vector< int > Data::getLOCs (  ) 

Gets the lines of code per instance.

Returns:
A vector containing the lines of code per instance.

vector< vector< InstanceElement * > * > Data::getInstanceSet (  ) 

Gets the instance set.

Returns:
The instance set.

unsigned int Data::getNumAtts (  ) 

Gets the number of attributes.

Returns:
The number of attributes.

unsigned int Data::getNumClasses (  ) 

Gets the number of class values.

Returns:
The number of class values.

unsigned int Data::getClassIndex ( std::vector< InstanceElement * > *  instance  ) 

Gets the class index for a given instance.

Parameters:
An instance of data.
Returns:
The class index.

unsigned int Data::getNumAttVals ( std::string  att  ) 

Gets the number of values for a given attribute.

Parameters:
att The attribute.
Returns:
the number of values for att.

string Data::getAttName ( int  index  ) 

Gets the attribute name of the index'th attribute.

Parameters:
index The name to return.
Returns:
The name of the attribute at index.

unsigned int Data::getAttIndex ( std::string  name  ) 

Gets the index of an attribute if the string sent in matches it.

Parameters:
name The name of the attribute to find the index of.
Returns:
The index if found, number of attributes + 1 otherwise.

unsigned int Data::getAttValIndex ( std::string  attName,
std::string  valName 
)

Gets the index of an attribute value if the string sent in matches it.

Parameters:
attName The name of the attribute.
valName The name of the attribute value to match.
Returns:
The index if found, number of attribute values + 1 otherwise.

std::string Data::getAttValName ( std::string  att,
int  index 
)

Gets the name of the attribute value at the index'th value.

Parameters:
att The name of the attribute.
index the value to get.
Returns:
The name of the attribute value at the index.

string Data::getClassName ( int  index  ) 

Gets the class name at the index'th location.

Parameters:
index The index of the class to get.
Returns:
The class name in string form.

vector< int > Data::getClassFreqs (  ) 

Gets the class frequency vector.

Returns:
The class frequency vector.

const vector< vector< int * > > * Data::getFrequencyTable (  ) 

Gets the frequency count table for best^2/(best+rest)

Returns:
A jagged array with each 2-dimensinal access containing a length two array with the first element being the rest count and the second element being the best count of this attribute-value pair.

void Data::printAttributes (  ) 

This method will print the attributes.

void Data::printDataSet ( std::ostream &  stream  ) 

This method will print the data set.

void Data::printClassDist (  ) 

This method prints the class names and frequencies.

void Data::printInstance ( int  inst  ) 

This method will print one instance of the data set.

Parameters:
inst The instance number to print.

void Data::printFrequencyTable ( std::ostream &  stream  ) 

This method will print all of the attribute value best and rest frequencies.

Parameters:
stream The stream to print to.

void Data::processAttribute ( std::string  line  )  [protected]

Processes a string of text and converts that to a new attribute with values in the mAtts and mAttVals lists.

Parameters:
line The line of text to process.

void Data::processInstance ( std::string  line  )  [protected]

Processes a string of text and converts that to a new instance of a data set. Inserts that instance into the mInstances list.

Parameters:
line The line of text to convert.

std::string Data::preprocessString ( std::string  line  )  [protected]

Removes any trailing whitespace from a line. Makes every letter lower case. This allows for easier matching in later stages of the program.

Parameters:
line A line of text.
Returns:
A processed line of text.

int Data::find ( std::string  att,
std::vector< std::string > &  l 
) [protected]

Attempts to find a string in a list of strings.

Parameters:
att The string to find.
l The list to search.
Returns:
The index of att in l. If it is not found, returns -1.


The documentation for this class was generated from the following files:

Generated on Wed Feb 20 13:52:40 2008 for Which by  doxygen 1.5.5