Data Class Reference
#include <Data.h>
List of all members.
Detailed Description
This class represents a data file. It facilitates the structures necessary to easily get infomormation from the data.
Constructor & Destructor Documentation
Member Function Documentation
void Data::read |
( |
std::string |
fName |
) |
|
Reads in the training ARFF file and creates the Data instance.
- Parameters:
-
| fName | The name of the file to read in. |
- Returns:
- The file stream.
void Data::discretizeEqInt |
( |
int |
bins, |
|
|
Data * |
combine | |
|
) |
| | |
This method will discretize the attributes that are continuous using an equal interval discretization method.
- Parameters:
-
| bins | The number of bins to use. |
| combine | Another data set to combine with this one in the discretization. |
void Data::discretizeEqFreq |
( |
int |
bins, |
|
|
Data * |
combine = NULL | |
|
) |
| | |
This method will discretize the attribtues that are continous using an equal frequency discretization method.
- Parameters:
-
| bins | The number of bins to use. |
| combine | Another data set to combine with this one in the discretization. If this is null, this is ignored. |
bool Data::subsample |
( |
unsigned int |
desClass, |
|
|
float |
per | |
|
) |
| | |
This method will subsample the data. That is, remove instances of data that are not the desired class until the percentage of the desired class in the entire data set is met.
- Parameters:
-
| desClass | The index of the desired class. |
| per | The desired percent. If this is smaller than the percent makeup already, this method does nothing. |
- Returns:
- true if the set has been altered, false otherwise.
unsigned int Data::microsample |
( |
unsigned int |
amount |
) |
|
This method will microsample that data. This involves having an equal distribution of all classes and a total number of each class being equal to amount.
- Parameters:
-
| amount | The number of each class to be left in the data set. |
- Returns:
- The actual number of each class left in the data set. If amount > size( class ) than it will only remove from other classes.
void Data::normalizeAttribute |
( |
int |
attIndex |
) |
|
This method will normalize an attribute so that each value is between 0 and 1 and the greatest attribute is equal to 1.
- Parameters:
-
| attIndex | The index of the attribute to normalize. |
Creates a copy of the Data with the attributes and instance information.
- Returns:
- The copied Data.
bool Data::cover |
( |
Rule * |
rule |
) |
|
This method will remove all instances of data from the data set that are covered by a given rule.
- Parameters:
-
| rule | The rule to check coverage. |
- Returns:
- true if the set was altered, false otherwise.
int Data::compareListItems |
( |
ListItem |
l1, |
|
|
ListItem |
l2 | |
|
) |
| | |
Compares two ListItems.
- Parameters:
-
| l1 | The first ListItem. |
| l2 | The second Listitem. |
- Returns:
- 0 if l1 = l2, -1 if l1 < l2, or 1 if l1 > l2.
Calculates the base lift of the data.
void Data::calcPDPFEst |
( |
unsigned int |
LOC |
) |
|
Calulates the base infomation needed for Effort scoring.
- Parameters:
-
| LOC | The attribue that is the lines of code. |
void Data::calcProbSupt |
( |
|
) |
|
Calculates the frequency counts of each attribute-value pair. Assumes all data is discrete. Assumes only 2 ordered classes.( Best is second class )
Gets the base lift of the data.
- Returns:
- The base lift.
Gets the total lines of code in this data instance.
- Returns:
- The total lines of code.
vector< int > Data::getLOCs |
( |
|
) |
|
Gets the lines of code per instance.
- Returns:
- A vector containing the lines of code per instance.
vector< vector< InstanceElement * > * > Data::getInstanceSet |
( |
|
) |
|
Gets the instance set.
- Returns:
- The instance set.
unsigned int Data::getNumAtts |
( |
|
) |
|
Gets the number of attributes.
- Returns:
- The number of attributes.
unsigned int Data::getNumClasses |
( |
|
) |
|
Gets the number of class values.
- Returns:
- The number of class values.
unsigned int Data::getClassIndex |
( |
std::vector< InstanceElement * > * |
instance |
) |
|
Gets the class index for a given instance.
- Parameters:
-
- Returns:
- The class index.
unsigned int Data::getNumAttVals |
( |
std::string |
att |
) |
|
Gets the number of values for a given attribute.
- Parameters:
-
- Returns:
- the number of values for att.
string Data::getAttName |
( |
int |
index |
) |
|
Gets the attribute name of the index'th attribute.
- Parameters:
-
| index | The name to return. |
- Returns:
- The name of the attribute at index.
unsigned int Data::getAttIndex |
( |
std::string |
name |
) |
|
Gets the index of an attribute if the string sent in matches it.
- Parameters:
-
| name | The name of the attribute to find the index of. |
- Returns:
- The index if found, number of attributes + 1 otherwise.
unsigned int Data::getAttValIndex |
( |
std::string |
attName, |
|
|
std::string |
valName | |
|
) |
| | |
Gets the index of an attribute value if the string sent in matches it.
- Parameters:
-
| attName | The name of the attribute. |
| valName | The name of the attribute value to match. |
- Returns:
- The index if found, number of attribute values + 1 otherwise.
std::string Data::getAttValName |
( |
std::string |
att, |
|
|
int |
index | |
|
) |
| | |
Gets the name of the attribute value at the index'th value.
- Parameters:
-
| att | The name of the attribute. |
| index | the value to get. |
- Returns:
- The name of the attribute value at the index.
string Data::getClassName |
( |
int |
index |
) |
|
Gets the class name at the index'th location.
- Parameters:
-
| index | The index of the class to get. |
- Returns:
- The class name in string form.
vector< int > Data::getClassFreqs |
( |
|
) |
|
Gets the class frequency vector.
- Returns:
- The class frequency vector.
const vector< vector< int * > > * Data::getFrequencyTable |
( |
|
) |
|
Gets the frequency count table for best^2/(best+rest)
- Returns:
- A jagged array with each 2-dimensinal access containing a length two array with the first element being the rest count and the second element being the best count of this attribute-value pair.
void Data::printAttributes |
( |
|
) |
|
This method will print the attributes.
void Data::printDataSet |
( |
std::ostream & |
stream |
) |
|
This method will print the data set.
void Data::printClassDist |
( |
|
) |
|
This method prints the class names and frequencies.
void Data::printInstance |
( |
int |
inst |
) |
|
This method will print one instance of the data set.
- Parameters:
-
| inst | The instance number to print. |
void Data::printFrequencyTable |
( |
std::ostream & |
stream |
) |
|
This method will print all of the attribute value best and rest frequencies.
- Parameters:
-
| stream | The stream to print to. |
void Data::processAttribute |
( |
std::string |
line |
) |
[protected] |
Processes a string of text and converts that to a new attribute with values in the mAtts and mAttVals lists.
- Parameters:
-
| line | The line of text to process. |
void Data::processInstance |
( |
std::string |
line |
) |
[protected] |
Processes a string of text and converts that to a new instance of a data set. Inserts that instance into the mInstances list.
- Parameters:
-
| line | The line of text to convert. |
std::string Data::preprocessString |
( |
std::string |
line |
) |
[protected] |
Removes any trailing whitespace from a line. Makes every letter lower case. This allows for easier matching in later stages of the program.
- Parameters:
-
- Returns:
- A processed line of text.
int Data::find |
( |
std::string |
att, |
|
|
std::vector< std::string > & |
l | |
|
) |
| | [protected] |
Attempts to find a string in a list of strings.
- Parameters:
-
| att | The string to find. |
| l | The list to search. |
- Returns:
- The index of att in l. If it is not found, returns -1.
The documentation for this class was generated from the following files:
- My Documents/Zach/School/Research/Which/which/Data.h
- My Documents/Zach/School/Research/Which/which/Data.cpp