uk.co.agena.minerva.util.model
Class DataSet

java.lang.Object
  extended by uk.co.agena.minerva.util.model.DataSet
All Implemented Interfaces:
java.lang.Cloneable, Identifiable, Writable

public class DataSet
extends java.lang.Object
implements Identifiable, java.lang.Cloneable, Writable

A DataSet contains a set of DataPoints (i.e. numerical values). A DataSet will typically be associated with an ExtendedNode or an Observation and will contain the marginal probability of each ExtendedState or the likelihood distribution of the evidence to be entered.

For example, there may exist an ExtendedNode called "Supplier Quality" with three ExtendedStates, "Low", "Medium" and "High". After compilation of the containing ExtendedBN, the marginals could be exported as a DataSet with the following DataPoint values: {0.3, 0.4, 0.3}.

A DataSet could also be used to encapsulate, for example, that "Supplier Quality" has been observed to be "High". The associated Observation object would contain a DataSet with the following DataPoint values: {0, 0, 1} (where the last value would correspond to the "High" ExtendedState).


Field Summary
static int LIMIT_RULE_NUMERIC_VALUE
           
static int LIMIT_RULE_STATENUMBER
           
static double version
           
 
Fields inherited from interface uk.co.agena.minerva.util.model.Writable
FIELD_SEPARATOR
 
Constructor Summary
DataSet()
          Default constructor.
DataSet(NameDescription name, int connObjectId)
          Creates a DataSet with the specified name and connected object ID.
DataSet(NameDescription name, int connObjectId, java.util.List dataPoints)
          Creates a DataSet with the specified name, connected object ID and DataPoints.
 
Method Summary
 DataPoint addAbsoluteDataPoint(double value)
          Adds a new absolute (i.e.
 void addDataPoint(DataPoint dataPoint)
          Adds the supplied DataPoint to the DataSet.
 void addDataPoint(DataPoint dataPoint, int orderPos)
          Adds the supplied DataPoint to the DataSet at the specified order postion.
 DataPoint addIntervalDataPoint(double lower, double upper)
          Adds a new interval data point to the DataSet.
 DataPoint addLabelledDataPoint(java.lang.String label)
          Adds a new labelled DataPoint to the DataSet.
 void clearDataPoints()
          Clears all DataPoints out of the DataSet and reinitialises the DataSet with an empty List.
 java.lang.Object clone()
          Creates a copy of this DataSet.
 boolean containsXNoZeroDataPoints(int nonZeroDataPoints)
          This method returns a boolean that is true if the data set contains a number of non-0 data points equal to the specified argurment.
 void convertLabelsToIntegerFormat()
          This method will iterate over the data points in the data set.
 void convertLabelsToScientificFormat(java.lang.String numberMask)
          This method will itterate over the data points in the data set, If they are interval data points it will construct new labels for the data points using their range values the number mask will be appiled to the numbers before they are used in the string If the number formatted mask provided is null "", then "0.00E0" is used.
static DataSet createDataSet(java.lang.String[] dataPointNames)
          This method creates a dataset, with a set of data points, with names equal to the strings passed as an arguement.
 boolean doDataPointsHaveIntervalLabels()
          This checks the first DataPoint to see if its label contains an interval.
 double[] getAsDoubles()
          Returns the DataSet's DataPoints as simple array of doubles.
 int getConnObjectId()
          Returns the connected object ID of the DataSet.
 DataPoint getDataPointAtOrderPosition(int orderPos)
          Returns the DataPoint at the specified order position in the DataSet.
 java.lang.String[] getDataPointLabels()
          Returns an array of the labels of all DataPoints.
 java.util.List getDataPoints()
          Returns the DataPoints associated with the DataSet.
 DataPoint getDataPointWithConnObjectID(int id)
          Returns the first data point that has the connObjectID attribute value equal to the one specified.
 DataPoint getHighestDataPoint()
          This method will return the highest data point in the dataset.
 DataPoint getHighestDataPointAdjustedForProbailityMass()
          This method will return the highest data point adjusted for probaility mass.
 int getId()
          Returns the unique ID of the DataSet.
 IntervalDataPoint getIntervalDataPointWithHighestMidPoint()
          Return the Interval data point with the highest mid point (dirived from the bounding values) if there are no Interval data points connected to the data set then a null is returned.
 IntervalDataPoint getIntervalDataPointWithHighestRangeValue()
          Return the Interval data point with the lowest range value (lowest lower bound) if there are no Interval data points connected to the data set then a null is returned.
 IntervalDataPoint getIntervalDataPointWithLowestMidPoint()
          Return the Interval data point with the lowest mid point (dirived from the bounding values) if there are no Interval data points connected to the data set then a null is returned.
 IntervalDataPoint getIntervalDataPointWithLowestRangeValue()
          Return the Interval data point with the lowest range value (lowest lower bound) if there are no Interval data points connected to the data set then a null is returned.
 DataPoint getLowestDataPoint()
          Returns the DataPoint in the DataSet that has the lowest value.
 DataPoint getLowestDataPointAdjustedForProbailityMass()
          This method will return the lowest data point adjusted for probaility mass.
 NameDescription getName()
          Returns the name of the DataSet.
 double getTotal(int fromPoint, int toPoint)
          returns the total value of all data points between the specified points, a value of -1 in a arguement will have it default to the beginning and end of the data set respectivily Thus getTotal(-1,-1) will return the total of the whole data set
 double getTotalAdjustedForProbabilityMass(int fromPoint, int toPoint)
          This methods will return a total of the data point values between the specified points.
 double getVersion()
          Returns the version of the class.
 void normalise()
          Normalises this DataSet that the values of all the DataPoints add up to one.
 void pad(double with, int noOfElements, java.lang.String label, int connObjId)
          Creates multiple identical DataPoints and adds them to this DataSet.
 int read(java.util.List strings, int currentLineNumber)
           
 DataPoint removeDataPointAtOrderPosition(int orderPos)
          Removes the DataPoint at the specified order position in the DataSet.
 void setConnObjectId(int connObjectId)
          Sets the connected object ID of the DataSet to the ID specified.
 void setDataPoints(java.util.List dataPoints)
          Assigns a List of DataPoints to the DataSet.
 void setId(int id)
          Sets the unique ID of the DataSet to the ID specified.
 void setName(NameDescription name)
          Sets the name of the DataSet to the name specified.
 void setVersion(double version)
          Sets the version number of the class.
 int size()
           
 java.lang.String toHTMLString(boolean includeHTMLTag, boolean includeTitle, java.lang.String formatterMask)
          This method will render the data set to an HTML string.
 java.lang.String toString()
          Returns a String representation of the DataSet.
 java.lang.String toString(boolean includeTitle, java.lang.String formatterMask)
          This method will format the data set to a sting, it includes line breaks between data points :
 java.util.List write()
          Writes the object in question to a List of Strings.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

version

public static double version

LIMIT_RULE_STATENUMBER

public static final int LIMIT_RULE_STATENUMBER
See Also:
Constant Field Values

LIMIT_RULE_NUMERIC_VALUE

public static final int LIMIT_RULE_NUMERIC_VALUE
See Also:
Constant Field Values
Constructor Detail

DataSet

public DataSet()
Default constructor.


DataSet

public DataSet(NameDescription name,
               int connObjectId)
Creates a DataSet with the specified name and connected object ID.

Parameters:
name - the name of the DataSet
connObjectId - the ID of the connected object

DataSet

public DataSet(NameDescription name,
               int connObjectId,
               java.util.List dataPoints)
Creates a DataSet with the specified name, connected object ID and DataPoints.

Parameters:
name - the name of the DataSet
connObjectId - the ID of the connected object
dataPoints - the DataPoints in the DataSet
Method Detail

getConnObjectId

public int getConnObjectId()
Returns the connected object ID of the DataSet.

Returns:
the connected object ID of the DataSet

setConnObjectId

public void setConnObjectId(int connObjectId)
Sets the connected object ID of the DataSet to the ID specified.

Parameters:
connObjectId - the new connected object ID for the DataSet

getVersion

public double getVersion()
Description copied from interface: Writable
Returns the version of the class. Used to ensure backward compatibility.

Specified by:
getVersion in interface Writable
Returns:
the version number

setVersion

public void setVersion(double version)
Description copied from interface: Writable
Sets the version number of the class. Used to ensure backward compatibility.

Specified by:
setVersion in interface Writable
Parameters:
version - the version number

getId

public int getId()
Returns the unique ID of the DataSet.

Specified by:
getId in interface Identifiable
Returns:
the unique ID of the DataSet

setId

public void setId(int id)
Sets the unique ID of the DataSet to the ID specified.

Parameters:
id - the new ID for the DataSet

getName

public NameDescription getName()
Returns the name of the DataSet.

Returns:
the name of the DataSet

setName

public void setName(NameDescription name)
Sets the name of the DataSet to the name specified.

Parameters:
name - the new name for the DataSet

getDataPoints

public java.util.List getDataPoints()
Returns the DataPoints associated with the DataSet.

Returns:
a List of DataPoints

setDataPoints

public void setDataPoints(java.util.List dataPoints)
Assigns a List of DataPoints to the DataSet.

Parameters:
dataPoints - the List of DataPoints

addDataPoint

public void addDataPoint(DataPoint dataPoint)
Adds the supplied DataPoint to the DataSet.

Parameters:
dataPoint - the DataPoint to be added

addIntervalDataPoint

public DataPoint addIntervalDataPoint(double lower,
                                      double upper)
Adds a new interval data point to the DataSet.

Parameters:
lower - the lower bound of the DataPoint
upper - the upper bound of the DataPoint
Returns:
the newly created DataPoint

addAbsoluteDataPoint

public DataPoint addAbsoluteDataPoint(double value)
Adds a new absolute (i.e. real-valued) DataPoint to the DataSet.

Parameters:
value - the numerical value of the DataPoint
Returns:
the newly created DataPoint

addLabelledDataPoint

public DataPoint addLabelledDataPoint(java.lang.String label)
Adds a new labelled DataPoint to the DataSet.

Parameters:
label - the label of the DataPoint
Returns:
the newly created DataPoint

getAsDoubles

public double[] getAsDoubles()
Returns the DataSet's DataPoints as simple array of doubles. This is a convenience method to make the processing of large DataSets more efficient.

Returns:
an array containing just the double value of each DataPoint in the DataSet

normalise

public void normalise()
Normalises this DataSet that the values of all the DataPoints add up to one.


addDataPoint

public void addDataPoint(DataPoint dataPoint,
                         int orderPos)
Adds the supplied DataPoint to the DataSet at the specified order postion.

Parameters:
dataPoint - the DataPoint to be added
orderPos - the position in the List where the DataPoint will be added

getDataPointAtOrderPosition

public DataPoint getDataPointAtOrderPosition(int orderPos)
                                      throws MinervaIndexException
Returns the DataPoint at the specified order position in the DataSet.

Parameters:
orderPos - the order position of the required DataPoint in the DataSet
Returns:
the DataPoint at the specified order position
Throws:
MinervaIndexException - if there is no DataPoint at the specified order position

getDataPointWithConnObjectID

public DataPoint getDataPointWithConnObjectID(int id)
Returns the first data point that has the connObjectID attribute value equal to the one specified. If none do then null is returned.

Parameters:
id - The connObjectID value attempting to match
Returns:
The first data point that has this conn object ID (or null if one does not exist)

removeDataPointAtOrderPosition

public DataPoint removeDataPointAtOrderPosition(int orderPos)
                                         throws MinervaIndexException
Removes the DataPoint at the specified order position in the DataSet.

Parameters:
orderPos - the order position of the required DataPoint in the DataSet
Returns:
the DataPoint that has been removed
Throws:
MinervaIndexException - if there is no DataPoint at the specified order position

pad

public void pad(double with,
                int noOfElements,
                java.lang.String label,
                int connObjId)
Creates multiple identical DataPoints and adds them to this DataSet.

Parameters:
with - the value that will be given to each DataPoint
noOfElements - the number of DataPoints to be created
label - the label that will be given to each DataPoint
connObjId - the connected object ID that will be given to each DataPoint

getHighestDataPoint

public DataPoint getHighestDataPoint()
This method will return the highest data point in the dataset. The user can limit the subset of the data points in the dataset that the method will itterate over. The first arguement defines whether the search should be limited to a subset (if this is false then the remainder of the arguements will have no effect). If true, then Lower and upper bounds that restrict the search are defined in the last 2 arguements. What these actually mean is defined by arguement 2. They can mean the following: LIMIT_RULE_STATENUMBER - this assumes that the lb, and ub arguements represent data point numbers, e.g. states 1 to 4, These arguements are inclusive. Therefore if Lower bound 1 was specified it would include dp 1. LIMIT_RULE_NUMERIC_VALUE - this assumes that the lb and ub arguements are real values. in the case of basic datatsets each data point is considered to have a range of 1.

Returns:
boolean limitToPartOfDataSet, int limitMethod, double lb, double ub

getHighestDataPointAdjustedForProbailityMass

public DataPoint getHighestDataPointAdjustedForProbailityMass()
This method will return the highest data point adjusted for probaility mass. The probability mass is equal to the range of the data point divided by its value. This calculation can only be performed if the data points are IntervalDataPoints if they are not then the data point with the highest value is returned.

Returns:

getLowestDataPointAdjustedForProbailityMass

public DataPoint getLowestDataPointAdjustedForProbailityMass()
This method will return the lowest data point adjusted for probaility mass. The probability mass is equal to the range of the data point divided by its value. This calculation can only be performed if the data points are IntervalDataPoints if they are not then the data point with the highest value is returned.

Returns:

getLowestDataPoint

public DataPoint getLowestDataPoint()
Returns the DataPoint in the DataSet that has the lowest value.

Returns:
the lowest DataPoint

getIntervalDataPointWithLowestRangeValue

public IntervalDataPoint getIntervalDataPointWithLowestRangeValue()
Return the Interval data point with the lowest range value (lowest lower bound) if there are no Interval data points connected to the data set then a null is returned.

Returns:

getIntervalDataPointWithLowestMidPoint

public IntervalDataPoint getIntervalDataPointWithLowestMidPoint()
Return the Interval data point with the lowest mid point (dirived from the bounding values) if there are no Interval data points connected to the data set then a null is returned.

Returns:

getIntervalDataPointWithHighestMidPoint

public IntervalDataPoint getIntervalDataPointWithHighestMidPoint()
Return the Interval data point with the highest mid point (dirived from the bounding values) if there are no Interval data points connected to the data set then a null is returned.

Returns:

getIntervalDataPointWithHighestRangeValue

public IntervalDataPoint getIntervalDataPointWithHighestRangeValue()
Return the Interval data point with the lowest range value (lowest lower bound) if there are no Interval data points connected to the data set then a null is returned.

Returns:

getTotal

public double getTotal(int fromPoint,
                       int toPoint)
returns the total value of all data points between the specified points, a value of -1 in a arguement will have it default to the beginning and end of the data set respectivily Thus getTotal(-1,-1) will return the total of the whole data set

Returns:

getTotalAdjustedForProbabilityMass

public double getTotalAdjustedForProbabilityMass(int fromPoint,
                                                 int toPoint)
This methods will return a total of the data point values between the specified points. The data point values will be adjusted for the their probaility mass

Parameters:
fromPoint -
toPoint -
Returns:

clearDataPoints

public void clearDataPoints()
Clears all DataPoints out of the DataSet and reinitialises the DataSet with an empty List.


doDataPointsHaveIntervalLabels

public boolean doDataPointsHaveIntervalLabels()
This checks the first DataPoint to see if its label contains an interval. It is assumed that if the first one does, then all will.

Returns:
true if the label contains an interval, false otherwise

toString

public java.lang.String toString()
Returns a String representation of the DataSet. This contains the name of the DataSet along with all of the DataPoints.

Overrides:
toString in class java.lang.Object
Returns:
the String representation of the DataSet

size

public int size()

clone

public java.lang.Object clone()
Creates a copy of this DataSet. The copy is "deep" i.e. all DataPoints in the DataSet are cloned too.

Overrides:
clone in class java.lang.Object
Returns:
a copy of this DataSet

write

public java.util.List write()
                     throws MinervaReadWriteException
Description copied from interface: Writable
Writes the object in question to a List of Strings.

Specified by:
write in interface Writable
Returns:
the List to which the Strings have been written
Throws:
MinervaReadWriteException - if there

read

public int read(java.util.List strings,
                int currentLineNumber)
         throws MinervaReadWriteException
Specified by:
read in interface Writable
Throws:
MinervaReadWriteException

getDataPointLabels

public java.lang.String[] getDataPointLabels()
                                      throws MinervaIndexException
Returns an array of the labels of all DataPoints.

Returns:
the array of DataPoint labels
Throws:
MinervaIndexException

convertLabelsToScientificFormat

public void convertLabelsToScientificFormat(java.lang.String numberMask)
This method will itterate over the data points in the data set, If they are interval data points it will construct new labels for the data points using their range values the number mask will be appiled to the numbers before they are used in the string If the number formatted mask provided is null "", then "0.00E0" is used.

Parameters:
numberMask -

convertLabelsToIntegerFormat

public void convertLabelsToIntegerFormat()
This method will iterate over the data points in the data set. It will convert labels to be appropriate for IntegerInterval nodes.

Parameters:
numberMask -

containsXNoZeroDataPoints

public boolean containsXNoZeroDataPoints(int nonZeroDataPoints)
This method returns a boolean that is true if the data set contains a number of non-0 data points equal to the specified argurment. E.G if the argement = 1, and the data set contains 6 data points, only 1 of which had a value <>0 then the method would return TRUE, otherwise it will return FALSE.

Parameters:
nonZeroDataPoints -
Returns:

toHTMLString

public java.lang.String toHTMLString(boolean includeHTMLTag,
                                     boolean includeTitle,
                                     java.lang.String formatterMask)
This method will render the data set to an HTML string. Each data point will occupy its own line and be formatted :

Parameters:
includeHTMLTag - determines whether the string should be begun and ended with
includeTitle - determines whether the data set title should be included at the top
Returns:

toString

public java.lang.String toString(boolean includeTitle,
                                 java.lang.String formatterMask)
This method will format the data set to a sting, it includes line breaks between data points :

Parameters:
includeTitle - determines whether the data set title should be included at the top
Returns:

createDataSet

public static DataSet createDataSet(java.lang.String[] dataPointNames)
This method creates a dataset, with a set of data points, with names equal to the strings passed as an arguement. All the values in the data set are equal to 0.

Parameters:
dataPointNames -
Returns:


Copyright © 2006 Agena Ltd. All Rights Reserved.