derdava package#

Subpackages#

derdava.data_valuation package
- ValuableModel
  - ValuableModel.__init__()
  - ValuableModel.valuate()

Submodules#

derdava.coalition_probability module#

class derdava.coalition_probability.CoalitionProbability[source]#

Bases: ABC

Represents the probability distribution that each coalition remains as the support set after data deletion.

abstract get_probability(coalition: tuple)[source]#

Returns the staying probability of the given coalition.

Parameters:: coalition – A tuple of integers representing the coalition to be queried.

abstract simulate()[source]#: Randomly generates a coalition according to the joint probability.

class derdava.coalition_probability.IndependentCoalitionProbability(staying_probabilities)[source]#

Bases: CoalitionProbability

Each data source has independent staying probability.

__init__(staying_probabilities)[source]#

Creates an IndependentCoalitionProbability.

Parameters:: staying_probabilities – A dictionary { int: float } representing the independent

staying probability of each data source.

get_probability(coalition: tuple)[source]#

Returns the staying probability of the given coalition.

Parameters:: coalition – A tuple of integers representing the coalition to be queried.
Returns:: Staying probability of the given coalition.

simulate()[source]#

Randomly generates a coalition according to the joint probability.

Returns:: The simulated coalition.

class derdava.coalition_probability.RandomCoalitionProbability(support: tuple)[source]#

Bases: CoalitionProbability

Randomly creates the joint probability.

__init__(support: tuple)[source]#

Creates an RandomCoalitionProbability.

Parameters:: support – A set containing the indices of data sources in the support set.

get_probability(coalition: tuple)[source]#

Returns the staying probability of the given coalition.

Parameters:: coalition – A tuple of integers representing the coalition to be queried.
Returns:: Staying probability of the given coalition.

simulate()[source]#

Randomly generates a coalition according to the joint probability.

Returns:: The simulated coalition.

derdava.data_source module#

derdava.data_source.add_classification_noise(y: ndarray, noise_level: float = 0.2)[source]#

Adds noises to the classification labels by randomly choosing one from the remaining label set.

Parameters:

y – Labels of target dataset.
noise_level – Amount of noise to be added (defaults 0.2).

Returns:

None.

Raises:

ValueError – If noise_level is not in the range [0, 1].

derdava.data_source.generate_random_data_sources(X: ndarray, y: ndarray, num_of_data_sources: int = 10)[source]#

Splits a given dataset to a specified number of data sources randomly.

Parameters:

X – Feature set of the given dataset.
y – Label set of the given dataset.
num_of_data_sources – Number of data sources to be generated (default: 10).

Returns:

A dictionary containing mappings between data source indices and their data (X, y).

derdava.dataset module#

derdava.dataset.load_dataset(name: str)[source]#

Loads a built-in dataset.

Parameters:: name – One of 'cpu', 'credit card', 'diabetes', 'flower', 'mnist', 'phoneme', 'pol', 'wind'.
Returns:: A tuple (X, y) containing the features and labels of the loaded dataset. X and y are Numpy arrays.
Raises:: ValueError – If name is not one of the above names.

derdava.model_utility module#

class derdava.model_utility.IClassificationModel(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Bases: ModelUtilityFunction

Represents a model utility function based on a classification model and accuracy scores.

__init__(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Constructs an IClassificationModel.

Parameters:

model – A machine learning model from this module used for training.
data_sources – A dictionary containing mappings between data source indices and their data (X, y).
X_test – Features of the testing (or validating) set.
y_test – Labels of the testing (or validating) set.

get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:: coalition – A tuple containing the indices of data sources in the coalition.
Returns:: Utility of the coalition.

class derdava.model_utility.ICoalitionalValue(dic: dict)[source]#

Bases: ModelUtilityFunction

Stores every coalition and its utility.

__init__(dic: dict)[source]#

Constructs an ICoalitionalValue model utility function.

Parameters:: dic – A dictionary containing mappings between coalitions and their utilities.

get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:: coalition – A tuple containing the indices of data sources in the coalition.
Returns:: Utility of the coalition.

class derdava.model_utility.ISymmetricClassificationModel(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Bases: ModelUtilityFunction

Assumes all data sources are identical.

__init__(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Constructs an ISymmetricClassificationModel.

Parameters:

model – A machine learning model from this module used for training.
data_sources – A dictionary containing mappings between data source indices and their data (X, y).
X_test – Features of the testing (or validating) set.
y_test – Labels of the testing (or validating) set.

get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:: coalition – A tuple containing the indices of data sources in the coalition.
Returns:: Utility of the coalition.

class derdava.model_utility.ModelUtilityFunction[source]#

Bases: ABC

Base class used to represent model utility function \(v: \mathcal{P}(D) \to \mathbb{R}\).

abstract get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:: coalition – A tuple containing the indices of data sources in the coalition.
Returns:: Utility of the coalition.

derdava.model_utility.model_gaussian_nb()#: Represents a Gaussian Naïve Bayes classifier.

derdava.model_utility.model_knn()#: Represents a \(k\)-Nearest Neighbours classifier.

derdava.model_utility.model_linear_svm()#: Represents a linear Support Vector Machine classifier.

derdava.model_utility.model_logistic_regression()#: Represents a Logistic Regression classifier.

derdava.model_utility.model_ridge_classifier()#: Represents a Ridge classifier.

derdava.sampling module#

derdava.sampling.check_gelman_rubin(statistics: dict, tolerance: float)[source]#

Checks whether all Gelman-Rubin statistics have converged.

Parameters:

statistics – A dictionary of Gelman-Rubin statistics for all data sources.
tolerance – If one Gelman-Rubin statistic is between tolerance and 1 / tolerance.

Returns:

A tuple containing two elements: (1) whether all Gelman-Rubin statistics have converged; (2) number of data sources that have not converged.

derdava.sampling.cvar(values: dict, lower_tail: float = 0.6, reverse: bool = False)[source]#

Returns the C-CVaR (Coalitional Conditional Value-at-Risk) value of a given discrete random variable (default C-CVaR\(^-\)).

Parameters:

values – A dictionary that contains mappings between values and probability of the random variable.
lower_tail – The percentage of lower tail (\(\alpha\)) (default 0.6).
reverse – Whether to compute C-CVaR\(^+\) (upper tail) instead (default False).

Returns:

The computed C-CVaR value.

derdava.sampling.gelman_rubin(samples: dict, m_chains: int)[source]#

Computes the Gelman-Rubin statistic using the given samples. Referenced from https://www.imperial.ac.uk/media/imperial-college/research-centres-and-groups/astrophysics/public/icic/data-analysis-workshop/2018/Convergence-Tests.pdf.

Parameters:

samples – A dictionary containing samples generated for each data source.
m_chains – Number of Markov chains to be run in parallel.

Returns:

A dictionary containing Gelman-Rubin statistics of each data source.

derdava.sampling.zot_sampling()[source]#

Returns the sampled state (one of \(\{0, 1, 2\}\)) for each data source.

Returns:: One of 0, 1 and 2.