derdava package#

Subpackages#

Submodules#

derdava.coalition_probability module#

class derdava.coalition_probability.CoalitionProbability[source]#

Bases: ABC

Represents the probability distribution that each coalition remains as the support set after data deletion.

abstract get_probability(coalition: tuple)[source]#

Returns the staying probability of the given coalition.

Parameters:

coalition – A tuple of integers representing the coalition to be queried.

abstract simulate()[source]#

Randomly generates a coalition according to the joint probability.

class derdava.coalition_probability.IndependentCoalitionProbability(staying_probabilities)[source]#

Bases: CoalitionProbability

Each data source has independent staying probability.

__init__(staying_probabilities)[source]#

Creates an IndependentCoalitionProbability.

Parameters:

staying_probabilities – A dictionary { int: float } representing the independent

staying probability of each data source.

get_probability(coalition: tuple)[source]#

Returns the staying probability of the given coalition.

Parameters:

coalition – A tuple of integers representing the coalition to be queried.

Returns:

Staying probability of the given coalition.

simulate()[source]#

Randomly generates a coalition according to the joint probability.

Returns:

The simulated coalition.

class derdava.coalition_probability.RandomCoalitionProbability(support: tuple)[source]#

Bases: CoalitionProbability

Randomly creates the joint probability.

__init__(support: tuple)[source]#

Creates an RandomCoalitionProbability.

Parameters:

support – A set containing the indices of data sources in the support set.

get_probability(coalition: tuple)[source]#

Returns the staying probability of the given coalition.

Parameters:

coalition – A tuple of integers representing the coalition to be queried.

Returns:

Staying probability of the given coalition.

simulate()[source]#

Randomly generates a coalition according to the joint probability.

Returns:

The simulated coalition.

derdava.data_source module#

derdava.data_source.add_classification_noise(y: ndarray, noise_level: float = 0.2)[source]#

Adds noises to the classification labels by randomly choosing one from the remaining label set.

Parameters:
  • y – Labels of target dataset.

  • noise_level – Amount of noise to be added (defaults 0.2).

Returns:

None.

Raises:

ValueError – If noise_level is not in the range [0, 1].

derdava.data_source.generate_random_data_sources(X: ndarray, y: ndarray, num_of_data_sources: int = 10)[source]#

Splits a given dataset to a specified number of data sources randomly.

Parameters:
  • X – Feature set of the given dataset.

  • y – Label set of the given dataset.

  • num_of_data_sources – Number of data sources to be generated (default: 10).

Returns:

A dictionary containing mappings between data source indices and their data (X, y).

derdava.dataset module#

derdava.dataset.load_dataset(name: str)[source]#

Loads a built-in dataset.

Parameters:

name – One of 'cpu', 'credit card', 'diabetes', 'flower', 'mnist', 'phoneme', 'pol', 'wind'.

Returns:

A tuple (X, y) containing the features and labels of the loaded dataset. X and y are Numpy arrays.

Raises:

ValueError – If name is not one of the above names.

derdava.model_utility module#

class derdava.model_utility.IClassificationModel(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Bases: ModelUtilityFunction

Represents a model utility function based on a classification model and accuracy scores.

__init__(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Constructs an IClassificationModel.

Parameters:
  • model – A machine learning model from this module used for training.

  • data_sources – A dictionary containing mappings between data source indices and their data (X, y).

  • X_test – Features of the testing (or validating) set.

  • y_test – Labels of the testing (or validating) set.

get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:

coalition – A tuple containing the indices of data sources in the coalition.

Returns:

Utility of the coalition.

class derdava.model_utility.ICoalitionalValue(dic: dict)[source]#

Bases: ModelUtilityFunction

Stores every coalition and its utility.

__init__(dic: dict)[source]#

Constructs an ICoalitionalValue model utility function.

Parameters:

dic – A dictionary containing mappings between coalitions and their utilities.

get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:

coalition – A tuple containing the indices of data sources in the coalition.

Returns:

Utility of the coalition.

class derdava.model_utility.ISymmetricClassificationModel(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Bases: ModelUtilityFunction

Assumes all data sources are identical.

__init__(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#

Constructs an ISymmetricClassificationModel.

Parameters:
  • model – A machine learning model from this module used for training.

  • data_sources – A dictionary containing mappings between data source indices and their data (X, y).

  • X_test – Features of the testing (or validating) set.

  • y_test – Labels of the testing (or validating) set.

get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:

coalition – A tuple containing the indices of data sources in the coalition.

Returns:

Utility of the coalition.

class derdava.model_utility.ModelUtilityFunction[source]#

Bases: ABC

Base class used to represent model utility function \(v: \mathcal{P}(D) \to \mathbb{R}\).

abstract get_utility(coalition: tuple)[source]#

Returns the utility of a given coalition.

Parameters:

coalition – A tuple containing the indices of data sources in the coalition.

Returns:

Utility of the coalition.

derdava.model_utility.model_gaussian_nb()#

Represents a Gaussian Naïve Bayes classifier.

derdava.model_utility.model_knn()#

Represents a \(k\)-Nearest Neighbours classifier.

derdava.model_utility.model_linear_svm()#

Represents a linear Support Vector Machine classifier.

derdava.model_utility.model_logistic_regression()#

Represents a Logistic Regression classifier.

derdava.model_utility.model_ridge_classifier()#

Represents a Ridge classifier.

derdava.sampling module#

derdava.sampling.check_gelman_rubin(statistics: dict, tolerance: float)[source]#

Checks whether all Gelman-Rubin statistics have converged.

Parameters:
  • statistics – A dictionary of Gelman-Rubin statistics for all data sources.

  • tolerance – If one Gelman-Rubin statistic is between tolerance and 1 / tolerance.

Returns:

A tuple containing two elements: (1) whether all Gelman-Rubin statistics have converged; (2) number of data sources that have not converged.

derdava.sampling.cvar(values: dict, lower_tail: float = 0.6, reverse: bool = False)[source]#

Returns the C-CVaR (Coalitional Conditional Value-at-Risk) value of a given discrete random variable (default C-CVaR\(^-\)).

Parameters:
  • values – A dictionary that contains mappings between values and probability of the random variable.

  • lower_tail – The percentage of lower tail (\(\alpha\)) (default 0.6).

  • reverse – Whether to compute C-CVaR\(^+\) (upper tail) instead (default False).

Returns:

The computed C-CVaR value.

derdava.sampling.gelman_rubin(samples: dict, m_chains: int)[source]#

Computes the Gelman-Rubin statistic using the given samples. Referenced from https://www.imperial.ac.uk/media/imperial-college/research-centres-and-groups/astrophysics/public/icic/data-analysis-workshop/2018/Convergence-Tests.pdf.

Parameters:
  • samples – A dictionary containing samples generated for each data source.

  • m_chains – Number of Markov chains to be run in parallel.

Returns:

A dictionary containing Gelman-Rubin statistics of each data source.

derdava.sampling.zot_sampling()[source]#

Returns the sampled state (one of \(\{0, 1, 2\}\)) for each data source.

Returns:

One of 0, 1 and 2.