derdava package#
Subpackages#
Submodules#
derdava.coalition_probability module#
- class derdava.coalition_probability.CoalitionProbability[source]#
Bases:
ABC
Represents the probability distribution that each coalition remains as the support set after data deletion.
- class derdava.coalition_probability.IndependentCoalitionProbability(staying_probabilities)[source]#
Bases:
CoalitionProbability
Each data source has independent staying probability.
- __init__(staying_probabilities)[source]#
Creates an
IndependentCoalitionProbability
.- Parameters:
staying_probabilities – A dictionary
{ int: float }
representing the independent
staying probability of each data source.
- class derdava.coalition_probability.RandomCoalitionProbability(support: tuple)[source]#
Bases:
CoalitionProbability
Randomly creates the joint probability.
- __init__(support: tuple)[source]#
Creates an
RandomCoalitionProbability
.- Parameters:
support – A set containing the indices of data sources in the support set.
derdava.data_source module#
- derdava.data_source.add_classification_noise(y: ndarray, noise_level: float = 0.2)[source]#
Adds noises to the classification labels by randomly choosing one from the remaining label set.
- Parameters:
y – Labels of target dataset.
noise_level – Amount of noise to be added (defaults
0.2
).
- Returns:
None
.- Raises:
ValueError – If
noise_level
is not in the range[0, 1]
.
- derdava.data_source.generate_random_data_sources(X: ndarray, y: ndarray, num_of_data_sources: int = 10)[source]#
Splits a given dataset to a specified number of data sources randomly.
- Parameters:
X – Feature set of the given dataset.
y – Label set of the given dataset.
num_of_data_sources – Number of data sources to be generated (default:
10
).
- Returns:
A dictionary containing mappings between data source indices and their data
(X, y)
.
derdava.dataset module#
- derdava.dataset.load_dataset(name: str)[source]#
Loads a built-in dataset.
- Parameters:
name – One of
'cpu'
,'credit card'
,'diabetes'
,'flower'
,'mnist'
,'phoneme'
,'pol'
,'wind'
.- Returns:
A tuple
(X, y)
containing the features and labels of the loaded dataset.X
andy
are Numpy arrays.- Raises:
ValueError – If
name
is not one of the above names.
derdava.model_utility module#
- class derdava.model_utility.IClassificationModel(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#
Bases:
ModelUtilityFunction
Represents a model utility function based on a classification model and accuracy scores.
- __init__(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#
Constructs an
IClassificationModel
.- Parameters:
model – A machine learning model from this module used for training.
data_sources – A dictionary containing mappings between data source indices and their data (X, y).
X_test – Features of the testing (or validating) set.
y_test – Labels of the testing (or validating) set.
- class derdava.model_utility.ICoalitionalValue(dic: dict)[source]#
Bases:
ModelUtilityFunction
Stores every coalition and its utility.
- class derdava.model_utility.ISymmetricClassificationModel(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#
Bases:
ModelUtilityFunction
Assumes all data sources are identical.
- __init__(model, data_sources: dict, X_test: ndarray, y_test: ndarray)[source]#
Constructs an ISymmetricClassificationModel.
- Parameters:
model – A machine learning model from this module used for training.
data_sources – A dictionary containing mappings between data source indices and their data (X, y).
X_test – Features of the testing (or validating) set.
y_test – Labels of the testing (or validating) set.
- class derdava.model_utility.ModelUtilityFunction[source]#
Bases:
ABC
Base class used to represent model utility function \(v: \mathcal{P}(D) \to \mathbb{R}\).
- derdava.model_utility.model_gaussian_nb()#
Represents a Gaussian Naïve Bayes classifier.
- derdava.model_utility.model_knn()#
Represents a \(k\)-Nearest Neighbours classifier.
- derdava.model_utility.model_linear_svm()#
Represents a linear Support Vector Machine classifier.
- derdava.model_utility.model_logistic_regression()#
Represents a Logistic Regression classifier.
- derdava.model_utility.model_ridge_classifier()#
Represents a Ridge classifier.
derdava.sampling module#
- derdava.sampling.check_gelman_rubin(statistics: dict, tolerance: float)[source]#
Checks whether all Gelman-Rubin statistics have converged.
- Parameters:
statistics – A dictionary of Gelman-Rubin statistics for all data sources.
tolerance – If one Gelman-Rubin statistic is between
tolerance
and1 / tolerance
.
- Returns:
A tuple containing two elements: (1) whether all Gelman-Rubin statistics have converged; (2) number of data sources that have not converged.
- derdava.sampling.cvar(values: dict, lower_tail: float = 0.6, reverse: bool = False)[source]#
Returns the C-CVaR (Coalitional Conditional Value-at-Risk) value of a given discrete random variable (default C-CVaR\(^-\)).
- Parameters:
values – A dictionary that contains mappings between values and probability of the random variable.
lower_tail – The percentage of lower tail (\(\alpha\)) (default
0.6
).reverse – Whether to compute C-CVaR\(^+\) (upper tail) instead (default
False
).
- Returns:
The computed C-CVaR value.
- derdava.sampling.gelman_rubin(samples: dict, m_chains: int)[source]#
Computes the Gelman-Rubin statistic using the given samples. Referenced from https://www.imperial.ac.uk/media/imperial-college/research-centres-and-groups/astrophysics/public/icic/data-analysis-workshop/2018/Convergence-Tests.pdf.
- Parameters:
samples – A dictionary containing samples generated for each data source.
m_chains – Number of Markov chains to be run in parallel.
- Returns:
A dictionary containing Gelman-Rubin statistics of each data source.