Documentation¶
- class suftware.DensityEstimator(data, grid=None, grid_spacing=None, num_grid_points=None, bounding_box=None, alpha=3, periodic=False, num_posterior_samples=100, compute_K_coeff=True, t_start=None, max_t_step=1.0, tolerance=1e-06, resolution=0.1, sample_only_at_l_star=False, max_log_evidence_ratio_drop=20, evaluation_method_for_Z='Lap', num_samples_for_Z=1000, seed=None, print_t=False)¶
Estimates a 1D probability density from sampled data.
- Parameters:
- data: (set, list, or np.array of numbers)
An array of data from which the probability density will be estimated. Infinite or NaN values will be discarded.
- grid: (1D np.array)
An array of evenly spaced grid points on which the probability density will be estimated. Default value is
None, in which case the grid is set automatically.- grid_spacing: (float > 0)
The distance at which to space neighboring grid points. Default value is
None, in which case this spacing is set automatically.- num_grid_points: (int)
The number of grid points to draw within the data domain. Restricted to
2*alpha <= num_grid_points <= 1000. Default value isNone, in which case the number of grid points is chosen automatically.- bounding_box: ([float, float])
The boundaries of the data domain, within which the probability density will be estimated. Default value is
None, in which case the bounding box is set automatically to encompass all of the data.- alpha: (int)
The order of derivative constrained in the definition of smoothness. Restricted to
1 <= alpha <= 4. Default value is 3.- periodic: (bool)
Whether or not to impose periodic boundary conditions on the estimated probability density. Default False, in which case no boundary conditions are imposed.
- num_posterior_samples: (int >= 0)
Number of samples to draw from the Bayesian posterior. Restricted to 0 <= num_posterior_samples <= MAX_NUM_POSTERIOR_SAMPLES.
- compute_K_coeff: (bool)
Whether to compute the K coefficient (Kinney, 2015, PRE, Eq. 29), the sign of which tests the validity of the MaxEnt hypothesis on the data provided.
- max_t_step: (float > 0)
Upper bound on the amount by which the parameter
tin the DEFT algorithm is incremented when tracing the MAP curve. Default value is 1.0.- tollerance: (float > 0)
Sets the convergence criterion for the corrector algorithm used in tracing the MAP curve.
- resolution: (float > 0)
The maximum geodesic distance allowed for neighboring points on the MAP curve.
- sample_only_at_l_star: (boolean)
Specifies whether to let l vary when sampling from the Bayesian posterior.
- max_log_evidence_ratio_drop: (float > 0)
If set, MAP curve tracing will terminate prematurely when max_log_evidence - current_log_evidence > max_log_evidence_ratio_drop.
- evaluation_method_for_Z: (string)
Method of evaluation of partition function Z. Possible values: ‘Lap’ : Laplace approximation (default). ‘Lap+Imp’ : Laplace approximation + importance sampling. ‘Lap+Fey’ : Laplace approximation + Feynman diagrams.
- num_samples_for_Z: (int >= 0)
Number of posterior samples to use when evaluating the paritation function Z. Only has an affect when
evaluation_method_for_Z = 'Lap+Imp'.- seed: (int)
Seed provided to the random number generator before density estimation commences. For development purposes only.
- print_t: (bool)
Whether to print the values of
twhile tracing the MAP curve. For development purposes only.
- Attributes:
- grid:
The grid points at which the probability density was be estimated. (1D np.array)
- grid_spacing:
The distance between neighboring grid points. (float > 0)
- num_grid_points:
The number of grid points used. (int)
- bounding_box:
The boundaries of the data domain within which the probability density was be estimated. ([float, float])
- histogram:
A histogram of the data using
gridfor the centers of each bin. (1D np.array)- values:
The values of the optimal (i.e., MAP) density at each grid point. (1D np.array)
- sample_values:
The values of the posterior sampled densities at each grid point. The first index specifies grid points, the second posterior samples. (2D np.array)
- sample_weights:
The importance weights corresponding to each posterior sample. (1D np.array)
- K_coeff:
The value of the K coefficient (Kinney, 2015, Eq. 29). (float)
- ells:
The smoothness length scales at which the MAP curve was computed. (np.array)
- log_Es:
The log evidence ratio values (Kinney, 2015, Eq. 27) at each length scale along the MAP curve. (np.array)
- max_log_E:
The log evidence ratio at the optimal length scale. (float)
- runtime:
The amount of time (in seconds) taken to execute.
Methods
evaluate(x)Evaluate the optimal (i.e. MAP) density at the supplied value(s) of x.
evaluate_samples(x[, resample])Evaluate sampled densities at specified locations.
get_stats([use_weights, show_samples])Computes summary statistics for the estimated density
plot([ax, save_as, resample, figsize, ...])Plot the MAP density, the posterior sampled densities, and the data histogram.
- __init__(data, grid=None, grid_spacing=None, num_grid_points=None, bounding_box=None, alpha=3, periodic=False, num_posterior_samples=100, compute_K_coeff=True, t_start=None, max_t_step=1.0, tolerance=1e-06, resolution=0.1, sample_only_at_l_star=False, max_log_evidence_ratio_drop=20, evaluation_method_for_Z='Lap', num_samples_for_Z=1000, seed=None, print_t=False)¶
- plot(ax=None, save_as=None, resample=True, figsize=(4, 4), fontsize=12, title='', xlabel='', tight_layout=False, show_now=True, show_map=True, map_color='blue', map_linewidth=2, map_alpha=1, num_posterior_samples=None, posterior_color='dodgerblue', posterior_linewidth=1, posterior_alpha=0.2, show_histogram=True, histogram_color='orange', histogram_alpha=1, show_maxent=False, maxent_color='maroon', maxent_linewidth=1, maxent_alpha=1)¶
Plot the MAP density, the posterior sampled densities, and the data histogram.
- Parameters:
- ax: (plt.Axes)
A matplotlib axes object on which to draw. If None, one will be created
- save_as: (str)
Name of file to save plot to. File type is determined by file extension.
- resample: (bool)
If True, sampled densities will be ploted only after importance resampling.
- figsize: ([float, float])
Figure size as (width, height) in inches.
- fontsize: (float)
Size of font to use in plot annotation.
- title: (str)
Plot title.
- xlabel: (str)
Plot xlabel.
- tight_layout: (bool)
Whether to call plt.tight_layout() after rendering graphics.
- show_now: (bool)
Whether to show the plot immediately by calling plt.show().
- show_map: (bool)
Whether to show the MAP density.
- map_color: (color spec)
MAP density color.
- map_linewidth: (float)
MAP density linewidth.
- map_alpha: (float)
Map density opacity (between 0 and 1).
- num_posterior_samples: (int)
Number of posterior samples to display. If this is greater than the number of posterior samples taken, all of the samples taken will be shown.
- posterior_color: (color spec)
Sampled density color.
- posterior_linewidth: (float)
Sampled density linewidth.
- posterior_alpha: (float)
Sampled density opactity (between 0 and 1).
- show_histogram: (bool)
Whether to show the (normalized) data histogram.
- histogram_color: (color spec)
Face color of the data histogram.
- histogram_alpha: (float)
Data histogram opacity (between 0 and 1).
- show_maxent: (bool)
Whether to show the MaxEnt density estimate.
- maxent_color: (color spect)
Line color of the MaxEnt density estimate.
- maxent_alpha: (float)
MaxEnt opacity (between 0 and 1).
- Returns:
- None.
- evaluate(x)¶
Evaluate the optimal (i.e. MAP) density at the supplied value(s) of x.
- Parameters:
- x: (number or list-like collection of numbers)
The locations in the data domain at which to evaluate the MAP density.
- Returns:
- A float or 1D np.array representing the values of the MAP density at
- the specified locations.
- evaluate_samples(x, resample=True)¶
Evaluate sampled densities at specified locations.
- Parameters:
- x: (number or list-like collection of numbers)
The locations in the data domain at which to evaluate sampled density.
- resample: (bool)
Whether to use importance resampling, i.e., should the values returned be from the original samples (obtained using a Laplace approximated posterior) or should they be resampled to account for the deviation between the true Bayesian posterior and its Laplace approximation.
- Returns:
- A 1D np.array (if x is a number) or a 2D np.array (if x is list-like),
- representing the values of the posterior sampled densities at the
- specified locations. The first index corresponds to values in x, the
- second to sampled densities.
- get_stats(use_weights=True, show_samples=False)¶
Computes summary statistics for the estimated density
- Parameters:
- show_samples: (bool)
If True, summary stats are computed for each posterior sample. If False, summary stats are returned for the “star” estimate, the histogram, and the maxent estimate, along with the mean and RMSD values of these stats across posterior samples.
- use_weights: (bool)
If True, mean and RMSD are computed using importance weights.
- Returns:
- df: (pd.DataFrame)
A pandas data frame listing summary statistics for the estimated probability densities. These summary statistics include “entropy” (in bits), “mean”, “variance”, “skewness”, and “kurtosis”. If
show_samples = False, results will be shown for the best estimate, as well as mean and RMDS values across all samples. Ifshow_samples = True, results will be shown for each sample. A column showing column weights will also be included.
- class suftware.ExampleDataset(dataset='old_faithful_eruption_times')¶
Provides an interface to example data provided with the SUFTware package.
- Parameters:
- dataset: (str)
Name of dataset to load. Run sw.list_example_datasets() to see which datasets are available.
- Attributes:
- data: (np.array)
An array containing sampled data
- details: (np.array, optional)
Optional return value containing meta information
- __init__(dataset='old_faithful_eruption_times')¶
- suftware.demo(example='real_data')¶
Performs a demonstration of suftware.
- Parameters:
- example: (str)
A string specifying which demo to run. Must be ‘real_data’ or ‘simulated_data’.