Friday, July 19, 2024

Bayesian neural networks (BNNs) - perplexity

 Bayesian neural networks (BNNs) are an extension of standard neural networks that incorporate Bayesian inference to account for uncertainty in the model parameters and predictions. Here are the key aspects of BNNs:


1. Probabilistic approach: Instead of having fixed weights and biases, BNNs treat these parameters as random variables with probability distributions.


2. Prior distributions: Before training, prior probability distributions are assigned to the model parameters, representing initial beliefs about their values.


3. Posterior inference: During training, Bayesian inference is used to update the prior distributions to posterior distributions based on the observed data.


4. Uncertainty quantification: BNNs provide uncertainty estimates for both model parameters and predictions, allowing for more robust decision-making.


5. Prediction process: When making predictions, BNNs sample from the posterior distributions of weights and biases, effectively considering many possible neural networks.


6. Output distributions: Instead of point estimates, BNNs output probability distributions for predictions, capturing both aleatoric (data) and epistemic (model) uncertainty.


7. Overfitting prevention: The Bayesian approach naturally penalizes overly complex models, helping to prevent overfitting without explicit regularization.


8. Challenges: BNNs are computationally more intensive than standard neural networks and often require approximation methods for tractability.


9. Applications: BNNs are particularly useful in scenarios where quantifying uncertainty is crucial, such as in medical diagnosis, autonomous systems, and scientific research.


10. Implementation: Various methods exist for implementing BNNs, including variational inference, Markov Chain Monte Carlo (MCMC) sampling, and Monte Carlo dropout.


By incorporating uncertainty into neural networks, BNNs provide a more comprehensive framework for modeling complex systems and making reliable predictions in the face of limited or noisy data.


Citations:

[1] https://www.databricks.com/glossary/bayesian-neural-network

[2] https://webuser.bus.umich.edu/feinf/Bayes/Titterington_-_Bayesian_Methods_For_Neural_Networks_And_Related_Models.pdf

[3] https://keras.io/examples/keras_recipes/bayesian_neural_networks/

[4] https://www.cs.toronto.edu/~duvenaud/distill_bayes_net/public/

[5] https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150?gi=beb03f7e16cf


When evaluating multiple Bayesian neural network models to determine the best fit, there are several approaches and metrics you can use:


1. Marginal Likelihood / Model Evidence:

The marginal likelihood, also known as model evidence, is a key metric for Bayesian model comparison. It naturally balances model complexity and fit to data.


- Higher marginal likelihood indicates better model fit.

- Can be approximated using methods like variational inference or MCMC.


2. Information Criteria:

These metrics penalize model complexity to avoid overfitting.


- Bayesian Information Criterion (BIC)

- Deviance Information Criterion (DIC)

- Widely Applicable Information Criterion (WAIC)


Lower values indicate better models.


3. Predictive Performance:

Evaluate models on held-out test data using metrics like:


- Mean Squared Error (MSE) or Root Mean Squared Error (RMSE) for regression

- Accuracy, F1-score, or AUC-ROC for classification

- Log-likelihood on test data


4. Posterior Predictive Checks:

Compare simulated data from the posterior predictive distribution to observed data to assess model fit.


5. Coverage of Uncertainty Estimates:

For Bayesian models, evaluate the quality of uncertainty estimates:


- Calibration plots

- Expected Calibration Error (ECE)

- Proper scoring rules like log-loss or Brier score


6. Bayes Factors:

Compare the relative evidence for two models by computing the ratio of their marginal likelihoods.


7. Posterior Model Probabilities:

When you have a set of models with prior probabilities, you can compute posterior probabilities for each model.


8. Cross-Validation:

Use techniques like k-fold cross-validation to assess model performance and generalization.


9. Kernelized Stein Discrepancy (KSD):

Measures the discrepancy between the approximate posterior and the true posterior without requiring samples from the true posterior.


10. Maximum Mean Discrepancy (MMD):

Compares the distribution of predictions from the approximate posterior to those from a reference posterior.


When using these methods, consider:


- The specific goals of your modeling task

- Computational feasibility, especially for large models

- Interpretability of the results

- Consistency across different evaluation metrics


It's often beneficial to use multiple evaluation criteria to get a comprehensive view of model performance and fit. The choice of metrics should align with your specific research or application goals.


Citations:

[1] https://webuser.bus.umich.edu/feinf/Bayes/Titterington_-_Bayesian_Methods_For_Neural_Networks_And_Related_Models.pdf

[2] https://keras.io/examples/keras_recipes/bayesian_neural_networks/

[3] https://arxiv.org/abs/2206.06779

[4] https://datascience.stackexchange.com/questions/54513/mcmc-for-finding-bayesian-neural-network

[5] https://www.cs.toronto.edu/~duvenaud/distill_bayes_net/public/


Bayesian Neural Networks (BNNs) provide a posterior distribution over the network's weights, which allows them to quantify uncertainty in predictions. The process involves several key steps:

  1. Prior Distribution:

    • Before observing any data, a prior distribution p(w)p(w) is assigned to the weights ww of the neural network. This prior reflects any prior knowledge or assumptions about the weights.
  2. Likelihood:

    • Given a dataset D={(Xi,Yi)}i=1ND = \{(X_i, Y_i)\}_{i=1}^N, the likelihood function p(YX,w)p(Y|X, w) describes the probability of the observed data YY given the inputs XX and the weights ww. For regression tasks, this is often modeled as Yi=f(Xi;w)+ϵiY_i = f(X_i; w) + \epsilon_i, where ff is the neural network and ϵi\epsilon_i is the noise term.
  3. Bayes' Theorem:

    • The posterior distribution over the weights p(wX,Y)p(w|X, Y) is obtained by combining the prior distribution and the likelihood using Bayes' theorem: p(wX,Y)p(YX,w)p(w)p(w|X, Y) \propto p(Y|X, w) \cdot p(w)
    • This posterior distribution represents the updated beliefs about the weights after observing the data.
  4. Approximation Methods:

    • The exact posterior distribution is often intractable due to the high dimensionality and complexity of neural networks. Therefore, various approximation methods are used to estimate the posterior:
      • Markov Chain Monte Carlo (MCMC): Methods like Hamiltonian Monte Carlo (HMC) generate samples from the posterior distribution by simulating a Markov chain.
      • Stochastic Gradient MCMC (SGMCMC): Variants like Stochastic Gradient Langevin Dynamics (SGLD) and Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) use stochastic gradients to scale MCMC methods to large datasets.
      • Variational Inference: Methods like Bayes by Backprop approximate the posterior by optimizing a simpler distribution (e.g., Gaussian) to minimize the Kullback-Leibler divergence with the true posterior.
      • Monte Carlo Dropout: Dropout is used at both training and inference time to approximate a Bayesian posterior.
      • Ensemble Methods: Multiple neural networks are trained with different initializations, and their predictions are averaged to approximate the posterior distribution.
  5. Prediction:

    • To make predictions, the posterior distribution over weights is used to compute the predictive distribution for a new input xx: p(yx,X,Y)=p(yx,w)p(wX,Y)dwp(y|x, X, Y) = \int p(y|x, w) p(w|X, Y) \, dw
    • This integral is often approximated by averaging predictions from multiple samples of the weights drawn from the posterior distribution.

In summary, BNNs provide a posterior distribution over weights by combining prior knowledge with observed data through Bayes' theorem. Due to the complexity of neural networks, this posterior is typically approximated using methods like MCMC, variational inference, and ensemble techniques, enabling the quantification of uncertainty in predictions.

No comments:

Post a Comment