A digital notebook where I will keep all my stuff organized in order to find them easier in the future.
What is a model?
A model is a declerative representation of our understanding of the world. It's a representation
within a computer that captures our understanding of what these variables are and how they interact with
each other. Declerative means that the representation stands on its own which means that we make sense
of it aside from any algorithm that we might choose to apply on.
This means that the same model can be used in the context of one algorithm that answers any one kind
of question or other algorithms that might answe different kinds of questions, or the same question
in more efficient ways, or that make different trade-offs between accuracy and complicational cause.
Also, we can separate out the construction of the model from the algorithms that are used to reason it.
We can construct methodologies that elicit these models from a human expert or ones that learn it from
historical data using statistical machine learning techniques or a combination of the two.
What is probabilistic?
Uncertainty. Uncertainty comes in many forms and for many
different reasons such as:
- Partial knowledge of state of the world
- Noisy observations
- Phenomena not covered by our model
- Inherently stochastic(randomly determined)
Probability theory is a framework that allows us to deal with uncertainty in ways that are pricipled
and that bring to bear important and valuable tools.
- Declarative representation with clear semantics
- Powerful reasoning patterns
- Established learning methods
What is graphical?
Probabilistic graphical models are a synthesis between ideas from probability theory in statistics and
ideas from computer science.
In order to capture probability distributions over spaces involving such a large number of factors, we
need to have probability distributions over what are called random variables.
We need to represent the world through these variables each of which captures some facet of the world.
Our goal is to capture our uncertainty about the possible states of the world in terms of their probability
distribution or what's called a joint distribution over the possible assignments to the set of random
variables.
An example of a graphical model is Bayesian networks.
It uses a directed graph as the intrinsic (native) representation. In this case, the randome variables
are represented by nodes in the graph. The edges in the grpah represent the probabilistic connections
between those random variables in a way that is very formal.
Another example of a graphical model is called Markov networks.
It is an undirected graph.
Graphical representation is:
- intuitive and compact data structure
- efficient reasoning using general-purpose algorithms
- sparse parameterization: feasible elicitation, learning from data - in both cases a reduction in
the number of parameters is very valuable
Representation:
- directed and undirected
- temporal and plate models
Inference (reasoning):
- exact and approximate
- decision making
Learning:
- parameters and structure
- with and without complete data
Join Distribution in the student example:
- variable 1: intelligence(I) - it has 2 values: low, high
- variable 2: difficulty(D) - it has 2 values: easy, hard
- variable 3: grade(G) - it has 3 values: g1, g3, g3
P(I,D,G) = 2x2x3 = 12 probabilities
We also have indipendent parameters whose value is not completely determined by the value of other
parameters.
All the probabilites have to sum to 1 so if you tell me eleven out of the twelve, I know what the
twelfth is so the number of independent parameters is eleven.
Conditioning: Reduction
- if we know one parameter this will cause reduction of the probabilities we came in the beginning.
This operation is called reduction.
Unnormalized measure which means it doesn't sum to 1. So we need to normalize this distribution - make
them sum to 1.
Marginalization
- marginalize over I or D - P(I,D)
Factor:
This is a function or a table that takes arguments (a set of variables), and just like any function
it gives us a value for every assignment.
The set of variable is called the scope of the factor.
A join distribution is a factor.
For every value of I, D and G, a combination of values, I get a number that's why it's a factor.
Conditional Probability Distribution (CPD)
It gives us the conditional probability of the variable G given I and D - P(G | I,D).
This means for every combination of values to the variables I and D, we have a probability distribution over G.
Factor Product
Factor Marginalization
Factor Reduction
Why factors?
- It turns out that these are the fundamental building block for defining distributions in high-dimensional spaces. That is the way in which we're going to define an exponentially large probability putting them together by multiplying factors in order to define these high dimensional probability distributions.
- Also, the same set of basic operation that we use to define the probability distributions in these high dimensional spaces are also what we use for manipulating them in order give us a set of basic inference algorithms.
Test:
1 - 0.16 0.45 0.60
2 - 42 85 96 30 ?
3 - A & B - correct
4 - 108 135 79 141 - correct
b
3
0.05 0.14
The student example
- Grade
- course Difficulty
- student Intelligence
- student Sat
- reference Letter
G depends on D
I depends on G and S
G depends on L
The model i a representation on how we believe the world works.
CPD
Chain rule for Bayesian Networks
P(D,I,G,S,L) so to calculate this join probability distribution: P(d0*i1*g3*s1*l1)= 0.6*0.3*0.02*0.01*0.8
A Bayesian network is a directed acyclic (means: no cycles -> you can't go back when you started) graph (DAG)G whose nodes
represent the random variables X1, ..., Xn
And for each node in the graph, we have CPD - set of variables so this would be live the probability of G give I and D
The BN represents a join distribution via the chain rule for Bayesian networks.
BN is a legal distribution: P >= 0
P is a product of factor (CPD) and CPD is non-negative and if you multiply a bunch of non-negative factors, you get a non-negative factor.
BN is a legal distribution: P = 1
The very first example of Bayesian network -> genetic inheritance:
Genotype: AA, AB, AO, BO, BB, OO
Phenotype:A, B, AO, BO, AB
Reasoning patterns:
- causal reasoning (top to bottom)
- evidential reasoning (bottom to top)
- intercausal reasoning (2 causes of a single effect)
Flow of probabilistic influence
When can X influence Y?
Infleunce means condition on X chnages beliefs about Y.
- X -> Y / yes - causal
- X <- Y / yes - evidential
- X -> W -> Y / yes
- X <- W <- Y / yes
- X <- W -> Y / yes
- X -> W <- Y / V-structure no
To activate a v-structure, Xi or one of its descendants is observed.