py-bbn
End of Life: This version of py-bbn is no longer maintained. For a new version please go here.

py-bbn
is a Python implementation of probabilistic and causal inference in Bayesian Belief Networks using exact inference algorithms [CGH97, Cow98, HD99, Kol09, Mur12].
You may install py-bbn
from pypi.
pip install pybbn
If you like py-bbn, you might be interested in our next-generation products.
Rocket Vector is a CausalAI platform in the cloud.

Autonosis is a GenAI + CausalAI capable platform.

pyspark-bbn is a is a scalable, massively parallel processing MPP framework for learning structures and parameters of Bayesian Belief Networks BBNs using Apache Spark.

Please contact us at info@rocketvector.io. Let’s reach for success!
Probabilistic Inference
The probabilistic inference algorithm used by py-bbn is an exact inference algorithm. Let’s go through an example on how to conduct exact inference.
Huang Graph
Below is the code to create the Huang Graph [HD99]. Note the typical procedure as follows.
create a Bayesian Belief Network (BBN)
create a junction tree from the graph
assert evidence
print out the marginal probabilities
Huang Bayesian Belief Network structure.
1from pybbn.graph.dag import Bbn
2from pybbn.graph.edge import Edge, EdgeType
3from pybbn.graph.jointree import EvidenceBuilder
4from pybbn.graph.node import BbnNode
5from pybbn.graph.variable import Variable
6from pybbn.pptc.inferencecontroller import InferenceController
7
8# create the nodes
9a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
10b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
11c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
12d = BbnNode(Variable(3, 'd', ['on', 'off']), [0.9, 0.1, 0.5, 0.5])
13e = BbnNode(Variable(4, 'e', ['on', 'off']), [0.3, 0.7, 0.6, 0.4])
14f = BbnNode(Variable(5, 'f', ['on', 'off']), [0.01, 0.99, 0.01, 0.99, 0.01, 0.99, 0.99, 0.01])
15g = BbnNode(Variable(6, 'g', ['on', 'off']), [0.8, 0.2, 0.1, 0.9])
16h = BbnNode(Variable(7, 'h', ['on', 'off']), [0.05, 0.95, 0.95, 0.05, 0.95, 0.05, 0.95, 0.05])
17
18# create the network structure
19bbn = Bbn() \
20 .add_node(a) \
21 .add_node(b) \
22 .add_node(c) \
23 .add_node(d) \
24 .add_node(e) \
25 .add_node(f) \
26 .add_node(g) \
27 .add_node(h) \
28 .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
29 .add_edge(Edge(a, c, EdgeType.DIRECTED)) \
30 .add_edge(Edge(b, d, EdgeType.DIRECTED)) \
31 .add_edge(Edge(c, e, EdgeType.DIRECTED)) \
32 .add_edge(Edge(d, f, EdgeType.DIRECTED)) \
33 .add_edge(Edge(e, f, EdgeType.DIRECTED)) \
34 .add_edge(Edge(c, g, EdgeType.DIRECTED)) \
35 .add_edge(Edge(e, h, EdgeType.DIRECTED)) \
36 .add_edge(Edge(g, h, EdgeType.DIRECTED))
37
38# convert the BBN to a join tree
39join_tree = InferenceController.apply(bbn)
40
41# insert an observation evidence
42ev = EvidenceBuilder() \
43 .with_node(join_tree.get_bbn_node_by_name('a')) \
44 .with_evidence('on', 1.0) \
45 .build()
46join_tree.set_observation(ev)
47
48# print the posterior probabilities
49for node, posteriors in join_tree.get_posteriors().items():
50 p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
51 print(f'{node} : {p}')
A Bayesian Belief Network (BBN) is defined as a pair, G, P
, where
G
is a directed acylic graph (DAG)P
is a joint probability distributionand
G
satisfies the Markov Condition (nodes are conditionally independent of non-descendants given its parents)
Ideally, the API should force the user to define G
and P
separately. However, there will be a bit of cognitive friction
with this API as we define nodes associated with their local probability models (conditional probability tables)
and then the structure afterwards. But this approach seems a bit more concise, no?
Updating Conditional Probability Tables
Sometimes, you may want to preserve the join tree structure and just update the condtional probability tables (CPTs). Here’s how to do so.
1from pybbn.graph.dag import Bbn
2from pybbn.graph.edge import EdgeType, Edge
3from pybbn.graph.node import BbnNode
4from pybbn.graph.variable import Variable
5from pybbn.pptc.inferencecontroller import InferenceController
6
7# you have built a BBN
8a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
9b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
10bbn = Bbn().add_node(a).add_node(b) \
11 .add_edge(Edge(a, b, EdgeType.DIRECTED))
12
13# you have built a junction tree from the BBN
14# let's call this "original" junction tree the left-hand side (lhs) junction tree
15lhs_jt = InferenceController.apply(bbn)
16
17# you may just update the CPTs with the original junction tree structure
18# the algorithm to find/build the junction tree is avoided
19# the CPTs are updated
20rhs_jt = InferenceController.reapply(lhs_jt, {0: [0.3, 0.7], 1: [0.2, 0.8, 0.8, 0.2]})
21
22# let's print out the marginal probabilities and see how things changed
23# print the marginal probabilities for the lhs junction tree
24print('lhs probabilities')
25# print the posterior probabilities
26for node, posteriors in lhs_jt.get_posteriors().items():
27 p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
28 print(f'{node} : {p}')
29
30# print the marginal probabilities for the rhs junction tree
31print('rhs probabilities')
32for node, posteriors in rhs_jt.get_posteriors().items():
33 p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
34 print(f'{node} : {p}')
Note that we use InferenceController.reapply(...)
to apply the new CPTs to a previous one and that we
get a new junction tree as an output.
Gaussian Inference
Inference on a Gaussian Bayesian Network (GBN) is accomplished through updating the means and covariance matrix incrementally [CGH97]. The following GBN comes from [Cow98].
Cowell GBN structure.
The variables come from the following Gaussian distributions.
\(Y = \mathcal{N}(0, 1)\)
\(X = \mathcal{N}(Y, 1)\)
\(Z = \mathcal{N}(Z, 1)\)
Below is a code sample of how we can perform inference on this GBN.
1import numpy as np
2
3from pybbn.gaussian.inference import GaussianInference
4
5
6def get_cowell_data():
7 """
8 Gets Cowell data.
9
10 :return: Data and headers.
11 """
12 n = 10000
13 Y = np.random.normal(0, 1, n)
14 X = np.random.normal(Y, 1, n)
15 Z = np.random.normal(X, 1, n)
16
17 D = np.vstack([Y, X, Z]).T
18 return D, ['Y', 'X', 'Z']
19
20
21# assume we have data and headers (variable names per column)
22# X is the data (rows are observations, columns are variables)
23# H is just a list of variable names
24X, H = get_cowell_data()
25
26# then we can compute the means and covariance matrix easily
27M = X.mean(axis=0)
28E = np.cov(X.T)
29
30# the means and covariance matrix are all we need for gaussian inference
31# notice how we keep `g` around?
32# we'll use `g` over and over to do inference with evidence/observations
33g = GaussianInference(H, M, E)
34# {'Y': (0.00967, 0.98414), 'X': (0.01836, 2.02482), 'Z': (0.02373, 3.00646)}
35print(g.P)
36
37# we can make a single observation with do_inference()
38g1 = g.do_inference('X', 1.5)
39# {'X': (1.5, 0), 'Y': (0.76331, 0.49519), 'Z': (1.51893, 1.00406)}
40print(g1.P)
41
42# we can make multiple observations with do_inferences()
43g2 = g.do_inferences([('Z', 1.5), ('X', 2.0)])
44# {'Z': (1.5, 0), 'X': (2.0, 0), 'Y': (1.00770, 0.49509)}
45print(g2.P)
Causal Inference
Average Causal Effect
Here’s how you may estimate the Average Causal Effect ACE
using Pearl’s do-operator
[Pea88, Pea00, Pea16, Pea18].
In this example, we want to estimate the ACE of drug on recovery where recovery is true.
Z is confounding X and Y.
1from pybbn.causality.ace import Ace
2from pybbn.graph.dag import Bbn
3from pybbn.graph.edge import Edge, EdgeType
4from pybbn.graph.node import BbnNode
5from pybbn.graph.variable import Variable
6
7# create a BBN
8gender_probs = [0.49, 0.51]
9drug_probs = [0.23323615160349853, 0.7667638483965015,
10 0.7563025210084033, 0.24369747899159663]
11recovery_probs = [0.31000000000000005, 0.69,
12 0.27, 0.73,
13 0.13, 0.87,
14 0.06999999999999995, 0.93]
15
16X = BbnNode(Variable(1, 'drug', ['false', 'true']), drug_probs)
17Y = BbnNode(Variable(2, 'recovery', ['false', 'true']), recovery_probs)
18Z = BbnNode(Variable(0, 'gender', ['female', 'male']), gender_probs)
19
20bbn = Bbn() \
21 .add_node(X) \
22 .add_node(Y) \
23 .add_node(Z) \
24 .add_edge(Edge(Z, X, EdgeType.DIRECTED)) \
25 .add_edge(Edge(Z, Y, EdgeType.DIRECTED)) \
26 .add_edge(Edge(X, Y, EdgeType.DIRECTED))
27
28# compute the ACE
29ace = Ace(bbn)
30results = ace.get_ace('drug', 'recovery', 'true')
31t = results['true']
32f = results['false']
33average_causal_impact = t - f
Serialization/Deserialization
We all need a way to save (serialize) and load (deserialize) our Bayesian Belief Networks (BBNs) and join trees (JTs). Here’s how to do so. Note that serde (serialization/deserialization) features are just writing to JSON or CSV formats and loading back from the such files. The code takes care of the serde process.
Serializing a BBN
JSON Serialization Format
1from pybbn.graph.dag import Bbn
2from pybbn.graph.edge import Edge, EdgeType
3from pybbn.graph.node import BbnNode
4from pybbn.graph.variable import Variable
5
6# create graph
7a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
8b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
9bbn = Bbn().add_node(a).add_node(b) \
10 .add_edge(Edge(a, b, EdgeType.DIRECTED))
11
12# serialize
13Bbn.to_json(bbn, 'simple-bbn.json')
You will get a file simple-bbn.json
written out with the following content.
1{
2 "nodes": {
3 "0": {
4 "probs": [
5 0.2,
6 0.8
7 ],
8 "variable": {
9 "id": 0,
10 "name": "a",
11 "values": [
12 "t",
13 "f"
14 ]
15 }
16 },
17 "1": {
18 "probs": [
19 0.1,
20 0.9,
21 0.9,
22 0.1
23 ],
24 "variable": {
25 "id": 1,
26 "name": "b",
27 "values": [
28 "t",
29 "f"
30 ]
31 }
32 }
33 },
34 "edges": [
35 {
36 "pa": 0,
37 "ch": 1
38 }
39 ]
40}
CSV Serialization Format
1from pybbn.graph.dag import Bbn
2from pybbn.graph.edge import Edge, EdgeType
3from pybbn.graph.node import BbnNode
4from pybbn.graph.variable import Variable
5
6# create graph
7a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
8b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
9bbn = Bbn().add_node(a).add_node(b) \
10 .add_edge(Edge(a, b, EdgeType.DIRECTED))
11
12# serialize
13Bbn.to_csv(bbn, 'simple-bbn.csv')
You will get a file simple-bbn.csv
written out with the following content.
10,a,t,f,|,0.2,0.8
21,b,t,f,|,0.1,0.9,0.9,0.1
30,1,directed
Deserializing a BBN
JSON Deserialization Format
1from pybbn.graph.dag import Bbn
2
3# deserialize
4bbn = Bbn.from_json('simple-bbn.json')
CSV Deserialization Format
1from pybbn.graph.dag import Bbn
2
3# deserialize
4bbn = Bbn.from_csv('simple-bbn.csv')
Join Tree Serde
A join tree may also be serialized and deserialized. Only json
format is supported for now.
Serializing a Join Tree
1import json
2
3from pybbn.graph.dag import Bbn
4from pybbn.graph.edge import EdgeType, Edge
5from pybbn.graph.jointree import JoinTree
6from pybbn.graph.node import BbnNode
7from pybbn.graph.variable import Variable
8from pybbn.pptc.inferencecontroller import InferenceController
9
10a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
11b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
12bbn = Bbn().add_node(a).add_node(b) \
13 .add_edge(Edge(a, b, EdgeType.DIRECTED))
14jt = InferenceController.apply(bbn)
15
16with open('simple-join-tree.json', 'w') as f:
17 d = JoinTree.to_dict(jt, bbn)
18 j = json.dumps(d, sort_keys=True, indent=2)
19 f.write(j)
You will get a file simple-join-tree.json
written out with the following content.
1{
2 "bbn_nodes": {
3 "0": {
4 "probs": [
5 0.2,
6 0.8
7 ],
8 "variable": {
9 "id": 0,
10 "name": "a",
11 "values": [
12 "t",
13 "f"
14 ]
15 }
16 },
17 "1": {
18 "probs": [
19 0.1,
20 0.9,
21 0.9,
22 0.1
23 ],
24 "variable": {
25 "id": 1,
26 "name": "b",
27 "values": [
28 "t",
29 "f"
30 ]
31 }
32 }
33 },
34 "jt": {
35 "edges": [],
36 "nodes": {
37 "0-1": {
38 "node_ids": [
39 0,
40 1
41 ],
42 "type": "clique"
43 }
44 },
45 "parent_info": {
46 "0": [],
47 "1": [
48 0
49 ]
50 }
51 }
52}
Deserializing a Join Tree
1import json
2
3from pybbn.graph.jointree import JoinTree
4from pybbn.pptc.inferencecontroller import InferenceController
5
6with open('simple-join-tree.json', 'r') as f:
7 j = f.read()
8 d = json.loads(j)
9 jt = JoinTree.from_dict(d)
10 jt = InferenceController.apply_from_serde(jt)
Generating Bayesian Belief Networks
Let’s generate some Bayesian Belief Networks (BBNs). The algorithms are taken from Random Generation of Bayesian Networks [IC02]. There are two types of BBNs you may generate.
singly-connected
multi-connected
A singly-connected BBN is one, where ignoring the direction of the edges, there is at most one path between any two nodes.
A multi-connected BBN is one that is not
singly-connected.
Singly-connected network structure.
Multi-connected network structure. There are two paths between C and F: (C, D, F) and (C, E, F).
Singly-Connected
The key method to use here is generate_singly_bbn
.
1import numpy as np
2
3from pybbn.generator.bbngenerator import generate_singly_bbn, convert_for_exact_inference, convert_for_drawing
4
5# very important to set the seed for reproducible results
6np.random.seed(37)
7
8# this method generates the graph, g, and probabilities, p
9# note we are generating a singly-connected graph
10g, p = generate_singly_bbn(5, max_iter=5)
11
12# you have to convert g and p to a BBN
13bbn = convert_for_exact_inference(g, p)
14
15# you can convert the BBN to a nx graph for visualization
16nx_graph = convert_for_drawing(bbn)
Multi-Connected
The key method to use here is generate_multi_bbn
.
1import numpy as np
2
3from pybbn.generator.bbngenerator import generate_multi_bbn, convert_for_exact_inference, convert_for_drawing
4
5# very important to set the seed for reproducible results
6np.random.seed(37)
7
8# this method generates the graph, g, and probabilities, p
9# note we are generating a multi-connected graph
10g, p = generate_multi_bbn(5, max_iter=5)
11
12# you have to convert g and p to a BBN
13bbn = convert_for_exact_inference(g, p)
14
15# you can convert the BBN to a nx graph for visualization
16nx_graph = convert_for_drawing(bbn)
Direct Generation
In the case where you do NOT
need a reference to the BBN objects, use the API’s convenience method to generate and serialize the BBN directly to file.
1import numpy as np
2
3from pybbn.generator.bbngenerator import generate_bbn_to_file
4
5# set the seed for reproducibility
6np.random.seed(37)
7
8# generate a singly-connected BBN
9generate_bbn_to_file(n=10, file_path='singly-bbn.csv', bbn_type='singly', max_alpha=10)
10
11# generate a multi-connected BBN
12generate_bbn_to_file(n=10, file_path='multi-bbn.csv', bbn_type='multi', max_alpha=10)
Here’s the output for singly-bbn.csv
.
10,0,state0,state1,|,0.5495149877004699,0.4504850122995299
21,1,state0,state1,|,0.35835359558290997,0.64164640441709,0.8660444980250707,0.13395550197492936
32,2,state0,state1,|,0.5828348518985648,0.4171651481014352,0.6352808281847757,0.3647191718152243
43,3,state0,state1,|,0.43155247482552955,0.5684475251744704,0.05744110250902426,0.9425588974909757,0.44585399607259946,0.5541460039274007,0.286749915005319,0.713250084994681
54,4,state0,state1,|,0.3190576398549361,0.6809423601450639,0.011424133320075755,0.9885758666799241
65,5,state0,state1,|,0.48207371043602226,0.5179262895639779,0.07147107402394111,0.9285289259760588
76,6,state0,state1,|,0.2076134466833406,0.7923865533166594,0.44542849473036455,0.5545715052696354
87,7,state0,state1,|,0.757560101942848,0.242439898057152
98,8,state0,state1,|,0.1906328058926942,0.8093671941073058,0.2814000588799281,0.7185999411200719
109,9,state0,state1,|,0.7854793106243432,0.2145206893756569,0.12392098364527641,0.8760790163547235
110,1,directed
121,2,directed
132,3,directed
143,4,directed
153,8,directed
165,6,directed
175,3,directed
187,5,directed
198,9,directed
Here’s the output for multi-bbn.csv
.
10,0,state0,state1,|,0.680874572938313,0.319125427061687
21,1,state0,state1,|,0.7617263477727293,0.23827365222727065,0.3117227721913154,0.6882772278086846
32,2,state0,state1,|,0.12614472921860395,0.8738552707813961,0.7070911105993563,0.29290888940064375
43,3,state0,state1,|,0.4055587320025024,0.5944412679974977,0.9624106996627307,0.037589300337269156
54,4,state0,state1,|,0.31986562609614827,0.6801343739038517,0.022365118374575416,0.9776348816254246
65,5,state0,state1,|,0.77366174354673,0.2263382564532701,0.8579513677510221,0.1420486322489778,0.3183725110598738,0.6816274889401261,0.04262514631905535,0.9573748536809447
76,6,state0,state1,|,0.05830032685169777,0.9416996731483022,0.5840685338695271,0.41593146613047294,0.7078930065265004,0.29210699347349944,0.490562272424676,0.509437727575324
87,7,state0,state1,|,0.7569425298012309,0.243057470198769,0.6536654079476188,0.3463345920523811,0.6299885487124776,0.3700114512875224,0.4929042112083024,0.5070957887916976
98,8,state0,state1,|,0.3295640257593744,0.6704359742406256,0.9098731919901998,0.09012680800980029
109,9,state0,state1,|,0.7804943261233692,0.21950567387663072,0.43963638923803844,0.5603636107619615,0.03168532379450399,0.968314676205496,0.7189237718440259,0.28107622815597405,0.356320337335263,0.643679662664737,0.8089559692517324,0.19104403074826756,0.520364955519572,0.47963504448042804,0.3989706528653481,0.601029347134652
110,1,directed
120,9,directed
130,5,directed
141,2,directed
152,3,directed
163,4,directed
174,5,directed
184,6,directed
194,7,directed
205,6,directed
216,7,directed
226,9,directed
237,8,directed
248,9,directed
Sampling Data
Sampling data from a BBN is possible. The algorithm uses logic sampling with rejection
[Hen88].
Simple Sampling
This code demonstrates simple sampling.
1from pybbn.graph.dag import Bbn
2from pybbn.graph.edge import Edge, EdgeType
3from pybbn.graph.node import BbnNode
4from pybbn.graph.variable import Variable
5from pybbn.sampling.sampling import LogicSampler
6
7a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
8b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
9c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
10
11bbn = Bbn() \
12 .add_node(a) \
13 .add_node(b) \
14 .add_node(c) \
15 .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
16 .add_edge(Edge(b, c, EdgeType.DIRECTED))
17
18sampler = LogicSampler(bbn)
19samples = sampler.get_samples(n_samples=10000, seed=37)
Sampling with Rejection
This code demonstrates sampling with evidence asserted. During each round of sampling, if the sample value generated does not match with the evidence, the entire sample is discarded.
1from pybbn.graph.dag import Bbn
2from pybbn.graph.edge import Edge, EdgeType
3from pybbn.graph.node import BbnNode
4from pybbn.graph.variable import Variable
5from pybbn.sampling.sampling import LogicSampler
6
7a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
8b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
9c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
10
11bbn = Bbn() \
12 .add_node(a) \
13 .add_node(b) \
14 .add_node(c) \
15 .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
16 .add_edge(Edge(b, c, EdgeType.DIRECTED))
17
18sampler = LogicSampler(bbn)
19samples = sampler.get_samples(evidence={0: 'on'}, n_samples=10000, seed=37)
Create BBN with structure and data
If you know the BBN structure and have data, you can create a BBN using the structure and learn the parameters from the data. For now, the parameters are simply the raw counts (not-Bayesian). The method to use is from Factory.from_data()
.
[1]:
import pandas as pd
from pybbn.graph.factory import Factory
df = pd.read_csv('./data/data-from-structure.csv')
structure = {
'a': [],
'b': ['a'],
'c': ['b']
}
bbn = Factory.from_data(structure, df)
As usual, after you acquire a BBN, you can performe inference using an InferenceController
.
[2]:
from pybbn.pptc.inferencecontroller import InferenceController
join_tree = InferenceController.apply(bbn)
for node, posteriors in join_tree.get_posteriors().items():
p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
print(f'{node} : {p}')
b : off=0.55020, on=0.44980
c : off=0.57210, on=0.42790
a : off=0.49850, on=0.50150
[3]:
import networkx as nx
n, d = bbn.to_nx_graph()
nx.draw(n, with_labels=True, labels=d, node_color='r', alpha=0.5)

Exact Inference with Widgets
Here, we show a very simple example of how to observe the marginal posterior probabilities of each node given the state of one. We will use the Huang graph [HD99].
Simulate data
[1]:
%matplotlib inline
from pybbn.graph.dag import BbnUtil
from pybbn.graph.jointree import EvidenceBuilder, EvidenceType
from pybbn.pptc.inferencecontroller import InferenceController
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from collections import namedtuple
np.random.seed(37)
plt.style.use('ggplot')
Marginal = namedtuple('Marginal', 'name, s')
def potential_to_series(p):
vals = []
index = []
for pe in p.entries:
try:
v = pe.entries.values()[0]
except:
v = list(pe.entries.values())[0]
p = pe.value
vals.append(p)
index.append(v)
return pd.Series(vals, index=index)
def get_marginals(join_tree):
data = []
for node in join_tree.get_bbn_nodes():
name = node.variable.name
s = potential_to_series(join_tree.get_bbn_potential(node))
t = Marginal(name, s)
data.append(t)
return data
# get the pre-defined huang graph
bbn = BbnUtil.get_huang_graph()
# convert the BBN to a join tree
join_tree = InferenceController.apply(bbn)
Visualize
[2]:
import math
from ipywidgets import interact
@interact(a=[('unobserved', -1), ('off', 0), ('on', 1)])
def f(a=-1):
n_cols = 4
n_rows = math.ceil(len(bbn.get_nodes()) / n_cols)
if a == -1:
join_tree.unobserve_all()
marginals = get_marginals(join_tree)
else:
v = 'on' if a == 1 else 'off'
ev = EvidenceBuilder() \
.with_node(join_tree.get_bbn_node_by_name('a')) \
.with_evidence(v, 1.0) \
.build()
join_tree.unobserve_all()
join_tree.set_observation(ev)
marginals = get_marginals(join_tree)
marginals = sorted(marginals, key=lambda tup: tup[0])
fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, 5), sharey=True)
for m, ax in zip(marginals, np.ravel(axes)):
m.s.plot(kind='bar', legend=False, ax=ax)
ax.set_title(m.name)
ax.set_ylim([0.0, 1.0])
ax.set_xlabel('')
plt.tight_layout()
Multivariate Gaussian Inference with Widgets
This notebook shows how to do multivariate Gaussian inference with widgets. We allow one variable to change and visualize the change of distributions for the other. We will be using the Cowell graph [Cow98].
Simulate data
[1]:
%matplotlib inline
import numpy as np
from pybbn.gaussian.inference import GaussianInference
import matplotlib.pyplot as plt
np.random.seed(37)
plt.style.use('ggplot')
plt.rcParams['axes.grid'] = False
def get_cowell_data():
n = 10000
Y = np.random.normal(0, 1, n)
X = np.random.normal(Y, 1, n)
Z = np.random.normal(X, 1, n)
D = np.vstack([Y, X, Z]).T
return D, ['Y', 'X', 'Z']
def get_mvn():
X, H = get_cowell_data()
M = X.mean(axis=0)
E = np.cov(X.T)
g = GaussianInference(H, M, E)
return g
g = get_mvn()
[2]:
import pandas as pd
pd.DataFrame(g.marginals)
[2]:
name | mean | var | |
---|---|---|---|
0 | Y | -0.001723 | 0.990700 |
1 | X | 0.007448 | 2.016406 |
2 | Z | 0.002459 | 3.033838 |
Visualize
[3]:
from ipywidgets import interact
samples1 = g.sample_marginals(size=10000)
@interact(x=(-5, 5, 1))
def f(x=None):
if x is not None:
gg = g.do_inference('X', x)
else:
gg = g
samples2 = gg.sample_marginals(size=5000)
fig, axes = plt.subplots(1, 3, figsize=(15, 3), sharey=False)
axes = np.ravel(axes)
kind = 'hist'
alpha = 0.15
for (name, s2), ax in zip(samples2.items(), axes):
if name == 'X':
ax2 = ax.twinx()
_ = samples1[name].plot(kind=kind, ax=ax2, color='blue', alpha=alpha)
_ = ax.axvline(x=x, color='red')
_ = ax2.set_ylabel('')
else:
ax2 = ax.twinx()
_ = samples1[name].plot(kind=kind, ax=ax, color='blue', alpha=alpha)
_ = s2.plot(kind=kind, ax=ax)
_ = s2.plot(kind='kde', ax=ax2, color='green')
_ = ax2.set_ylabel('')
_ = ax.set_title(f'{name}')
_ = ax.set_ylabel('')
plt.tight_layout()
Bibliography
E. Castillo, J.M. Gutierrez, and A.S. Hadi. Expert Systems and Probabilistic Network Models. Springer, 1997.
R.G. Cowell. Advanced inference in bayesian networks. In M.I. Jordan, editor, Learning in Graphical Models. A Bradford Book, 1998.
M. Henrion. Propagating uncertainty in bayesian networks by probabilistic logic sampling. Uncertainty in Artificial Intelligence, 1988.
C. Huang and A. Darwiche. Inference in belief networks: a procedural guide. International Journal of Approximate Reasoning, 1999.
J.S. Ide and F.G. Cozman. Random generation of bayesian network. Advances in Artificial Intelligence, 2002.
D. Koller. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
K.P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2000.
J. Pearl. Causal Inference in Statistics - A Primer. Wiley, 2016.
J. Pearl. The Book of Why: The New Science of Cause and Effect. Basic Books, 2018.
Edge Ordering
Edge ordering makes a difference on the on the join tree and potentials produced. Let’s take the BBN network structure below where all nodes are binary having the values on
and off
.
a –> c <– b
Note how c
has 2 parents, a
and b
. The potential (or conditional probability table CPT
) for c
is specified as a list of probabilities as follows.
[0.7, 0.3, 0.2, 0.8, 0.6, 0.4, 0.6, 0.4]
Let’s say that this list of probabilities represents the CPT below.
| | | c=on | c=off |
|-------|-------|------|-------|
| a=on | b=on | 0.7 | 0.3 |
| a=on | b=off | 0.2 | 0.8 |
| a=off | b=on | 0.6 | 0.4 |
| a=off | b=off | 0.6 | 0.4 |
When we define a BBN structure (be it programmatically in code/Python or declaratively in JSON), we should define and add the edge a -> c
to the graph before the edge b -> c
. Below is the code where we do so.
[1]:
from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
def get_bbn1():
a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.2, 0.8])
b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.8, 0.2])
c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8, 0.6, 0.4, 0.6, 0.4])
bbn = Bbn() \
.add_node(a) \
.add_node(b) \
.add_node(c) \
.add_edge(Edge(a, c, EdgeType.DIRECTED)) \
.add_edge(Edge(b, c, EdgeType.DIRECTED))
return bbn
When we add the edge b -> c
to the network structure before a -> c
, then the induced CPT for c
will be as follows. This second CPT for c
is not the same at all for the first one!
| | | c=on | c=off |
|-------|-------|------|-------|
| b=on | a=on | 0.7 | 0.3 |
| b=on | a=off | 0.2 | 0.8 |
| b=off | a=on | 0.6 | 0.4 |
| b=off | a=off | 0.6 | 0.4 |
Here is the code for creating a BBN where we add b -> c
before a -> c
.
[2]:
def get_bbn2():
a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.2, 0.8])
b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.8, 0.2])
c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8, 0.6, 0.4, 0.6, 0.4])
bbn = Bbn() \
.add_node(a) \
.add_node(b) \
.add_node(c) \
.add_edge(Edge(b, c, EdgeType.DIRECTED)) \
.add_edge(Edge(a, c, EdgeType.DIRECTED))
return bbn
Although the networks (regardless of the order of how we add the edges) are the same in both cases, the parameters induced are NOT and sensitive to the order of how the edges are added. Now, let’s compare the posteriors of of these 2 BBNs.
[3]:
from pybbn.pptc.inferencecontroller import InferenceController
b1 = get_bbn1()
b2 = get_bbn2()
j1 = InferenceController.apply(b1)
j2 = InferenceController.apply(b2)
Here are the posteriors for the first BBN. Note that the id-to-name as defined above are as follows.
0: a
1: b
2: c
Keep an eye on id 2, thus.
[4]:
for node in j1.get_bbn_nodes():
potential = j1.get_bbn_potential(node)
print(potential)
print('-' * 10)
1=on|0.80000
1=off|0.20000
----------
2=on|0.60000
2=off|0.40000
----------
0=on|0.20000
0=off|0.80000
----------
Here are the posteriors for the second BBN.
[5]:
for node in j2.get_bbn_nodes():
potential = j2.get_bbn_potential(node)
print(potential)
print('-' * 10)
1=on|0.80000
1=off|0.20000
----------
2=on|0.36000
2=off|0.64000
----------
0=on|0.20000
0=off|0.80000
----------
For now, there is no workaround for this issue of logically identical specified BBNs producing different potentials as a result of edge insertion order. Just make sure you are aware and careful.
Simple Example
Let’s say you have a DAG with 5 variables: \(X_1, X_2, X_3, X_4, X_5\) and the structure represented by an edge list is as follows.
\(X_1 \rightarrow X_5\)
\(X_2 \rightarrow X_5\)
\(X_3 \rightarrow X_5\)
\(X_4 \rightarrow X_5\)
The domains (or number of values) for each variable is as follows.
\(X_1 \in \{v_1, v_2\}\)
\(X_2 \in \{v_1, v_2, v_3\}\)
\(X_3 \in \{v_1, v_2\}\)
\(X_4 \in \{v_1, v_2, v_3, v_4, v_5\}\)
\(X_5 \in \{v_1, v_2, v_3\}\)
The question is, how do we build the parameters for \(X_5\)?
Let’s create some fake data.
[63]:
import pandas as pd
import numpy as np
import random
np.random.seed(37)
random.seed(37)
def get_data(n_values, n_samples):
return [f'v{v}' for v in np.random.randint(0, n_values, n_samples)]
N = 1_000
df = pd.DataFrame({
'x1': get_data(2, N),
'x2': get_data(3, N),
'x3': get_data(2, N),
'x4': get_data(5, N),
'x5': get_data(3, N)
})
df.shape
[63]:
(1000, 5)
[64]:
df.head()
[64]:
x1 | x2 | x3 | x4 | x5 | |
---|---|---|---|---|---|
0 | v1 | v2 | v0 | v4 | v2 |
1 | v1 | v0 | v1 | v3 | v0 |
2 | v0 | v1 | v1 | v2 | v2 |
3 | v1 | v2 | v0 | v1 | v2 |
4 | v0 | v2 | v1 | v1 | v2 |
Now, let’s verify the domains.
[65]:
def get_domains(df):
return {c: sorted(list(df[c].unique())) for c in df.columns}
domains = get_domains(df)
domains
[65]:
{'x1': ['v0', 'v1'],
'x2': ['v0', 'v1', 'v2'],
'x3': ['v0', 'v1'],
'x4': ['v0', 'v1', 'v2', 'v3', 'v4'],
'x5': ['v0', 'v1', 'v2']}
You want to create a conditional probability table CPT for \(X_5\) that looks like the following. Note that we simply show you the CPT shape and not the actual values. We will go into computing the actual value in a bit.
[66]:
import itertools
pas = [c for c in domains if c != 'x5']
cpt = (domains[pa] for pa in pas)
cpt = itertools.product(*cpt)
cpt = pd.DataFrame(cpt, columns=pas) \
.assign(**{v: 0.0 for v in domains['x5']}) \
.set_index(pas)
cpt
[66]:
v0 | v1 | v2 | ||||
---|---|---|---|---|---|---|
x1 | x2 | x3 | x4 | |||
v0 | v0 | v0 | v0 | 0.0 | 0.0 | 0.0 |
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | 0.0 | 0.0 | 0.0 | ||
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | v0 | 0.0 | 0.0 | 0.0 | |
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | 0.0 | 0.0 | 0.0 | ||
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v2 | v0 | v0 | 0.0 | 0.0 | 0.0 | |
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | 0.0 | 0.0 | 0.0 | ||
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | v0 | v0 | 0.0 | 0.0 | 0.0 |
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | 0.0 | 0.0 | 0.0 | ||
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | v0 | 0.0 | 0.0 | 0.0 | |
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | 0.0 | 0.0 | 0.0 | ||
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v2 | v0 | v0 | 0.0 | 0.0 | 0.0 | |
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 | |||
v1 | v0 | 0.0 | 0.0 | 0.0 | ||
v1 | 0.0 | 0.0 | 0.0 | |||
v2 | 0.0 | 0.0 | 0.0 | |||
v3 | 0.0 | 0.0 | 0.0 | |||
v4 | 0.0 | 0.0 | 0.0 |
Ok, so how do we create the CPT for \(X_5\) given the data? Here’s some code to demonstrate how to build the CPT.
[68]:
def get_cond_probs(q, d, pas, ch, df):
def save_divide(num, den):
try:
return num / den
except:
return 0.0
p_b = df.query(q).shape[0]
cp_pa = {f'x{i+1}': p for i, p in enumerate(pas)}
cp_ch = {v: save_divide(df.query(f'{q} and {ch}=="{v}"').shape[0], p_b) for v in d}
cp = {**cp_pa, **cp_ch}
return cp
pa_values = list(itertools.product(*(domains[pa] for pa in pas)))
queries = map(lambda tup: [f'x{i+1}=="{v}"' for i, v in enumerate(tup)], pa_values)
queries = map(lambda arr: ' and '.join(arr), queries)
queries = list(queries)
cpt = pd.DataFrame((get_cond_probs(q, domains['x5'], pas, 'x5', df) for q, pas in zip(queries, pa_values))) \
.set_index(['x1', 'x2', 'x3', 'x4'])
cpt
[68]:
v0 | v1 | v2 | ||||
---|---|---|---|---|---|---|
x1 | x2 | x3 | x4 | |||
v0 | v0 | v0 | v0 | 0.466667 | 0.266667 | 0.266667 |
v1 | 0.285714 | 0.428571 | 0.285714 | |||
v2 | 0.583333 | 0.166667 | 0.250000 | |||
v3 | 0.538462 | 0.307692 | 0.153846 | |||
v4 | 0.400000 | 0.000000 | 0.600000 | |||
v1 | v0 | 0.533333 | 0.333333 | 0.133333 | ||
v1 | 0.411765 | 0.117647 | 0.470588 | |||
v2 | 0.357143 | 0.357143 | 0.285714 | |||
v3 | 0.434783 | 0.173913 | 0.391304 | |||
v4 | 0.368421 | 0.421053 | 0.210526 | |||
v1 | v0 | v0 | 0.375000 | 0.375000 | 0.250000 | |
v1 | 0.266667 | 0.400000 | 0.333333 | |||
v2 | 0.312500 | 0.250000 | 0.437500 | |||
v3 | 0.400000 | 0.200000 | 0.400000 | |||
v4 | 0.190476 | 0.428571 | 0.380952 | |||
v1 | v0 | 0.157895 | 0.368421 | 0.473684 | ||
v1 | 0.307692 | 0.384615 | 0.307692 | |||
v2 | 0.388889 | 0.333333 | 0.277778 | |||
v3 | 0.250000 | 0.375000 | 0.375000 | |||
v4 | 0.200000 | 0.250000 | 0.550000 | |||
v2 | v0 | v0 | 0.333333 | 0.166667 | 0.500000 | |
v1 | 0.142857 | 0.428571 | 0.428571 | |||
v2 | 0.320000 | 0.440000 | 0.240000 | |||
v3 | 0.285714 | 0.428571 | 0.285714 | |||
v4 | 0.500000 | 0.250000 | 0.250000 | |||
v1 | v0 | 0.476190 | 0.333333 | 0.190476 | ||
v1 | 0.333333 | 0.333333 | 0.333333 | |||
v2 | 0.307692 | 0.076923 | 0.615385 | |||
v3 | 0.312500 | 0.375000 | 0.312500 | |||
v4 | 0.562500 | 0.187500 | 0.250000 | |||
v1 | v0 | v0 | v0 | 0.333333 | 0.285714 | 0.380952 |
v1 | 0.222222 | 0.388889 | 0.388889 | |||
v2 | 0.235294 | 0.411765 | 0.352941 | |||
v3 | 0.333333 | 0.333333 | 0.333333 | |||
v4 | 0.470588 | 0.235294 | 0.294118 | |||
v1 | v0 | 0.368421 | 0.315789 | 0.315789 | ||
v1 | 0.461538 | 0.192308 | 0.346154 | |||
v2 | 0.263158 | 0.315789 | 0.421053 | |||
v3 | 0.411765 | 0.235294 | 0.352941 | |||
v4 | 0.315789 | 0.368421 | 0.315789 | |||
v1 | v0 | v0 | 0.304348 | 0.260870 | 0.434783 | |
v1 | 0.454545 | 0.454545 | 0.090909 | |||
v2 | 0.300000 | 0.350000 | 0.350000 | |||
v3 | 0.454545 | 0.181818 | 0.363636 | |||
v4 | 0.214286 | 0.428571 | 0.357143 | |||
v1 | v0 | 0.285714 | 0.357143 | 0.357143 | ||
v1 | 0.235294 | 0.411765 | 0.352941 | |||
v2 | 0.315789 | 0.315789 | 0.368421 | |||
v3 | 0.363636 | 0.227273 | 0.409091 | |||
v4 | 0.222222 | 0.333333 | 0.444444 | |||
v2 | v0 | v0 | 0.272727 | 0.181818 | 0.545455 | |
v1 | 0.190476 | 0.523810 | 0.285714 | |||
v2 | 0.391304 | 0.260870 | 0.347826 | |||
v3 | 0.428571 | 0.214286 | 0.357143 | |||
v4 | 0.333333 | 0.166667 | 0.500000 | |||
v1 | v0 | 0.235294 | 0.352941 | 0.411765 | ||
v1 | 0.166667 | 0.500000 | 0.333333 | |||
v2 | 0.300000 | 0.200000 | 0.500000 | |||
v3 | 0.263158 | 0.368421 | 0.368421 | |||
v4 | 0.400000 | 0.333333 | 0.266667 |
Finally, we can flatten the CPT as follows.
[71]:
cpt_x5 = np.ravel(cpt).tolist()
cpt_x5
[71]:
[0.4666666666666667,
0.26666666666666666,
0.26666666666666666,
0.2857142857142857,
0.42857142857142855,
0.2857142857142857,
0.5833333333333334,
0.16666666666666666,
0.25,
0.5384615384615384,
0.3076923076923077,
0.15384615384615385,
0.4,
0.0,
0.6,
0.5333333333333333,
0.3333333333333333,
0.13333333333333333,
0.4117647058823529,
0.11764705882352941,
0.47058823529411764,
0.35714285714285715,
0.35714285714285715,
0.2857142857142857,
0.43478260869565216,
0.17391304347826086,
0.391304347826087,
0.3684210526315789,
0.42105263157894735,
0.21052631578947367,
0.375,
0.375,
0.25,
0.26666666666666666,
0.4,
0.3333333333333333,
0.3125,
0.25,
0.4375,
0.4,
0.2,
0.4,
0.19047619047619047,
0.42857142857142855,
0.38095238095238093,
0.15789473684210525,
0.3684210526315789,
0.47368421052631576,
0.3076923076923077,
0.38461538461538464,
0.3076923076923077,
0.3888888888888889,
0.3333333333333333,
0.2777777777777778,
0.25,
0.375,
0.375,
0.2,
0.25,
0.55,
0.3333333333333333,
0.16666666666666666,
0.5,
0.14285714285714285,
0.42857142857142855,
0.42857142857142855,
0.32,
0.44,
0.24,
0.2857142857142857,
0.42857142857142855,
0.2857142857142857,
0.5,
0.25,
0.25,
0.47619047619047616,
0.3333333333333333,
0.19047619047619047,
0.3333333333333333,
0.3333333333333333,
0.3333333333333333,
0.3076923076923077,
0.07692307692307693,
0.6153846153846154,
0.3125,
0.375,
0.3125,
0.5625,
0.1875,
0.25,
0.3333333333333333,
0.2857142857142857,
0.38095238095238093,
0.2222222222222222,
0.3888888888888889,
0.3888888888888889,
0.23529411764705882,
0.4117647058823529,
0.35294117647058826,
0.3333333333333333,
0.3333333333333333,
0.3333333333333333,
0.47058823529411764,
0.23529411764705882,
0.29411764705882354,
0.3684210526315789,
0.3157894736842105,
0.3157894736842105,
0.46153846153846156,
0.19230769230769232,
0.34615384615384615,
0.2631578947368421,
0.3157894736842105,
0.42105263157894735,
0.4117647058823529,
0.23529411764705882,
0.35294117647058826,
0.3157894736842105,
0.3684210526315789,
0.3157894736842105,
0.30434782608695654,
0.2608695652173913,
0.43478260869565216,
0.45454545454545453,
0.45454545454545453,
0.09090909090909091,
0.3,
0.35,
0.35,
0.45454545454545453,
0.18181818181818182,
0.36363636363636365,
0.21428571428571427,
0.42857142857142855,
0.35714285714285715,
0.2857142857142857,
0.35714285714285715,
0.35714285714285715,
0.23529411764705882,
0.4117647058823529,
0.35294117647058826,
0.3157894736842105,
0.3157894736842105,
0.3684210526315789,
0.36363636363636365,
0.22727272727272727,
0.4090909090909091,
0.2222222222222222,
0.3333333333333333,
0.4444444444444444,
0.2727272727272727,
0.18181818181818182,
0.5454545454545454,
0.19047619047619047,
0.5238095238095238,
0.2857142857142857,
0.391304347826087,
0.2608695652173913,
0.34782608695652173,
0.42857142857142855,
0.21428571428571427,
0.35714285714285715,
0.3333333333333333,
0.16666666666666666,
0.5,
0.23529411764705882,
0.35294117647058826,
0.4117647058823529,
0.16666666666666666,
0.5,
0.3333333333333333,
0.3,
0.2,
0.5,
0.2631578947368421,
0.3684210526315789,
0.3684210526315789,
0.4,
0.3333333333333333,
0.26666666666666666]
For completeness, let’s compute the probabilities for the parents.
[80]:
def get_prob(n, d, df):
N = df.shape[0]
p = {v: df.query(f'{n}=="{v}"').shape[0] / N for v in d}
return list(p.values())
cpt_x1 = get_prob('x1', domains['x1'], df)
cpt_x2 = get_prob('x2', domains['x2'], df)
cpt_x3 = get_prob('x3', domains['x3'], df)
cpt_x4 = get_prob('x4', domains['x4'], df)
[81]:
cpt_x1
[81]:
[0.489, 0.511]
[82]:
cpt_x2
[82]:
[0.34, 0.341, 0.319]
[83]:
cpt_x3
[83]:
[0.477, 0.523]
[84]:
cpt_x4
[84]:
[0.209, 0.197, 0.206, 0.191, 0.197]
Let’s build the BBN and join tree.
[93]:
x1 = BbnNode(Variable(0, 'x1', domains['x1']), cpt_x1)
x2 = BbnNode(Variable(1, 'x2', domains['x2']), cpt_x2)
x3 = BbnNode(Variable(2, 'x3', domains['x3']), cpt_x3)
x4 = BbnNode(Variable(3, 'x4', domains['x4']), cpt_x4)
x5 = BbnNode(Variable(4, 'x5', domains['x5']), cpt_x5)
bbn = Bbn() \
.add_node(x1) \
.add_node(x2) \
.add_node(x3) \
.add_node(x4) \
.add_node(x5) \
.add_edge(Edge(x1, x5, EdgeType.DIRECTED)) \
.add_edge(Edge(x2, x5, EdgeType.DIRECTED)) \
.add_edge(Edge(x3, x5, EdgeType.DIRECTED)) \
.add_edge(Edge(x4, x5, EdgeType.DIRECTED))
join_tree = InferenceController.apply(bbn)
Here are the posteriors.
[94]:
for node in join_tree.get_bbn_nodes():
potential = join_tree.get_bbn_potential(node)
print(node)
print(potential)
print('-' * 15)
1|x2|v0,v1,v2
1=v0|0.34000
1=v1|0.34100
1=v2|0.31900
---------------
2|x3|v0,v1
2=v0|0.47700
2=v1|0.52300
---------------
3|x4|v0,v1,v2,v3,v4
3=v0|0.20900
3=v1|0.19700
3=v2|0.20600
3=v3|0.19100
3=v4|0.19700
---------------
4|x5|v0,v1,v2
4=v0|0.33850
4=v1|0.30794
4=v2|0.35356
---------------
0|x1|v0,v1
0=v0|0.48900
0=v1|0.51100
---------------
Let’s assert evidence and observe the posteriors.
[98]:
from pybbn.graph.jointree import EvidenceBuilder
ev1 = EvidenceBuilder() \
.with_node(join_tree.get_bbn_node_by_name('x1')) \
.with_evidence('v0', 1.0) \
.build()
join_tree.unobserve_all()
join_tree.update_evidences([ev1])
for node in join_tree.get_bbn_nodes():
potential = join_tree.get_bbn_potential(node)
print(node)
print(potential)
print('-' * 15)
1|x2|v0,v1,v2
1=v0|0.34000
1=v1|0.34100
1=v2|0.31900
---------------
2|x3|v0,v1
2=v0|0.47700
2=v1|0.52300
---------------
3|x4|v0,v1,v2,v3,v4
3=v0|0.20900
3=v1|0.19700
3=v2|0.20600
3=v3|0.19100
3=v4|0.19700
---------------
4|x5|v0,v1,v2
4=v0|0.36050
4=v1|0.29824
4=v2|0.34126
---------------
0|x1|v0,v1
0=v0|1.00000
0=v1|0.00000
---------------
Let’s assert multiple evidences. The posterior for \(X_5\) should be the same as the CPT when all parents are set to v0
.
[99]:
ev1 = EvidenceBuilder() \
.with_node(join_tree.get_bbn_node_by_name('x1')) \
.with_evidence('v0', 1.0) \
.build()
ev2 = EvidenceBuilder() \
.with_node(join_tree.get_bbn_node_by_name('x2')) \
.with_evidence('v0', 1.0) \
.build()
ev3 = EvidenceBuilder() \
.with_node(join_tree.get_bbn_node_by_name('x3')) \
.with_evidence('v0', 1.0) \
.build()
ev4 = EvidenceBuilder() \
.with_node(join_tree.get_bbn_node_by_name('x4')) \
.with_evidence('v0', 1.0) \
.build()
join_tree.unobserve_all()
join_tree.update_evidences([ev1, ev2, ev3, ev4])
for node in join_tree.get_bbn_nodes():
potential = join_tree.get_bbn_potential(node)
print(node)
print(potential)
print('-' * 15)
1|x2|v0,v1,v2
1=v0|1.00000
1=v1|0.00000
1=v2|0.00000
---------------
2|x3|v0,v1
2=v0|1.00000
2=v1|0.00000
---------------
3|x4|v0,v1,v2,v3,v4
3=v0|1.00000
3=v1|0.00000
3=v2|0.00000
3=v3|0.00000
3=v4|0.00000
---------------
4|x5|v0,v1,v2
4=v0|0.46667
4=v1|0.26667
4=v2|0.26667
---------------
0|x1|v0,v1
0=v0|1.00000
0=v1|0.00000
---------------
[ ]:
py-bbn
Subpackages
Graph
Variable
Variable.
- class pybbn.graph.variable.Variable(id, name, values)
Bases:
object
A variable.
- __init__(id, name, values)
Ctor.
- Parameters:
id – Numeric identifier. e.g. 0
name – Name. e.g. ‘a’
values – Array of values. e.g. [‘on’, ‘off’]
- to_dict()
Gets a JSON serializable dictionary representation.
- Returns:
Dictionary.
Node
Nodes. There are many types: nodes, cliques, belief network nodes and separation sets.
- class pybbn.graph.node.BbnNode(variable, probs)
Bases:
Node
A BBN node.
- get_weight()
Gets the weight, which is the number of values.
- Returns:
Weight.
- to_dict()
Gets a JSON serializable dictionary representation.
- Returns:
Dictionary.
- class pybbn.graph.node.Clique(nodes)
Bases:
Node
A clique.
- contains(id)
Checks if this clique contains the specified ID.
- Parameters:
id – Numeric id.
- Returns:
A boolean indicating if the specified id exists in this clique.
- get_node_ids()
Gets the node IDs in this clique.
- Returns:
An array of numeric ids of the nodes in this clique.
- get_sep_set(that)
Creates a separation-set from this node and the one passed in. The separation-set is composed of the intersection of the two cliques. If this node has [0, 1, 2] and the node passed in has [1, 2, 3], then the separation set will be [1, 2].
- Parameters:
that – Clique.
- Returns:
Separation-set.
- get_sid()
Gets the string ID of this clique.
- Returns:
String ID composed of the sorted corresponding variables in each node.
- get_weight()
Gets the weight of this clique; the weight is product of the weights of the nodes in this clique.
- Returns:
Weight.
- intersects(that)
Gets intersection information.
- Parameters:
that – Clique.
- Returns:
Tuple where first item is a boolean indicating if there is any intersection, second item are the IDs in this clique, third item are the IDs of that clique and last item are IDs common to both Cliques.
- is_marked()
Checks if this clique is marked.
- Returns:
A boolean indicating if the clique is marked.
- is_superset(that)
Checks if this clique is a superset of that clique.
- Parameters:
that – Clique.
- Returns:
A boolean indicating if this clique is a superset of the clique passed in.
- mark()
Marks this clique.
- unmark()
Unmarks this clique.
- class pybbn.graph.node.Node(id)
Bases:
object
A node.
- add_metadata(k, v)
Adds metadata.
- Parameters:
k – Key. Typically a string value.
v – Value. Any object.
- class pybbn.graph.node.SepSet(left, right, lhs=None, rhs=None, intersection=None)
Bases:
Clique
Separation-set.
- property cost
Gets the cost.
- Returns:
The cost.
- get_cost()
The cost is the sum of the weights of the cliques connected to this separation-set.
- Returns:
Cost.
- get_mass()
The mass is the number of nodes in this separation-set.
- Returns:
Mass.
- property is_empty
Checks if the cliques in this separation set have an empty intersection.
- Returns:
A boolean indicating if there is no intersection.
- property mass
Gets the mass.
- Returns:
The mass.
Edge
Edges. There are two main types: undirected and directed. However, many other types exists as well.
- class pybbn.graph.edge.Edge(i, j, type)
Bases:
object
Edge.
- __init__(i, j, type)
Ctor.
- Parameters:
i – Node.
j – Node.
type – Edge type.
- property key
Key used for map.
- Returns:
Key.
- class pybbn.graph.edge.EdgeType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
Edge type.
- DIRECTED = 2
- UNDIRECTED = 1
- class pybbn.graph.edge.JtEdge(sep_set)
Bases:
Edge
Junction tree edge. This is basically a hyper-edge.
- __init__(sep_set)
Ctor.
- Parameters:
sep_set – Separation set.
- get_lhs_edge()
Gets a JtEdge. e.g. left – sep_set.
- Returns:
JtEdge.
- get_rhs_edge()
Gets a JtEdge. e.g. right – sep_set.
- Returns:
JtEdge.
Graph
Basic graphs.
- class pybbn.graph.graph.Graph
Bases:
object
Graph.
- __init__()
Ctor.
- add_edge(edge)
Adds an edge.
- Parameters:
edge – Edge.
- Returns:
This graph.
- add_node(node)
Adds a node.
- Parameters:
node – Node.
- Returns:
This graph.
- edge_exists(id1, id2)
Checks if the specified edge id1 – id2 exists.
- Parameters:
id1 – Node id.
id2 – Node id.
- Returns:
A boolean indicating if the specified edge exists.
- get_edges()
Gets all the edges.
- Returns:
List of edges.
- get_neighbors(id)
Gets the neighbors of the specified node.
- Parameters:
id – Node id.
- Returns:
Set of neighbors of the specified node.
- get_node(id)
Gets the node associated with the specified id.
- Parameters:
id – Node id.
- Returns:
Node.
- get_nodes()
Gets all the nodes.
- Returns:
List of nodes.
- remove_node(id)
Removes a node from the graph.
- Parameters:
id – Node id.
Directed Acyclic Graph
Directed acyclic graphs.
- class pybbn.graph.dag.Bbn
Bases:
Dag
BBN.
- __init__()
Ctor.
- static from_csv(path)
Converts the BBN in CSV format to a BBN. :param path: Path to CSV file. :return: BBN.
- static from_dict(d)
Creates a BBN from a dictionary (deserialized JSON).
- Parameters:
d – Dictionary.
- Returns:
BBN.
- static from_json(path)
Deserializes BBN from JSON.
- Parameters:
path – Path.
- Returns:
BBN.
- get_parents_ordered(id)
Gets the IDs of the specified node ordered.
- Parameters:
id – ID of node.
- Returns:
List of parent IDs sorted.
- static to_csv(bbn, path)
Converts the specified BBN to CSV format.
- Parameters:
bbn – BBN.
path – Path to file.
- Returns:
None.
- static to_dict(bbn)
Gets a JSON serializable dictionary representation.
- Parameters:
bbn – BBN.
- Returns:
Dictionary.
- static to_dne(bbn, bnet_name='network')
- static to_json(bbn, path)
Serializes BBN to JSON.
- Parameters:
bbn – BBN.
path – Path.
- Returns:
None.
- class pybbn.graph.dag.BbnUtil
Bases:
object
BBN utility.
- static get_huang_graph()
Gets the Huang reference BBN graph.
- Returns:
BBN.
- static get_simple()
Gets a simple BBN graph.
- Returns:
BBN.
- class pybbn.graph.dag.Dag
Bases:
Graph
Directed acyclic graph.
- __init__()
Ctor.
- edge_exists(id1, id2)
Checks if a directed edge exists between the specified id. e.g. id1 -> id2
- Parameters:
id1 – Node id.
id2 – Node id.
- Returns:
A boolean indicating if a directed edge id1 -> id2 exists.
- get_children(node_id)
Gets the children IDs of the specified node.
- Parameters:
node_id – Node id.
- Returns:
Array of children ids.
- get_i2n()
Gets a map of node identifiers to names.
- Returns:
Dictionary.
- get_n2i()
Gets a map of node names to identifiers.
- Returns:
Dictionary.
- get_parents(id)
Gets the parent IDs of the specified node.
- Parameters:
id – Node id.
- Returns:
Array of parent ids.
- to_nx_graph()
Converts this DAG to a NX DiGraph for visualization.
- Returns:
A tuple, where the first item is the NX DiGraph and the second items are the node labels.
Partially Directed Acylic Graph
Partially directed acylic graphs.
- class pybbn.graph.pdag.PathDetector(graph, start, stop)
Bases:
object
Detects path between two nodes.
- __init__(graph, start, stop)
Ctor.
- Parameters:
graph – Pdag.
start – Start node id.
stop – Stop node id.
- exists()
Checks if a path exists.
- Returns:
True if a path exists, otherwise, false.
- class pybbn.graph.pdag.Pdag
Bases:
Graph
Partially directed acyclic graph.
- __init__()
Ctor.
- directed_edge_exists(id1, id2)
Checks if the specified edge id1 -> id2 exists.
- Parameters:
id1 – Node id.
id2 – Node id.
- Returns:
A boolean indicating if the edge exists.
- edge_exists(id1, id2)
Checks if the specified edge id1 – id2 exists.
- Parameters:
id1 – Node id.
id2 – Node id.
- Returns:
A boolean indicating if the edge exists.
- get_out_nodes(id)
Gets all the out nodes for the node with the specified id. Out nodes are all connected nodes that are not parents (do not have a directed arc into the specified node).
- Parameters:
id – Node id.
- Returns:
Array of out node ids.
- get_parents(id)
Gets the parent of the specified node id.
- Parameters:
id – Node id.
- Returns:
Array of parent ids.
Join Tree
Join trees or junction trees.
- class pybbn.graph.jointree.ChangeType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
Change type.
- NONE = 1
- RETRACTION = 3
- UPDATE = 2
- class pybbn.graph.jointree.Evidence(node, type)
Bases:
object
Evidence.
- __init__(node, type)
Ctor.
- Parameters:
node – BBN node.
type – EvidenceType.
- add_value(value, likelihood)
Adds a value.
- Parameters:
value – Value.
likelihood – Likelihood.
- Returns:
This evidence.
- compare(potentials)
Compares this evidence with previous ones.
- Parameters:
potentials – Map of potentials.
- Returns:
The ChangeType from the comparison.
- validate()
Validates this evidence.
virtual evidence: each likelihood must be in the range [0, 1].
finding evidence: all likelihoods must be exactly 1.0 or 0.0.
observation evidence: exactly one likelihood is 1.0 and all others must be 0.0.
- class pybbn.graph.jointree.EvidenceBuilder
Bases:
object
Evidence builder.
- __init__()
Ctor.
- build()
Builds an evidence.
- Returns:
Evidence.
- with_evidence(val, likelihood)
Adds evidence.
- Parameters:
val – Value.
likelihood – Likelihood.
- Returns:
Builder.
- with_node(node)
Adds a BBN node.
- Parameters:
node – BBN node.
- Returns:
Builder.
- with_type(type)
Adds the EvidenceType.
- Parameters:
type – EvidenceType.
- Returns:
Builder.
- class pybbn.graph.jointree.EvidenceType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
Evidence type.
- FINDING = 2
- OBSERVATION = 3
- UNOBSERVE = 4
- VIRTUAL = 1
- class pybbn.graph.jointree.JoinTree
Bases:
Ug
Join tree.
- __init__()
Ctor.
- add_edge(edge)
Adds an JtEdge.
- Parameters:
edge – JtEdge.
- Returns:
This join tree.
- add_potential(clique, potential)
Adds a potential associated with the specified clique.
- Parameters:
clique – Clique.
potential – Potential.
- Returns:
This join tree.
- find_cliques_with_node_and_parents(id)
Finds all cliques in this junction tree having the specified node and its parents.
- Parameters:
id – Node id.
- Returns:
Array of cliques.
- static from_dict(d)
Converts a dictionary to a junction tree.
- Parameters:
d – Dictionary.
- Returns:
Junction tree.
- get_bbn_node(id)
Gets the BBN node associated with the specified id.
- Parameters:
id – Node id.
- Returns:
BBN node or None if no such node exists.
- get_bbn_node_and_parents()
Gets a map of nodes and its parents.
- Returns:
Map. Keys are node ID and values are list of nodes.
- get_bbn_node_by_name(name)
Gets the BBN node associated with the specified name.
- Parameters:
name – Node name.
- Returns:
BBN node or None if no such node exists.
- get_bbn_nodes()
Gets all the BBN nodes in this junction tree.
- Returns:
List of BBN nodes.
- get_bbn_potential(node)
Gets the potential associated with the specified BBN node.
- Parameters:
node – BBN node.
- Returns:
Potential.
- get_change_type(evidences)
Gets the change type associated with the specified list of evidences.
- Parameters:
evidences – List of evidences.
- Returns:
ChangeType.
- get_cliques()
Gets all the cliques in this junction tree.
- Returns:
Array of cliques.
- get_evidence(node, value)
Gets the evidence associated with the specified BBN node and value.
- Parameters:
node – BBN node.
value – Value.
- Returns:
Potential (the evidence).
- get_flattened_edges()
Gets all the edges “flattened” out. Since separation-sets are really hyper-edges, this method breaks separation-sets into two edges.
- Returns:
Array of edges.
- get_posteriors()
Gets the posterior for all nodes.
- Returns:
Map. Keys are node names; values are map of node values to posterior probabilities.
- get_sep_sets()
Gets all the separation sets in this junction tree.
- Returns:
Array of separation sets.
- get_unobserved_evidence(node)
Gets the unobserved evidences associated with the specified node.
- Parameters:
node – BBN node.
- Returns:
Evidence.
- set_listener(listener)
Sets the listener.
- Parameters:
listener – JoinTreeListener.
- set_observation(evidence)
Sets a single observation.
- Parameters:
evidence – Evidence.
- Returns:
This join tree.
- static to_dict(jt, bbn)
Converts a junction tree to a serializable dictionary.
- Parameters:
jt – Junction tree.
bbn – BBN.
- Returns:
Dictionary.
- unmark_cliques()
Unmarks the cliques.
- unobserve(nodes)
Unobserves a list of nodes.
- Parameters:
nodes – List of nodes.
- Returns:
This join tree.
- unobserve_all()
Unobserves all BBN nodes.
- Returns:
This join tree.
- update_bbn_cpts(cpts)
Updates the CPTs of the BBN nodes.
- Parameters:
cpts – Dictionary of CPTs. Keys are ids of BBN node and values are new CPTs.
- Returns:
None
- update_evidences(evidences)
Updates this join tree with the list of specified evidence.
- Parameters:
evidences – List of evidences.
- Returns:
This join tree.
- class pybbn.graph.jointree.JoinTreeListener
Bases:
object
Interface like class used for listening to a join tree.
- evidence_retracted(join_tree)
Evidence is retracted.
- Parameters:
join_tree – Join tree.
- evidence_updated(join_tree)
Evidence is updated.
- Parameters:
join_tree – Join tree.
- class pybbn.graph.jointree.PathDetector(graph, start, stop)
Bases:
object
Detects path between two nodes.
- __init__(graph, start, stop)
Ctor.
- Parameters:
graph – Join tree.
start – Start node id.
stop – Stop node id.
- exists()
Checks if a path exists.
- Returns:
True if a path exists, otherwise, false.
Factory
Factories.
- class pybbn.graph.factory.Factory
Bases:
object
Factory to convert other API BBNs into py-bbn.
- static from_data(structure, df)
Creates a BBN.
- Parameters:
structure – A dictionary where keys are names of children and values are list of parent names.
df – A dataframe.
- Returns:
BBN.
- static from_libpgm_discrete_dictionary(d)
Converts a libpgm discrete network as specified by a dictionary into a py-bbn one. Look at https://pythonhosted.org/libpgm/unittestdict.html.
- Parameters:
d – A dictionary representing a libpgm discrete network.
- Returns:
py-bbn BBN.
- static from_libpgm_discrete_json(j)
Converts a libpgm discrete network as specified by a JSON string into a py-bbn one. Look at https://pythonhosted.org/libpgm/unittestdict.html.
- Parameters:
j – String representing JSON.
- Returns:
py-bbn BBN.
- static from_libpgm_discrete_object(bn)
Converts a libpgm discrete network object into a py-bbn one.
- Parameters:
bn – libpgm discrete BBN.
- Returns:
py-bbn BBN.
Potential
Potentials.
- class pybbn.graph.potential.Potential
Bases:
object
Potential.
- __init__()
Ctor.
- add_entry(entry)
Adds a PotentialEntry.
- Parameters:
entry – PotentialEntry.
- Returns:
This potential.
- get_matching_entries(entry)
Gets all potential entries matching the specified entry.
- Parameters:
entry – PotentialEntry.
- Returns:
Array of matching potential entries.
- static to_dict(potentials)
Converts potential to dictionary for easy validation.
- Parameters:
potentials – Potential.
- Returns:
Dictionary representation. Keys are entries and values are probabilities.
- class pybbn.graph.potential.PotentialEntry
Bases:
object
Potential entry.
- __init__()
Ctor.
- add(k, v)
Adds a node id and its value.
- Parameters:
k – Node id.
v – Value.
- Returns:
This potential entry.
- duplicate()
Duplicates this entry.
- Returns:
PotentialEntry.
- get_entry_keys()
Gets entry keys sorted.
- Returns:
List of tuples. First tuple is id of variable and second tuple is value of variable.
- get_kv()
Gets key-value pair that may be used for storage in dictionary.
- Returns:
Key-value pair.
- matches(that)
Checks if this potential entry matches the specified one. A match is determined with all the keys and their associated values in the potential entry passed in matches this one.
- Parameters:
that – PotentialEntry.
- Returns:
- class pybbn.graph.potential.PotentialUtil
Bases:
object
Potential util.
- static divide(numerator, denominator)
Divides two potentials.
- Parameters:
numerator – Potential.
denominator – Potential.
- Returns:
Potential.
- static get_cartesian_product(lists)
Gets the cartesian product of a list of lists of values. For example, if the list is
[ [‘on’, ‘off’], [‘on’, ‘off’] ]
then the result will be a list of the following
[ ‘on’, ‘on’]
[ ‘on’, ‘off’ ]
[ ‘off’, ‘on’ ]
[ ‘off’, ‘off’ ]
- Parameters:
lists – List of list of values.
- Returns:
Cartesian product of values.
- static get_potential(node, parents)
Gets the potential associated with the specified node and its parents.
- Parameters:
node – BBN node.
parents – Parents of the BBN node (that themselves are also BBN nodes).
- Returns:
Potential.
- static get_potential_from_nodes(nodes)
Gets a potential from a list of BBN nodes.
- Parameters:
nodes – Array of BBN nodes.
- Returns:
Potential.
- static is_zero(d)
Checks if the specified value is 0.0.
- Parameters:
d – Value.
- Returns:
A boolean indicating if the value is zero.
- static marginalize_for(join_tree, clique, nodes)
Marginalizes the specified clique’s potential over the specified nodes.
- Parameters:
join_tree – Join tree.
clique – Clique.
nodes – List of BBN nodes.
- Returns:
Potential.
- static merge(node, parents)
Merges the nodes into one array.
- Parameters:
node – BBN node.
parents – BBN parent nodes.
- Returns:
Array of BBN nodes.
- static multiply(bigger, smaller)
Multiplies two potentials. Order matters.
- Parameters:
bigger – Bigger potential.
smaller – Smaller potential.
- static normalize(potential)
Normalizes the potential (make sure they sum to 1.0).
- Parameters:
potential – Potential.
- Returns:
Potential.
- static pass_single_message(join_tree, x, s, y)
Single message pass from x – s – y (from x to s to y).
- Parameters:
join_tree – Join tree.
x – Clique.
s – Separation-set.
y – Clique.
Utilities
Utilities to make life easier.
Junction Tree Algorithm
Inference Control
Used in controlling exact inference.
- class pybbn.pptc.inferencecontroller.InferenceController
Bases:
JoinTreeListener
Inference controller.
- static apply(bbn)
Sets up the specified BBN for probability propagation in tree clusters (PPTC).
- Parameters:
bbn – BBN graph.
- Returns:
Join tree.
- static apply_from_serde(join_tree)
Applies propagation to join tree from a deserialzed join tree.
- Parameters:
join_tree – Join tree.
- Returns:
Join tree (the same one passed in).
- evidence_retracted(join_tree)
Evidence is retracted.
- Parameters:
join_tree – Join tree.
- evidence_updated(join_tree)
Evidence is updated.
- Parameters:
join_tree – Join tree.
- static reapply(join_tree, cpts)
Reapply propagation to join tree with new CPTs. The join tree structure is kept but the BBN node CPTs are updated. A new instance/copy of the join tree will be returned.
- Parameters:
join_tree – Join tree.
cpts – Dictionary of new CPTs. Keys are id’s of nodes and values are new CPTs.
- Returns:
Join tree.
Potential Initialization
Used to initialize potentials.
Moralization
Moralization of a directed acyclic graph.
Triangulation
Triangulates a moralized graph.
- class pybbn.pptc.triangulator.NodeClique(node, neighbors, weight, edges)
Bases:
object
Node clique.
- __init__(node, neighbors, weight, edges)
Ctor.
- Parameters:
node – BBN node.
neighbors – BBN nodes (neighbors).
weight – Weight.
edges – Edges.
- get_bbn_nodes()
Gets all the BBN nodes in this node clique.
- Returns:
Array of BBN nodes.
- class pybbn.pptc.triangulator.Triangulator
Bases:
object
Triangulator. Triangulates an undirected moralized graph and produces cliques in the process.
- static duplicate(g)
Duplicates a undirected graph.
- Parameters:
g – Undirected graph.
- Returns:
Undirected graph.
- static generate_cliques(m)
Generates a list of node cliques.
- Parameters:
m – Graph.
- Returns:
List of NodeCliques.
- static get_edges_to_add(n, m)
Gets edges to add.
- Parameters:
n – BBN node.
m – Graph.
- Returns:
Array of edges.
- static get_weight(n, m)
Gets the weight of a BBN node. The weight of a node is the product of the its weight with all its neighbors’ weight.
- Parameters:
n – BBN node.
m – Graph.
- Returns:
Weight.
- static is_subset(cliques, clique)
Checks if the specified clique is a subset of the specified list of cliques.
- Parameters:
cliques – List of cliques.
clique – Clique.
- Returns:
A boolean indicating if the clique is a subset.
- static select_node(m)
Selects a clique from the specified graph. Cliques are sorted by number of edges, weight, and id (asc).
- Parameters:
m – Graph.
- Returns:
Clique.
- static triangulate(m)
Triangulates the specified moralized graph.
- Parameters:
m – Moralized undirected graph.
- Returns:
Array of cliques.
Transformation
Transforms the cliques found from triangulation into a junction tree.
- class pybbn.pptc.transformer.Transformer
Bases:
object
Transformer. Transforms a list of cliques into a join tree.
- static get_sep_sets(cliques)
Gets all pair-wise separation-sets.
- Parameters:
cliques – Array of cliques.
- Returns:
Array of separation sets sorted descendingly by mass followed by cost (asc) and id (asc).
- static transform(cliques)
Transforms the cliques into a join tree.
- Parameters:
cliques – List of cliques.
- Returns:
Join tree.
Initialization
Initializes a junction tree.
- class pybbn.pptc.initializer.Initializer
Bases:
object
Initializes the join tree.
- static get_clique(node, join_tree)
Gets the parent clique associated with the specified BBN node.
- Parameters:
node – BBN node.
join_tree – Join tree.
- Returns:
Parent clique.
- static initialize(join_tree)
Starts the initialization.
- Parameters:
join_tree – Join tree.
- Returns:
Join tree.
Propagation
Propagates evidences in a junction tree.
- class pybbn.pptc.propagator.Propagator
Bases:
object
Evidence propagator.
- static collect_evidence(join_tree, start)
Collects evidence.
- Parameters:
join_tree – Join tree.
start – Start clique.
- static distribute_evidence(join_tree, start)
Distributes evidence.
- Parameters:
join_tree – Join tree.
start – Start clique.
- static propagate(join_tree)
Propagates evidence.
- Parameters:
join_tree – Join tree.
- Returns:
Join tree.
Evidence Distribution
Distributes evidences.
- class pybbn.pptc.evidencedistributor.EvidenceDistributor(join_tree, start_clique)
Bases:
object
Evidence distributor. Passes messages using breadth-first-search (BFS). Messages are passed from the start clique to the far remote cliques.
- __init__(join_tree, start_clique)
Ctor.
- Parameters:
join_tree – Join tree.
start_clique – Start clique.
- start()
Starts the evidence distribution.
Evidence Collection
Collects evidences.
- class pybbn.pptc.evidencecollector.EvidenceCollector(join_tree, start_clique)
Bases:
object
Evidence collector. Passes messages using depth-first-search (DFS). Messages are passed from the far remote cliques back to the start clique.
- __init__(join_tree, start_clique)
Ctor.
- Parameters:
join_tree – Join tree.
start_clique – Start clique.
- start()
Starts the evidence collection.
Sampling
Use this module for sampling.
- class pybbn.sampling.sampling.LogicSampler(bbn)
Bases:
object
Logic sampling with rejection.
- __init__(bbn)
Ctor.
- Parameters:
bbn – BBN.
- get_samples(evidence={}, n_samples=100, seed=37)
Gets the samples.
- Parameters:
evidence – Evidence. Dictionary. Keys are ids and values are node values.
n_samples – Number of samples.
seed – Seed (default=37).
- Returns:
Samples.
- class pybbn.sampling.sampling.SortableNode(node_id, parent_ids)
Bases:
object
Sortable node.
- __init__(node_id, parent_ids)
Ctor.
- Parameters:
node_id – Node ID.
parent_ids – List of parent IDs.
- class pybbn.sampling.sampling.Table(node, parents=[])
Bases:
object
Table association parent instantiations with cumulative distributions of node values.
- __init__(node, parents=[])
Ctor.
- Parameters:
node – BBN node.
parents – List of parent BBN nodes.
- get_value(prob, sample=None)
Gets the value associated with the specified probability.
- Parameters:
prob – Probability.
sample – Dictionary of variable-value sampled so far.
- Returns:
Value.
- has_parents()
Checks if the node associated with this table has parents.
- Returns:
Boolean.
Generator
Used this package to create realistic Bayesian belief networks.
- pybbn.generator.bbngenerator.convert_for_drawing(bbn)
Converts a BBN to a networkx graph for drawing.
- Parameters:
bbn – BBN.
- Returns:
Directed acyclic graph.
- pybbn.generator.bbngenerator.convert_for_exact_inference(g, p)
Converts the graph and parameters to a BBN.
- Parameters:
g – Directed acyclic graph (DAG in the form of networkx).
p – Parameters.
- Returns:
BBN.
- pybbn.generator.bbngenerator.generate_bbn_to_file(n, file_path, bbn_type='singly', max_iter=10, max_values=2, max_alpha=10)
Generates a BBN and saves it to a file.
- Parameters:
n – Number of nodes.
file_path – File path. JSON and CSV supported. Export will be determined by path extension.
bbn_type – Type: singly or multi.
max_iter – Maximum iterations.
max_values – Maximum values.
max_alpha – Maximum alpha.
- Returns:
None.
- pybbn.generator.bbngenerator.generate_multi_bbn(n, max_iter=10, max_values=2, max_alpha=10)
Generates structure and parameters for a multi-connected BBN.
- Parameters:
n – Number of nodes.
max_iter – Maximum iterations.
max_values – Maximum values per node.
max_alpha – Maximum alpha per value (hyperparameters).
- Returns:
A tuple of structure and parameters.
- pybbn.generator.bbngenerator.generate_singly_bbn(n, max_iter=10, max_values=2, max_alpha=10)
Generates structure and parameters for a singly-connected BBN.
- Parameters:
n – Number of nodes.
max_iter – Maximum iterations.
max_values – Maximum values per node.
max_alpha – Maximum alpha per value (hyperparameters).
- Returns:
A tuple of structure and parameters.
- pybbn.generator.bbngenerator.to_json(g, params, pretty=False)
Serializes the graph to JSON.
- Parameters:
g – Graph.
params – Parameters.
pretty – Pretty-print serialization flag.
- Returns:
None.
Causality
Average Causal Effect
Use this package to compute the Average Causal Effect
.
Gaussian Package
Inference
Use this module to do inference in Gaussian Bayesian Belief Networks.
- class pybbn.gaussian.inference.GaussianInference(H, M, E, meta={})
Bases:
object
Gaussian inference.
- property P
Gets the univariate parameters of each variable.
- Returns:
Dictionary. Keys are variable names. Values are tuples of (mean, variance).
- __init__(H, M, E, meta={})
ctor.
- Parameters:
H – Headers.
M – Means.
E – Covariance matrix.
meta – Dictionary storing observations.
- do_inference(name, observation)
Performs inference. Simply calls the do_inferences method.
- Parameters:
name – Name of variable.
observation – Observation value.
- Returns:
GaussianInference.
- do_inferences(observations)
Performs inference.
Denote the following.
\(z\) as the variable observed
\(y\) as the set of other variables
- \(\mu\) as the vector of means
\(\mu_z\) as the partitioned \(\mu`\) of length \(|z|\)
\(\mu_y\) as the partitioned \(\mu`\) of length \(|y|\)
- \(\Sigma\) as the covariance matrix
\(\Sigma_{yz}\) as the partitioned \(\Sigma\) of \(|y|\) rows and \(|z|\) columns
\(\Sigma_{zz}\) as the partitioned \(\Sigma\) of \(|z|\) rows and \(|z|\) columns
\(\Sigma_{yy}\) as the partitioned \(\Sigma\) of \(|y|\) rows and \(|y|\) columns
If we observe evidence \(z_e\), then the new means \(\mu_y^{*}\) and covariance matrix \(\Sigma_y^{*}\) corresponding to \(y\) are computed as follows.
\(\mu_y^{*} = \mu_y - \Sigma_{yz} \Sigma_{zz} (z_e - \mu_z)\)
\(\Sigma_y^{*} = \Sigma_{yy} \Sigma_{zz} \Sigma_{yz}^{T}\)
- Parameters:
observations – List of observation. Each observation is tuple (name, value).
- Returns:
GaussianInference.
- property marginals
Gets the marginals.
- Returns:
List of dictionary. Each element has name, mean and variance.
- sample_marginals(size=1000)
Samples data from the marginals.
- Parameters:
size – Number of samples.
- Returns:
Dictionary with keys as names and values as pandas series (sampled data).
Indices and tables
Copyright
Software
Copyright 2017 Jee Vang
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Art
Copyright 2020 Daytchia Vang
Citation
@misc{vang_2017,
title={PyBBN},
url={https://github.com/vangj/py-bbn/},
author={Vang, Jee},
year={2017},
month={Jan}}