py-bbn

End of Life: This version of py-bbn is no longer maintained. For a new version please go here.

pybbn logo.

py-bbn is a Python implementation of probabilistic and causal inference in Bayesian Belief Networks using exact inference algorithms [CGH97, Cow98, HD99, Kol09, Mur12].

You may install py-bbn from pypi.

pip install pybbn

If you like py-bbn, you might be interested in our next-generation products.

Rocket Vector is a CausalAI platform in the cloud.

Rocket Vector logo.

Autonosis is a GenAI + CausalAI capable platform.

Autonosis logo.

pyspark-bbn is a is a scalable, massively parallel processing MPP framework for learning structures and parameters of Bayesian Belief Networks BBNs using Apache Spark.

pyspark-bbn logo.

Please contact us at info@rocketvector.io. Let’s reach for success!

Probabilistic Inference

The probabilistic inference algorithm used by py-bbn is an exact inference algorithm. Let’s go through an example on how to conduct exact inference.

Huang Graph

Below is the code to create the Huang Graph [HD99]. Note the typical procedure as follows.

  • create a Bayesian Belief Network (BBN)

  • create a junction tree from the graph

  • assert evidence

  • print out the marginal probabilities

digraph { node [fixedsize=true, width=0.3, shape=circle, fontname="Helvetica-Outline", color=crimson, style=filled] A -> B A -> C B -> D C -> E D -> F E -> F C -> G E -> H G -> H }

Huang Bayesian Belief Network structure.

 1from pybbn.graph.dag import Bbn
 2from pybbn.graph.edge import Edge, EdgeType
 3from pybbn.graph.jointree import EvidenceBuilder
 4from pybbn.graph.node import BbnNode
 5from pybbn.graph.variable import Variable
 6from pybbn.pptc.inferencecontroller import InferenceController
 7
 8# create the nodes
 9a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
10b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
11c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
12d = BbnNode(Variable(3, 'd', ['on', 'off']), [0.9, 0.1, 0.5, 0.5])
13e = BbnNode(Variable(4, 'e', ['on', 'off']), [0.3, 0.7, 0.6, 0.4])
14f = BbnNode(Variable(5, 'f', ['on', 'off']), [0.01, 0.99, 0.01, 0.99, 0.01, 0.99, 0.99, 0.01])
15g = BbnNode(Variable(6, 'g', ['on', 'off']), [0.8, 0.2, 0.1, 0.9])
16h = BbnNode(Variable(7, 'h', ['on', 'off']), [0.05, 0.95, 0.95, 0.05, 0.95, 0.05, 0.95, 0.05])
17
18# create the network structure
19bbn = Bbn() \
20    .add_node(a) \
21    .add_node(b) \
22    .add_node(c) \
23    .add_node(d) \
24    .add_node(e) \
25    .add_node(f) \
26    .add_node(g) \
27    .add_node(h) \
28    .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
29    .add_edge(Edge(a, c, EdgeType.DIRECTED)) \
30    .add_edge(Edge(b, d, EdgeType.DIRECTED)) \
31    .add_edge(Edge(c, e, EdgeType.DIRECTED)) \
32    .add_edge(Edge(d, f, EdgeType.DIRECTED)) \
33    .add_edge(Edge(e, f, EdgeType.DIRECTED)) \
34    .add_edge(Edge(c, g, EdgeType.DIRECTED)) \
35    .add_edge(Edge(e, h, EdgeType.DIRECTED)) \
36    .add_edge(Edge(g, h, EdgeType.DIRECTED))
37
38# convert the BBN to a join tree
39join_tree = InferenceController.apply(bbn)
40
41# insert an observation evidence
42ev = EvidenceBuilder() \
43    .with_node(join_tree.get_bbn_node_by_name('a')) \
44    .with_evidence('on', 1.0) \
45    .build()
46join_tree.set_observation(ev)
47
48# print the posterior probabilities
49for node, posteriors in join_tree.get_posteriors().items():
50    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
51    print(f'{node} : {p}')

A Bayesian Belief Network (BBN) is defined as a pair, G, P, where

  • G is a directed acylic graph (DAG)

  • P is a joint probability distribution

  • and G satisfies the Markov Condition (nodes are conditionally independent of non-descendants given its parents)

Ideally, the API should force the user to define G and P separately. However, there will be a bit of cognitive friction with this API as we define nodes associated with their local probability models (conditional probability tables) and then the structure afterwards. But this approach seems a bit more concise, no?

Updating Conditional Probability Tables

Sometimes, you may want to preserve the join tree structure and just update the condtional probability tables (CPTs). Here’s how to do so.

 1from pybbn.graph.dag import Bbn
 2from pybbn.graph.edge import EdgeType, Edge
 3from pybbn.graph.node import BbnNode
 4from pybbn.graph.variable import Variable
 5from pybbn.pptc.inferencecontroller import InferenceController
 6
 7# you have built a BBN
 8a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
 9b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
10bbn = Bbn().add_node(a).add_node(b) \
11    .add_edge(Edge(a, b, EdgeType.DIRECTED))
12
13# you have built a junction tree from the BBN
14# let's call this "original" junction tree the left-hand side (lhs) junction tree
15lhs_jt = InferenceController.apply(bbn)
16
17# you may just update the CPTs with the original junction tree structure
18# the algorithm to find/build the junction tree is avoided
19# the CPTs are updated
20rhs_jt = InferenceController.reapply(lhs_jt, {0: [0.3, 0.7], 1: [0.2, 0.8, 0.8, 0.2]})
21
22# let's print out the marginal probabilities and see how things changed
23# print the marginal probabilities for the lhs junction tree
24print('lhs probabilities')
25# print the posterior probabilities
26for node, posteriors in lhs_jt.get_posteriors().items():
27    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
28    print(f'{node} : {p}')
29
30# print the marginal probabilities for the rhs junction tree
31print('rhs probabilities')
32for node, posteriors in rhs_jt.get_posteriors().items():
33    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
34    print(f'{node} : {p}')

Note that we use InferenceController.reapply(...) to apply the new CPTs to a previous one and that we get a new junction tree as an output.

Gaussian Inference

Inference on a Gaussian Bayesian Network (GBN) is accomplished through updating the means and covariance matrix incrementally [CGH97]. The following GBN comes from [Cow98].

digraph { node [fixedsize=true, width=0.3, shape=circle, fontname="Helvetica-Outline", color=crimson, style=filled] Y -> X X -> Z }

Cowell GBN structure.

The variables come from the following Gaussian distributions.

  • \(Y = \mathcal{N}(0, 1)\)

  • \(X = \mathcal{N}(Y, 1)\)

  • \(Z = \mathcal{N}(Z, 1)\)

Below is a code sample of how we can perform inference on this GBN.

 1import numpy as np
 2
 3from pybbn.gaussian.inference import GaussianInference
 4
 5
 6def get_cowell_data():
 7    """
 8    Gets Cowell data.
 9
10    :return: Data and headers.
11    """
12    n = 10000
13    Y = np.random.normal(0, 1, n)
14    X = np.random.normal(Y, 1, n)
15    Z = np.random.normal(X, 1, n)
16
17    D = np.vstack([Y, X, Z]).T
18    return D, ['Y', 'X', 'Z']
19
20
21# assume we have data and headers (variable names per column)
22# X is the data (rows are observations, columns are variables)
23# H is just a list of variable names
24X, H = get_cowell_data()
25
26# then we can compute the means and covariance matrix easily
27M = X.mean(axis=0)
28E = np.cov(X.T)
29
30# the means and covariance matrix are all we need for gaussian inference
31# notice how we keep `g` around?
32# we'll use `g` over and over to do inference with evidence/observations
33g = GaussianInference(H, M, E)
34# {'Y': (0.00967, 0.98414), 'X': (0.01836, 2.02482), 'Z': (0.02373, 3.00646)}
35print(g.P)
36
37# we can make a single observation with do_inference()
38g1 = g.do_inference('X', 1.5)
39# {'X': (1.5, 0), 'Y': (0.76331, 0.49519), 'Z': (1.51893, 1.00406)}
40print(g1.P)
41
42# we can make multiple observations with do_inferences()
43g2 = g.do_inferences([('Z', 1.5), ('X', 2.0)])
44# {'Z': (1.5, 0), 'X': (2.0, 0), 'Y': (1.00770, 0.49509)}
45print(g2.P)

Causal Inference

Average Causal Effect

Here’s how you may estimate the Average Causal Effect ACE using Pearl’s do-operator [Pea88, Pea00, Pea16, Pea18]. In this example, we want to estimate the ACE of drug on recovery where recovery is true.

digraph { node [fixedsize=true, width=0.3, shape=circle, fontname="Helvetica-Outline", color=crimson, style=filled] Z -> X Z -> Y X -> Y }

Z is confounding X and Y.

 1from pybbn.causality.ace import Ace
 2from pybbn.graph.dag import Bbn
 3from pybbn.graph.edge import Edge, EdgeType
 4from pybbn.graph.node import BbnNode
 5from pybbn.graph.variable import Variable
 6
 7# create a BBN
 8gender_probs = [0.49, 0.51]
 9drug_probs = [0.23323615160349853, 0.7667638483965015,
10              0.7563025210084033, 0.24369747899159663]
11recovery_probs = [0.31000000000000005, 0.69,
12                  0.27, 0.73,
13                  0.13, 0.87,
14                  0.06999999999999995, 0.93]
15
16X = BbnNode(Variable(1, 'drug', ['false', 'true']), drug_probs)
17Y = BbnNode(Variable(2, 'recovery', ['false', 'true']), recovery_probs)
18Z = BbnNode(Variable(0, 'gender', ['female', 'male']), gender_probs)
19
20bbn = Bbn() \
21    .add_node(X) \
22    .add_node(Y) \
23    .add_node(Z) \
24    .add_edge(Edge(Z, X, EdgeType.DIRECTED)) \
25    .add_edge(Edge(Z, Y, EdgeType.DIRECTED)) \
26    .add_edge(Edge(X, Y, EdgeType.DIRECTED))
27
28# compute the ACE
29ace = Ace(bbn)
30results = ace.get_ace('drug', 'recovery', 'true')
31t = results['true']
32f = results['false']
33average_causal_impact = t - f

Serialization/Deserialization

We all need a way to save (serialize) and load (deserialize) our Bayesian Belief Networks (BBNs) and join trees (JTs). Here’s how to do so. Note that serde (serialization/deserialization) features are just writing to JSON or CSV formats and loading back from the such files. The code takes care of the serde process.

Serializing a BBN

JSON Serialization Format

 1from pybbn.graph.dag import Bbn
 2from pybbn.graph.edge import Edge, EdgeType
 3from pybbn.graph.node import BbnNode
 4from pybbn.graph.variable import Variable
 5
 6# create graph
 7a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
 8b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
 9bbn = Bbn().add_node(a).add_node(b) \
10    .add_edge(Edge(a, b, EdgeType.DIRECTED))
11
12# serialize
13Bbn.to_json(bbn, 'simple-bbn.json')

You will get a file simple-bbn.json written out with the following content.

 1{
 2  "nodes": {
 3    "0": {
 4      "probs": [
 5        0.2,
 6        0.8
 7      ],
 8      "variable": {
 9        "id": 0,
10        "name": "a",
11        "values": [
12          "t",
13          "f"
14        ]
15      }
16    },
17    "1": {
18      "probs": [
19        0.1,
20        0.9,
21        0.9,
22        0.1
23      ],
24      "variable": {
25        "id": 1,
26        "name": "b",
27        "values": [
28          "t",
29          "f"
30        ]
31      }
32    }
33  },
34  "edges": [
35    {
36      "pa": 0,
37      "ch": 1
38    }
39  ]
40}

CSV Serialization Format

 1from pybbn.graph.dag import Bbn
 2from pybbn.graph.edge import Edge, EdgeType
 3from pybbn.graph.node import BbnNode
 4from pybbn.graph.variable import Variable
 5
 6# create graph
 7a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
 8b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
 9bbn = Bbn().add_node(a).add_node(b) \
10    .add_edge(Edge(a, b, EdgeType.DIRECTED))
11
12# serialize
13Bbn.to_csv(bbn, 'simple-bbn.csv')

You will get a file simple-bbn.csv written out with the following content.

10,a,t,f,|,0.2,0.8
21,b,t,f,|,0.1,0.9,0.9,0.1
30,1,directed

Deserializing a BBN

JSON Deserialization Format

1from pybbn.graph.dag import Bbn
2
3# deserialize
4bbn = Bbn.from_json('simple-bbn.json')

CSV Deserialization Format

1from pybbn.graph.dag import Bbn
2
3# deserialize
4bbn = Bbn.from_csv('simple-bbn.csv')

Join Tree Serde

A join tree may also be serialized and deserialized. Only json format is supported for now.

Serializing a Join Tree

 1import json
 2
 3from pybbn.graph.dag import Bbn
 4from pybbn.graph.edge import EdgeType, Edge
 5from pybbn.graph.jointree import JoinTree
 6from pybbn.graph.node import BbnNode
 7from pybbn.graph.variable import Variable
 8from pybbn.pptc.inferencecontroller import InferenceController
 9
10a = BbnNode(Variable(0, 'a', ['t', 'f']), [0.2, 0.8])
11b = BbnNode(Variable(1, 'b', ['t', 'f']), [0.1, 0.9, 0.9, 0.1])
12bbn = Bbn().add_node(a).add_node(b) \
13    .add_edge(Edge(a, b, EdgeType.DIRECTED))
14jt = InferenceController.apply(bbn)
15
16with open('simple-join-tree.json', 'w') as f:
17    d = JoinTree.to_dict(jt, bbn)
18    j = json.dumps(d, sort_keys=True, indent=2)
19    f.write(j)

You will get a file simple-join-tree.json written out with the following content.

 1{
 2  "bbn_nodes": {
 3    "0": {
 4      "probs": [
 5        0.2,
 6        0.8
 7      ],
 8      "variable": {
 9        "id": 0,
10        "name": "a",
11        "values": [
12          "t",
13          "f"
14        ]
15      }
16    },
17    "1": {
18      "probs": [
19        0.1,
20        0.9,
21        0.9,
22        0.1
23      ],
24      "variable": {
25        "id": 1,
26        "name": "b",
27        "values": [
28          "t",
29          "f"
30        ]
31      }
32    }
33  },
34  "jt": {
35    "edges": [],
36    "nodes": {
37      "0-1": {
38        "node_ids": [
39          0,
40          1
41        ],
42        "type": "clique"
43      }
44    },
45    "parent_info": {
46      "0": [],
47      "1": [
48        0
49      ]
50    }
51  }
52}

Deserializing a Join Tree

 1import json
 2
 3from pybbn.graph.jointree import JoinTree
 4from pybbn.pptc.inferencecontroller import InferenceController
 5
 6with open('simple-join-tree.json', 'r') as f:
 7    j = f.read()
 8    d = json.loads(j)
 9    jt = JoinTree.from_dict(d)
10    jt = InferenceController.apply_from_serde(jt)

Generating Bayesian Belief Networks

Let’s generate some Bayesian Belief Networks (BBNs). The algorithms are taken from Random Generation of Bayesian Networks [IC02]. There are two types of BBNs you may generate.

  • singly-connected

  • multi-connected

A singly-connected BBN is one, where ignoring the direction of the edges, there is at most one path between any two nodes. A multi-connected BBN is one that is not singly-connected.

digraph { node [fixedsize=true, width=0.3, shape=circle, fontname="Helvetica-Outline", color=crimson, style=filled] A -> C B -> C C -> D C -> E E -> F }

Singly-connected network structure.

digraph { node [fixedsize=true, width=0.3, shape=circle, fontname="Helvetica-Outline", color=crimson, style=filled] A -> C B -> C C -> D C -> E D -> F E -> F }

Multi-connected network structure. There are two paths between C and F: (C, D, F) and (C, E, F).

Singly-Connected

The key method to use here is generate_singly_bbn.

 1import numpy as np
 2
 3from pybbn.generator.bbngenerator import generate_singly_bbn, convert_for_exact_inference, convert_for_drawing
 4
 5# very important to set the seed for reproducible results
 6np.random.seed(37)
 7
 8# this method generates the graph, g, and probabilities, p
 9# note we are generating a singly-connected graph
10g, p = generate_singly_bbn(5, max_iter=5)
11
12# you have to convert g and p to a BBN
13bbn = convert_for_exact_inference(g, p)
14
15# you can convert the BBN to a nx graph for visualization
16nx_graph = convert_for_drawing(bbn)

Multi-Connected

The key method to use here is generate_multi_bbn.

 1import numpy as np
 2
 3from pybbn.generator.bbngenerator import generate_multi_bbn, convert_for_exact_inference, convert_for_drawing
 4
 5# very important to set the seed for reproducible results
 6np.random.seed(37)
 7
 8# this method generates the graph, g, and probabilities, p
 9# note we are generating a multi-connected graph
10g, p = generate_multi_bbn(5, max_iter=5)
11
12# you have to convert g and p to a BBN
13bbn = convert_for_exact_inference(g, p)
14
15# you can convert the BBN to a nx graph for visualization
16nx_graph = convert_for_drawing(bbn)

Direct Generation

In the case where you do NOT need a reference to the BBN objects, use the API’s convenience method to generate and serialize the BBN directly to file.

 1import numpy as np
 2
 3from pybbn.generator.bbngenerator import generate_bbn_to_file
 4
 5# set the seed for reproducibility
 6np.random.seed(37)
 7
 8# generate a singly-connected BBN
 9generate_bbn_to_file(n=10, file_path='singly-bbn.csv', bbn_type='singly', max_alpha=10)
10
11# generate a multi-connected BBN
12generate_bbn_to_file(n=10, file_path='multi-bbn.csv', bbn_type='multi', max_alpha=10)

Here’s the output for singly-bbn.csv.

 10,0,state0,state1,|,0.5495149877004699,0.4504850122995299
 21,1,state0,state1,|,0.35835359558290997,0.64164640441709,0.8660444980250707,0.13395550197492936
 32,2,state0,state1,|,0.5828348518985648,0.4171651481014352,0.6352808281847757,0.3647191718152243
 43,3,state0,state1,|,0.43155247482552955,0.5684475251744704,0.05744110250902426,0.9425588974909757,0.44585399607259946,0.5541460039274007,0.286749915005319,0.713250084994681
 54,4,state0,state1,|,0.3190576398549361,0.6809423601450639,0.011424133320075755,0.9885758666799241
 65,5,state0,state1,|,0.48207371043602226,0.5179262895639779,0.07147107402394111,0.9285289259760588
 76,6,state0,state1,|,0.2076134466833406,0.7923865533166594,0.44542849473036455,0.5545715052696354
 87,7,state0,state1,|,0.757560101942848,0.242439898057152
 98,8,state0,state1,|,0.1906328058926942,0.8093671941073058,0.2814000588799281,0.7185999411200719
109,9,state0,state1,|,0.7854793106243432,0.2145206893756569,0.12392098364527641,0.8760790163547235
110,1,directed
121,2,directed
132,3,directed
143,4,directed
153,8,directed
165,6,directed
175,3,directed
187,5,directed
198,9,directed

Here’s the output for multi-bbn.csv.

 10,0,state0,state1,|,0.680874572938313,0.319125427061687
 21,1,state0,state1,|,0.7617263477727293,0.23827365222727065,0.3117227721913154,0.6882772278086846
 32,2,state0,state1,|,0.12614472921860395,0.8738552707813961,0.7070911105993563,0.29290888940064375
 43,3,state0,state1,|,0.4055587320025024,0.5944412679974977,0.9624106996627307,0.037589300337269156
 54,4,state0,state1,|,0.31986562609614827,0.6801343739038517,0.022365118374575416,0.9776348816254246
 65,5,state0,state1,|,0.77366174354673,0.2263382564532701,0.8579513677510221,0.1420486322489778,0.3183725110598738,0.6816274889401261,0.04262514631905535,0.9573748536809447
 76,6,state0,state1,|,0.05830032685169777,0.9416996731483022,0.5840685338695271,0.41593146613047294,0.7078930065265004,0.29210699347349944,0.490562272424676,0.509437727575324
 87,7,state0,state1,|,0.7569425298012309,0.243057470198769,0.6536654079476188,0.3463345920523811,0.6299885487124776,0.3700114512875224,0.4929042112083024,0.5070957887916976
 98,8,state0,state1,|,0.3295640257593744,0.6704359742406256,0.9098731919901998,0.09012680800980029
109,9,state0,state1,|,0.7804943261233692,0.21950567387663072,0.43963638923803844,0.5603636107619615,0.03168532379450399,0.968314676205496,0.7189237718440259,0.28107622815597405,0.356320337335263,0.643679662664737,0.8089559692517324,0.19104403074826756,0.520364955519572,0.47963504448042804,0.3989706528653481,0.601029347134652
110,1,directed
120,9,directed
130,5,directed
141,2,directed
152,3,directed
163,4,directed
174,5,directed
184,6,directed
194,7,directed
205,6,directed
216,7,directed
226,9,directed
237,8,directed
248,9,directed

Sampling Data

Sampling data from a BBN is possible. The algorithm uses logic sampling with rejection [Hen88].

Simple Sampling

This code demonstrates simple sampling.

 1from pybbn.graph.dag import Bbn
 2from pybbn.graph.edge import Edge, EdgeType
 3from pybbn.graph.node import BbnNode
 4from pybbn.graph.variable import Variable
 5from pybbn.sampling.sampling import LogicSampler
 6
 7a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
 8b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
 9c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
10
11bbn = Bbn() \
12    .add_node(a) \
13    .add_node(b) \
14    .add_node(c) \
15    .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
16    .add_edge(Edge(b, c, EdgeType.DIRECTED))
17
18sampler = LogicSampler(bbn)
19samples = sampler.get_samples(n_samples=10000, seed=37)

Sampling with Rejection

This code demonstrates sampling with evidence asserted. During each round of sampling, if the sample value generated does not match with the evidence, the entire sample is discarded.

 1from pybbn.graph.dag import Bbn
 2from pybbn.graph.edge import Edge, EdgeType
 3from pybbn.graph.node import BbnNode
 4from pybbn.graph.variable import Variable
 5from pybbn.sampling.sampling import LogicSampler
 6
 7a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
 8b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
 9c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
10
11bbn = Bbn() \
12    .add_node(a) \
13    .add_node(b) \
14    .add_node(c) \
15    .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
16    .add_edge(Edge(b, c, EdgeType.DIRECTED))
17
18sampler = LogicSampler(bbn)
19samples = sampler.get_samples(evidence={0: 'on'}, n_samples=10000, seed=37)

Create BBN with structure and data

If you know the BBN structure and have data, you can create a BBN using the structure and learn the parameters from the data. For now, the parameters are simply the raw counts (not-Bayesian). The method to use is from Factory.from_data().

[1]:
import pandas as pd
from pybbn.graph.factory import Factory

df = pd.read_csv('./data/data-from-structure.csv')
structure = {
    'a': [],
    'b': ['a'],
    'c': ['b']
}

bbn = Factory.from_data(structure, df)

As usual, after you acquire a BBN, you can performe inference using an InferenceController.

[2]:
from pybbn.pptc.inferencecontroller import InferenceController

join_tree = InferenceController.apply(bbn)

for node, posteriors in join_tree.get_posteriors().items():
    p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
    print(f'{node} : {p}')
b : off=0.55020, on=0.44980
c : off=0.57210, on=0.42790
a : off=0.49850, on=0.50150
[3]:
import networkx as nx

n, d = bbn.to_nx_graph()
nx.draw(n, with_labels=True, labels=d, node_color='r', alpha=0.5)
_images/structure-data_4_0.png

Exact Inference with Widgets

Here, we show a very simple example of how to observe the marginal posterior probabilities of each node given the state of one. We will use the Huang graph [HD99].

Simulate data

[1]:
%matplotlib inline
from pybbn.graph.dag import BbnUtil
from pybbn.graph.jointree import EvidenceBuilder, EvidenceType
from pybbn.pptc.inferencecontroller import InferenceController
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from collections import namedtuple

np.random.seed(37)
plt.style.use('ggplot')
Marginal = namedtuple('Marginal', 'name, s')

def potential_to_series(p):
    vals = []
    index = []

    for pe in p.entries:
        try:
            v = pe.entries.values()[0]
        except:
            v = list(pe.entries.values())[0]
        p = pe.value

        vals.append(p)
        index.append(v)

    return pd.Series(vals, index=index)

def get_marginals(join_tree):
    data = []
    for node in join_tree.get_bbn_nodes():
        name = node.variable.name
        s = potential_to_series(join_tree.get_bbn_potential(node))
        t = Marginal(name, s)
        data.append(t)
    return data

# get the pre-defined huang graph
bbn = BbnUtil.get_huang_graph()

# convert the BBN to a join tree
join_tree = InferenceController.apply(bbn)

Visualize

[2]:
import math
from ipywidgets import interact

@interact(a=[('unobserved', -1), ('off', 0), ('on', 1)])
def f(a=-1):
    n_cols = 4
    n_rows = math.ceil(len(bbn.get_nodes()) / n_cols)

    if a == -1:
        join_tree.unobserve_all()
        marginals = get_marginals(join_tree)
    else:
        v = 'on' if a == 1 else 'off'
        ev = EvidenceBuilder() \
            .with_node(join_tree.get_bbn_node_by_name('a')) \
            .with_evidence(v, 1.0) \
            .build()
        join_tree.unobserve_all()
        join_tree.set_observation(ev)
        marginals = get_marginals(join_tree)

    marginals = sorted(marginals, key=lambda tup: tup[0])

    fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, 5), sharey=True)

    for m, ax in zip(marginals, np.ravel(axes)):
        m.s.plot(kind='bar', legend=False, ax=ax)
        ax.set_title(m.name)
        ax.set_ylim([0.0, 1.0])
        ax.set_xlabel('')

    plt.tight_layout()

Multivariate Gaussian Inference with Widgets

This notebook shows how to do multivariate Gaussian inference with widgets. We allow one variable to change and visualize the change of distributions for the other. We will be using the Cowell graph [Cow98].

Simulate data

[1]:
%matplotlib inline
import numpy as np
from pybbn.gaussian.inference import GaussianInference
import matplotlib.pyplot as plt

np.random.seed(37)
plt.style.use('ggplot')
plt.rcParams['axes.grid'] = False

def get_cowell_data():
    n = 10000
    Y = np.random.normal(0, 1, n)
    X = np.random.normal(Y, 1, n)
    Z = np.random.normal(X, 1, n)

    D = np.vstack([Y, X, Z]).T
    return D, ['Y', 'X', 'Z']

def get_mvn():
    X, H = get_cowell_data()

    M = X.mean(axis=0)
    E = np.cov(X.T)

    g = GaussianInference(H, M, E)
    return g

g = get_mvn()
[2]:
import pandas as pd

pd.DataFrame(g.marginals)
[2]:
name mean var
0 Y -0.001723 0.990700
1 X 0.007448 2.016406
2 Z 0.002459 3.033838

Visualize

[3]:
from ipywidgets import interact

samples1 = g.sample_marginals(size=10000)

@interact(x=(-5, 5, 1))
def f(x=None):
    if x is not None:
        gg = g.do_inference('X', x)
    else:
        gg = g

    samples2 = gg.sample_marginals(size=5000)

    fig, axes = plt.subplots(1, 3, figsize=(15, 3), sharey=False)
    axes = np.ravel(axes)

    kind = 'hist'
    alpha = 0.15
    for (name, s2), ax in zip(samples2.items(), axes):
        if name == 'X':
            ax2 = ax.twinx()
            _ = samples1[name].plot(kind=kind, ax=ax2, color='blue', alpha=alpha)
            _ = ax.axvline(x=x, color='red')
            _ = ax2.set_ylabel('')
        else:
            ax2 = ax.twinx()
            _ = samples1[name].plot(kind=kind, ax=ax, color='blue', alpha=alpha)
            _ = s2.plot(kind=kind, ax=ax)
            _ = s2.plot(kind='kde', ax=ax2, color='green')
            _ = ax2.set_ylabel('')

        _ = ax.set_title(f'{name}')
        _ = ax.set_ylabel('')

    plt.tight_layout()

Bibliography

[CGH97]

E. Castillo, J.M. Gutierrez, and A.S. Hadi. Expert Systems and Probabilistic Network Models. Springer, 1997.

[Cow98]

R.G. Cowell. Advanced inference in bayesian networks. In M.I. Jordan, editor, Learning in Graphical Models. A Bradford Book, 1998.

[Hen88]

M. Henrion. Propagating uncertainty in bayesian networks by probabilistic logic sampling. Uncertainty in Artificial Intelligence, 1988.

[HD99]

C. Huang and A. Darwiche. Inference in belief networks: a procedural guide. International Journal of Approximate Reasoning, 1999.

[IC02]

J.S. Ide and F.G. Cozman. Random generation of bayesian network. Advances in Artificial Intelligence, 2002.

[Kol09]

D. Koller. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

[Mur12]

K.P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.

[Pea88]

J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.

[Pea00]

J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2000.

[Pea16]

J. Pearl. Causal Inference in Statistics - A Primer. Wiley, 2016.

[Pea18]

J. Pearl. The Book of Why: The New Science of Cause and Effect. Basic Books, 2018.

Edge Ordering

Edge ordering makes a difference on the on the join tree and potentials produced. Let’s take the BBN network structure below where all nodes are binary having the values on and off.

a –> c <– b

Note how c has 2 parents, a and b. The potential (or conditional probability table CPT) for c is specified as a list of probabilities as follows.

[0.7, 0.3, 0.2, 0.8, 0.6, 0.4, 0.6, 0.4]

Let’s say that this list of probabilities represents the CPT below.

|       |       | c=on | c=off |
|-------|-------|------|-------|
| a=on  | b=on  | 0.7  | 0.3   |
| a=on  | b=off | 0.2  | 0.8   |
| a=off | b=on  | 0.6  | 0.4   |
| a=off | b=off | 0.6  | 0.4   |

When we define a BBN structure (be it programmatically in code/Python or declaratively in JSON), we should define and add the edge a -> c to the graph before the edge b -> c. Below is the code where we do so.

[1]:
from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable

def get_bbn1():
    a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.2, 0.8])
    b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.8, 0.2])
    c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8, 0.6, 0.4, 0.6, 0.4])

    bbn = Bbn() \
        .add_node(a) \
        .add_node(b) \
        .add_node(c) \
        .add_edge(Edge(a, c, EdgeType.DIRECTED)) \
        .add_edge(Edge(b, c, EdgeType.DIRECTED))

    return bbn

When we add the edge b -> c to the network structure before a -> c, then the induced CPT for c will be as follows. This second CPT for c is not the same at all for the first one!

|       |       | c=on | c=off |
|-------|-------|------|-------|
| b=on  | a=on  | 0.7  | 0.3   |
| b=on  | a=off | 0.2  | 0.8   |
| b=off | a=on  | 0.6  | 0.4   |
| b=off | a=off | 0.6  | 0.4   |

Here is the code for creating a BBN where we add b -> c before a -> c.

[2]:
def get_bbn2():
    a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.2, 0.8])
    b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.8, 0.2])
    c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8, 0.6, 0.4, 0.6, 0.4])

    bbn = Bbn() \
        .add_node(a) \
        .add_node(b) \
        .add_node(c) \
        .add_edge(Edge(b, c, EdgeType.DIRECTED)) \
        .add_edge(Edge(a, c, EdgeType.DIRECTED))

    return bbn

Although the networks (regardless of the order of how we add the edges) are the same in both cases, the parameters induced are NOT and sensitive to the order of how the edges are added. Now, let’s compare the posteriors of of these 2 BBNs.

[3]:
from pybbn.pptc.inferencecontroller import InferenceController

b1 = get_bbn1()
b2 = get_bbn2()

j1 = InferenceController.apply(b1)
j2 = InferenceController.apply(b2)

Here are the posteriors for the first BBN. Note that the id-to-name as defined above are as follows.

  • 0: a

  • 1: b

  • 2: c

Keep an eye on id 2, thus.

[4]:
for node in j1.get_bbn_nodes():
    potential = j1.get_bbn_potential(node)
    print(potential)
    print('-' * 10)
1=on|0.80000
1=off|0.20000
----------
2=on|0.60000
2=off|0.40000
----------
0=on|0.20000
0=off|0.80000
----------

Here are the posteriors for the second BBN.

[5]:
for node in j2.get_bbn_nodes():
    potential = j2.get_bbn_potential(node)
    print(potential)
    print('-' * 10)
1=on|0.80000
1=off|0.20000
----------
2=on|0.36000
2=off|0.64000
----------
0=on|0.20000
0=off|0.80000
----------

For now, there is no workaround for this issue of logically identical specified BBNs producing different potentials as a result of edge insertion order. Just make sure you are aware and careful.

Simple Example

Let’s say you have a DAG with 5 variables: \(X_1, X_2, X_3, X_4, X_5\) and the structure represented by an edge list is as follows.

  • \(X_1 \rightarrow X_5\)

  • \(X_2 \rightarrow X_5\)

  • \(X_3 \rightarrow X_5\)

  • \(X_4 \rightarrow X_5\)

The domains (or number of values) for each variable is as follows.

  • \(X_1 \in \{v_1, v_2\}\)

  • \(X_2 \in \{v_1, v_2, v_3\}\)

  • \(X_3 \in \{v_1, v_2\}\)

  • \(X_4 \in \{v_1, v_2, v_3, v_4, v_5\}\)

  • \(X_5 \in \{v_1, v_2, v_3\}\)

The question is, how do we build the parameters for \(X_5\)?

Let’s create some fake data.

[63]:
import pandas as pd
import numpy as np
import random

np.random.seed(37)
random.seed(37)

def get_data(n_values, n_samples):
    return [f'v{v}' for v in np.random.randint(0, n_values, n_samples)]

N = 1_000
df = pd.DataFrame({
    'x1': get_data(2, N),
    'x2': get_data(3, N),
    'x3': get_data(2, N),
    'x4': get_data(5, N),
    'x5': get_data(3, N)
})

df.shape
[63]:
(1000, 5)
[64]:
df.head()
[64]:
x1 x2 x3 x4 x5
0 v1 v2 v0 v4 v2
1 v1 v0 v1 v3 v0
2 v0 v1 v1 v2 v2
3 v1 v2 v0 v1 v2
4 v0 v2 v1 v1 v2

Now, let’s verify the domains.

[65]:
def get_domains(df):
    return {c: sorted(list(df[c].unique())) for c in df.columns}

domains = get_domains(df)
domains
[65]:
{'x1': ['v0', 'v1'],
 'x2': ['v0', 'v1', 'v2'],
 'x3': ['v0', 'v1'],
 'x4': ['v0', 'v1', 'v2', 'v3', 'v4'],
 'x5': ['v0', 'v1', 'v2']}

You want to create a conditional probability table CPT for \(X_5\) that looks like the following. Note that we simply show you the CPT shape and not the actual values. We will go into computing the actual value in a bit.

[66]:
import itertools

pas = [c for c in domains if c != 'x5']

cpt = (domains[pa] for pa in pas)
cpt = itertools.product(*cpt)
cpt = pd.DataFrame(cpt, columns=pas) \
    .assign(**{v: 0.0 for v in domains['x5']}) \
    .set_index(pas)
cpt
[66]:
v0 v1 v2
x1 x2 x3 x4
v0 v0 v0 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v2 v0 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 v0 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v2 v0 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0
v1 v0 0.0 0.0 0.0
v1 0.0 0.0 0.0
v2 0.0 0.0 0.0
v3 0.0 0.0 0.0
v4 0.0 0.0 0.0

Ok, so how do we create the CPT for \(X_5\) given the data? Here’s some code to demonstrate how to build the CPT.

[68]:
def get_cond_probs(q, d, pas, ch, df):
    def save_divide(num, den):
        try:
            return num / den
        except:
            return 0.0

    p_b = df.query(q).shape[0]

    cp_pa = {f'x{i+1}': p for i, p in enumerate(pas)}
    cp_ch = {v: save_divide(df.query(f'{q} and {ch}=="{v}"').shape[0], p_b) for v in d}
    cp = {**cp_pa, **cp_ch}

    return cp

pa_values = list(itertools.product(*(domains[pa] for pa in pas)))

queries = map(lambda tup: [f'x{i+1}=="{v}"' for i, v in enumerate(tup)], pa_values)
queries = map(lambda arr: ' and '.join(arr), queries)
queries = list(queries)

cpt = pd.DataFrame((get_cond_probs(q, domains['x5'], pas, 'x5', df) for q, pas in zip(queries, pa_values))) \
    .set_index(['x1', 'x2', 'x3', 'x4'])
cpt
[68]:
v0 v1 v2
x1 x2 x3 x4
v0 v0 v0 v0 0.466667 0.266667 0.266667
v1 0.285714 0.428571 0.285714
v2 0.583333 0.166667 0.250000
v3 0.538462 0.307692 0.153846
v4 0.400000 0.000000 0.600000
v1 v0 0.533333 0.333333 0.133333
v1 0.411765 0.117647 0.470588
v2 0.357143 0.357143 0.285714
v3 0.434783 0.173913 0.391304
v4 0.368421 0.421053 0.210526
v1 v0 v0 0.375000 0.375000 0.250000
v1 0.266667 0.400000 0.333333
v2 0.312500 0.250000 0.437500
v3 0.400000 0.200000 0.400000
v4 0.190476 0.428571 0.380952
v1 v0 0.157895 0.368421 0.473684
v1 0.307692 0.384615 0.307692
v2 0.388889 0.333333 0.277778
v3 0.250000 0.375000 0.375000
v4 0.200000 0.250000 0.550000
v2 v0 v0 0.333333 0.166667 0.500000
v1 0.142857 0.428571 0.428571
v2 0.320000 0.440000 0.240000
v3 0.285714 0.428571 0.285714
v4 0.500000 0.250000 0.250000
v1 v0 0.476190 0.333333 0.190476
v1 0.333333 0.333333 0.333333
v2 0.307692 0.076923 0.615385
v3 0.312500 0.375000 0.312500
v4 0.562500 0.187500 0.250000
v1 v0 v0 v0 0.333333 0.285714 0.380952
v1 0.222222 0.388889 0.388889
v2 0.235294 0.411765 0.352941
v3 0.333333 0.333333 0.333333
v4 0.470588 0.235294 0.294118
v1 v0 0.368421 0.315789 0.315789
v1 0.461538 0.192308 0.346154
v2 0.263158 0.315789 0.421053
v3 0.411765 0.235294 0.352941
v4 0.315789 0.368421 0.315789
v1 v0 v0 0.304348 0.260870 0.434783
v1 0.454545 0.454545 0.090909
v2 0.300000 0.350000 0.350000
v3 0.454545 0.181818 0.363636
v4 0.214286 0.428571 0.357143
v1 v0 0.285714 0.357143 0.357143
v1 0.235294 0.411765 0.352941
v2 0.315789 0.315789 0.368421
v3 0.363636 0.227273 0.409091
v4 0.222222 0.333333 0.444444
v2 v0 v0 0.272727 0.181818 0.545455
v1 0.190476 0.523810 0.285714
v2 0.391304 0.260870 0.347826
v3 0.428571 0.214286 0.357143
v4 0.333333 0.166667 0.500000
v1 v0 0.235294 0.352941 0.411765
v1 0.166667 0.500000 0.333333
v2 0.300000 0.200000 0.500000
v3 0.263158 0.368421 0.368421
v4 0.400000 0.333333 0.266667

Finally, we can flatten the CPT as follows.

[71]:
cpt_x5 = np.ravel(cpt).tolist()
cpt_x5
[71]:
[0.4666666666666667,
 0.26666666666666666,
 0.26666666666666666,
 0.2857142857142857,
 0.42857142857142855,
 0.2857142857142857,
 0.5833333333333334,
 0.16666666666666666,
 0.25,
 0.5384615384615384,
 0.3076923076923077,
 0.15384615384615385,
 0.4,
 0.0,
 0.6,
 0.5333333333333333,
 0.3333333333333333,
 0.13333333333333333,
 0.4117647058823529,
 0.11764705882352941,
 0.47058823529411764,
 0.35714285714285715,
 0.35714285714285715,
 0.2857142857142857,
 0.43478260869565216,
 0.17391304347826086,
 0.391304347826087,
 0.3684210526315789,
 0.42105263157894735,
 0.21052631578947367,
 0.375,
 0.375,
 0.25,
 0.26666666666666666,
 0.4,
 0.3333333333333333,
 0.3125,
 0.25,
 0.4375,
 0.4,
 0.2,
 0.4,
 0.19047619047619047,
 0.42857142857142855,
 0.38095238095238093,
 0.15789473684210525,
 0.3684210526315789,
 0.47368421052631576,
 0.3076923076923077,
 0.38461538461538464,
 0.3076923076923077,
 0.3888888888888889,
 0.3333333333333333,
 0.2777777777777778,
 0.25,
 0.375,
 0.375,
 0.2,
 0.25,
 0.55,
 0.3333333333333333,
 0.16666666666666666,
 0.5,
 0.14285714285714285,
 0.42857142857142855,
 0.42857142857142855,
 0.32,
 0.44,
 0.24,
 0.2857142857142857,
 0.42857142857142855,
 0.2857142857142857,
 0.5,
 0.25,
 0.25,
 0.47619047619047616,
 0.3333333333333333,
 0.19047619047619047,
 0.3333333333333333,
 0.3333333333333333,
 0.3333333333333333,
 0.3076923076923077,
 0.07692307692307693,
 0.6153846153846154,
 0.3125,
 0.375,
 0.3125,
 0.5625,
 0.1875,
 0.25,
 0.3333333333333333,
 0.2857142857142857,
 0.38095238095238093,
 0.2222222222222222,
 0.3888888888888889,
 0.3888888888888889,
 0.23529411764705882,
 0.4117647058823529,
 0.35294117647058826,
 0.3333333333333333,
 0.3333333333333333,
 0.3333333333333333,
 0.47058823529411764,
 0.23529411764705882,
 0.29411764705882354,
 0.3684210526315789,
 0.3157894736842105,
 0.3157894736842105,
 0.46153846153846156,
 0.19230769230769232,
 0.34615384615384615,
 0.2631578947368421,
 0.3157894736842105,
 0.42105263157894735,
 0.4117647058823529,
 0.23529411764705882,
 0.35294117647058826,
 0.3157894736842105,
 0.3684210526315789,
 0.3157894736842105,
 0.30434782608695654,
 0.2608695652173913,
 0.43478260869565216,
 0.45454545454545453,
 0.45454545454545453,
 0.09090909090909091,
 0.3,
 0.35,
 0.35,
 0.45454545454545453,
 0.18181818181818182,
 0.36363636363636365,
 0.21428571428571427,
 0.42857142857142855,
 0.35714285714285715,
 0.2857142857142857,
 0.35714285714285715,
 0.35714285714285715,
 0.23529411764705882,
 0.4117647058823529,
 0.35294117647058826,
 0.3157894736842105,
 0.3157894736842105,
 0.3684210526315789,
 0.36363636363636365,
 0.22727272727272727,
 0.4090909090909091,
 0.2222222222222222,
 0.3333333333333333,
 0.4444444444444444,
 0.2727272727272727,
 0.18181818181818182,
 0.5454545454545454,
 0.19047619047619047,
 0.5238095238095238,
 0.2857142857142857,
 0.391304347826087,
 0.2608695652173913,
 0.34782608695652173,
 0.42857142857142855,
 0.21428571428571427,
 0.35714285714285715,
 0.3333333333333333,
 0.16666666666666666,
 0.5,
 0.23529411764705882,
 0.35294117647058826,
 0.4117647058823529,
 0.16666666666666666,
 0.5,
 0.3333333333333333,
 0.3,
 0.2,
 0.5,
 0.2631578947368421,
 0.3684210526315789,
 0.3684210526315789,
 0.4,
 0.3333333333333333,
 0.26666666666666666]

For completeness, let’s compute the probabilities for the parents.

[80]:
def get_prob(n, d, df):
    N = df.shape[0]
    p = {v: df.query(f'{n}=="{v}"').shape[0] / N for v in d}
    return list(p.values())

cpt_x1 = get_prob('x1', domains['x1'], df)
cpt_x2 = get_prob('x2', domains['x2'], df)
cpt_x3 = get_prob('x3', domains['x3'], df)
cpt_x4 = get_prob('x4', domains['x4'], df)
[81]:
cpt_x1
[81]:
[0.489, 0.511]
[82]:
cpt_x2
[82]:
[0.34, 0.341, 0.319]
[83]:
cpt_x3
[83]:
[0.477, 0.523]
[84]:
cpt_x4
[84]:
[0.209, 0.197, 0.206, 0.191, 0.197]

Let’s build the BBN and join tree.

[93]:
x1 = BbnNode(Variable(0, 'x1', domains['x1']), cpt_x1)
x2 = BbnNode(Variable(1, 'x2', domains['x2']), cpt_x2)
x3 = BbnNode(Variable(2, 'x3', domains['x3']), cpt_x3)
x4 = BbnNode(Variable(3, 'x4', domains['x4']), cpt_x4)
x5 = BbnNode(Variable(4, 'x5', domains['x5']), cpt_x5)

bbn = Bbn() \
    .add_node(x1) \
    .add_node(x2) \
    .add_node(x3) \
    .add_node(x4) \
    .add_node(x5) \
    .add_edge(Edge(x1, x5, EdgeType.DIRECTED)) \
    .add_edge(Edge(x2, x5, EdgeType.DIRECTED)) \
    .add_edge(Edge(x3, x5, EdgeType.DIRECTED)) \
    .add_edge(Edge(x4, x5, EdgeType.DIRECTED))

join_tree = InferenceController.apply(bbn)

Here are the posteriors.

[94]:
for node in join_tree.get_bbn_nodes():
    potential = join_tree.get_bbn_potential(node)
    print(node)
    print(potential)
    print('-' * 15)
1|x2|v0,v1,v2
1=v0|0.34000
1=v1|0.34100
1=v2|0.31900
---------------
2|x3|v0,v1
2=v0|0.47700
2=v1|0.52300
---------------
3|x4|v0,v1,v2,v3,v4
3=v0|0.20900
3=v1|0.19700
3=v2|0.20600
3=v3|0.19100
3=v4|0.19700
---------------
4|x5|v0,v1,v2
4=v0|0.33850
4=v1|0.30794
4=v2|0.35356
---------------
0|x1|v0,v1
0=v0|0.48900
0=v1|0.51100
---------------

Let’s assert evidence and observe the posteriors.

[98]:
from pybbn.graph.jointree import EvidenceBuilder

ev1 = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('x1')) \
    .with_evidence('v0', 1.0) \
    .build()

join_tree.unobserve_all()
join_tree.update_evidences([ev1])

for node in join_tree.get_bbn_nodes():
    potential = join_tree.get_bbn_potential(node)
    print(node)
    print(potential)
    print('-' * 15)
1|x2|v0,v1,v2
1=v0|0.34000
1=v1|0.34100
1=v2|0.31900
---------------
2|x3|v0,v1
2=v0|0.47700
2=v1|0.52300
---------------
3|x4|v0,v1,v2,v3,v4
3=v0|0.20900
3=v1|0.19700
3=v2|0.20600
3=v3|0.19100
3=v4|0.19700
---------------
4|x5|v0,v1,v2
4=v0|0.36050
4=v1|0.29824
4=v2|0.34126
---------------
0|x1|v0,v1
0=v0|1.00000
0=v1|0.00000
---------------

Let’s assert multiple evidences. The posterior for \(X_5\) should be the same as the CPT when all parents are set to v0.

[99]:
ev1 = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('x1')) \
    .with_evidence('v0', 1.0) \
    .build()
ev2 = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('x2')) \
    .with_evidence('v0', 1.0) \
    .build()
ev3 = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('x3')) \
    .with_evidence('v0', 1.0) \
    .build()
ev4 = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('x4')) \
    .with_evidence('v0', 1.0) \
    .build()

join_tree.unobserve_all()
join_tree.update_evidences([ev1, ev2, ev3, ev4])

for node in join_tree.get_bbn_nodes():
    potential = join_tree.get_bbn_potential(node)
    print(node)
    print(potential)
    print('-' * 15)
1|x2|v0,v1,v2
1=v0|1.00000
1=v1|0.00000
1=v2|0.00000
---------------
2|x3|v0,v1
2=v0|1.00000
2=v1|0.00000
---------------
3|x4|v0,v1,v2,v3,v4
3=v0|1.00000
3=v1|0.00000
3=v2|0.00000
3=v3|0.00000
3=v4|0.00000
---------------
4|x5|v0,v1,v2
4=v0|0.46667
4=v1|0.26667
4=v2|0.26667
---------------
0|x1|v0,v1
0=v0|1.00000
0=v1|0.00000
---------------
[ ]:

py-bbn

Subpackages

Graph

Variable

Variable.

class pybbn.graph.variable.Variable(id, name, values)

Bases: object

A variable.

__init__(id, name, values)

Ctor.

Parameters:
  • id – Numeric identifier. e.g. 0

  • name – Name. e.g. ‘a’

  • values – Array of values. e.g. [‘on’, ‘off’]

to_dict()

Gets a JSON serializable dictionary representation.

Returns:

Dictionary.

Node

Nodes. There are many types: nodes, cliques, belief network nodes and separation sets.

class pybbn.graph.node.BbnNode(variable, probs)

Bases: Node

A BBN node.

get_weight()

Gets the weight, which is the number of values.

Returns:

Weight.

to_dict()

Gets a JSON serializable dictionary representation.

Returns:

Dictionary.

class pybbn.graph.node.Clique(nodes)

Bases: Node

A clique.

contains(id)

Checks if this clique contains the specified ID.

Parameters:

id – Numeric id.

Returns:

A boolean indicating if the specified id exists in this clique.

get_node_ids()

Gets the node IDs in this clique.

Returns:

An array of numeric ids of the nodes in this clique.

get_sep_set(that)

Creates a separation-set from this node and the one passed in. The separation-set is composed of the intersection of the two cliques. If this node has [0, 1, 2] and the node passed in has [1, 2, 3], then the separation set will be [1, 2].

Parameters:

that – Clique.

Returns:

Separation-set.

get_sid()

Gets the string ID of this clique.

Returns:

String ID composed of the sorted corresponding variables in each node.

get_weight()

Gets the weight of this clique; the weight is product of the weights of the nodes in this clique.

Returns:

Weight.

intersects(that)

Gets intersection information.

Parameters:

that – Clique.

Returns:

Tuple where first item is a boolean indicating if there is any intersection, second item are the IDs in this clique, third item are the IDs of that clique and last item are IDs common to both Cliques.

is_marked()

Checks if this clique is marked.

Returns:

A boolean indicating if the clique is marked.

is_superset(that)

Checks if this clique is a superset of that clique.

Parameters:

that – Clique.

Returns:

A boolean indicating if this clique is a superset of the clique passed in.

mark()

Marks this clique.

unmark()

Unmarks this clique.

class pybbn.graph.node.Node(id)

Bases: object

A node.

add_metadata(k, v)

Adds metadata.

Parameters:
  • k – Key. Typically a string value.

  • v – Value. Any object.

class pybbn.graph.node.SepSet(left, right, lhs=None, rhs=None, intersection=None)

Bases: Clique

Separation-set.

property cost

Gets the cost.

Returns:

The cost.

get_cost()

The cost is the sum of the weights of the cliques connected to this separation-set.

Returns:

Cost.

get_mass()

The mass is the number of nodes in this separation-set.

Returns:

Mass.

property is_empty

Checks if the cliques in this separation set have an empty intersection.

Returns:

A boolean indicating if there is no intersection.

property mass

Gets the mass.

Returns:

The mass.

Edge

Edges. There are two main types: undirected and directed. However, many other types exists as well.

class pybbn.graph.edge.Edge(i, j, type)

Bases: object

Edge.

__init__(i, j, type)

Ctor.

Parameters:
  • i – Node.

  • j – Node.

  • type – Edge type.

property key

Key used for map.

Returns:

Key.

class pybbn.graph.edge.EdgeType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Edge type.

DIRECTED = 2
UNDIRECTED = 1
class pybbn.graph.edge.JtEdge(sep_set)

Bases: Edge

Junction tree edge. This is basically a hyper-edge.

__init__(sep_set)

Ctor.

Parameters:

sep_set – Separation set.

get_lhs_edge()

Gets a JtEdge. e.g. left – sep_set.

Returns:

JtEdge.

get_rhs_edge()

Gets a JtEdge. e.g. right – sep_set.

Returns:

JtEdge.

class pybbn.graph.edge.SepSetEdge(i, j)

Bases: Edge

Separation set.

__init__(i, j)

Ctor.

Parameters:
  • i – Node.

  • j – Node.

Graph

Basic graphs.

class pybbn.graph.graph.Graph

Bases: object

Graph.

__init__()

Ctor.

add_edge(edge)

Adds an edge.

Parameters:

edge – Edge.

Returns:

This graph.

add_node(node)

Adds a node.

Parameters:

node – Node.

Returns:

This graph.

edge_exists(id1, id2)

Checks if the specified edge id1 – id2 exists.

Parameters:
  • id1 – Node id.

  • id2 – Node id.

Returns:

A boolean indicating if the specified edge exists.

get_edges()

Gets all the edges.

Returns:

List of edges.

get_neighbors(id)

Gets the neighbors of the specified node.

Parameters:

id – Node id.

Returns:

Set of neighbors of the specified node.

get_node(id)

Gets the node associated with the specified id.

Parameters:

id – Node id.

Returns:

Node.

get_nodes()

Gets all the nodes.

Returns:

List of nodes.

remove_node(id)

Removes a node from the graph.

Parameters:

id – Node id.

class pybbn.graph.graph.Ug

Bases: Graph

Undirected graph.

__init__()

Ctor.

Directed Acyclic Graph

Directed acyclic graphs.

class pybbn.graph.dag.Bbn

Bases: Dag

BBN.

__init__()

Ctor.

static from_csv(path)

Converts the BBN in CSV format to a BBN. :param path: Path to CSV file. :return: BBN.

static from_dict(d)

Creates a BBN from a dictionary (deserialized JSON).

Parameters:

d – Dictionary.

Returns:

BBN.

static from_json(path)

Deserializes BBN from JSON.

Parameters:

path – Path.

Returns:

BBN.

get_parents_ordered(id)

Gets the IDs of the specified node ordered.

Parameters:

id – ID of node.

Returns:

List of parent IDs sorted.

static to_csv(bbn, path)

Converts the specified BBN to CSV format.

Parameters:
  • bbn – BBN.

  • path – Path to file.

Returns:

None.

static to_dict(bbn)

Gets a JSON serializable dictionary representation.

Parameters:

bbn – BBN.

Returns:

Dictionary.

static to_dne(bbn, bnet_name='network')
static to_json(bbn, path)

Serializes BBN to JSON.

Parameters:
  • bbn – BBN.

  • path – Path.

Returns:

None.

class pybbn.graph.dag.BbnUtil

Bases: object

BBN utility.

static get_huang_graph()

Gets the Huang reference BBN graph.

Returns:

BBN.

static get_simple()

Gets a simple BBN graph.

Returns:

BBN.

class pybbn.graph.dag.Dag

Bases: Graph

Directed acyclic graph.

__init__()

Ctor.

edge_exists(id1, id2)

Checks if a directed edge exists between the specified id. e.g. id1 -> id2

Parameters:
  • id1 – Node id.

  • id2 – Node id.

Returns:

A boolean indicating if a directed edge id1 -> id2 exists.

get_children(node_id)

Gets the children IDs of the specified node.

Parameters:

node_id – Node id.

Returns:

Array of children ids.

get_i2n()

Gets a map of node identifiers to names.

Returns:

Dictionary.

get_n2i()

Gets a map of node names to identifiers.

Returns:

Dictionary.

get_parents(id)

Gets the parent IDs of the specified node.

Parameters:

id – Node id.

Returns:

Array of parent ids.

to_nx_graph()

Converts this DAG to a NX DiGraph for visualization.

Returns:

A tuple, where the first item is the NX DiGraph and the second items are the node labels.

class pybbn.graph.dag.PathDetector(graph, start, stop)

Bases: object

Detects path between two nodes.

__init__(graph, start, stop)

Ctor.

Parameters:
  • graph – DAG.

  • start – Start node id.

  • stop – Stop node id.

exists()

Checks if a path exists.

Returns:

True if a path exists, otherwise, false.

Partially Directed Acylic Graph

Partially directed acylic graphs.

class pybbn.graph.pdag.PathDetector(graph, start, stop)

Bases: object

Detects path between two nodes.

__init__(graph, start, stop)

Ctor.

Parameters:
  • graph – Pdag.

  • start – Start node id.

  • stop – Stop node id.

exists()

Checks if a path exists.

Returns:

True if a path exists, otherwise, false.

class pybbn.graph.pdag.Pdag

Bases: Graph

Partially directed acyclic graph.

__init__()

Ctor.

directed_edge_exists(id1, id2)

Checks if the specified edge id1 -> id2 exists.

Parameters:
  • id1 – Node id.

  • id2 – Node id.

Returns:

A boolean indicating if the edge exists.

edge_exists(id1, id2)

Checks if the specified edge id1 – id2 exists.

Parameters:
  • id1 – Node id.

  • id2 – Node id.

Returns:

A boolean indicating if the edge exists.

get_out_nodes(id)

Gets all the out nodes for the node with the specified id. Out nodes are all connected nodes that are not parents (do not have a directed arc into the specified node).

Parameters:

id – Node id.

Returns:

Array of out node ids.

get_parents(id)

Gets the parent of the specified node id.

Parameters:

id – Node id.

Returns:

Array of parent ids.

Join Tree

Join trees or junction trees.

class pybbn.graph.jointree.ChangeType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Change type.

NONE = 1
RETRACTION = 3
UPDATE = 2
class pybbn.graph.jointree.Evidence(node, type)

Bases: object

Evidence.

__init__(node, type)

Ctor.

Parameters:
  • node – BBN node.

  • type – EvidenceType.

add_value(value, likelihood)

Adds a value.

Parameters:
  • value – Value.

  • likelihood – Likelihood.

Returns:

This evidence.

compare(potentials)

Compares this evidence with previous ones.

Parameters:

potentials – Map of potentials.

Returns:

The ChangeType from the comparison.

validate()

Validates this evidence.

  • virtual evidence: each likelihood must be in the range [0, 1].

  • finding evidence: all likelihoods must be exactly 1.0 or 0.0.

  • observation evidence: exactly one likelihood is 1.0 and all others must be 0.0.

class pybbn.graph.jointree.EvidenceBuilder

Bases: object

Evidence builder.

__init__()

Ctor.

build()

Builds an evidence.

Returns:

Evidence.

with_evidence(val, likelihood)

Adds evidence.

Parameters:
  • val – Value.

  • likelihood – Likelihood.

Returns:

Builder.

with_node(node)

Adds a BBN node.

Parameters:

node – BBN node.

Returns:

Builder.

with_type(type)

Adds the EvidenceType.

Parameters:

type – EvidenceType.

Returns:

Builder.

class pybbn.graph.jointree.EvidenceType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Evidence type.

FINDING = 2
OBSERVATION = 3
UNOBSERVE = 4
VIRTUAL = 1
class pybbn.graph.jointree.JoinTree

Bases: Ug

Join tree.

__init__()

Ctor.

add_edge(edge)

Adds an JtEdge.

Parameters:

edge – JtEdge.

Returns:

This join tree.

add_potential(clique, potential)

Adds a potential associated with the specified clique.

Parameters:
  • clique – Clique.

  • potential – Potential.

Returns:

This join tree.

find_cliques_with_node_and_parents(id)

Finds all cliques in this junction tree having the specified node and its parents.

Parameters:

id – Node id.

Returns:

Array of cliques.

static from_dict(d)

Converts a dictionary to a junction tree.

Parameters:

d – Dictionary.

Returns:

Junction tree.

get_bbn_node(id)

Gets the BBN node associated with the specified id.

Parameters:

id – Node id.

Returns:

BBN node or None if no such node exists.

get_bbn_node_and_parents()

Gets a map of nodes and its parents.

Returns:

Map. Keys are node ID and values are list of nodes.

get_bbn_node_by_name(name)

Gets the BBN node associated with the specified name.

Parameters:

name – Node name.

Returns:

BBN node or None if no such node exists.

get_bbn_nodes()

Gets all the BBN nodes in this junction tree.

Returns:

List of BBN nodes.

get_bbn_potential(node)

Gets the potential associated with the specified BBN node.

Parameters:

node – BBN node.

Returns:

Potential.

get_change_type(evidences)

Gets the change type associated with the specified list of evidences.

Parameters:

evidences – List of evidences.

Returns:

ChangeType.

get_cliques()

Gets all the cliques in this junction tree.

Returns:

Array of cliques.

get_evidence(node, value)

Gets the evidence associated with the specified BBN node and value.

Parameters:
  • node – BBN node.

  • value – Value.

Returns:

Potential (the evidence).

get_flattened_edges()

Gets all the edges “flattened” out. Since separation-sets are really hyper-edges, this method breaks separation-sets into two edges.

Returns:

Array of edges.

get_posteriors()

Gets the posterior for all nodes.

Returns:

Map. Keys are node names; values are map of node values to posterior probabilities.

get_sep_sets()

Gets all the separation sets in this junction tree.

Returns:

Array of separation sets.

get_unobserved_evidence(node)

Gets the unobserved evidences associated with the specified node.

Parameters:

node – BBN node.

Returns:

Evidence.

set_listener(listener)

Sets the listener.

Parameters:

listener – JoinTreeListener.

set_observation(evidence)

Sets a single observation.

Parameters:

evidence – Evidence.

Returns:

This join tree.

static to_dict(jt, bbn)

Converts a junction tree to a serializable dictionary.

Parameters:
  • jt – Junction tree.

  • bbn – BBN.

Returns:

Dictionary.

unmark_cliques()

Unmarks the cliques.

unobserve(nodes)

Unobserves a list of nodes.

Parameters:

nodes – List of nodes.

Returns:

This join tree.

unobserve_all()

Unobserves all BBN nodes.

Returns:

This join tree.

update_bbn_cpts(cpts)

Updates the CPTs of the BBN nodes.

Parameters:

cpts – Dictionary of CPTs. Keys are ids of BBN node and values are new CPTs.

Returns:

None

update_evidences(evidences)

Updates this join tree with the list of specified evidence.

Parameters:

evidences – List of evidences.

Returns:

This join tree.

class pybbn.graph.jointree.JoinTreeListener

Bases: object

Interface like class used for listening to a join tree.

evidence_retracted(join_tree)

Evidence is retracted.

Parameters:

join_tree – Join tree.

evidence_updated(join_tree)

Evidence is updated.

Parameters:

join_tree – Join tree.

class pybbn.graph.jointree.PathDetector(graph, start, stop)

Bases: object

Detects path between two nodes.

__init__(graph, start, stop)

Ctor.

Parameters:
  • graph – Join tree.

  • start – Start node id.

  • stop – Stop node id.

exists()

Checks if a path exists.

Returns:

True if a path exists, otherwise, false.

Factory

Factories.

class pybbn.graph.factory.Factory

Bases: object

Factory to convert other API BBNs into py-bbn.

static from_data(structure, df)

Creates a BBN.

Parameters:
  • structure – A dictionary where keys are names of children and values are list of parent names.

  • df – A dataframe.

Returns:

BBN.

static from_libpgm_discrete_dictionary(d)

Converts a libpgm discrete network as specified by a dictionary into a py-bbn one. Look at https://pythonhosted.org/libpgm/unittestdict.html.

Parameters:

d – A dictionary representing a libpgm discrete network.

Returns:

py-bbn BBN.

static from_libpgm_discrete_json(j)

Converts a libpgm discrete network as specified by a JSON string into a py-bbn one. Look at https://pythonhosted.org/libpgm/unittestdict.html.

Parameters:

j – String representing JSON.

Returns:

py-bbn BBN.

static from_libpgm_discrete_object(bn)

Converts a libpgm discrete network object into a py-bbn one.

Parameters:

bn – libpgm discrete BBN.

Returns:

py-bbn BBN.

Potential

Potentials.

class pybbn.graph.potential.Potential

Bases: object

Potential.

__init__()

Ctor.

add_entry(entry)

Adds a PotentialEntry.

Parameters:

entry – PotentialEntry.

Returns:

This potential.

get_matching_entries(entry)

Gets all potential entries matching the specified entry.

Parameters:

entry – PotentialEntry.

Returns:

Array of matching potential entries.

static to_dict(potentials)

Converts potential to dictionary for easy validation.

Parameters:

potentials – Potential.

Returns:

Dictionary representation. Keys are entries and values are probabilities.

class pybbn.graph.potential.PotentialEntry

Bases: object

Potential entry.

__init__()

Ctor.

add(k, v)

Adds a node id and its value.

Parameters:
  • k – Node id.

  • v – Value.

Returns:

This potential entry.

duplicate()

Duplicates this entry.

Returns:

PotentialEntry.

get_entry_keys()

Gets entry keys sorted.

Returns:

List of tuples. First tuple is id of variable and second tuple is value of variable.

get_kv()

Gets key-value pair that may be used for storage in dictionary.

Returns:

Key-value pair.

matches(that)

Checks if this potential entry matches the specified one. A match is determined with all the keys and their associated values in the potential entry passed in matches this one.

Parameters:

that – PotentialEntry.

Returns:

class pybbn.graph.potential.PotentialUtil

Bases: object

Potential util.

static divide(numerator, denominator)

Divides two potentials.

Parameters:
  • numerator – Potential.

  • denominator – Potential.

Returns:

Potential.

static get_cartesian_product(lists)

Gets the cartesian product of a list of lists of values. For example, if the list is

  • [ [‘on’, ‘off’], [‘on’, ‘off’] ]

then the result will be a list of the following

  • [ ‘on’, ‘on’]

  • [ ‘on’, ‘off’ ]

  • [ ‘off’, ‘on’ ]

  • [ ‘off’, ‘off’ ]

Parameters:

lists – List of list of values.

Returns:

Cartesian product of values.

static get_potential(node, parents)

Gets the potential associated with the specified node and its parents.

Parameters:
  • node – BBN node.

  • parents – Parents of the BBN node (that themselves are also BBN nodes).

Returns:

Potential.

static get_potential_from_nodes(nodes)

Gets a potential from a list of BBN nodes.

Parameters:

nodes – Array of BBN nodes.

Returns:

Potential.

static is_zero(d)

Checks if the specified value is 0.0.

Parameters:

d – Value.

Returns:

A boolean indicating if the value is zero.

static marginalize_for(join_tree, clique, nodes)

Marginalizes the specified clique’s potential over the specified nodes.

Parameters:
  • join_tree – Join tree.

  • clique – Clique.

  • nodes – List of BBN nodes.

Returns:

Potential.

static merge(node, parents)

Merges the nodes into one array.

Parameters:
  • node – BBN node.

  • parents – BBN parent nodes.

Returns:

Array of BBN nodes.

static multiply(bigger, smaller)

Multiplies two potentials. Order matters.

Parameters:
  • bigger – Bigger potential.

  • smaller – Smaller potential.

static normalize(potential)

Normalizes the potential (make sure they sum to 1.0).

Parameters:

potential – Potential.

Returns:

Potential.

static pass_single_message(join_tree, x, s, y)

Single message pass from x – s – y (from x to s to y).

Parameters:
  • join_tree – Join tree.

  • x – Clique.

  • s – Separation-set.

  • y – Clique.

Utilities

Utilities to make life easier.

class pybbn.graph.util.IdUtil

Bases: object

ID util.

static hash_string(s)

Hashes the string.

Parameters:

s – String.

Returns:

Hash value.

Junction Tree Algorithm

Inference Control

Used in controlling exact inference.

class pybbn.pptc.inferencecontroller.InferenceController

Bases: JoinTreeListener

Inference controller.

static apply(bbn)

Sets up the specified BBN for probability propagation in tree clusters (PPTC).

Parameters:

bbn – BBN graph.

Returns:

Join tree.

static apply_from_serde(join_tree)

Applies propagation to join tree from a deserialzed join tree.

Parameters:

join_tree – Join tree.

Returns:

Join tree (the same one passed in).

evidence_retracted(join_tree)

Evidence is retracted.

Parameters:

join_tree – Join tree.

evidence_updated(join_tree)

Evidence is updated.

Parameters:

join_tree – Join tree.

static reapply(join_tree, cpts)

Reapply propagation to join tree with new CPTs. The join tree structure is kept but the BBN node CPTs are updated. A new instance/copy of the join tree will be returned.

Parameters:
  • join_tree – Join tree.

  • cpts – Dictionary of new CPTs. Keys are id’s of nodes and values are new CPTs.

Returns:

Join tree.

Potential Initialization

Used to initialize potentials.

class pybbn.pptc.potentialinitializer.PotentialInitializer

Bases: object

Potential initializer.

static init(bbn)

Initializes the BBN potentials.

Parameters:

bbn – BBN graph.

static reinit(jt)

Reinitialize potentials of BBN nodes in join tree.

Parameters:

jt – Join tree.

Returns:

None.

Moralization

Moralization of a directed acyclic graph.

class pybbn.pptc.moralizer.Moralizer

Bases: object

Graph moralizer for a DAG.

static moralize(dag)

Moralizes a DAG.

Parameters:

dag – DAG.

Returns:

Moralized (undirected) graph.

Triangulation

Triangulates a moralized graph.

class pybbn.pptc.triangulator.NodeClique(node, neighbors, weight, edges)

Bases: object

Node clique.

__init__(node, neighbors, weight, edges)

Ctor.

Parameters:
  • node – BBN node.

  • neighbors – BBN nodes (neighbors).

  • weight – Weight.

  • edges – Edges.

get_bbn_nodes()

Gets all the BBN nodes in this node clique.

Returns:

Array of BBN nodes.

class pybbn.pptc.triangulator.Triangulator

Bases: object

Triangulator. Triangulates an undirected moralized graph and produces cliques in the process.

static duplicate(g)

Duplicates a undirected graph.

Parameters:

g – Undirected graph.

Returns:

Undirected graph.

static generate_cliques(m)

Generates a list of node cliques.

Parameters:

m – Graph.

Returns:

List of NodeCliques.

static get_edges_to_add(n, m)

Gets edges to add.

Parameters:
  • n – BBN node.

  • m – Graph.

Returns:

Array of edges.

static get_weight(n, m)

Gets the weight of a BBN node. The weight of a node is the product of the its weight with all its neighbors’ weight.

Parameters:
  • n – BBN node.

  • m – Graph.

Returns:

Weight.

static is_subset(cliques, clique)

Checks if the specified clique is a subset of the specified list of cliques.

Parameters:
  • cliques – List of cliques.

  • clique – Clique.

Returns:

A boolean indicating if the clique is a subset.

static select_node(m)

Selects a clique from the specified graph. Cliques are sorted by number of edges, weight, and id (asc).

Parameters:

m – Graph.

Returns:

Clique.

static triangulate(m)

Triangulates the specified moralized graph.

Parameters:

m – Moralized undirected graph.

Returns:

Array of cliques.

Transformation

Transforms the cliques found from triangulation into a junction tree.

class pybbn.pptc.transformer.Transformer

Bases: object

Transformer. Transforms a list of cliques into a join tree.

static get_sep_sets(cliques)

Gets all pair-wise separation-sets.

Parameters:

cliques – Array of cliques.

Returns:

Array of separation sets sorted descendingly by mass followed by cost (asc) and id (asc).

static transform(cliques)

Transforms the cliques into a join tree.

Parameters:

cliques – List of cliques.

Returns:

Join tree.

Initialization

Initializes a junction tree.

class pybbn.pptc.initializer.Initializer

Bases: object

Initializes the join tree.

static get_clique(node, join_tree)

Gets the parent clique associated with the specified BBN node.

Parameters:
  • node – BBN node.

  • join_tree – Join tree.

Returns:

Parent clique.

static initialize(join_tree)

Starts the initialization.

Parameters:

join_tree – Join tree.

Returns:

Join tree.

Propagation

Propagates evidences in a junction tree.

class pybbn.pptc.propagator.Propagator

Bases: object

Evidence propagator.

static collect_evidence(join_tree, start)

Collects evidence.

Parameters:
  • join_tree – Join tree.

  • start – Start clique.

static distribute_evidence(join_tree, start)

Distributes evidence.

Parameters:
  • join_tree – Join tree.

  • start – Start clique.

static propagate(join_tree)

Propagates evidence.

Parameters:

join_tree – Join tree.

Returns:

Join tree.

Evidence Distribution

Distributes evidences.

class pybbn.pptc.evidencedistributor.EvidenceDistributor(join_tree, start_clique)

Bases: object

Evidence distributor. Passes messages using breadth-first-search (BFS). Messages are passed from the start clique to the far remote cliques.

__init__(join_tree, start_clique)

Ctor.

Parameters:
  • join_tree – Join tree.

  • start_clique – Start clique.

start()

Starts the evidence distribution.

Evidence Collection

Collects evidences.

class pybbn.pptc.evidencecollector.EvidenceCollector(join_tree, start_clique)

Bases: object

Evidence collector. Passes messages using depth-first-search (DFS). Messages are passed from the far remote cliques back to the start clique.

__init__(join_tree, start_clique)

Ctor.

Parameters:
  • join_tree – Join tree.

  • start_clique – Start clique.

start()

Starts the evidence collection.

Sampling

Use this module for sampling.

class pybbn.sampling.sampling.LogicSampler(bbn)

Bases: object

Logic sampling with rejection.

__init__(bbn)

Ctor.

Parameters:

bbn – BBN.

get_samples(evidence={}, n_samples=100, seed=37)

Gets the samples.

Parameters:
  • evidence – Evidence. Dictionary. Keys are ids and values are node values.

  • n_samples – Number of samples.

  • seed – Seed (default=37).

Returns:

Samples.

class pybbn.sampling.sampling.SortableNode(node_id, parent_ids)

Bases: object

Sortable node.

__init__(node_id, parent_ids)

Ctor.

Parameters:
  • node_id – Node ID.

  • parent_ids – List of parent IDs.

class pybbn.sampling.sampling.Table(node, parents=[])

Bases: object

Table association parent instantiations with cumulative distributions of node values.

__init__(node, parents=[])

Ctor.

Parameters:
  • node – BBN node.

  • parents – List of parent BBN nodes.

get_value(prob, sample=None)

Gets the value associated with the specified probability.

Parameters:
  • prob – Probability.

  • sample – Dictionary of variable-value sampled so far.

Returns:

Value.

has_parents()

Checks if the node associated with this table has parents.

Returns:

Boolean.

Generator

Used this package to create realistic Bayesian belief networks.

pybbn.generator.bbngenerator.convert_for_drawing(bbn)

Converts a BBN to a networkx graph for drawing.

Parameters:

bbn – BBN.

Returns:

Directed acyclic graph.

pybbn.generator.bbngenerator.convert_for_exact_inference(g, p)

Converts the graph and parameters to a BBN.

Parameters:
  • g – Directed acyclic graph (DAG in the form of networkx).

  • p – Parameters.

Returns:

BBN.

pybbn.generator.bbngenerator.generate_bbn_to_file(n, file_path, bbn_type='singly', max_iter=10, max_values=2, max_alpha=10)

Generates a BBN and saves it to a file.

Parameters:
  • n – Number of nodes.

  • file_path – File path. JSON and CSV supported. Export will be determined by path extension.

  • bbn_type – Type: singly or multi.

  • max_iter – Maximum iterations.

  • max_values – Maximum values.

  • max_alpha – Maximum alpha.

Returns:

None.

pybbn.generator.bbngenerator.generate_multi_bbn(n, max_iter=10, max_values=2, max_alpha=10)

Generates structure and parameters for a multi-connected BBN.

Parameters:
  • n – Number of nodes.

  • max_iter – Maximum iterations.

  • max_values – Maximum values per node.

  • max_alpha – Maximum alpha per value (hyperparameters).

Returns:

A tuple of structure and parameters.

pybbn.generator.bbngenerator.generate_singly_bbn(n, max_iter=10, max_values=2, max_alpha=10)

Generates structure and parameters for a singly-connected BBN.

Parameters:
  • n – Number of nodes.

  • max_iter – Maximum iterations.

  • max_values – Maximum values per node.

  • max_alpha – Maximum alpha per value (hyperparameters).

Returns:

A tuple of structure and parameters.

pybbn.generator.bbngenerator.to_json(g, params, pretty=False)

Serializes the graph to JSON.

Parameters:
  • g – Graph.

  • params – Parameters.

  • pretty – Pretty-print serialization flag.

Returns:

None.

Causality

Average Causal Effect

Use this package to compute the Average Causal Effect.

class pybbn.causality.ace.Ace(bbn)

Bases: object

Estimates average causal effect (ACE).

__init__(bbn)

ctor

Parameters:

bbn – Bayesian belief network.

get_ace(x, y, y_val)

Computes the ACE of X on Y.

Parameters:
  • x – X name.

  • y – Y name.

  • y_val – Y value.

Returns:

Dictionary of ACE over X values.

Gaussian Package

Inference

Use this module to do inference in Gaussian Bayesian Belief Networks.

class pybbn.gaussian.inference.GaussianInference(H, M, E, meta={})

Bases: object

Gaussian inference.

property P

Gets the univariate parameters of each variable.

Returns:

Dictionary. Keys are variable names. Values are tuples of (mean, variance).

__init__(H, M, E, meta={})

ctor.

Parameters:
  • H – Headers.

  • M – Means.

  • E – Covariance matrix.

  • meta – Dictionary storing observations.

do_inference(name, observation)

Performs inference. Simply calls the do_inferences method.

Parameters:
  • name – Name of variable.

  • observation – Observation value.

Returns:

GaussianInference.

do_inferences(observations)

Performs inference.

Denote the following.

  • \(z\) as the variable observed

  • \(y\) as the set of other variables

  • \(\mu\) as the vector of means
    • \(\mu_z\) as the partitioned \(\mu`\) of length \(|z|\)

    • \(\mu_y\) as the partitioned \(\mu`\) of length \(|y|\)

  • \(\Sigma\) as the covariance matrix
    • \(\Sigma_{yz}\) as the partitioned \(\Sigma\) of \(|y|\) rows and \(|z|\) columns

    • \(\Sigma_{zz}\) as the partitioned \(\Sigma\) of \(|z|\) rows and \(|z|\) columns

    • \(\Sigma_{yy}\) as the partitioned \(\Sigma\) of \(|y|\) rows and \(|y|\) columns

If we observe evidence \(z_e\), then the new means \(\mu_y^{*}\) and covariance matrix \(\Sigma_y^{*}\) corresponding to \(y\) are computed as follows.

  • \(\mu_y^{*} = \mu_y - \Sigma_{yz} \Sigma_{zz} (z_e - \mu_z)\)

  • \(\Sigma_y^{*} = \Sigma_{yy} \Sigma_{zz} \Sigma_{yz}^{T}\)

Parameters:

observations – List of observation. Each observation is tuple (name, value).

Returns:

GaussianInference.

property marginals

Gets the marginals.

Returns:

List of dictionary. Each element has name, mean and variance.

sample_marginals(size=1000)

Samples data from the marginals.

Parameters:

size – Number of samples.

Returns:

Dictionary with keys as names and values as pandas series (sampled data).

Indices and tables

Citation

@misc{vang_2017,
title={PyBBN},
url={https://github.com/vangj/py-bbn/},
author={Vang, Jee},
year={2017},
month={Jan}}

Author

Jee Vang, Ph.D.

  • Patreon: support is appreciated

  • GitHub: sponsorship will help us change the world for the better

Help