Supplementary Exercises*#

Warning

This page contains solutions! We recommend attempting each problem before peeking.

1. New neighbors#

A family with two primary-school children just moved in next door. You have not seen the children yet. Consider two scenarios:

  1. This morning you introduced yourself to the neighbor and asked whether he has any boys. He said: Yes. What is the probability that one child is a girl, assuming he is telling the truth?

  2. This morning you saw one of the children, a boy. What is the probability that one child is a girl?

In other words, you are either told that there is a boy in family (1) or you saw that there is a boy (2). Why then do the probabilities in (1) and (2) differ? Assume that the probability of a boy or a girl is \(\frac{1}{2}\), and that the genders of the two children are independent.

2. New neighbors with three children#

You just got new neighbors, a family with three elementary school children. You have not seen the children yet. Consider two scenarios:

  1. This morning you met the neighbor and asked whether he has any boys. He answers: Yes. What is the probability that exactly one child is a girl, assuming your neighbor is telling the truth?

  2. This morning you saw one of the children, a boy. What is the probability that exactly one child is a girl?

In other words, you are either told that there is a boy in family (1) or you saw that there is a boy (2). Why then do the probabilities in (1) and (2) differ? Find the probabilities and explain. Assume that the probability of a boy or a girl is \(\frac{1}{2}\), and that the genders of the three children are independent.

3. Queen of spades revisited#

From the standard deck of 52 cards, the 2 of diamonds, 3 of spades, 4 of clubs, and 5 of hearts are removed. From the remaining cards, one card is selected at random.

  1. Are the events A - the selected card is queen, and B - the selected card is a spade, independent?

  2. What is the answer to (a) if from the standard deck both red suits are eliminated, and prior to selecting a card, the deck contains only spades and clubs suits?

4. Multiple Choice#

A student answers a multiple choice examination question that has 4 possible answers. Suppose that the probability that the student knows the answer to a question is 0.80 and the probability that the student guesses is 0.20. If the student guesses, the student is able to eliminate two choices as wrong and guess on the remaining two choices in 40% of cases. In this case, the probability of a correct answer is \(\frac{1}{2}\). The student is able to eliminate one choice as wrong and guess on the remaining three choices in 30% of cases, and for 30% of questions, the student is not able to eliminate any choice and the probability of a correct answer is \(\frac{1}{4}\).

  1. What is the probability that the randomly selected question is answered correctly?

  2. If it is answered correctly, what is the probability that the student really knew the correct answer?

5. Classifier#

In a machine learning classification procedure, the items are classified as 1 or 0. Based on a training sample of size 120 in which there are 65 1’s and 55 0’s, the classifier predicts 70 1’s and 50 0’s. Out of 70 items predicted by the classifier as 1, 52 are correctly classified.

  1. What are the sensitivity and specificity of the classifier?

  2. From the population of items where the proportion of 0-labels is 95% (and 1-labels 5%), an item is selected at random. What is the probability that the item is of label 1 if the classifier says it was?

6. Alzheimer’s#

A medical research team wished to evaluate a proposed screening test for Alzheimer’s disease. The test was given to a random sample of 450 patients with Alzheimer’s disease and to an independent sample of 500 subjects without symptoms of the disease. The two samples were drawn from the population of subjects who are 65 years old or older. The results are as follows:

Test Result

Diagnosed Alzheimer’s, \(D\)

No Alzheimer’s Symptoms, \(D^c\)

Total

Positive Test, T

436

5

441

Negative Test, \(T^c\)

14

495

509

Total

450

500

950

Probability of D (prevalence) is the rate of the disease in the relevant population (\(\ge 65\) y.o.) and it is estimated to be 11.3 (Evans and others [1990]).

  1. Using the numbers from the table, estimate \(P(T | D)\) and \(P(T^c | D^c)\). Interpret these probabilities in terms of the problem, one sentence for each.

  2. Find \(P(D | T)\) (positive predicted value) using Bayes’ formula. You cannot find \(P(D | T)\) using information from the table only - you need external info, such as prevalence.

7. Mysterious Transfer#

Of two bags, one contains four white balls and three black balls and the other contains three white balls and five black balls. One ball is randomly selected from the first bag and placed unseen in the second bag.

  1. What is the probability that a ball now drawn from the second bag will be black?

  2. If the second ball is black, what is the probability that a black ball was transferred?

8. NIR and Raman in Parkinson’s#

In a study by Schipper et al. (2008), 53 subjects, 21 with mild or moderate stages of Parkinson’s disease and 32 age-matched controls, had whole blood samples analyzed using the near-infrared (NIR) spectroscopy and Raman spectroscopy methods. The data showed that the two independent biospectroscopy measurement techniques yielded similar and consistent results. In differentiating Parkinson’s disease patients from the control group, Raman spectroscopy resulted in eight false positives and four false negatives. NIR spectroscopy resulted in four false positives and five false negatives.

For both methods, find the Positive Predicted Value, that is, the probability that a person who tested positive and was randomly selected from the same age group in the general population has the disease if no other clinical information is available. Use the assumption that the population prevalence is well estimated by the information in the table (see Vidakovic [2017] p. 137).

9. Twins#

Dizygotic (fraternal) twins have the same probability of each gender as in overall births, which is approximately 51% male, 49% female. Monozygotic (identical) twins must be of the same gender. Among all twin pregnancies, about 1/3 are monozygotic. Find the probability of two girls in

  1. Monozygotic pregnancy (assuming that monozygotic twins have the same probability of each gender as in overall births).

  2. Dizygotic pregnancy.

  3. Dizygotic pregnancy given that we know that the gender of the babies is the same.

  4. Probability of dizygotic pregnancy given that we know that the gender of the babies is the same.

If Mary is expecting twins, but no information about the type of pregnancy is available, what is the probability that the babies are

  1. Two girls.

  2. Of the same gender.

  3. Find the probability that Mary’s pregnancy is dizygotic if it is known that the babies are two girls. Retain four decimal places in your calculations.

Hints: (2) genders are independent; (3) since \(A\) is a subset of \(B\), \(A \cap B = A\) and \(P(A | B) = P(A)/P(B)\); (4, 5, 6) total probability; (7) Bayes’ rule.

10. Hexi#

There is a 10% chance that pure breed German shepherd Hexi is a carrier of canine hemophilia A. If she is a carrier, there is a 50−50 chance that she will pass the hemophiliac gene to a puppy. Hexi has two male puppies and they are tested free of hemophilia. What is the probability that Hexi is a carrier, given this information about her puppies?

11. Playing Dice at the Casino#

You play the following game in a casino. You roll a pair of fair dice and then the croupier rolls a pair of fair dice as well. If the sum on the croupier’s dice is larger or equal than on yours, the casino wins. If the sum on your dice is strictly larger than the croupier’s, you win.

  1. What is the probability that you win?

  2. If it is known that you won, what is the probability that croupier obtained a sum of 9?

  3. What is the expected croupier’s sum if you won? Note that without any information on winners, the expected sum for both you and the croupier is 7.

Hide code cell content
import numpy as np
import random

casino = np.array(range(2, 13))
prob = np.array([35, 66, 90, 104, 105, 90, 50, 24, 9, 2, 0]) / 575
np.dot(casino, prob)  # 5.454

n = 1000000
won = 0
for x in range(n):
    you = random.randint(1, 6) + random.randint(1, 6)
    casino = random.randint(1, 6) + random.randint(1, 6)
    if casino < you:
        won += 1
print("Approx Prob of Winning =", won / n)

12. Redundant Wiring#

In the circuit shown in the figure below, the electricity is to move from point A to point B.

circuit diagram

The four independently working elements in the circuit are operational (and the current goes through) with probabilities given in the table

Element

\(e_1\)

\(e_2\)

\(e_3\)

\(e_4\)

Operational with prob

0.5

0.2

0.3

0.8

If an element fails, the current is not going through.

  1. Is it possible to save on the wire that connects the elements without affecting the functionality of the network? Explain which part of wiring can be removed.

  2. Find the probability that the electricity will flow from A to B.

13. Cross-linked System#

Each of the five components in a cross-linked system is operational in a time interval \([0, T]\) with the probability of 0.6. The components are independent. Let \(E_i\) denote the event that \(i\)-th component is operational at time \(T\)and \(E_i^c\) that it is not. Denote by \(A\) the event that the system is operational at time \(T\).

circuit diagram

  1. Find the probabilities of events \(H_1 = E_2^c E_3^c\), \(H_2 = E_2^c E_3\), \(H_3 = E_2 E_3^c\), and \(H_4 = E_2 E_3\). Do these four probabilities sum up to 1?

  2. What is the probability of the system being operational if \(H_1\)is true; that is, what is \(P(A | H_1)\)? Find also \(P(A | H_2)\), \(P(A | H_3)\) and \(P(A | H_4)\).

  3. Using results in (a) and (b), find \(P(A)\).

Hide code cell content
p = 0.6
q = 1 - p

# P(H_i)
H_1 = q**2
H_2 = q * p
H_3 = H_2
H_4 = p**2

# part a
print(H_1, H_2, H_3, H_4)
print(sum([H_1, H_2, H_3, H_4]))

# part b
# P(A|H_i)
# if neither E_2 or E_3 are working, there's no viable path
p_ah1 = 0

# if E_2 isn't working, then only the E_3 -> E_4/E_5 path is possible
p_ah2 = 1 - q**2

# if E_3 isn't working, only the top path will work
p_ah3 = p

# If E_3 and E_2 are assumed working, there is one guaranteed path for P(A) on the crosslink
p_ah4 = 1

# part c
p_A = p_ah1 * H_1 + p_ah2 * H_2 + p_ah3 * H_3 + p_ah4 * H_4
print(p_A)  # .7056

Note

The following Bayes Network problems may be solved in many ways, including PyMC or other PPLs. OpenBUGS solutions are available at the old class website. PPLs will not be required or allowed on the midterm, nor do we emphasize these types of problems because they can be so time-consuming.

14. Sprinkler Bayes Net#

Suppose that a sprinkler (\(S\)) or rain (\(R\)) can make the grass in your yard wet (\(W\)). The probability that the sprinkler was on depends whether the day was cloudy (\(C\)). The probability of rain also depends on whether the day was cloudy. The DAG for events \(C\), \(S\), \(R\), and \(W\) is shown below:

DAG

The conditional probabilities of the nodes are given in the following tables:

\[\begin{split} \begin{array}{c|c} C^c & C \\ 0.5 & 0.5 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} S^c & S & \text{Condition} \\ 0.50 & 0.50 & C^c \\ 0.90 & 0.10 & C \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} R^c & R & \text{Condition} \\ 0.80 & 0.20 & C^c \\ 0.20 & 0.80 & C \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} W^c & W & \text{Condition} \\ 1 & 0 & S^c R^c \\ 0.10 & 0.90 & S^c R \\ 0.10 & 0.90 & S R^c \\ 0.01 & 0.99 & S R \\ \end{array} \end{split}\]

Approximate the probabilities:

  1. \(P(C | W)\)

  2. \(P(S | W^c)\)

  3. \(P(C | R, W^c)\)

Hide code cell content
# W = Wet, C = Cloudy, S = Sprinkler, R = Rain
#'n' represents 'not'
p_W_C_S_R = 0.5 * 0.1 * 0.8 * 0.99
p_W_C_S_Rn = 0.5 * 0.1 * 0.2 * 0.9
p_W_C_Sn_R = 0.5 * 0.9 * 0.8 * 0.9
p_W_C_Sn_Rn = 0.5 * 0.9 * 0.2 * 0

p_W_Cn_S_R = 0.5 * 0.5 * 0.2 * 0.99
p_W_Cn_S_Rn = 0.5 * 0.5 * 0.8 * 0.9
p_W_Cn_Sn_R = 0.5 * 0.5 * 0.2 * 0.9
p_W_Cn_Sn_Rn = 0.5 * 0.5 * 0.8 * 0

# combine cloudy and not cloudy terms
p_C = p_W_C_S_R + p_W_C_S_Rn + p_W_C_Sn_R + p_W_C_Sn_Rn
p_Cn = p_W_Cn_S_R + p_W_Cn_S_Rn + p_W_Cn_Sn_R + p_W_Cn_Sn_Rn

# P(C|W)
print(p_C / (p_C + p_Cn))
# output = 0.5757997218358832
0.5757997218358832

15. Simple Diagnostic Bayes Network#

Incidences of diseases \(A\) and \(B\) depend on the exposure (E). Disease \(A\) is additionally influenced by risk factors ®. Both diseases lead to symptoms (S). Results of the test for disease \(B\)(\(T_B\)) are affected also by disease \(A\). Positive test will be denoted as \(T_B = 1\), negative as \(T_B = 0\). The Bayes Network and needed conditional probabilities are shown in this figure:

graph

The probabilities of nodes are as follows:

\[\begin{split} \begin{array}{c|c} E & 0 & 1 \\ & 0.8 & 0.2 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c} R & 0 & 1 \\ & 0.7 & 0.3 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} A & 0 & 1 \\ E^c & 0.9 & 0.1 \\ E & 0.5 & 0.5 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} B & 0 & 1 \\ E^c R^c & 0.95 & 0.05 \\ E^c R & 0.8 & 0.2 \\ E R^c & 0.7 & 0.3 \\ E R & 0.25 & 0.75 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} S & 0 & 1 \\ A^c B^c & 0.99 & 0.01 \\ A^c B & 0.5 & 0.5 \\ A B^c & 0.8 & 0.2 \\ A B & 0.4 & 0.6 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c} T_B & 0 & 1 \\ A^c B^c & 0.99 & 0.01 \\ A^c B & 0.2 & 0.8 \\ A B^c & 0.95 & 0.05 \\ A B & 0.15 & 0.85 \\ \end{array} \end{split}\]
  1. What is the probability of disease \(B\), if disease \(A\) is not present, but symptoms \(S\) are present?

  2. What is the probability of exposure \(E\), if symptoms are present \(S\) and test for \(B\) is positive?

Hide code cell content
import pymc as pm
import numpy as np
import pytensor
import pytensor.tensor as pt


# PyMC solution based on this notebook:
# https://gist.github.com/Dekermanjian/35aa0340c673cb6cc7f9fbd95e8e764d

table_A = pytensor.shared(np.array([[0.9, 0.1], [0.5, 0.5]]))
table_B = pytensor.shared(
    np.array([[[0.95, 0.05], [0.8, 0.2]], [[0.7, 0.3], [0.25, 0.75]]])
)
table_S = pytensor.shared(
    np.array([[[0.99, 0.01], [0.5, 0.5]], [[0.8, 0.2], [0.4, 0.6]]])
)
table_T_B = pytensor.shared(
    np.array([[[0.99, 0.01], [0.2, 0.8]], [[0.95, 0.05], [0.15, 0.85]]])
)


def lookup_A(E):
    return table_A[E]


def lookup_B(E, R):
    return table_B[E, R]


def lookup_S(A, B):
    return table_S[A, B]


def lookup_T_B(A, B):
    return table_T_B[A, B]


with pm.Model() as m:
    E = pm.Categorical("E", p=np.array([0.8, 0.2]))
    R = pm.Categorical("R", p=np.array([0.7, 0.3]))

    A = pm.Categorical("A", p=lookup_A(E))
    B = pm.Categorical("B", p=lookup_B(E, R))
    S = pm.Categorical("S", p=lookup_S(A, B))
    T_B = pm.Categorical("T_B", p=lookup_T_B(A, B))

    prior_trace = pm.sample_prior_predictive(100000)

prior_trace_dict = {
    "E": prior_trace.prior["E"],
    "R": prior_trace.prior["R"],
    "A": prior_trace.prior["A"],
    "B": prior_trace.prior["B"],
    "S": prior_trace.prior["S"],
    "T_B": prior_trace.prior["T_B"],
}


def conditional_probabilities(trace, var, conditions):
    subset = np.all(
        [trace[k].values == v for k, v in conditions.items()], axis=0
    )
    prob = np.mean(trace[var].values[subset])
    return prob


P_B_given_A0_S1 = conditional_probabilities(
    prior_trace_dict, "B", {"A": 0, "S": 1}
)
P_E_given_S1_T_B1 = conditional_probabilities(
    prior_trace_dict, "E", {"S": 1, "T_B": 1}
)

# print(f"P(B|A=0, S=1) = {P_B_given_A0_S1}")  # ~ 0.8899
# print(f"P(E|S=1, T_B=1) = {P_E_given_S1_T_B1}")  # ~ 0.5571

from myst_nb import glue

glue("glued_txt_15_1", f"P(B|A=0, S=1) = {round(P_B_given_A0_S1, 4)}")
glue("glued_txt_15_2", f"P(E|S=1, T_B=1) = {round(P_E_given_S1_T_B1, 4)}")
Sampling: [A, B, E, R, S, T_B]
'P(B|A=0, S=1) = 0.8841'
'P(E|S=1, T_B=1) = 0.5635'

16. Smart Alarm#

Your house has a “smart” alarm system that warns you against burglary with a long sound. The house is located in a seismically active area and the alarm system will emit a short sound if set off by an earthquake. The alarm can sound either way by error, or nor sound even in the case of earthquake or burglary. You have two neighbors, Mary and John, who do not know each other. If they hear the alarm they call you, but this is not guaranteed. The likelihood of their call depends on type of sound if any. They also call you from time to time just to chat. The probabilities are:

\[\begin{split} \begin{array}{c|c} E & 0 & 1 \\ & 0.998 & 0.002 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c} B & 0 & 1 \\ & 0.999 & 0.001 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c|c|c} A & 0 & 1 & 2 \\ E^c B^c & 0.998 & 0.001 & 0.001 \\ E^c B & 0.01 & 0.12 & 0.87 \\ E B^c & 0.01 & 0.75 & 0.24 \\ E B & 0.008 & 0.092 & 0.9 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c} M & 0 & 1 \\ A = 0 & 0.99 & 0.01 \\ A = 1 & 0.4 & 0.6 \\ A = 2 & 0.3 & 0.7 \\ \end{array} \end{split}\]
\[\begin{split} \begin{array}{c|c} J & 0 & 1 \\ A = 0 & 0.95 & 0.05 \\ A = 1 & 0.1 & 0.9 \\ A = 2 & 0.05 & 0.95 \\ \end{array} \end{split}\]

Write BUGS code to approximate conditional probabilities of nodes given the evidence.

  1. Find the probability of a burglary if a short sound was emitted by the alarm;

  2. Find the probability of Mary calling you if John called you;

  3. Find the probability of a short sound if John called you and there was no earthquake;

  4. Find the probability of any sound if there was no burglary and Mary did not call you;

  5. Find the probability of an earthquake if a long sound was emitted by the alarm.