# init repo notebook
!git clone https://github.com/rramosp/ppdl.git > /dev/null 2> /dev/null
!mv -n ppdl/content/init.py ppdl/content/local . 2> /dev/null
!pip install -r ppdl/content/requirements.txt > /dev/null

Lab 02.02.1: Marginals, Conditionals, and Joints#

## Ignore this cell
!pip install ppdl==0.1.5 rlxmoocapi==0.1.0 --quiet
import inspect
from rlxmoocapi import submit, session
from ppdl.samplers import FluSeasonContactSampler

course_id = "ppdl.v1"
endpoint = "https://m5knaekxo6.execute-api.us-west-2.amazonaws.com/dev-v0001/rlxmooc"
lab = "L02.02.01"

Log-in with your username and password:

session.LoginSequence(
    endpoint=endpoint,
    course_id=course_id,
    lab_id=lab,
    varname="student"
    );

Base libraries:

import pandas as pd
import numpy as np

Task 1: Marginal Probabilities#

Consider the following probabilistic graphical model that represents the relationship between three random variables:

https://raw.githubusercontent.com/rramosp/ppdl/main/content/local/imgs/flu_season_contact.png
  • \(\mathcal{X}_1 \in \{"Winter", "Spring", "Summer", "Autumn"\}\): the current season.

  • \(\mathcal{X}_2 \in \{0, 1\}\): if a person had contact with another person with flu in the last week.

  • \(\mathcal{Y} \in \{0, 1\}\): if a person has flu.

Consider the following historical data:

sampler = FluSeasonContactSampler()
df = sampler(n_samples=1000, seed=42)
df

CHALLENGE: Implement a function that returns the marginal probability for a specific value in any of the three variables in the dataset:

def marginal(data, variable, value):
    """
    Implement this function to calculate the marginal probability of y.

    Parameters
    ----------
    data: DataFrame
        The dataframe containing the data.
    variable: str
        The variable to calculate the marginal probability of.
    value: str | int
        The value of the variable to calculate the marginal probability of.

    Returns
    -------
    prob : float
        The marginal probability of the variable in the given value.
    """
    ...

Test your code, the following cases should be:

> marginal(df, "y", 1)
0.34

> marginal(df, "x_1", "winter")
0.271

> marginal(df, "x_2", 0)
0.482
marginal(df, "y", 1)
marginal(df, "x_1", "winter")
marginal(df, "x_2", 0)
student.submit_task(namespace=globals(), task_id="T1");

Task 2: Conditional Probabilities#

Considering the same graphical model, consider the following historical data:

df = sampler(n_samples=1000, seed=0)
df

CHALLENGE: Implement the conditional function, so that it returns the conditional probability \(P(\mathcal{Y}|\mathcal{X}_1, \mathcal{X}_2)\).

def conditional(data, x_1_value, x_2_value, y_value):
    """
    Implement this function to calculate the conditional probability of y.

    Parameters
    ----------
    data: DataFrame
        The dataframe containing the data.
    x_1_value: str
        Season
    x_2_value: int
        Contact
    y_value: int
        Flu

    Returns
    -------
    prob : float
        The conditional probability of the variable in the given value.
    """
    ...

Test your code, the following cases should be:

> conditional(df, "spring", 0, 1)
0.1417

> conditional(df, "winter", 1, 1)
0.6590

> conditional(df, "autumn", 1, 0)
0.4919
# probability of flu given season is spring and contact is 0
conditional(df, "spring", 0, 1) 
# probability of flu given season is winter and contact is 1
conditional(df, "winter", 1, 1)
# probability of no flu given season is autumn and contact is 1
conditional(df, "autumn", 1, 0)
student.submit_task(namespace=globals(), task_id="T2");

Task 3: Joint Probabilities#

Consider the same graphical model, use the following historical data:

df = sampler(n_samples=1000, seed=20)
df

CHALLENGE: Implement the joint function, so that it returns the joint probability \(P(\mathcal{Y}, \mathcal{X}_1, \mathcal{X}_2)\) for any input variable, you must use the functions from the task 1 and 2 by deducing an expression from the graphical model.

def joint(data, x_1_value, x_2_value, y_value):
    """
    Implement this function to calculate the joint probability.

    Parameters
    ----------
    data: DataFrame
        The dataframe containing the data.
    y_value: int
        If the person has flu.
    x_1_value: str | int
        The value of the first variable to calculate the joint probability of.
    x_2_value: str | int
        The value of the second variable to calculate the joint probability of.

    Returns
    -------
    prob : float
        The joint probability of the variable in the given value.
    """
    ...

Test your code, the following cases should be:

> joint(df, "spring", 1, 0)
0.0923

> joint(df, "winter", 1, 1)
0.0745

> joint(df, "autumn", 1, 0)
0.0681
# probability of no flu and spring and contact 
joint(df, "spring", 1, 0)
# probability of flu and winter and contact
joint(df, "winter", 1, 1)
# probability of no flu and autumn and contact
joint(df, "autumn", 1, 0)
student.submit_task(namespace=globals(), task_id="T3");