--- title: "tidydice" author: "Roland Krasser" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{tidydice} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ### Introduction A basic understanding of probability and statistics is crucial for data understanding and discovery of meaningful patterns. A great way to teach probability and statistics is to start with an experiment, like rolling a dice or flipping a coin. This package simulates rolling a dice and flipping a coin. Each experiment generates a tibble. Dice rolls and coin flips are simulated using sample(). The properties of the dice can be changed, like the number of sides. A coin flip is simulated using a two sided dice. Experiments can be combined with the pipe-operator. tidydice package on Github: [https://github.com/rolkra/tidydice](https://github.com/rolkra/tidydice) ### Setup As the tidydice-functions fits well into the tidyverse, we load the dplyr-package. For quick visualisations we use the explore-package. To create more flexible graphics, use ggplot2. ```{r message=FALSE, warning=FALSE} library(tidydice) library(dplyr) library(explore) ``` ### Roll a dice #### Example 6 dice are rolled 3 times using roll_dice(). The result of the dice-experiment is visualised using plot_dice(). ```{r fig.height=3, fig.width=6} set.seed(123) roll_dice(times = 6, rounds = 3) %>% plot_dice() ``` #### Roll a dice once The output of roll_dice() is a tibble. Each row represents a dice roll. Without parameters, a dice is rolled once. You can use plot_dice() to visualise the result. ```{R echo=TRUE} set.seed(123) roll_dice() ``` ```{R echo=TRUE, fig.width=6, fig.height=1} set.seed(123) roll_dice() %>% plot_dice() ``` Success is defined as result = 6 (as default), while result = 1..5 is not a success. In this case the result is 2, so it is no success. #### Define success If we would define result = 2 and result = 6 as success, it would be treated as success. ```{R echo=TRUE} set.seed(123) roll_dice(success = c(2,6)) ``` #### Unfair dice As default, the dice is fair. So every result (0..6) has the same probability. If you want, you can change this. ```{R echo=TRUE} set.seed(123) roll_dice(prob = c(0,0,0,0,0,1)) ``` In this case we created a dice that always gets result = 6 (with 100% probability) #### Untypically dice As default the dice has 6 sides. If you want you can change this. Here we use a dice with 12 sides. result now can have a value between 1 and 12. But result = 6 is still the default success. ```{R echo=TRUE} set.seed(123) roll_dice(sides = 12) ``` #### Roll a dice 4 times ```{R echo=TRUE} set.seed(123) roll_dice(times = 4) ``` ```{R echo=TRUE, fig.width=6, fig.height=1} set.seed(123) roll_dice(times = 4) %>% plot_dice() ``` We get 1 success #### Define rounds ```{R echo=TRUE} set.seed(123) roll_dice(times = 4, rounds = 2) ``` ```{R echo=TRUE, fig.width=6, fig.height=2} set.seed(123) roll_dice(times = 4, rounds = 2) %>% plot_dice() ``` Rolling the dice 4 times is repeated. In the first round we got 1 success, in the secound round 2 success. #### Use agg A convenient way to aggregate the result, is to use the agg parameter. Now we get one line per round. ```{R echo=TRUE} set.seed(123) roll_dice(times = 4, rounds = 2, agg = TRUE) ``` You can aggregate by hand too using dplyr. ```{R echo=TRUE} set.seed(123) roll_dice(times = 4, rounds = 2) %>% group_by(experiment, round) %>% summarise(times = n(), success = sum(success)) ``` #### Visualise result You can use any package/tool you like to visualise the result. In this example we use the explore-package. ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) roll_dice(times = 100) %>% explore(result, title = "Rolling a dice 100x") ``` In 15% of the cases we got a six. This is close to the expected value of 100/6 = 16.67% If we increase the times parameter to 10000, the results are more balanced. ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) roll_dice(times = 10000) %>% explore(result, title = "Rolling a dice 10000x") ``` #### Visualise success If we repeat the experiment rolling a dice 100x with rounds = 100, we get the distribution with a peak at about 17 (16.67 is the expected value) ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) roll_dice(times = 100, rounds = 100, agg = TRUE) %>% explore(success, title = "Rolling 100 dice 100x", auto_scale = FALSE) ``` If we increase rounds from 100 to 10000 we get a more symmetric shape. We see that success below 5 and success above 30 are very unlikely. ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% explore(success, title = "Rolling 100 dice 10000x", auto_scale = FALSE) ``` This shape is already very close to the binomial distribution ```{R echo=TRUE, fig.height=4, fig.width=7} binom_dice(times = 100) %>% plot_binom(title = "Binomial distribution, rolling 100 dice") ``` ```{R echo=TRUE} set.seed(123) roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% mutate(check = ifelse(success < 5 | success > 30, 1, 0)) %>% count(check) ``` In only 4 of 10000 (0.04%) cases success is below 5 or above 30. So the probability to get this result is very low. We can check that with the binomial distribution too: ```{R echo=TRUE} binom_dice(times = 100) %>% filter(success < 5 | success > 30) ``` ```{R echo=TRUE} binom_dice(times = 100) %>% filter(success < 5 | success > 30) %>% summarise(check_pct = sum(pct)) ``` The probability to get this result is 0.04% (based on the binomial distribution). #### Combine experiments Let's add an experiment, where you have 10 extra dice. The shape of the distribution changes. ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% roll_dice(times = 110, rounds = 10000, agg = TRUE) %>% explore(success, target = experiment, title = "Rolling a dice 100/110x", auto_scale = FALSE) ``` You can add as many experiments as you like (as long they generate the same data structure) Adding an experiment with times = 150 will generate a smaller but wider shape. ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) roll_dice(times = 100, rounds = 10000, agg = TRUE) %>% roll_dice(times = 110, rounds = 10000, agg = TRUE) %>% roll_dice(times = 150, rounds = 10000, agg = TRUE) %>% explore(success, target = experiment, title = "Rolling a dice 100/110/150x", auto_scale = FALSE) ``` #### Binomial distribution Rolling a dice 100x, a result between 10 and 23 has a probability of over 94% ```{R echo=TRUE, fig.height=4, fig.width=7} binom_dice(times = 100) %>% plot_binom(highlight = c(10:23)) ``` ### Flip a coin Internally the package handles coins as dice with only two sides. Success is defined as result = 2 (as default), while result = 1 is not a success. #### Flip a coin 10x ```{R echo=TRUE} set.seed(123) flip_coin(times = 10) ``` In this case the result are 6x 2 and 4x 1. We can use the describe() function of the explore-package to get a good overview. ```{R echo=TRUE} set.seed(123) flip_coin(times = 10) %>% describe(success) ``` Or just use the agg-parameter ```{R echo=TRUE} set.seed(123) flip_coin(times = 10, agg = TRUE) ``` #### Define rounds The parameter rounds can be used like in roll_dice(). ```{R echo=TRUE} set.seed(123) flip_coin(times = 10, rounds = 4, agg = TRUE) ``` #### Visualise result ```{R echo=TRUE, fig.height=4, fig.width=7} set.seed(123) flip_coin(times = 10, rounds = 4) %>% plot_coin() ``` #### Combine experiments ```{R echo=TRUE} set.seed(123) flip_coin(times = 10, agg = TRUE) %>% flip_coin(times = 15, agg = TRUE) ``` #### Binomial Distribution ```{R echo=TRUE} binom_coin(times = 10) ``` ```{R echo=TRUE, fig.height=4, fig.width=7} binom_coin(times = 10) %>% plot_binom(title = "Binomial distribution,\n10 coin flips") ``` ### Dice design #### Default settings ```{r fig.height=1, fig.width=6} set.seed(123) roll_dice(times = 6) %>% plot_dice() ``` #### Change color ```{r fig.height=1, fig.width=6} set.seed(123) roll_dice(times = 6) %>% plot_dice(fill = "black", line_color = "white", point_color = "white") ``` ```{r fig.height=1, fig.width=6} set.seed(123) roll_dice(times = 6) %>% plot_dice(fill = "lightblue", fill_success = "gold") ``` ```{r fig.height=1, fig.width=6} set.seed(123) roll_dice(times = 6) %>% plot_dice(fill = "darkgrey", fill_success = "darkblue", line_color = "white", point_color = "white") ``` #### Dice type ```{r fig.height=1, fig.width=6} set.seed(123) roll_dice(times = 6) %>% plot_dice(detailed = TRUE) ``` ```{r fig.height=1, fig.width=6} set.seed(123) roll_dice(times = 6) %>% plot_dice(detailed = FALSE) ``` #### Limits plot_dice() is limited to 1 experiment with max. 10 times x 10 rounds. ```{r fig.height=6, fig.width=6} set.seed(123) roll_dice(times = 10, rounds = 10) %>% plot_dice(detailed = FALSE, fill_success = "gold") ``` #### Force You can force a result using force_dice() and force_coin(). ```{r fig.height=1, fig.width=6} force_dice(1:6) %>% plot_dice() ``` ```{r fig.height=1, fig.width=6} force_dice(rep(6, times = 6)) %>% plot_dice() ``` We can combine two foreced dice rolling using the pipe operator and the parameter round. ```{r fig.height=1, fig.width=6} force_dice(rep(5, times = 3), round = 1) %>% force_dice(rep(6, times = 3), round = 2) ``` ```{r fig.height=1, fig.width=6} set.seed(123) force_dice(rep(6, times = 3)) %>% roll_dice(times = 3) ``` In the first experiment we get 3 times a 6 (forced), but in the second experiment none. ### Using Dice Formula If you want to do more complex dice rolls, use ```roll_dice_formula()``` ```{r} # roll 1 dice with 6 sides roll_dice_formula(dice_formula = "1d6", seed = 123) ``` ```{r} roll_dice_formula( dice_formula = "4d6", # 4 dice with 6 sides success = 15:24, # success is defined as sum between 15 and 24 seed = 123 # random seed to make it reproducible ) ``` ```{r} # roll 4 dice with 6 sides roll_dice_formula( dice_formula = "4d6", # 4 dice with 6 sides rounds = 10, # repeat 10 times success = 15:24, # success is defined as sum between 15 and 24 seed = 123 # random seed to make it reproducible ) ``` ```{r} roll_dice_formula( dice_formula = "4d6e3", # 4 dice with 6 sides, explode on a 3 rounds = 5, # repeat 5 times success = 15:24, # success is defined as sum between 15 and 24 seed = 123 # random seed to make it reproducible ) ``` ```{r fig.height=3, fig.width=6} roll_dice_formula( dice_formula = "4d6+1d10", # 4 dice with 6 sides + 1 dice with 10 sides rounds = 1000) %>% # repeat 1000 times explore_bar(result, numeric = TRUE) # visualise result ``` Other examples for dice_formula: - ```1d6``` = roll one 6-sided dice - ```1d8``` = roll one 8-sided dice - ```1d12``` = roll one 12-sided dice - ```2d6``` = roll two 6-sided dice - ```1d6e6``` = roll one 6-sided dice, explode dice on a 6 - ```3d6kh2``` = roll three 6-sided dice, keep highest 2 rolls - ```3d6kl2``` = roll three 6-sided dice, keep lowest 2 rolls - ```4d6kh3e6``` = roll four 6-sided dice, keep highest 3 rolls, but explode on a 6 - ```1d20+4``` = roll one 20-sided dice, and add 4 - ```1d4+1d6``` = roll one 4-sided dice and one 6-sided dice, and sum the results