Skip to content
  • add_tailor() specifies post-processing steps to apply through the usage of a tailor.

  • remove_tailor() removes the tailor as well as any downstream objects that might get created after the tailor is used for post-processing, such as the fitted tailor.

  • update_tailor() first removes the tailor, then replaces the previous tailor with the new one.

Usage

add_tailor(x, tailor, prop = NULL, method = NULL, ...)

remove_tailor(x)

update_tailor(x, tailor, ...)

Arguments

x

A workflow

tailor

A tailor created using tailor::tailor(). The tailor should not have been trained already with tailor::fit(); workflows will handle training internally.

prop

The proportion of the data in fit.workflow() that should be held back specifically for estimating the postprocessor. Only relevant for postprocessors that require estimation—see section Data Usage below to learn more. Defaults to 2/3.

method

The method with which to split the data in fit.workflow(), as a character vector. Only relevant for postprocessors that require estimation and not required when resampling the workflow with tune. If fit.workflow(data) arose as training(split_object), this argument can usually be supplied as class(split_object). Defaults to "mc_split", which randomly samples fit.workflow(data) into two sets, similarly to rsample::initial_split(). See section Data Usage below to learn more.

...

Not used.

Value

x, updated with either a new or removed tailor postprocessor.

Data Usage

While preprocessors and models are trained on data in the usual sense, postprocessors are training on predictions on data. When a workflow is fitted, the user supplies training data with the data argument. When workflows don't contain a postprocessor that requires training, they can use all of the supplied data to train the preprocessor and model. However, in the case where a postprocessor must be trained as well, training the preprocessor and model on all of data would leave no data left to train the postprocessor with—if that were the case, workflows would need to predict() from the preprocessor and model on the same data that they were trained on, with the postprocessor then training on those predictions. Predictions on data that a model was trained on likely follow different distributions than predictions on unseen data; thus, workflows must split up the supplied data into two training sets, where the first is used to train the preprocessor and model and the second is passed to that trained processor and model to generate predictions, which then form the training data for the post-processor.

The arguments prop and method parameterize how that data is split up. prop determines the proportion of rows in fit.workflow(data) that are allotted to training the preprocessor and model, while the rest are used to train the postprocessor. method determines how that split occurs; since fit.workflow() just takes in a data frame, the function doesn't have any information on how that dataset came to be. For example, data could have been created as:

split <- rsample::initial_split(some_other_data)
data <- rsample::training(split)

...in which case it's okay to randomly allot some rows of data to train the preprocessor and model and the rest to train the postprocessor. However, data could also have arisen as:

boots <- rsample::bootstraps(some_other_data)
split <- rsample::get_rsplit(boots, 1)
data <- rsample::assessment(split)

In this case, some of the rows in data will be duplicated. Thus, randomly allotting some of them to train the preprocessor and model and others to train the preprocessor would likely result in the same rows appearing in both datasets, resulting in the preprocessor and model generating predictions on rows they've seen before. Similarly problematic situations could arise in the context of other resampling situations, like time-based splits. The method argument ensures that data is allotted properly (and is internally handled by the tune package when resampling workflows).

Examples

library(tailor)
library(magrittr)

tailor <- tailor("binary")
tailor_1 <- adjust_probability_threshold(tailor, .1)

workflow <- workflow() %>%
  add_tailor(tailor_1)

workflow
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
#> Postprocessor: tailor
#> 
#> ── Postprocessor ─────────────────────────────────────────────────────────
#> 
#> ── tailor ────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#> 
#>  Adjust probability threshold to 0.1.
#> NA
#> NA
#> NA

remove_tailor(workflow)
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None

update_tailor(workflow, adjust_probability_threshold(tailor, .2))
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
#> Postprocessor: tailor
#> 
#> ── Postprocessor ─────────────────────────────────────────────────────────
#> 
#> ── tailor ────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#> 
#>  Adjust probability threshold to 0.2.
#> NA
#> NA
#> NA