add_tailor()
specifies post-processing steps to apply through the usage of a tailor.remove_tailor()
removes the tailor as well as any downstream objects that might get created after the tailor is used for post-processing, such as the fitted tailor.update_tailor()
first removes the tailor, then replaces the previous tailor with the new one.
Usage
add_tailor(x, tailor, prop = NULL, method = NULL, ...)
remove_tailor(x)
update_tailor(x, tailor, ...)
Arguments
- x
A workflow
- tailor
A tailor created using
tailor::tailor()
. The tailor should not have been trained already withtailor::fit()
; workflows will handle training internally.- prop
The proportion of the data in
fit.workflow()
that should be held back specifically for estimating the postprocessor. Only relevant for postprocessors that require estimation—see section Data Usage below to learn more. Defaults to 2/3.- method
The method with which to split the data in
fit.workflow()
, as a character vector. Only relevant for postprocessors that require estimation and not required when resampling the workflow with tune. Iffit.workflow(data)
arose astraining(split_object)
, this argument can usually be supplied asclass(split_object)
. Defaults to"mc_split"
, which randomly samplesfit.workflow(data)
into two sets, similarly torsample::initial_split()
. See section Data Usage below to learn more.- ...
Not used.
Data Usage
While preprocessors and models are trained on data in the usual sense,
postprocessors are training on predictions on data. When a workflow
is fitted, the user supplies training data with the data
argument.
When workflows don't contain a postprocessor that requires training,
they can use all of the supplied data
to train the preprocessor and model.
However, in the case where a postprocessor must be trained as well,
training the preprocessor and model on all of data
would leave no data
left to train the postprocessor with—if that were the case, workflows
would need to predict()
from the preprocessor and model on the same data
that they were trained on, with the postprocessor then training on those
predictions. Predictions on data that a model was trained on likely follow
different distributions than predictions on unseen data; thus, workflows must
split up the supplied data
into two training sets, where the first is used to
train the preprocessor and model and the second is passed to that trained
processor and model to generate predictions, which then form the training data
for the post-processor.
The arguments prop
and method
parameterize how that data is split up.
prop
determines the proportion of rows in fit.workflow(data)
that are
allotted to training the preprocessor and model, while the rest are used to
train the postprocessor. method
determines how that split occurs; since
fit.workflow()
just takes in a data frame, the function doesn't have
any information on how that dataset came to be. For example, data
could
have been created as:
...in which case it's okay to randomly allot some rows of data
to train the
preprocessor and model and the rest to train the postprocessor. However,
data
could also have arisen as:
boots <- rsample::bootstraps(some_other_data)
split <- rsample::get_rsplit(boots, 1)
data <- rsample::assessment(split)
In this case, some of the rows in data
will be duplicated. Thus, randomly
allotting some of them to train the preprocessor and model and others to train
the preprocessor would likely result in the same rows appearing in both
datasets, resulting in the preprocessor and model generating predictions on
rows they've seen before. Similarly problematic situations could arise in the
context of other resampling situations, like time-based splits.
The method
argument ensures that data is allotted properly (and is
internally handled by the tune package when resampling workflows).
Examples
library(tailor)
library(magrittr)
tailor <- tailor("binary")
tailor_1 <- adjust_probability_threshold(tailor, .1)
workflow <- workflow() %>%
add_tailor(tailor_1)
workflow
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
#> Postprocessor: tailor
#>
#> ── Postprocessor ─────────────────────────────────────────────────────────
#>
#> ── tailor ────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#>
#> • Adjust probability threshold to 0.1.
#> NA
#> NA
#> NA
remove_tailor(workflow)
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
update_tailor(workflow, adjust_probability_threshold(tailor, .2))
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
#> Postprocessor: tailor
#>
#> ── Postprocessor ─────────────────────────────────────────────────────────
#>
#> ── tailor ────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#>
#> • Adjust probability threshold to 0.2.
#> NA
#> NA
#> NA