add_tailor()
specifies post-processing steps to apply through the usage of a tailor.remove_tailor()
removes the tailor as well as any downstream objects that might get created after the tailor is used for post-processing, such as the fitted tailor.update_tailor()
first removes the tailor, then replaces the previous tailor with the new one.
Arguments
- x
A workflow
- tailor
A tailor created using
tailor::tailor()
. The tailor should not have been trained already withtailor::fit()
; workflows will handle training internally.- ...
Not used.
Data Usage
While preprocessors and models are trained on data in the usual sense,
postprocessors are training on predictions on data. When a workflow
is fitted, the user typically supplies training data with the data
argument.
When workflows don't contain a postprocessor that requires training,
users can pass all of the available data to the data
argument to train the
preprocessor and model. However, in the case where a postprocessor must be
trained as well, allotting all of the available data to the data
argument
to train the preprocessor and model would leave no data
to train the postprocessor with—if that were the case, workflows
would need to predict()
from the preprocessor and model on the same data
that they were trained on, with the postprocessor then training on those
predictions. Predictions on data that a model was trained on likely follow
different distributions than predictions on unseen data; thus, workflows must
split up the supplied data
into two training sets, where the first is used to
train the preprocessor and model and the second, called the "calibration set,"
is passed to that trained postprocessor and model to generate predictions,
which then form the training data for the postprocessor.
When fitting a workflow with a postprocessor that requires training
(i.e. one that returns TRUE
in .workflow_includes_calibration(workflow)
), users
must pass two data arguments–the usual fit.workflow(data)
will be used
to train the preprocessor and model while fit.workflow(calibration)
will
be used to train the postprocessor.
In some situations, randomly splitting fit.workflow(data)
(with
rsample::initial_split()
, for example) is sufficient to prevent data
leakage. However, fit.workflow(data)
could also have arisen as:
boots <- rsample::bootstraps(some_other_data)
split <- rsample::get_rsplit(boots, 1)
data <- rsample::analysis(split)
In this case, some of the rows in data
will be duplicated. Thus, randomly
allotting some of them to train the preprocessor and model and others to train
the preprocessor would likely result in the same rows appearing in both
datasets, resulting in the preprocessor and model generating predictions on
rows they've seen before. Similarly problematic situations could arise in the
context of other resampling situations, like time-based splits.
In general, use the rsample::inner_split()
function to prevent data
leakage when resampling; when workflows with postprocessors that require
training are passed to the tune package, this is handled internally.
Examples
library(tailor)
library(magrittr)
tailor <- tailor()
tailor_1 <- adjust_probability_threshold(tailor, .1)
workflow <- workflow() %>%
add_tailor(tailor_1)
workflow
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
#> Postprocessor: tailor
#>
#> ── Postprocessor ─────────────────────────────────────────────────────────
#>
#> ── tailor ────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#>
#> • Adjust probability threshold to 0.1.
#> NA
#> NA
#> NA
remove_tailor(workflow)
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
update_tailor(workflow, adjust_probability_threshold(tailor, .2))
#> ══ Workflow ══════════════════════════════════════════════════════════════
#> Preprocessor: None
#> Model: None
#> Postprocessor: tailor
#>
#> ── Postprocessor ─────────────────────────────────────────────────────────
#>
#> ── tailor ────────────────────────────────────────────────────────────────
#> A binary postprocessor with 1 adjustment:
#>
#> • Adjust probability threshold to 0.2.
#> NA
#> NA
#> NA