API Reference¶
-
class recipies.ingredients.Ingredients(data: DataFrame | DataFrame =
None, copy: bool =None, roles: dict =None, check_roles: bool =True, backend: Backend =None)[source]¶ Bases:
object- Wrapper around either polars.DataFrame to store columns roles (e.g., predictor)
Due to the workings of polars, we do not subclass pl.dataframe anymore, but instead store the dataframe as an attribute.
- Parameters:¶
See also: pandas.DataFrame
-
__init__(data: DataFrame | DataFrame =
None, copy: bool =None, roles: dict =None, check_roles: bool =True, backend: Backend =None)[source]¶
- property columns¶
-
to_df(output_format=
None) DataFrame[source]¶ Return the underlying DataFrame.
- Returns:¶
Self as DataFrame.
- add_role(column: str, new_role: str)[source]¶
Adds an additional role for a column that already has roles.
-
update_role(column: str, new_role: str, old_role: str =
None)[source]¶ Adds a new role for a column without roles or changes an existing role to a different one.
-
class recipies.recipe.Recipe(data: Ingredients | DataFrame | DataFrame, outcomes: str | list[str] =
None, predictors: str | list[str] =None, groups: str | list[str] =None, sequences: str | list[str] =None, backend: Backend =None)[source]¶ Bases:
objectRecipe for preprocessing data
A Recipe object combines a pandas-like Ingredients object with one or more sklearn-inspired transformation Steps to turn into a model-ready input.
- Parameters:¶
- data: Ingredients | DataFrame | DataFrame¶
data to be preprocessed.
- outcomes: str | list[str] =
None¶ names of columns in data that are assigned the ‘outcome’ role
- predictors: str | list[str] =
None¶ names of columns in data that should be assigned the ‘predictor’ role
- groups: str | list[str] =
None¶ names of columns in data that should be assigned the ‘group’ role
- sequence
names of columns in data that should be assigned the ‘sequence’ role
-
__init__(data: Ingredients | DataFrame | DataFrame, outcomes: str | list[str] =
None, predictors: str | list[str] =None, groups: str | list[str] =None, sequences: str | list[str] =None, backend: Backend =None)[source]¶
-
roles =
None¶
-
columns =
None¶
-
add_roles(vars: str | list[str], new_role: str =
'predictor') Recipe[source]¶ Adds an additional role for one or more columns of the Recipe’s Ingredients.
- Parameters:¶
See also
Ingredients.add_role()
- Returns:¶
self
-
update_roles(vars: str | list[str], new_role: str =
'predictor', old_role: str =None) Recipe[source]¶ Adds a new role for one or more columns of the Recipe’s Ingredients without roles or changes an existing role to a different one.
- Parameters:¶
See also
Ingredients.update_role()
- Returns:¶
self
-
prep(data: DataFrame | DataFrame | Ingredients =
None, refit: bool =False) DataFrame | DataFrame[source]¶ Fits and transforms, in other words preps, the data.
- Parameters:¶
- data: DataFrame | DataFrame | Ingredients =
None¶ Data to fit and transform. Defaults to None.
- refit: bool =
False¶ Defaults to False. Whether to refit data.
- data: DataFrame | DataFrame | Ingredients =
- Returns:¶
Transformed data.
-
bake(data: DataFrame | DataFrame | Ingredients =
None) DataFrame | DataFrame[source]¶ Transforms, or bakes, the data if it has been prepped.
- Parameters:¶
- data: DataFrame | DataFrame | Ingredients =
None¶ Data to transform. Defaults to None.
- data: DataFrame | DataFrame | Ingredients =
- Returns:¶
Transformed data.
- class recipies.step.Step(sel: ~recipies.selector.Selector = all predictors, supported_backends: list[~recipies.constants.Backend] = [Backend.POLARS, Backend.PANDAS])[source]¶
Bases:
objectThis class represents a step in a recipe.
Steps are transformations to be executed on selected columns of a DataFrame. They fit a transformer to the selected columns and afterwards transform the data with the fitted transformer.
- Parameters:¶
- sel
Object that holds information about the selected columns.
- columns¶
List with the names of the selected columns.
- __init__(sel: ~recipies.selector.Selector = all predictors, supported_backends: list[~recipies.constants.Backend] = [Backend.POLARS, Backend.PANDAS])[source]¶
- property trained : bool¶
- property group : bool¶
- fit(data: Ingredients)[source]¶
This function fits the transformer to the data.
- Parameters:¶
- data: Ingredients¶
The DataFrame to fit to.
- abstractmethod do_fit(data: Ingredients)[source]¶
- transform(data: Ingredients) Ingredients[source]¶
This function transforms the data with the fitted transformer.
- Parameters:¶
- data: Ingredients¶
The DataFrame to transform.
- Returns:¶
The transformed DataFrame.
- fit_transform(data: Ingredients) Ingredients[source]¶
- class recipies.step.StepImputeFill(sel=all predictors, value=None, strategy=None, limit=None)[source]¶
Bases:
StepFor Pandas: uses pandas’ internal nafill function to replace missing values. See pandas.DataFrame.nafill for a description of the arguments.
- __init__(sel=all predictors, value=None, strategy=None, limit=None)[source]¶
- class recipies.step.StepImputeFastZeroFill(sel=all predictors)[source]¶
Bases:
StepQuick variant of pandas’ internal nafill(value=0) for grouped dataframes.
- __init__(sel=all predictors)[source]¶
- class recipies.step.StepImputeFastForwardFill(sel=all predictors)[source]¶
Bases:
StepQuick variant of pandas’ internal nafill(method=’ffill’) for grouped dataframes.
Note: this variant does not allow for setting a limit.
- __init__(sel=all predictors)[source]¶
- class recipies.step.StepImputeModel(sel=all predictors, model=None)[source]¶
Bases:
StepUses a pretrained imputation model to impute missing values. :param model: A function that takes a dataframe and the grouping columns as input and
returns a dataframe with imputed values without the grouping column.
- __init__(sel=all predictors, model=None)[source]¶
- class recipies.step.Accumulator(*values)[source]¶
Bases:
Enum-
MAX =
'max'¶
-
MIN =
'min'¶
-
MEAN =
'mean'¶
-
MEDIAN =
'median'¶
-
COUNT =
'count'¶
-
VAR =
'var'¶
-
FIRST =
'first'¶
-
LAST =
'last'¶
-
MAX =
- class recipies.step.StepHistorical(sel: ~recipies.selector.Selector = all numeric predictors, fun: ~recipies.step.Accumulator = Accumulator.MAX, suffix: str = None, role: str = 'predictor')[source]¶
Bases:
StepThis step generates columns with a historical accumulator provided by the user.
- Parameters:¶
- fun
Instance of the Accumulator enumerable that signifies which type of historical accumulation to use (default is MAX).
- suffix
Defaults to none. Set the name to have the step generate new columns with this suffix instead of the default suffix.
- role
Defaults to ‘predictor’. In case new columns are added, set their role to role.
- __init__(sel: ~recipies.selector.Selector = all numeric predictors, fun: ~recipies.step.Accumulator = Accumulator.MAX, suffix: str = None, role: str = 'predictor')[source]¶
- transform(data: Ingredients) Ingredients[source]¶
- Raises:¶
TypeError – If the function is not of type Accumulator
- class recipies.step.StepSklearn(sklearn_transformer: object, sel: ~recipies.selector.Selector = all predictors, columnwise: bool = False, in_place: bool = True, role: str = 'predictor')[source]¶
Bases:
StepThis step takes a transformer from scikit-learn and makes it usable as a step in a recipe.
- Parameters:¶
- sklearn_transformer
Instance of scikit-learn transformer that implements fit() and transform().
- columnwise
Defaults to False. Set to True to fit and transform the DF column by column.
- in_place
Defaults to True. Set to False to have the step generate new columns instead of overwriting the existing ones.
- role : str, optional
Defaults to ‘predictor’. Incase new columns are added, set their role to role.
- __init__(sklearn_transformer: object, sel: ~recipies.selector.Selector = all predictors, columnwise: bool = False, in_place: bool = True, role: str = 'predictor')[source]¶
- do_fit(data: Ingredients) Ingredients[source]¶
- Raises:¶
ValueError – If the transformer expects a single column but gets multiple.
- transform(data: Ingredients) Ingredients[source]¶
- Raises:¶
TypeError – If the transformer returns a sparse matrix.
ValueError – If the transformer returns an unexpected amount of columns.
- class recipies.step.StepResampling(new_resolution: str = '1h', accumulator_dict: ~typing.Dict[~recipies.selector.Selector, ~recipies.step.Accumulator] = {all predictors: Accumulator.LAST}, default_accumulator: ~recipies.step.Accumulator = Accumulator.LAST)[source]¶
Bases:
Step- __init__(new_resolution: str = '1h', accumulator_dict: ~typing.Dict[~recipies.selector.Selector, ~recipies.step.Accumulator] = {all predictors: Accumulator.LAST}, default_accumulator: ~recipies.step.Accumulator = Accumulator.LAST)[source]¶
This class represents a resampling step in a recipe.
- Parameters:¶
- new_resolution
Resolution to resample to.
- accumulator_dict
Supply dictionary with individual accumulation methods for each Selector.
- default_accumulator
Accumulator to use for variables not supplied in dictionary.
- do_fit(data: Ingredients)[source]¶
- class recipies.step.StepScale(sel=all numeric predictors, with_mean: bool = True, with_std: bool = True, *args, **kwargs)[source]¶
Bases:
StepSklearnProvides a wrapper for a scaling with StepSklearn. Note that because SKlearn transforms None (nulls) to NaN, we have to revert.
- Parameters:¶
- with_mean
Defaults to True. If True, center the data before scaling.
- with_std
Defaults to True. If True, scale the data to unit variance (or equivalently, unit standard deviation).
- in_place
Defaults to True. Set to False to have the step generate new columns instead of overwriting the existing ones.
- role : str, optional
Defaults to ‘predictor’. Incase new columns are added, set their role to role.
- __init__(sel=all numeric predictors, with_mean: bool = True, with_std: bool = True, *args, **kwargs)[source]¶
- transform(data: Ingredients) Ingredients[source]¶
- Raises:¶
TypeError – If the transformer returns a sparse matrix.
ValueError – If the transformer returns an unexpected amount of columns.
- class recipies.step.StepFunction(sel: Selector, function)[source]¶
Bases:
StepProvides a wrapper for a simple transformation function, without fitting.
- transform(data: Ingredients) Ingredients[source]¶
This function transforms the data with the fitted transformer.
- Parameters:¶
- data: Ingredients¶
The DataFrame to transform.
- Returns:¶
The transformed DataFrame.
-
class recipies.selector.Selector(description: str, names: str | list[str] =
None, roles: str | list[str] =None, types: str | list[str] =None, pattern: Pattern =None)[source]¶ Bases:
objectClass responsible for selecting the variables affected by a recipe step
- Parameters:¶
- description: str¶
Text used to represent Selector when printed in summaries
- names: str | list[str] =
None¶ Column names to select. Defaults to None.
- roles: str | list[str] =
None¶ Column roles to select, see also Ingredients. Defaults to None.
- types: str | list[str] =
None¶ Column data types to select. Defaults to None.
- pattern: Pattern =
None¶ Regex pattern to search column names with. Defaults to None.
-
__init__(description: str, names: str | list[str] =
None, roles: str | list[str] =None, types: str | list[str] =None, pattern: Pattern =None)[source]¶
- recipies.selector.enlist_dt(x: DataType | list[DataType] | None) list[DataType] | None[source]¶
Wrap a pl datatype in a list if it isn’t a list yet
- recipies.selector.enlist_str(x: str | list[str] | None) list[str] | None[source]¶
Wrap a str in a list if it isn’t a list yet
- recipies.selector.intersection(x: list, y: list) list[source]¶
Intersection of two lists
Note
maintains the order of the first list does not deduplicate items (i.e., does not return a set)
- recipies.selector.all_of(names: str | list[str]) Selector[source]¶
Define selector for any columns with one of the given names
- recipies.selector.regex_names(regex: str) Selector[source]¶
Define selector for any columns where the name matches the regex pattern
- recipies.selector.starts_with(prefix: str) Selector[source]¶
Define selector for any columns where the name starts with the prefix
- recipies.selector.ends_with(suffix: str) Selector[source]¶
Define selector for any columns where the name ends with the suffix
- recipies.selector.contains(substring: str) Selector[source]¶
Define selector for any columns where the name contains the substring
- recipies.selector.has_role(roles: str | list[str]) Selector[source]¶
Define selector for any columns with one of the given roles
- recipies.selector.has_type(types: str | list[str]) Selector[source]¶
Define selector for any columns with one of the given types
Note
Data types are selected based on string representation as returned by df[[varname]].dtype.name.
- Returns:¶
Object representing the selection rule.
- recipies.selector.all_predictors() Selector[source]¶
Define selector for all predictor columns
- Returns:¶
Object representing the selection rule.
-
recipies.selector.all_numeric_predictors(backend=
Backend.POLARS) Selector[source]¶ Define selector for all numerical predictor columns
- Returns:¶
Object representing the selection rule.
- recipies.selector.all_outcomes() Selector[source]¶
Define selector for all outcome columns
- Returns:¶
Object representing the selection rule.
- recipies.selector.all_groups() Selector[source]¶
Define selector for all grouping variables
- Returns:¶
Object representing the selection rule.
- recipies.selector.select_groups(ingr: Ingredients) list[str][source]¶
Select any grouping columns
Defines and directly applies Selector(roles=[“group”])
- Returns:¶
grouping columns
- recipies.selector.all_sequences() Selector[source]¶
Define selector for all grouping variables
- Returns:¶
Object representing the selection rule.
- recipies.selector.select_sequence(ingr: Ingredients) list[str][source]¶
Select any sequence columns
Defines and directly applies Selector(roles=[“sequence”])
- Returns:¶
Grouping columns.