API Reference¶

class recipies.ingredients.Ingredients(data: DataFrame | DataFrame = None, copy: bool = None, roles: dict = None, check_roles: bool = True, backend: Backend = None)[source]¶

Bases: object

Wrapper around either polars.DataFrame to store columns roles (e.g., predictor): Due to the workings of polars, we do not subclass pl.dataframe anymore, but instead store the dataframe as an attribute.

Parameters:¶

roles: dict = None¶: roles of DataFrame columns as (list of) strings. Defaults to None.
check_roles: bool = True¶: If set to false, doesn’t check whether the roles match existing columns. Defaults to True.

See also

Ingredients.update_role()

Returns:¶: self

add_step(step: Step) → Recipe[source]¶

Adds a new step to the Recipe

Parameters:¶

step: Step¶: a transformation step that should be applied to the Ingredients during prep() and bake()

Returns:¶

self

prep(data: DataFrame | DataFrame | Ingredients = None, refit: bool = False) → DataFrame | DataFrame[source]¶

Fits and transforms, in other words preps, the data.

Parameters:¶

data: DataFrame | DataFrame | Ingredients = None¶: Data to fit and transform. Defaults to None.
refit: bool = False¶: Defaults to False. Whether to refit data.

Returns:¶

Transformed data.

bake(data: DataFrame | DataFrame | Ingredients = None) → DataFrame | DataFrame[source]¶

Transforms, or bakes, the data if it has been prepped.

Parameters:¶

data: DataFrame | DataFrame | Ingredients = None¶: Data to transform. Defaults to None.

Returns:¶

Transformed data.

get_backend()[source]¶

cache()[source]¶: Prepares the recipe for caching

class recipies.step.Step(sel: ~recipies.selector.Selector = all predictors, supported_backends: list[~recipies.constants.Backend] = [Backend.POLARS, Backend.PANDAS])[source]¶

Bases: object

This class represents a step in a recipe.

Steps are transformations to be executed on selected columns of a DataFrame. They fit a transformer to the selected columns and afterwards transform the data with the fitted transformer.

Parameters:¶

sel: Object that holds information about the selected columns.

columns¶: List with the names of the selected columns.

__init__(sel: ~recipies.selector.Selector = all predictors, supported_backends: list[~recipies.constants.Backend] = [Backend.POLARS, Backend.PANDAS])[source]¶

property trained : bool¶

property group : bool¶

fit(data: Ingredients)[source]¶

This function fits the transformer to the data.

Parameters:¶

data: Ingredients¶: The DataFrame to fit to.

abstractmethod do_fit(data: Ingredients)[source]¶

transform(data: Ingredients) → Ingredients[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data: Ingredients¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

fit_transform(data: Ingredients) → Ingredients[source]¶

class recipies.step.StepImputeFill(sel=all predictors, value=None, strategy=None, limit=None)[source]¶

Bases: Step

For Pandas: uses pandas’ internal nafill function to replace missing values. See pandas.DataFrame.nafill for a description of the arguments.

__init__(sel=all predictors, value=None, strategy=None, limit=None)[source]¶

transform(data)[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

class recipies.step.StepImputeFastZeroFill(sel=all predictors)[source]¶

Bases: Step

Quick variant of pandas’ internal nafill(value=0) for grouped dataframes.

__init__(sel=all predictors)[source]¶

transform(data)[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

class recipies.step.StepImputeFastForwardFill(sel=all predictors)[source]¶

Bases: Step

Quick variant of pandas’ internal nafill(method=’ffill’) for grouped dataframes.

Note: this variant does not allow for setting a limit.

__init__(sel=all predictors)[source]¶

transform(data)[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

class recipies.step.StepImputeModel(sel=all predictors, model=None)[source]¶

Bases: Step

Uses a pretrained imputation model to impute missing values. :param model: A function that takes a dataframe and the grouping columns as input and

returns a dataframe with imputed values without the grouping column.

__init__(sel=all predictors, model=None)[source]¶

transform(data)[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

class recipies.step.Accumulator(*values)[source]¶

Bases: Enum

MAX = 'max'¶

MIN = 'min'¶

MEAN = 'mean'¶

MEDIAN = 'median'¶

COUNT = 'count'¶

VAR = 'var'¶

FIRST = 'first'¶

LAST = 'last'¶

class recipies.step.StepHistorical(sel: ~recipies.selector.Selector = all numeric predictors, fun: ~recipies.step.Accumulator = Accumulator.MAX, suffix: str = None, role: str = 'predictor')[source]¶

Bases: Step

This step generates columns with a historical accumulator provided by the user.

Parameters:¶

fun: Instance of the Accumulator enumerable that signifies which type of historical accumulation to use (default is MAX).
suffix: Defaults to none. Set the name to have the step generate new columns with this suffix instead of the default suffix.
role: Defaults to ‘predictor’. In case new columns are added, set their role to role.

__init__(sel: ~recipies.selector.Selector = all numeric predictors, fun: ~recipies.step.Accumulator = Accumulator.MAX, suffix: str = None, role: str = 'predictor')[source]¶

transform(data: Ingredients) → Ingredients[source]¶

Raises:¶: TypeError – If the function is not of type Accumulator

class recipies.step.StepSklearn(sklearn_transformer: object, sel: ~recipies.selector.Selector = all predictors, columnwise: bool = False, in_place: bool = True, role: str = 'predictor')[source]¶

Bases: Step

This step takes a transformer from scikit-learn and makes it usable as a step in a recipe.

Parameters:¶

sklearn_transformer: Instance of scikit-learn transformer that implements fit() and transform().
columnwise: Defaults to False. Set to True to fit and transform the DF column by column.
in_place: Defaults to True. Set to False to have the step generate new columns instead of overwriting the existing ones.
role : str, optional: Defaults to ‘predictor’. Incase new columns are added, set their role to role.

__init__(sklearn_transformer: object, sel: ~recipies.selector.Selector = all predictors, columnwise: bool = False, in_place: bool = True, role: str = 'predictor')[source]¶

do_fit(data: Ingredients) → Ingredients[source]¶

Raises:¶: ValueError – If the transformer expects a single column but gets multiple.

transform(data: Ingredients) → Ingredients[source]¶

Raises:¶

TypeError – If the transformer returns a sparse matrix.
ValueError – If the transformer returns an unexpected amount of columns.

class recipies.step.StepResampling(new_resolution: str = '1h', accumulator_dict: ~typing.Dict[~recipies.selector.Selector, ~recipies.step.Accumulator] = {all predictors: Accumulator.LAST}, default_accumulator: ~recipies.step.Accumulator = Accumulator.LAST)[source]¶

Bases: Step

__init__(new_resolution: str = '1h', accumulator_dict: ~typing.Dict[~recipies.selector.Selector, ~recipies.step.Accumulator] = {all predictors: Accumulator.LAST}, default_accumulator: ~recipies.step.Accumulator = Accumulator.LAST)[source]¶

This class represents a resampling step in a recipe.

Parameters:¶

new_resolution: Resolution to resample to.
accumulator_dict: Supply dictionary with individual accumulation methods for each Selector.
default_accumulator: Accumulator to use for variables not supplied in dictionary.

do_fit(data: Ingredients)[source]¶

transform(data)[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

class recipies.step.StepScale(sel=all numeric predictors, with_mean: bool = True, with_std: bool = True, *args, **kwargs)[source]¶

Bases: StepSklearn

Provides a wrapper for a scaling with StepSklearn. Note that because SKlearn transforms None (nulls) to NaN, we have to revert.

Parameters:¶

with_mean: Defaults to True. If True, center the data before scaling.
with_std: Defaults to True. If True, scale the data to unit variance (or equivalently, unit standard deviation).
in_place: Defaults to True. Set to False to have the step generate new columns instead of overwriting the existing ones.
role : str, optional: Defaults to ‘predictor’. Incase new columns are added, set their role to role.

__init__(sel=all numeric predictors, with_mean: bool = True, with_std: bool = True, *args, **kwargs)[source]¶

transform(data: Ingredients) → Ingredients[source]¶

Raises:¶

TypeError – If the transformer returns a sparse matrix.
ValueError – If the transformer returns an unexpected amount of columns.

class recipies.step.StepFunction(sel: Selector, function)[source]¶

Bases: Step

Provides a wrapper for a simple transformation function, without fitting.

__init__(sel: Selector, function)[source]¶

transform(data: Ingredients) → Ingredients[source]¶

This function transforms the data with the fitted transformer.

Parameters:¶

data: Ingredients¶: The DataFrame to transform.

Returns:¶

The transformed DataFrame.

class recipies.selector.Selector(description: str, names: str | list[str] = None, roles: str | list[str] = None, types: str | list[str] = None, pattern: Pattern = None)[source]¶

Bases: object

Class responsible for selecting the variables affected by a recipe step

Parameters:¶

description: str¶: Text used to represent Selector when printed in summaries
names: str | list[str] = None¶: Column names to select. Defaults to None.
roles: str | list[str] = None¶: Column roles to select, see also Ingredients. Defaults to None.
types: str | list[str] = None¶: Column data types to select. Defaults to None.
pattern: Pattern = None¶: Regex pattern to search column names with. Defaults to None.

__init__(description: str, names: str | list[str] = None, roles: str | list[str] = None, types: str | list[str] = None, pattern: Pattern = None)[source]¶

set_names(names: str | list[str])[source]¶

Set the column names to select with this Selector

Parameters:¶

names: str | list[str]¶: column names to select

set_roles(roles: str | list[str])[source]¶

Set the column roles to select with this Selector

Parameters:¶

roles: str | list[str]¶: column roles to select, see also Ingredients

set_types(roles: str | list[str])[source]¶

Set the column data types to select with this Selector

Parameters:¶

roles: str | list[str]¶: column data types to select

set_pattern(pattern: Pattern)[source]¶

Set the pattern to search with this Selector

Parameters:¶

pattern: Pattern¶: Regex pattern to search column names with.

recipies.selector.enlist_dt(x: DataType | list[DataType] | None) → list[DataType] | None[source]¶

Wrap a pl datatype in a list if it isn’t a list yet

Parameters:¶

x: DataType | list[DataType] | None¶: object to wrap.

Raises:¶

TypeError – If neither a datatype nor a list of datatypes is passed

Returns:¶

_description_

recipies.selector.enlist_str(x: str | list[str] | None) → list[str] | None[source]¶

Wrap a str in a list if it isn’t a list yet

Parameters:¶

x: str | list[str] | None¶: object to wrap.

Raises:¶

TypeError – If neither a str nor a list of strings is passed

Returns:¶

_description_

recipies.selector.intersection(x: list, y: list) → list[source]¶

Intersection of two lists

Note

maintains the order of the first list does not deduplicate items (i.e., does not return a set)

Parameters:¶

x: list¶: first list
y: list¶: second list

Returns:¶

Elements in x that are also in y.

recipies.selector.all_of(names: str | list[str]) → Selector[source]¶

Define selector for any columns with one of the given names

Parameters:¶

names: str | list[str]¶: names to select

Returns:¶

Object representing the selection rule.

recipies.selector.regex_names(regex: str) → Selector[source]¶

Define selector for any columns where the name matches the regex pattern

Parameters:¶

pattern: string to be transformed to regex pattern to search for

Returns:¶

Object representing the selection rule.

recipies.selector.starts_with(prefix: str) → Selector[source]¶

Define selector for any columns where the name starts with the prefix

Parameters:¶

prefix: str¶: prefix to search for

Returns:¶

Object representing the selection rule.

recipies.selector.ends_with(suffix: str) → Selector[source]¶

Define selector for any columns where the name ends with the suffix

Parameters:¶

prsuffixefix: suffix to search for

Returns:¶

Object representing the selection rule.

recipies.selector.contains(substring: str) → Selector[source]¶

Define selector for any columns where the name contains the substring

Parameters:¶

substring: str¶: substring to search for

Returns:¶

Object representing the selection rule.

recipies.selector.has_role(roles: str | list[str]) → Selector[source]¶

Define selector for any columns with one of the given roles

Parameters:¶

roles: str | list[str]¶: roles to select

Returns:¶

Object representing the selection rule.

recipies.selector.has_type(types: str | list[str]) → Selector[source]¶

Define selector for any columns with one of the given types

Parameters:¶

types: str | list[str]¶: data types to select

Note

Data types are selected based on string representation as returned by df[[varname]].dtype.name.

Returns:¶: Object representing the selection rule.

recipies.selector.all_predictors() → Selector[source]¶

Define selector for all predictor columns

Returns:¶: Object representing the selection rule.

recipies.selector.all_numeric_predictors(backend=Backend.POLARS) → Selector[source]¶

Define selector for all numerical predictor columns

Returns:¶: Object representing the selection rule.

recipies.selector.all_outcomes() → Selector[source]¶

Define selector for all outcome columns

Returns:¶: Object representing the selection rule.

recipies.selector.all_groups() → Selector[source]¶

Define selector for all grouping variables

Returns:¶: Object representing the selection rule.

recipies.selector.select_groups(ingr: Ingredients) → list[str][source]¶

Select any grouping columns

Defines and directly applies Selector(roles=[“group”])

Returns:¶: grouping columns

recipies.selector.all_sequences() → Selector[source]¶

Define selector for all grouping variables

Returns:¶: Object representing the selection rule.

recipies.selector.select_sequence(ingr: Ingredients) → list[str][source]¶

Select any sequence columns

Defines and directly applies Selector(roles=[“sequence”])

Returns:¶: Grouping columns.