API Reference

class recipies.ingredients.Ingredients(data: DataFrame | DataFrame = None, copy: bool = None, roles: dict = None, check_roles: bool = True, backend: Backend = None)[source]

Bases: object

Wrapper around either polars.DataFrame to store columns roles (e.g., predictor)

Due to the workings of polars, we do not subclass pl.dataframe anymore, but instead store the dataframe as an attribute.

Parameters:
roles: dict = None

roles of DataFrame columns as (list of) strings. Defaults to None.

check_roles: bool = True

If set to false, doesn’t check whether the roles match existing columns. Defaults to True.

See also: pandas.DataFrame

roles

dictionary of column roles

Type:

dict

__init__(data: DataFrame | DataFrame = None, copy: bool = None, roles: dict = None, check_roles: bool = True, backend: Backend = None)[source]
property columns
to_df(output_format=None) DataFrame[source]

Return the underlying DataFrame.

Returns:

Self as DataFrame.

add_role(column: str, new_role: str)[source]

Adds an additional role for a column that already has roles.

Parameters:
column: str

The column to receive additional roles.

new_role: str

The role to add to the column.

Raises:

RuntimeError – If the column has no role yet.

update_role(column: str, new_role: str, old_role: str = None)[source]

Adds a new role for a column without roles or changes an existing role to a different one.

Parameters:
column: str

The column to update the roles of.

new_role: str

The role to add or change to.

old_role: str = None

Defaults to None. The role to be changed.

Raises:

ValueError – If old_role is given but column has no roles. If old_role is given but column has no role old_role. If no old_role is given but column has multiple roles already.

select_dtypes(include=None)[source]
get_dtypes()[source]
get_str_dtypes()[source]

” Helper function for polar dataframes to return schema with dtypes as strings

get_schema()[source]
get_df()[source]
set_df(df)[source]
groupby(by)[source]
get_backend()[source]
class recipies.recipe.Recipe(data: Ingredients | DataFrame | DataFrame, outcomes: str | list[str] = None, predictors: str | list[str] = None, groups: str | list[str] = None, sequences: str | list[str] = None, backend: Backend = None)[source]

Bases: object

Recipe for preprocessing data

A Recipe object combines a pandas-like Ingredients object with one or more sklearn-inspired transformation Steps to turn into a model-ready input.

Parameters:
data: Ingredients | DataFrame | DataFrame

data to be preprocessed.

outcomes: str | list[str] = None

names of columns in data that are assigned the ‘outcome’ role

predictors: str | list[str] = None

names of columns in data that should be assigned the ‘predictor’ role

groups: str | list[str] = None

names of columns in data that should be assigned the ‘group’ role

sequence

names of columns in data that should be assigned the ‘sequence’ role

__init__(data: Ingredients | DataFrame | DataFrame, outcomes: str | list[str] = None, predictors: str | list[str] = None, groups: str | list[str] = None, sequences: str | list[str] = None, backend: Backend = None)[source]
roles = None
columns = None
add_roles(vars: str | list[str], new_role: str = 'predictor') Recipe[source]

Adds an additional role for one or more columns of the Recipe’s Ingredients.

Parameters:
vars: str | list[str]

The column to receive additional roles.

new_role: str = 'predictor'

Defaults to predictor. The role to add to the column.

See also

Ingredients.add_role()

Returns:

self

update_roles(vars: str | list[str], new_role: str = 'predictor', old_role: str = None) Recipe[source]

Adds a new role for one or more columns of the Recipe’s Ingredients without roles or changes an existing role to a different one.

Parameters:
vars: str | list[str]

The column to receive additional roles.

new_role: str = 'predictor'

Defaults to predictor’. The role to add or change to.

old_role: str = None

Defaults to None. The role to be changed.

See also

Ingredients.update_role()

Returns:

self

add_step(step: Step) Recipe[source]

Adds a new step to the Recipe

Parameters:
step: Step

a transformation step that should be applied to the Ingredients during prep() and bake()

Returns:

self

prep(data: DataFrame | DataFrame | Ingredients = None, refit: bool = False) DataFrame | DataFrame[source]

Fits and transforms, in other words preps, the data.

Parameters:
data: DataFrame | DataFrame | Ingredients = None

Data to fit and transform. Defaults to None.

refit: bool = False

Defaults to False. Whether to refit data.

Returns:

Transformed data.

bake(data: DataFrame | DataFrame | Ingredients = None) DataFrame | DataFrame[source]

Transforms, or bakes, the data if it has been prepped.

Parameters:
data: DataFrame | DataFrame | Ingredients = None

Data to transform. Defaults to None.

Returns:

Transformed data.

get_backend()[source]
cache()[source]

Prepares the recipe for caching

class recipies.step.Step(sel: ~recipies.selector.Selector = all predictors, supported_backends: list[~recipies.constants.Backend] = [Backend.POLARS, Backend.PANDAS])[source]

Bases: object

This class represents a step in a recipe.

Steps are transformations to be executed on selected columns of a DataFrame. They fit a transformer to the selected columns and afterwards transform the data with the fitted transformer.

Parameters:
sel

Object that holds information about the selected columns.

columns

List with the names of the selected columns.

__init__(sel: ~recipies.selector.Selector = all predictors, supported_backends: list[~recipies.constants.Backend] = [Backend.POLARS, Backend.PANDAS])[source]
property trained : bool
property group : bool
fit(data: Ingredients)[source]

This function fits the transformer to the data.

Parameters:
data: Ingredients

The DataFrame to fit to.

abstractmethod do_fit(data: Ingredients)[source]
transform(data: Ingredients) Ingredients[source]

This function transforms the data with the fitted transformer.

Parameters:
data: Ingredients

The DataFrame to transform.

Returns:

The transformed DataFrame.

fit_transform(data: Ingredients) Ingredients[source]
class recipies.step.StepImputeFill(sel=all predictors, value=None, strategy=None, limit=None)[source]

Bases: Step

For Pandas: uses pandas’ internal nafill function to replace missing values. See pandas.DataFrame.nafill for a description of the arguments.

__init__(sel=all predictors, value=None, strategy=None, limit=None)[source]
transform(data)[source]

This function transforms the data with the fitted transformer.

Parameters:
data

The DataFrame to transform.

Returns:

The transformed DataFrame.

class recipies.step.StepImputeFastZeroFill(sel=all predictors)[source]

Bases: Step

Quick variant of pandas’ internal nafill(value=0) for grouped dataframes.

__init__(sel=all predictors)[source]
transform(data)[source]

This function transforms the data with the fitted transformer.

Parameters:
data

The DataFrame to transform.

Returns:

The transformed DataFrame.

class recipies.step.StepImputeFastForwardFill(sel=all predictors)[source]

Bases: Step

Quick variant of pandas’ internal nafill(method=’ffill’) for grouped dataframes.

Note: this variant does not allow for setting a limit.

__init__(sel=all predictors)[source]
transform(data)[source]

This function transforms the data with the fitted transformer.

Parameters:
data

The DataFrame to transform.

Returns:

The transformed DataFrame.

class recipies.step.StepImputeModel(sel=all predictors, model=None)[source]

Bases: Step

Uses a pretrained imputation model to impute missing values. :param model: A function that takes a dataframe and the grouping columns as input and

returns a dataframe with imputed values without the grouping column.

__init__(sel=all predictors, model=None)[source]
transform(data)[source]

This function transforms the data with the fitted transformer.

Parameters:
data

The DataFrame to transform.

Returns:

The transformed DataFrame.

class recipies.step.Accumulator(*values)[source]

Bases: Enum

MAX = 'max'
MIN = 'min'
MEAN = 'mean'
MEDIAN = 'median'
COUNT = 'count'
VAR = 'var'
FIRST = 'first'
LAST = 'last'
class recipies.step.StepHistorical(sel: ~recipies.selector.Selector = all numeric predictors, fun: ~recipies.step.Accumulator = Accumulator.MAX, suffix: str = None, role: str = 'predictor')[source]

Bases: Step

This step generates columns with a historical accumulator provided by the user.

Parameters:
fun

Instance of the Accumulator enumerable that signifies which type of historical accumulation to use (default is MAX).

suffix

Defaults to none. Set the name to have the step generate new columns with this suffix instead of the default suffix.

role

Defaults to ‘predictor’. In case new columns are added, set their role to role.

__init__(sel: ~recipies.selector.Selector = all numeric predictors, fun: ~recipies.step.Accumulator = Accumulator.MAX, suffix: str = None, role: str = 'predictor')[source]
transform(data: Ingredients) Ingredients[source]
Raises:

TypeError – If the function is not of type Accumulator

class recipies.step.StepSklearn(sklearn_transformer: object, sel: ~recipies.selector.Selector = all predictors, columnwise: bool = False, in_place: bool = True, role: str = 'predictor')[source]

Bases: Step

This step takes a transformer from scikit-learn and makes it usable as a step in a recipe.

Parameters:
sklearn_transformer

Instance of scikit-learn transformer that implements fit() and transform().

columnwise

Defaults to False. Set to True to fit and transform the DF column by column.

in_place

Defaults to True. Set to False to have the step generate new columns instead of overwriting the existing ones.

role : str, optional

Defaults to ‘predictor’. Incase new columns are added, set their role to role.

__init__(sklearn_transformer: object, sel: ~recipies.selector.Selector = all predictors, columnwise: bool = False, in_place: bool = True, role: str = 'predictor')[source]
do_fit(data: Ingredients) Ingredients[source]
Raises:

ValueError – If the transformer expects a single column but gets multiple.

transform(data: Ingredients) Ingredients[source]
Raises:
  • TypeError – If the transformer returns a sparse matrix.

  • ValueError – If the transformer returns an unexpected amount of columns.

class recipies.step.StepResampling(new_resolution: str = '1h', accumulator_dict: ~typing.Dict[~recipies.selector.Selector, ~recipies.step.Accumulator] = {all predictors: Accumulator.LAST}, default_accumulator: ~recipies.step.Accumulator = Accumulator.LAST)[source]

Bases: Step

__init__(new_resolution: str = '1h', accumulator_dict: ~typing.Dict[~recipies.selector.Selector, ~recipies.step.Accumulator] = {all predictors: Accumulator.LAST}, default_accumulator: ~recipies.step.Accumulator = Accumulator.LAST)[source]

This class represents a resampling step in a recipe.

Parameters:
new_resolution

Resolution to resample to.

accumulator_dict

Supply dictionary with individual accumulation methods for each Selector.

default_accumulator

Accumulator to use for variables not supplied in dictionary.

do_fit(data: Ingredients)[source]
transform(data)[source]

This function transforms the data with the fitted transformer.

Parameters:
data

The DataFrame to transform.

Returns:

The transformed DataFrame.

class recipies.step.StepScale(sel=all numeric predictors, with_mean: bool = True, with_std: bool = True, *args, **kwargs)[source]

Bases: StepSklearn

Provides a wrapper for a scaling with StepSklearn. Note that because SKlearn transforms None (nulls) to NaN, we have to revert.

Parameters:
with_mean

Defaults to True. If True, center the data before scaling.

with_std

Defaults to True. If True, scale the data to unit variance (or equivalently, unit standard deviation).

in_place

Defaults to True. Set to False to have the step generate new columns instead of overwriting the existing ones.

role : str, optional

Defaults to ‘predictor’. Incase new columns are added, set their role to role.

__init__(sel=all numeric predictors, with_mean: bool = True, with_std: bool = True, *args, **kwargs)[source]
transform(data: Ingredients) Ingredients[source]
Raises:
  • TypeError – If the transformer returns a sparse matrix.

  • ValueError – If the transformer returns an unexpected amount of columns.

class recipies.step.StepFunction(sel: Selector, function)[source]

Bases: Step

Provides a wrapper for a simple transformation function, without fitting.

__init__(sel: Selector, function)[source]
transform(data: Ingredients) Ingredients[source]

This function transforms the data with the fitted transformer.

Parameters:
data: Ingredients

The DataFrame to transform.

Returns:

The transformed DataFrame.

class recipies.selector.Selector(description: str, names: str | list[str] = None, roles: str | list[str] = None, types: str | list[str] = None, pattern: Pattern = None)[source]

Bases: object

Class responsible for selecting the variables affected by a recipe step

Parameters:
description: str

Text used to represent Selector when printed in summaries

names: str | list[str] = None

Column names to select. Defaults to None.

roles: str | list[str] = None

Column roles to select, see also Ingredients. Defaults to None.

types: str | list[str] = None

Column data types to select. Defaults to None.

pattern: Pattern = None

Regex pattern to search column names with. Defaults to None.

__init__(description: str, names: str | list[str] = None, roles: str | list[str] = None, types: str | list[str] = None, pattern: Pattern = None)[source]
set_names(names: str | list[str])[source]

Set the column names to select with this Selector

Parameters:
names: str | list[str]

column names to select

set_roles(roles: str | list[str])[source]

Set the column roles to select with this Selector

Parameters:
roles: str | list[str]

column roles to select, see also Ingredients

set_types(roles: str | list[str])[source]

Set the column data types to select with this Selector

Parameters:
roles: str | list[str]

column data types to select

set_pattern(pattern: Pattern)[source]

Set the pattern to search with this Selector

Parameters:
pattern: Pattern

Regex pattern to search column names with.

recipies.selector.enlist_dt(x: DataType | list[DataType] | None) list[DataType] | None[source]

Wrap a pl datatype in a list if it isn’t a list yet

Parameters:
x: DataType | list[DataType] | None

object to wrap.

Raises:

TypeError – If neither a datatype nor a list of datatypes is passed

Returns:

_description_

recipies.selector.enlist_str(x: str | list[str] | None) list[str] | None[source]

Wrap a str in a list if it isn’t a list yet

Parameters:
x: str | list[str] | None

object to wrap.

Raises:

TypeError – If neither a str nor a list of strings is passed

Returns:

_description_

recipies.selector.intersection(x: list, y: list) list[source]

Intersection of two lists

Note

maintains the order of the first list does not deduplicate items (i.e., does not return a set)

Parameters:
x: list

first list

y: list

second list

Returns:

Elements in x that are also in y.

recipies.selector.all_of(names: str | list[str]) Selector[source]

Define selector for any columns with one of the given names

Parameters:
names: str | list[str]

names to select

Returns:

Object representing the selection rule.

recipies.selector.regex_names(regex: str) Selector[source]

Define selector for any columns where the name matches the regex pattern

Parameters:
pattern

string to be transformed to regex pattern to search for

Returns:

Object representing the selection rule.

recipies.selector.starts_with(prefix: str) Selector[source]

Define selector for any columns where the name starts with the prefix

Parameters:
prefix: str

prefix to search for

Returns:

Object representing the selection rule.

recipies.selector.ends_with(suffix: str) Selector[source]

Define selector for any columns where the name ends with the suffix

Parameters:
prsuffixefix

suffix to search for

Returns:

Object representing the selection rule.

recipies.selector.contains(substring: str) Selector[source]

Define selector for any columns where the name contains the substring

Parameters:
substring: str

substring to search for

Returns:

Object representing the selection rule.

recipies.selector.has_role(roles: str | list[str]) Selector[source]

Define selector for any columns with one of the given roles

Parameters:
roles: str | list[str]

roles to select

Returns:

Object representing the selection rule.

recipies.selector.has_type(types: str | list[str]) Selector[source]

Define selector for any columns with one of the given types

Parameters:
types: str | list[str]

data types to select

Note

Data types are selected based on string representation as returned by df[[varname]].dtype.name.

Returns:

Object representing the selection rule.

recipies.selector.all_predictors() Selector[source]

Define selector for all predictor columns

Returns:

Object representing the selection rule.

recipies.selector.all_numeric_predictors(backend=Backend.POLARS) Selector[source]

Define selector for all numerical predictor columns

Returns:

Object representing the selection rule.

recipies.selector.all_outcomes() Selector[source]

Define selector for all outcome columns

Returns:

Object representing the selection rule.

recipies.selector.all_groups() Selector[source]

Define selector for all grouping variables

Returns:

Object representing the selection rule.

recipies.selector.select_groups(ingr: Ingredients) list[str][source]

Select any grouping columns

Defines and directly applies Selector(roles=[“group”])

Returns:

grouping columns

recipies.selector.all_sequences() Selector[source]

Define selector for all grouping variables

Returns:

Object representing the selection rule.

recipies.selector.select_sequence(ingr: Ingredients) list[str][source]

Select any sequence columns

Defines and directly applies Selector(roles=[“sequence”])

Returns:

Grouping columns.