ReciPies

ReciPies is a Python package for feature engineering and data preprocessing with a focus on medical and clinical data. It provides a unified interface for working with both Polars and Pandas DataFrames while maintaining column role information throughout data transformations.

Dual Backend Support: Seamlessly work with both Polars and Pandas DataFrames

Column Role Management: Track and maintain semantic roles of columns (e.g., patient_id, timestamp, features)

Medical Data Focus: Specialized tools for clinical and medical data preprocessing

Pipeline Architecture: Build reproducible data processing pipelines with Steps and Recipes

Type Safety: Strong typing support for better code reliability

Performance: Leverage the speed of Polars while maintaining Pandas compatibility

Install ReciPies using pip:
pip install recipies
For development installation:
git clone https://github.com/rvandewater/ReciPies.git
cd ReciPies
pip install -e .
Here’s a simple example of using ReciPies:
import polars as pl
from recipies import Ingredients, Recipe
from recipies.step import Step

# Create sample data
data = pl.DataFrame({
    "patient_id": [1, 1, 2, 2],
    "timestamp": ["2023-01-01", "2023-01-02", "2023-01-01", "2023-01-02"],
    "heart_rate": [72, 75, 68, 70],
    "blood_pressure": [120, 125, 110, 115]
})

# Define column roles
roles = {
    "patient_id": "patient_id",
    "timestamp": "timestamp",
    "heart_rate": "feature",
    "blood_pressure": "feature"
}

# Create Ingredients object
ingredients = Ingredients(data, roles=roles)

# Build a recipe with processing steps
recipe = Recipe()
recipe.add_step(Step("normalize_features"))

# Apply the recipe
processed_data = recipe.apply(ingredients)
Ingredients
A wrapper around DataFrames that maintains column role information, ensuring data semantics are preserved during transformations.

Recipe
A collection of processing steps that can be applied to Ingredients objects to create reproducible data pipelines.

Step
Individual data transformation operations that understand column roles and can work with both Polars and Pandas backends.

Selector
Utilities for selecting columns based on their roles or other criteria.

ReciPies supports both Polars and Pandas backends:

Polars: High-performance DataFrame library with lazy evaluation

Pandas: Traditional DataFrame library with extensive ecosystem support

The package automatically detects the backend and provides a consistent API regardless of the underlying DataFrame implementation.

Check out the examples/ directory for Jupyter notebooks demonstrating:

Basic usage and concepts

Medical data preprocessing workflows

Performance benchmarking between backends

Advanced pipeline construction

Contributions are welcome! Please see our contributing guidelines and open an issue or submit a pull request on the GitHub repository.

This project is licensed under the MIT License. See the LICENSE file for details.

Contents:

API Reference

Index

Module Index

Search Page