ReciPies

ReciPies is a Python package for feature engineering and data preprocessing with a focus on medical and clinical data. It provides a unified interface for working with both Polars and Pandas DataFrames while maintaining column role information throughout data transformations.
Dual Backend Support: Seamlessly work with both Polars and Pandas DataFrames
Column Role Management: Track and maintain semantic roles of columns (e.g., patient_id, timestamp, features)
Medical Data Focus: Specialized tools for clinical and medical data preprocessing
Pipeline Architecture: Build reproducible data processing pipelines with Steps and Recipes
Type Safety: Strong typing support for better code reliability
Performance: Leverage the speed of Polars while maintaining Pandas compatibility
Install ReciPies using pip:
pip install recipiesFor development installation:
git clone https://github.com/rvandewater/ReciPies.git cd ReciPies pip install -e .Here’s a simple example of using ReciPies:
import polars as pl from recipies import Ingredients, Recipe from recipies.step import Step # Create sample data data = pl.DataFrame({ "patient_id": [1, 1, 2, 2], "timestamp": ["2023-01-01", "2023-01-02", "2023-01-01", "2023-01-02"], "heart_rate": [72, 75, 68, 70], "blood_pressure": [120, 125, 110, 115] }) # Define column roles roles = { "patient_id": "patient_id", "timestamp": "timestamp", "heart_rate": "feature", "blood_pressure": "feature" } # Create Ingredients object ingredients = Ingredients(data, roles=roles) # Build a recipe with processing steps recipe = Recipe() recipe.add_step(Step("normalize_features")) # Apply the recipe processed_data = recipe.apply(ingredients)
- Ingredients
A wrapper around DataFrames that maintains column role information, ensuring data semantics are preserved during transformations.
- Recipe
A collection of processing steps that can be applied to Ingredients objects to create reproducible data pipelines.
- Step
Individual data transformation operations that understand column roles and can work with both Polars and Pandas backends.
- Selector
Utilities for selecting columns based on their roles or other criteria.
ReciPies supports both Polars and Pandas backends:
Polars: High-performance DataFrame library with lazy evaluation
Pandas: Traditional DataFrame library with extensive ecosystem support
The package automatically detects the backend and provides a consistent API regardless of the underlying DataFrame implementation.
Check out the examples/ directory for Jupyter notebooks demonstrating:
Basic usage and concepts
Medical data preprocessing workflows
Performance benchmarking between backends
Advanced pipeline construction
Contributions are welcome! Please see our contributing guidelines and open an issue or submit a pull request on the GitHub repository.
This project is licensed under the MIT License. See the LICENSE file for details.
Contents: