ReciPies

PyPI Version Code Coverage Tests Python Versions License ReciPies Logo

ReciPies is a Python package for feature engineering and data preprocessing with a focus on medical and clinical data. It provides a unified interface for working with both Polars and Pandas DataFrames while maintaining column role information throughout data transformations.

  • Dual Backend Support: Seamlessly work with both Polars and Pandas DataFrames

  • Column Role Management: Track and maintain semantic roles of columns (e.g., patient_id, timestamp, features)

  • Medical Data Focus: Specialized tools for clinical and medical data preprocessing

  • Pipeline Architecture: Build reproducible data processing pipelines with Steps and Recipes

  • Type Safety: Strong typing support for better code reliability

  • Performance: Leverage the speed of Polars while maintaining Pandas compatibility

Install ReciPies using pip:

pip install recipies

For development installation:

git clone https://github.com/rvandewater/ReciPies.git
cd ReciPies
pip install -e .

Here’s a simple example of using ReciPies:

import polars as pl
from recipies import Ingredients, Recipe
from recipies.step import Step

# Create sample data
data = pl.DataFrame({
    "patient_id": [1, 1, 2, 2],
    "timestamp": ["2023-01-01", "2023-01-02", "2023-01-01", "2023-01-02"],
    "heart_rate": [72, 75, 68, 70],
    "blood_pressure": [120, 125, 110, 115]
})

# Define column roles
roles = {
    "patient_id": "patient_id",
    "timestamp": "timestamp",
    "heart_rate": "feature",
    "blood_pressure": "feature"
}

# Create Ingredients object
ingredients = Ingredients(data, roles=roles)

# Build a recipe with processing steps
recipe = Recipe()
recipe.add_step(Step("normalize_features"))

# Apply the recipe
processed_data = recipe.apply(ingredients)
Ingredients

A wrapper around DataFrames that maintains column role information, ensuring data semantics are preserved during transformations.

Recipe

A collection of processing steps that can be applied to Ingredients objects to create reproducible data pipelines.

Step

Individual data transformation operations that understand column roles and can work with both Polars and Pandas backends.

Selector

Utilities for selecting columns based on their roles or other criteria.

ReciPies supports both Polars and Pandas backends:

  • Polars: High-performance DataFrame library with lazy evaluation

  • Pandas: Traditional DataFrame library with extensive ecosystem support

The package automatically detects the backend and provides a consistent API regardless of the underlying DataFrame implementation.

Check out the examples/ directory for Jupyter notebooks demonstrating:

  • Basic usage and concepts

  • Medical data preprocessing workflows

  • Performance benchmarking between backends

  • Advanced pipeline construction

Contributions are welcome! Please see our contributing guidelines and open an issue or submit a pull request on the GitHub repository.

This project is licensed under the MIT License. See the LICENSE file for details.

Contents: