Computed Columns

Computed columns are the cornerstone of Pixeltable's power and flexibility. They transform your data, run your models, and orchestrate complex workflows – all within your table structure. This enables you to define and experiment with AI logic in an intuitive way, while Pixeltable handles the heavy lifting behind the scenes.

What Are Computed Columns?

Think of them as functions embedded within your table. You write the logic, and Pixeltable automatically applies it to your data, dynamically generating new results. You can think of this as similar to Excel formulas, but for your AI/ML projects, with the added benefits of lineage tracking, and versioning.

Why Use Computed Columns?

Computed columns represent transformations, feature engineering, model outputs, or any other calculation you need. Pixeltable automatically tracks their dependencies and updates them incrementally, saving you time and effort.

  • Declarative Logic: Shift your focus from "how to code the steps" to "what I want to compute." Pixeltable handles the execution details.
  • Unified Workflow: Keep your data transformations, feature engineering, model inference, and even custom logic centralized in a single declarative interface or table.
  • Automatic Updates: Change your input data, and the computed columns automatically recalculate the results. This enables efficient incremental updates and easy experimentation.
  • Transparent Lineage: Every computed column's history is tracked, allowing you to pinpoint exactly where data came from and how it was transformed.

Using Computed Columns

You can define computed columns using a concise, Python-like syntax. Here's how to create a simple column:

table["double_value"] = table.original_value * 2 

This creates a new column named double_value which contains twice the value of the original_value column for each row in the table.

Example: Image Transformations

image_table['resized'] = image_table.image_col.resize((224, 224)) # Resize to a specific size
image_table['cropped'] = image_table.image_col.crop(box=(100, 100, 200, 200)) # Crop a region of interest
image_table['flipped'] = image_table.image_col.flip_horizontal()  # Flip the image horizontally

Beyond the Basics

Computed columns support much more than simple calculations:

  • Custom Functions (User-Defined Functions (UDFs)): Write your own Python functions to encapsulate complex logic.
  • Library Integration: Seamlessly use libraries like OpenCV or Hugging Face within your computed columns, e.g. YOLOX
  • Error Handling: Pixeltable provides errortype and errormsg attributes to help you identify and debug any issues with your computations.

Tips and Tricks

  • Start Simple: Break down your workflow into smaller, reusable computed columns for improved clarity and flexibility.
  • Leverage the API: Pixeltable offers a powerful API for programmatically creating and managing computed columns, making automation easy.
  • Explore Built-in Functions: Pixeltable includes a library of common transformations and operations to help you get started quickly.

Computed columns are the foundation for building efficient, transparent, and reproducible AI workflows in Pixeltable. By focusing on declarative logic, you free up time to concentrate on your core AI expertise, whether it's building a new model or analyzing complex data relationships.

They are the building blocks for:

  • Data Transformation: Clean your data, engineer features, and apply complex transformations using built-in or custom functions.
  • Model Inference: Run your ML models directly on your data within the table, generating predictions and other outputs.
  • Orchestration: Control the flow of your AI pipeline by declaring dependencies between columns.
  • Declarative Syntax: Write your computed column logic in a clear, concise way using a syntax combining the best of Python and SQL.