Overview

The Open Source Python library that lets AI engineers and data scientists focus on exploration, modeling, and app development without dealing with the customary data plumbing.

Unifying Data, Models, and Orchestration for AI Products

Pixeltable unifies your AI workflow with a declarative, data-centric platform. Store, transform, index, and iterate on your data within the same table interface, whether it's text, images, embeddings, or even video. Built-in lineage and versioning ensure transparency and reproducibility, while the development-to-production mirror streamlines deployment.

Transformations, model inference, and custom logic are embedded as computed columns.

Transformations, model inference, and custom logic are embedded as computed columns.

📘

Learn more about Pixeltable through our FAQ.

⚡Quick Start

In these tutorials, we'll see how to create tables, populate them with data, and enhance them with built-in and user-defined transformations and AI operations.

Launch in KaggleOpen in Google Colab
Kaggle  Colab

💾 Installation

💻

Python 3.9, 3.10, 3.11, or 3.12 running on Linux, MacOS, or Windows are supported

%pip install pixeltable

To verify that it's working:

import pixeltable as pxt
pxt.init()

See the Getting Started with Pixeltable guide for more detailed installation instructions.

TopicNotebook
Get Started Open In Colab
User-Defined Functions (UDFs) Open In Colab
Comparing Object Detection Models Open In Colab
Experimenting with Chunking (RAG) Open In Colab
Working with External Files Open In Colab

Why should you use Pixeltable?

  • It gives you transparency and reproducibility
    • All generated data is automatically recorded and versioned
    • You will never need to re-run a workload because you lost track of the input data
  • It saves you money
    • All data changes are automatically incremental
    • You never need to re-run pipelines from scratch because you’re adding data
  • It integrates with any existing Python code or libraries
    • Bring your ever-changing code and workloads
    • You choose the models, tools, and AI practices (e.g., your embedding model for a vector index); Pixeltable orchestrates the data

Examples of Specific Data Problems Pixeltable Addresses

  • Interact with video data at the frame level without having to think about frame extraction, intermediate file storage, or storage space explosion.
  • Augment your data incrementally and interactively with built-in functions and UDFs, such as image transformations, model inference, and visualizations, without having to think about data pipelines, incremental updates, or capturing function output.
  • Interact with all the data relevant to your AI application (video, images, documents, audio, structured data, JSON) through a simple dataframe-style API directly in Python. This includes:
    • similarity search on embeddings, supported by high-dimensional vector indexing;
    • path expressions and transformations on JSON data;
    • PIL and OpenCV image operations;
    • assembling frames into videos.
  • Perform keyword and image similarity search at the video frame level without having to worry about frame storage.
  • Access all Pixeltable-resident data directly as a PyTorch dataset in your training scripts.
  • Understand the compute and storage costs of your data at the granularity of individual augmentations and get cost projections before adding new data and new augmentations.
  • Rely on Pixeltable's automatic versioning and snapshot functionality to protect against regressions and to ensure reproducibility.

Examples of High-Level Use Cases

Computer Vision

  • Object Detection for Autonomous Vehicles: Efficiently manage massive image and video datasets, labeling them within Pixeltable or seamlessly integrating with external labeling tools. Train and deploy models to detect pedestrians, vehicles, and other objects in real-time.
  • Visual Search for E-commerce: Create a searchable image catalog for products, enabling users to find similar items based on image similarity. Track the impact of different feature engineering and embedding choices on search performance.
  • Medical Image Analysis: Develop diagnostic models using large-scale DICOM datasets. Pixeltable's lineage tracking ensures traceability and reproducibility, crucial for regulatory compliance.
  • Defect Detection in Manufacturing: Reference images from your production line directly in Pixeltable. Apply transformations and train models to quickly identify and classify defects. Monitor model performance over time to ensure quality standards are met.

Natural Language Processing (NLP)

  • Sentiment Analysis of Customer Reviews: Import or reference text data from social media, survey responses, etc. Experiment with different preprocessing and modeling approaches. Track how changes impact accuracy and deploy the best model for real-time insights.
  • Text Summarization for Research: Load research papers, articles, or internal documents into Pixeltable. Utilize computed columns for text summarization, comparing the performance of various techniques (extractive, abstractive). Leverage lineage to understand the reasoning behind each summary.
  • Named Entity Recognition for Financial Documents: Extract critical information from financial statements, contracts, or news articles. Track the performance of your NER models on different document types. Use Pixeltable to create a centralized repository for extracted entities.

Retrieval Augmented Generation (RAG)

  • Knowledge Base Q&A Systems: Integrate Pixeltable with your company's documentation or knowledge base articles. Build a powerful chatbot that answers questions accurately and with explainability, thanks to Pixeltable's lineage tracking.
  • Content Generation: Automate marketing copy, product descriptions, or social media posts. Pixeltable allows experimenting with different prompts and LLM parameters while tracking the impact of each variation.
  • Code Generation: Use Pixeltable to build a RAG system that leverages your codebase to generate code snippets, summaries, or explanations on-demand.

Contributions & Feedback

Are you experiencing issues or bugs with Pixeltable? File an Issue.

Do you want to contribute? Feel free to open a PR.

🏟️ License

This library is licensed under the Apache 2.0 License.