Data References & Types

This section will cover how Pixeltable handles data, with a focus on how to get started and the unique aspects of its data-centric approach.

One of Pixeltable's key strengths is its ability to work with your data without requiring you to move it all into a new system. This is especially valuable when dealing with large datasets like images and videos. Pixeltable achieves this through data references.

What are Data References?

Think of a data reference as a pointer or link to your original file. Pixeltable stores these references within its tables, allowing you to perform computations, transformations, and analysis on your data without needing to copy or duplicate the large files themselves.

Advantages of Data References:

  • Avoids Redundant Storage: No need to duplicate terabytes of data into Pixeltable, saving you storage costs and streamlining data access.
  • Preserves Existing Workflows: Keep your images in S3, your videos on local storage, Pixeltable integrates seamlessly with what you already have.
  • Facilitates Collaboration: Share references to datasets with your team without the hassle of moving large files around.

How to Use Data References

  • Store References in Your Pixeltable: When creating your table, use a column with the appropriate data type (e.g., ImageType for images, VideoType for videos) to store the path or URL of your files.
  • Pixeltable Does the Work: Pixeltable will automatically fetch and process data from these referenced locations when needed, optimizing access for efficient computations.

Example: Creating a Pixeltable with Image References

import pixeltable as pxt

image_table = pxt.create_table('images', {'image_path': pxt.ImageType()}) 
# Create table with ImageType column

image_table.insert([
    {'image_path': 'https://my-s3-bucket/image1.jpg'},
    {'image_path': '/path/to/local/image2.png'},
    # ... more image references
])

Data Types

Pixeltable supports a variety of data types optimized for AI App development:

  • Media Types:
    • ImageType: Store references to image files (JPG, PNG, etc.) or computed image transformations (augmented images, thumbnails).
    • VideoType: Reference video files and access individual frames effortlessly using computed columns.
  • Structured Data Types:
    • Standard database types: integers, floats, strings, date/time values (for tabular data and metadata).
    • ArrayType: For storing embedding vectors, which can be generated with computed columns.
    • JsonType: A flexible structure for storing nested data like annotations, labels, or any complex metadata relevant to your AI application.

You can learn more by looking at our API Reference

Key Takeaways:

  • Data-Centric Approach: Pixeltable is built for working with all kinds of data essential for AI/ML development
  • No Lock-In: Pixeltable embraces your existing data storage solutions.
  • Flexibility: Combine data references with computed columns that manipulate your data on the fly.
  • Simplicity: Pixeltable's declarative API keeps your focus on the what, not the how.

Learn more about Pixeltable such as Working with External Files