FAQ

What problems does Pixeltable solve? Pixeltable addresses the complexity, lack of transparency, and deployment friction that plague current AI development workflows:

Workflow Complexity: Pixeltable eliminates the need for custom scripts, complex orchestration tools, and manual handoffs between stages.
Lack of Transparency: Automatic lineage and versioning provide full visibility into how your data and models evolve, making debugging, reproducibility, and collaboration far easier.
Deployment Bottlenecks: Pixeltable's development-to-production mirroring ensures model behavior remains consistent, enabling quick deployment and seamless iteration.
Cost Control and Optimization: Limited visibility into AI infrastructure costs, leading to budget overruns, and difficulty identifying areas for optimization.
Time and Money Wasted on Infrastructure: It removes data plumbing overhead, empowering you to focus on building innovative AI applications.

Today’s solutions for AI app development require extensive custom coding and infrastructure plumbing. Tracking lineage and versions between and across data transformations, models, and deployment is cumbersome. Pixeltable is a replacement for traditional data plumbing, providing a unified plane for data, models, and orchestration. It removes the data plumbing overhead in building and productionizing AI applications.

Who is Pixeltable for? Pixeltable accelerates the work of both ML Engineers and Data Scientists by removing time-consuming data plumbing tasks. This enables:

ML Engineers: Increased focus on optimization, deployment, and monitoring, leading to more robust AI in production
Data Scientists: More time for core modeling tasks, experimentation, and driving impactful insights.

What does Pixeltable provide me with? It provides:

Data storage and versioning
Combined Data and Model Lineage
Indexing (e.g. embedding vectors) and Data Retrieval
Orchestration of multimodal workloads
Incremental updates
Code is automatically production-ready

Why should you use Pixeltable?

Transparency & Reproducibility: All data transformations and model versions are tracked automatically, saving you from ever having to re-run workloads due to lost context.
Cost-Efficiency: Incremental updates eliminate the need to re-run pipelines from scratch when data changes.
Zero-Effort Path to Production: Your development workflow translates directly to production, enabling rapid and confident deployment.
Flexibility & Integration: Pixeltable integrates with your favorite Python libraries, tools, and practices. You choose the models and techniques; Pixeltable orchestrates the rest.
Granular Cost Accounting: Pixeltable tracks inference costs at the column level, providing unprecedented insights into where resources are spent. This empowers data-driven optimization and more predictable budgets.

What are we not?

Pixeltable is not a low-code, prescriptive AI solution. We empower you to use the best tools and techniques for your specific needs.
We do not aim to replace your existing AI toolkit, but rather enhance it by streamlining the underlying data infrastructure and orchestration.

How do I use it? Pixeltable can be applied across a variety of AI use cases. Here's an example of Computer Vision:

Task: Building an object detection model to identify stop sign violations

Data Loading: Create a Pixeltable and reference your video files stored in cloud storage (e.g., S3).
Preprocessing: Define computed columns:
- Extract frames from the video.
- Run your pre-trained detection model.
Indexing: Add a computed column to store detected object features as embeddings for similarity search.
Filtering & Analysis: Use computed columns to filter relevant frames ("contains a stop sign"), perform calculations, and visualize results directly within Pixeltable.

Key Points:

Lines of actual AI logic are minimal. Pixeltable handles orchestration and data flow.
Each step has automatic lineage, making it easy to experiment and compare results.
This table is production-ready, deployment simply means serving these computed columns.

How does Pixeltable fit into existing stacks? Pixeltable integrates seamlessly with your existing AI workflow:

Data Sources & Sinks: Interfaces with object stores (S3), labeling services (Scale, Labelbox), inference providers (OpenAI) and more. Pixeltable makes using these services even easier!
Compute & Runtimes: Works with cloud providers (AWS), and optimized environments (Modal).
Tools & Libraries: Integrates with your favorite Python libraries and MLOps tools.

What tools does Pixeltable potentially replace? When you rely on Pixeltable for data management, orchestration, and deployment, the need for the following kinds of tools can be significantly reduced or eliminated:

MLOps for Data/Model Versioning
Vector Databases/ Multimodal Databases
Custom Orchestration scripts
API integrations with external services

Do I need to move all my data to Pixeltable? No. Pixeltable stores reference to media data (images, videos) in their original location and manage new data products generated within your workflows. Structured data will need to be imported, but this can be done dynamically as needed.

What do we believe in?

Function-Focused Development: Prioritize the core logic of your application, not the mechanics of how to execute it.
Opinionated Data Infrastructure: We provide a prescriptive approach to data management, freeing you to choose the best AI tools and techniques.
Data, Model, Inference (DMI) as a Unified Workflow: We offer a holistic view of the essential elements of AI development.

What separates Pixeltable from typical solutions?

Beyond Orchestration and MLOps: Existing tools often address specific pain points or force you into their way of doing AI. Pixeltable provides an end-to-end solution, giving you the freedom to innovate as well as streamlined infrastructure.
Data-Centric vs. Process-Centric: By focusing on data and its lineage, Pixeltable simplifies and accelerates your entire AI workflow in a way other tools cannot.

Is Pixeltable focused on Open Source or a Cloud Service?
We're committed to open development and will offer both a managed cloud service and a self-hosted option.

Why should you use Pixeltable?

It gives you transparency and reproducibility
- All generated data is automatically recorded and versioned
- You will never need to re-run a workload because you lost track of the input data
It saves you money
- All data changes are automatically incremental
- You never need to re-run pipelines from scratch because you’re adding data
It provides a zero-effort path to production
- The table structure you created in development can be directly executed in a serving environment
- There is no need to hand off your prototype to the ML engineering team to re-write it against your data infrastructure
It integrates with any existing Python code or libraries
- Bring your ever-changing code and workloads
- You choose the models, tools, and AI practices (e.g., your embedding model for a vector index); Pixeltable orchestrates the data

How is Pixeltable different from Pandas? Pixeltable complements Pandas. In many cases, data scientists might leverage both tools. Pandas for initial data exploration and cleaning, then seamlessly transition to Pixeltable for building AI workloads.

Operation	Pandas	Pixeltable
reading data	Read from file system with `pd.read_*` methods: eg, `.csv`, `.json`, `.parquet`, etc.	In `pixeltable`, data is stored in tables. `cl.list_tables`, `tab = cl.get_table('mytable')`
saving data (fist time)	Save to file system, format of choice	`table.insert`
updating data	to update data persistently, use `pd.write_*()` to over-write or save new versions of the dataset	`table.update` statements on tables allow for fine-grained persistent updates only on columns with specific values
selecting rows	`df[ df.col > 1 ]`	`tab.where(tab.col > 1)`
selecting rows (predicates)	`df[(df.a > 0) & (df.b > 0)]`	`df.where((df.a > 0) & (df.b > 0))` both will error if `and` or `or` is used.
selecting columns (aka projection)	`df[['col']]`	`tab.select(tab.col)`
new column with computed value	`df.assign(new_col= fun(df.input_col1, df.input_col2,..))` or `df['new_col'] = fun(df.input_col1, df.input_col2,..))` (the latter acts in-place, modifying the df object)	`tab.select(old_colA, old_colB, new_col=fun(tab.input_col1, tab.input_col2,...))`
computing new values row by row	`df['new_col'] = df.apply(fun, axis=1)`	`df.select(old_colA, old_colB, ..., new_col=pxt.function(fun)(tab.input_col1, tab.input_col2,...)`