pixeltable-basics

Pixeltable Basics¶

Welcome to Pixeltable! In this tutorial, we'll survey how to create tables, populate them with data, and enhance them with built-in and user-defined transformations and AI operations.

If you want to follow along with this tutorial interactively, there are two ways to go.

Use a Kaggle or Colab container (easiest): Click on one of the badges above.
Locally in a self-managed Python environment: You'll probably want to create your own empty notebook, then copy-paste each command from the website. Be sure your Jupyter kernel is running in a Python virtual environment; you can check out the Getting Started with Pixeltable guide for step-by-step instructions.

Install Python Packages¶

First run the following command to install Pixeltable and related libraries needed for this tutorial.

In [ ]:

%pip install -q torch transformers openai pixeltable

Creating a Table¶

Let's begin by creating a demo directory (if it doesn't already exist) and a table that can hold image data, demo.first. The table will initially have just a single column to hold our input images, which we'll call input_image. We also need to specify a type for the column: pxt.ImageType().

In [1]:

import pixeltable as pxt

# Create the directory `demo` (if it doesn't already exist)
pxt.create_dir('demo', ignore_errors=True)

# Create the table `demo.first` with a single column `input_image`
pxt.drop_table('demo.first', ignore_errors=True)
t = pxt.create_table('demo.first', {'input_image': pxt.ImageType()})

Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata
Created directory `demo`.
Created table `first`.

We can use t.describe() to examine the table schema. We see that it now contains a single column, as expected.

In [2]:

t.describe()

Column Name	Type	Computed With
input_image	image

The new table is initially empty, with no rows:

In [3]:

t.count()

Out[3]:

Now let's put an image into it! We can add images simply by giving Pixeltable their URLs. The example images in this demo come from the COCO dataset, and we'll be referencing copies of them in the Pixeltable github repo. But in practice, the images can come from anywhere: an S3 bucket, say, or the local file system.

When we add the image, we see that Pixeltable gives us some useful status updates indicating that the operation was successful.

In [4]:

t.insert(input_image='https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000025.jpg')

Inserting rows into `first`: 1 rows [00:00, 286.95 rows/s]
Inserted 1 row with 0 errors.

Out[4]:

UpdateStatus(num_rows=1, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])

We can use t.show() to examine the contents of the table.

In [5]:

t.show()

Out[5]:

input_image

Adding Computed Columns¶

Great! Now we have a table containing some data. Let's add an object detection model to our workflow. Specifically, we're going to use the ResNet-50 object detection model, which runs using the Huggingface DETR ("DEtection TRansformer") model class. Pixeltable contains a built-in adapter for this model family, so all we have to do is call the detr_for_object_detection Pixeltable function. A nice thing about the Huggingface models is that they run locally, so you don't need an account with a service provider in order to use them.

This is our first example of a computed column, a key concept in Pixeltable. Recall that when we created the input_image column, we specified a type, ImageType, indicating our intent to populate it with data in the future. When we create a computed column, we instead specify a function that operates on other columns of the table. By default, when we add the new computed column, Pixeltable immediately evaluates it against all existing data in the table - in this case, by calling the detr_for_object_detection function on the image.

Depending on your setup, it may take a minute for the function to execute. In the background, Pixeltable is downloading the model from Huggingface (if necessary), instantiating it, and caching it for later use.

In [6]:

from pixeltable.functions import huggingface
t['detect'] = huggingface.detr_for_object_detection(t.input_image, model_id='facebook/detr-resnet-50')

Computing cells: 100%|████████████████████████████████████████████| 1/1 [00:01<00:00,  1.26s/ cells]
Added 1 column value with 0 errors.

Let's examine the results.

In [7]:

t.show()

Out[7]:

input_image	detect
	{'boxes': [[51.94154739379883, 356.17449951171875, 181.4806365966797, 413.97491455078125], [383.2247314453125, 58.6600456237793, 605.6396484375, 361.3460388183594]], 'labels': [25, 25], 'scores': [0.9898098707199097, 0.9990087747573853], 'label_text': ['giraffe', 'giraffe']}

We see that the model returned a JSON structure containing a lot of information. In particular, it has the following fields:

label_text: Descriptions of the objects detected
boxes: Bounding boxes for each detected object
scores: Confidence scores for each detection
labels: The DETR model's internal IDs for the detected objects

Perhaps this is more than we need, and all we really want are the text labels. We could add another computed column to extract label_text from the JSON struct:

In [8]:

t['detect_text'] = t.detect.label_text
t.show()

Computing cells: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 197.56 cells/s]
Added 1 column value with 0 errors.

Out[8]:

input_image	detect	detect_text
	{'boxes': [[51.94154739379883, 356.17449951171875, 181.4806365966797, 413.97491455078125], [383.2247314453125, 58.6600456237793, 605.6396484375, 361.3460388183594]], 'labels': [25, 25], 'scores': [0.9898098707199097, 0.9990087747573853], 'label_text': ['giraffe', 'giraffe']}	[giraffe, giraffe]

If we inspect the table schema now, we see how Pixeltable distinguishes between ordinary and computed columns.

In [9]:

t.describe()

Column Name	Type	Computed With
input_image	image
detect	json	huggingface.detr_for_object_detection(input_image, model_id='facebook/detr-resnet-50')
detect_text	json	detect.label_text

Now let's add some more images to our table. This demonstrates another important feature of computed columns: by default, they update incrementally any time new data shows up on their inputs. In this case, Pixeltable will run the ResNet-50 model against each new image that is added, then extract the labels into the detect_text column. Pixeltable will orchestrate the execution of any sequence (or DAG) of computed columns.

Note how we can pass multiple rows to t.insert with a single statement, which will insert them more efficiently.

In [10]:

more_images = [
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000030.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000034.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000042.jpg',
    'https://raw.github.com/pixeltable/pixeltable/master/docs/source/data/images/000000000061.jpg'
]
t.insert({'input_image': image} for image in more_images)

Computing cells:  50%|██████████████████████                      | 4/8 [00:01<00:01,  3.65 cells/s]
Inserting rows into `first`: 4 rows [00:00, 3703.58 rows/s]
Computing cells: 100%|████████████████████████████████████████████| 8/8 [00:01<00:00,  7.28 cells/s]
Inserted 4 rows with 0 errors.

Out[10]:

UpdateStatus(num_rows=4, num_computed_values=8, num_excs=0, updated_cols=[], cols_with_excs=[])

Let's see what the model came up with. We'll use t.select to suppress the display of the detect column, since right now we're only interested in the text labels.

In [11]:

t.select(t.input_image, t.detect_text).show()

Out[11]:

input_image	detect_text
	[giraffe, giraffe]
	[vase, potted plant]
	[zebra]
	[dog, dog]
$No description has been provided for this image$	[person, person, bench, person, elephant, elephant, person]

Pixeltable Is Persistent¶

An important feature of Pixeltable is that everything is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database: all your data, transformations, and computed columns are stored and preserved between sessions. To see this, let's clear all the variables in our notebook and start fresh. You can optionally restart your notebook kernel at this point, to demonstrate how Pixeltable data persists across sessions.

In [1]:

# Clear all variables in the notebook
%reset -f

# Instantiate a new client object
import pixeltable as pxt
t = pxt.get_table('demo.first')

# Display just the first two rows, to avoid cluttering the tutorial
t.select(t.input_image, t.detect_text).show(2)

Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata

Out[1]:

input_image	detect_text
	[giraffe, giraffe]
	[vase, potted plant]

GPT-4 Vision¶

For comparison, let's try running our examples through a generative model, Open AI's GPT-4 Vision. For this section, you'll need an OpenAI account with an API key. You can use the following command to add your API key to the environment (just enter your API key when prompted):

In [2]:

import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('Enter your OpenAI API key:')

Now we can connect to OpenAI through Pixeltable. This may take some time, depending on how long OpenAI takes to process the query.

In [3]:

from pixeltable.functions import openai

t['vision'] = openai.vision(prompt="Describe what's in this image.", image=t.input_image)

Computing cells: 100%|████████████████████████████████████████████| 5/5 [00:34<00:00,  6.95s/ cells]
Added 5 column values with 0 errors.

Let's see how GPT-4's responses compare to the traditional discriminative (DETR) model.

In [4]:

t.select(t.input_image, t.detect_text, t.vision).show()

Out[4]:

input_image	detect_text	vision
	[giraffe, giraffe]	The image shows two giraffes in a naturalistic habitat that resembles an African savanna setting. The giraffe in the foreground is standing and appears to be reaching up to a tall tree to possibly browse on leaves or hay placed there for enrichment. The spots on their bodies have a pattern that is typical of giraffes, and their long necks are distinctive features of this species. The background features a second giraffe resting on the ground with trees and shrubs around, indicating a spacious environment likely within a zoo or wildlife reserve. The setting is peaceful and pastoral with ample sunshine and clear skies.
	[vase, potted plant]	The image shows a white vase placed on what appears to be a ledge or a shelf, which is a part of an outdoor setting given the natural light and greenery in the background. The vase is filled with a lovely arrangement of flowers that includes white blossoms that could be hydrangeas, along with smaller red and pink flowers. The scene evokes a peaceful and warm ambiance, possibly a garden or a patio area. There is also a soft shadow cast onto the surface where the vase sits, indicating that the photo was taken on a sunny day.
	[zebra]	The image features a zebra grazing on a grassy field. The zebra is positioned at an angle to the camera, showing its distinctive black and white striped pattern covering its body, head, and legs. It is bending its head down towards the grass, likely eating. The background consists of a bright grassy field with a hint of a shadow suggesting a sunny day. The zebra's mane is erect, and it has large ears that are facing forward.
	[dog, dog]	The image shows a small poodle-like dog asleep on a white wire rack. The dog's curly, light brown fur covers its body so that its face is not visible in the photo. The wire rack is also holding various footwear, including a pair of red flip-flops with a pattern on them, white sneakers, and some black shoes that are partially cut off at the edge of the photo. Just in front of the rack, on the left side, there is a black item which appears to be a gym shoe or a sport sandal with the brand name "RUCANOR" visible. The floor is a terracotta-colored tile, and there's a hint of a blue object, possibly a part of another item or structure, adjacent to the shoe rack. The scene suggests a casual, homely environment.
$No description has been provided for this image$	[person, person, bench, person, elephant, elephant, person]	The image shows two elephants with riders on their backs, trekking through a lush green forest. The forest floor is covered with various green plants, and the trees are densely packed with green leaves. The riders are sitting on top of the elephants using what appear to be special seats or mounts designed for elephant riding. It seems to be a day with ample daylight, possibly indicating daytime in a tropical or subtropical forest environment often associated with such elephant trekking activities.

In addition to adapters for local models and inference APIs, Pixeltable can perform a range of more basic image operations. These image operations can be seamlessly chained with API calls, and Pixeltable will keep track of the sequence of operations, constructing new images and caching when necessary to keep things running smoothly. Just for fun (and to demonstrate the power of computed columns), let's see what OpenAI thinks of our sample images when we rotate them by 180 degrees.

In [5]:

t['rot_image'] = t.input_image.rotate(180)
t['rot_vision'] = openai.vision(prompt="Describe what's in this image.", image=t.rot_image)

Added 5 column values with 0 errors.
Computing cells: 100%|████████████████████████████████████████████| 5/5 [00:33<00:00,  6.78s/ cells]
Added 5 column values with 0 errors.

In [6]:

t.select(t.rot_image, t.rot_vision).show()

Out[6]:

rot_image	rot_vision
	The image appears to be upside down. It shows the inverted view of a giraffe with trees in the background. The giraffe's neck and head are visible as it seems to reach towards the ground, which is actually the sky due to the inverted perspective. The trees in the upside down image likely represent the sky normally, and the clear blue at the bottom, which would usually be the sky, represents the ground here. The photo seems to play with perspective by flipping the scene upside down. If you were to flip this image 180 degrees, you'd see the giraffe standing upright with the sky above and the trees and earth below.
	This image shows a bouquet of flowers hanging upside down from a white structure which could possibly be a shelf or the edge of a ceiling. The bouquet is attached just below a white, decorative element that resembles the bottom part of a classical column (perhaps a part of a lamp or another decorative piece). The flowers include various types, with white blossoms and some pinkish-red flowers amid greenery, arranged in a dense, rounded format. The flowers appear fresh and are likely hung this way for decorative purposes, as part of an event setting, or potentially to dry them for preservation. In the blurred background, there is greenery indicating that this setting is outdoors or in a space with plants.
	This image shows a zebra lying on its back on some grass. The zebra's front legs are bent at the knees and sticking up in the air, and its head is turned to the side to face the camera. The grass appears to be lush and sunlit, and the zebra looks comfortable and relaxed in this unusual position. The distinct black and white stripes of the zebra's coat are a striking contrast against the green of the grass.
	The image shows a curly-haired dog, possibly a poodle or poodle mix, that appears to be standing or lying down among various pairs of shoes on a metal rack. The rack is ostensibly for organizing shoes, and there are at least four visible pairs: a pair that looks like running shoes, a sandal, a flip-flop, and another sporty shoe. There is also a blue object, potentially a shoelace, on the ground to the left of the dog. The flooring is tiled, suggesting this may be an indoor setting, possibly an entryway or a mudroom where shoes are commonly kept.
	The image is upside down, which likely causes some confusion at first glance. It shows a dense green foliage background with the leaves covering most of the frame. The center of the image features an upside-down opening into what appears to be a cave or dark space, and inside this opening, there are several structures resembling a deck or viewing platform, with protective railings that are now on the lower part of the opening because the image is inverted. The color contrast between the dark interior space and the bright greenery makes the opening stand out. The image overall creates an intriguing and somewhat disorienting visual due to the upside-down perspective.

UDFs: Enhancing Pixeltable's Capabilities¶

Another important principle of Pixeltable is that, although Pixeltable has a built-in library of useful operations and adapters, it will never prescribe a particular way of doing things. Pixeltable is built from the ground up to be extensible.

Let's take a specific example. Recall our use of the ResNet-50 detection model, in which the detect column contains a JSON blob with bounding boxes, scores, and labels. Suppose we want to create a column containing the single label with the highest confidence score. There's no built-in Pixeltable function to do this, but it's easy to write our own. In fact, all we have to do is write a Python function that does the thing we want, and mark it with the @pxt.udf decorator.

In [7]:

@pxt.udf
def detect_top(detect: dict) -> str:
    scores = detect['scores']
    label_text = detect['label_text']
    # Get the index of the object with the highest confidence
    i = scores.index(max(scores))
    # Return the corresponding label
    return label_text[i]

In [8]:

t['top'] = detect_top(t.detect)

Computing cells: 100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 609.23 cells/s]
Added 5 column values with 0 errors.

In [9]:

t.select(t.detect_text, t.top).show()

Out[9]:

detect_text	top
[giraffe, giraffe]	giraffe
[vase, potted plant]	vase
[zebra]	zebra
[dog, dog]	dog
[person, person, bench, person, elephant, elephant, person]	elephant

Congratulations! You've reached the end of the tutorial. Hopefully, this gives a good overview of the capabilities of Pixeltable, but there's much more to explore. As a next step, you might check out one of the other tutorials, depending on your interests: