illustration

Copyright© Schmied Enterprises LLC, 2025.

A Game Changer for Image and Video Processing

Compressing image and video data using GPUs is incredibly important. It can slash training costs by a factor of 100, making AI development far more accessible.

How Compressible is Your Image?

One way to gauge compressibility is to randomly render pixels back into an image with a radius depending on how sparse the already drawn pixels are. The point at which you can recognize a key object, like a face, indicates the level of detail needed. Faces, being more complex, will naturally require a higher percentage of pixels than simpler objects like apples.

Turning Images into Knowledge Graphs for Compression

Another approach involves converting images into knowledge graphs. Imagine overlaying a random grid on your image and placing a point (a vertex) at each intersection. Connect these points to their neighbors with lines. Now, simplify the graph:

If a triangle's edges all share the same color, remove a vertex inside. If a line between two points has a consistent color (or a very similar one), remove the vertex between them. Distant vertices connected with the same color lines can be kept.

The resulting image, with circles of representative colors, will still resemble the original. This knowledge graph can then be used to identify simple shapes, like letters.

These techniques help define the complexity of abstract images. The improved compression of H.265 over H.264 is partly due to using larger blocks (64x64 pixels) to represent areas, compared to the smaller blocks (16x16 pixels) in H.264. 4K images, with their greater detail, often have larger areas of uniform color or gradual changes. This extra detail can bog down the convolutional layers used in AI inference.

Data augmentation significantly increases training dataset volume, necessitating more efficient processing strategies. By downsampling images to their minimum recognizable resolution, you can reduce computational costs while preserving essential patterns. For many recognition tasks, this approach allows for the removal of redundant convolutional layers, making it feasible to use smaller, MNIST-scale architectures.

Effective compression algorithms are key to making video inference affordable and practical for everyday applications.