Unlock the Potential of AI

SqueezeBits is building an ecosystem for designing, training, and deploying highly compressed Neural Networks.

Our Mission

Faster

From research to real life application

Lighter

The smaller size, the broader applications

Cost Effective

Equal performance at a low cost

Safer

Data only stays where it came from

How we can help

Model Profiling

Model profiling is the most critical first step in model optimization.

Identifying bottlenecks that slow down the model and determining the potential performance improvement through optimization are the key aspects.

Reducing Latency & Size

SqueezeBits optimize the model based on a deep understanding of the model architecture and target hardware.

Based on quantization, SqueezeBits select the most suitable optimization method that fits the application and environment.

Hardware specific optimization

SqueezeBits offers lightweighting services that are optimized for your hardware's characteristics.

Migrate from the server to the edge

By migrating AI computing from the cloud to the edge device, you can reduce the server operating cost.

Why SqueezeBits?

Optimizing an AI model strongly requires deep understanding of the hardware.

SqueezeBits maintain technical expertise on lightweight the models on following devices:

CPU, GPU, Mobile AP, Micro Controller, NPU

Applications

  • Vision

    Image classification, Segmentation, Object detection, Face recognition, Pose estimation and more

  • NLP & Speech

    Keyword spotting, Question answering, Noise suppression, Translation, Speech recognition and more

  • Generative AI

    Stable diffusion on a mobile environment by implementing quantization, pruning, and knowledge distillation and more

AI Compression Toolkit

OwLite is the easiest AI compression toolkit, simplifies the process of applying advanced AI compression techniques to pre-existing machine learning models

Lightweight your AI model with us

SqueezeBits excels in the optimization of deep learning models using state-of-the-art model compression techniques.