litert-community/Qwen3-0.6B

Main Model Card: Qwen/Qwen3-0.6B

This model card provides a few variants of the Qwen3-0.6B model that are ready for deployment on Android and Desktop.

How to Use

Android (Google AI Edge Gallery)

You can either install Google AI Edge Gallery through Open Beta in the Play Store or install the APK from Github.

To build the demo app from source, please follow the instructions from the GitHub repository.

Android (LiteRT-LM)

1. Add the dependency

Make sure you have the necessary dependency in your Gradle file.

dependencies {
    implementation("com.google.ai.edge.litertlm:litertlm:<LATEST_VERSION>")
}

2. Inference with the LiteRT-LM API

import com.google.ai.edge.litertlm.*

suspend fun main() {
  Engine.setNativeMinLogSeverity(LogSeverity.ERROR) // hide log for TUI app
  val engineConfig = EngineConfig(
      modelPath = "/path/to/your/model.litertlm", // Replace with model path
      backend = Backend.CPU, // Or Backend.GPU
      visionBackend = Backend.GPU,
  )

  // See the Content class for other variants.
  val multiModalMessage = Message.of(
      Content.Text("Tell me a Joke."),
  )
  Engine(engineConfig).use { engine ->
    engine.initialize()

    engine.createConversation().use { conversation ->
      while (true) {
        print("\n>>> ")
        conversation.sendMessageAsync(Message.of(readln())).collect { print(it) }
      }
    }
  }
}

Try running this model on NPU by using this .litertlm file and setting your EngineConfig’s backend to NPU. To check if your phone’s NPU is supported see this guide.

Desktop

To build a Desktop application, C++ is the current recommendation. See the following code sample.

// Create engine.
auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU,
);

// The same steps to create Engine and Conversation as above...

// Send message to the LLM with image data.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", { // Now content must be an array.
          {{"type", "text"}, {"text", "Tell me a Joke."}}
        }},
    });
CHECK_OK(model_message);

// Print the model message.
std::cout << *model_message << std::endl;

Performance

Android

Benchmarked on Vivo X300 Pro.

Backend	Quantization scheme	Context length	Prefill (tokens/sec)	Decode (tokens/sec)	Model size (MB)	Model File
CPU	dynamic_int8	4096	165 tk/s	9 tk/s	586 MB	🔗
GPU	dynamic_int8	4096	580 tk/s	21 tk/s	586 MB	🔗
NPU	a16w8	4096	1,472 tk/s	36 tk/s	992 MB	🔗

Notes:

Model Size: measured by the size of the file on disk.
The inference on CPU is accelerated via the LiteRT XNNPACK delegate with 4 threads
Benchmark is run with cache enabled and initialized. During the first run, the latency and memory usage may differ.

Downloads last month: 9

Model tree for litert-community/Qwen3-0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(431)

this model