Config Explorer should consider `--dtype` and `--kv-cache-dtype`

### Component

Configuration Explorer

### Desired use case or feature

In vLLM, there are various choices for controlling the precision of the model. To start off, consider the three arguments: `--dtype`, `--kv-cache-dtype`, and `--quantization`.

`--dtype`: https://docs.vllm.ai/en/latest/configuration/engine_args.html#-dtype
- Controls the precision used for loading model weights and activations
- `auto`: will use FP16 precision for FP32 and FP16 models, and BF16 precision for BF16 models
- Default is determined from `torch_dtype` in `config.json`

`--quantization`: https://docs.vllm.ai/en/latest/configuration/engine_args.html#-quantization-q
- This probably takes precedence over `--dtype`. 
- Options include `bitesandbytes`, `fp8`, `quark`. Looks like it is model- and hardware- specific. 

`--kv-cache-dtype`: KV cache can be quantized to reduce memory footprint. 
- Options include `auto` (unquantized, defaults to the model data type`, `fp8`, `fp8_e4m3`, and `fp8_e5m2`
- Given this [example](https://docs.vllm.ai/en/latest/features/quantization/quantized_kvcache.html#usage-example), it seems that kv cache dtype is independent of the weight dtype. One can use kv cache dtype fo fp8 while the tensor weights are unquantized  

### Proposed solution

The Capacity Planner should accept quantization for model weights and activation. Currently, the calculation for model weights use the dtype exposed by safetensor metadata, which might be different for some parameters. However, when loading into vllm, the weights use the same precision (it should be what `--dtype` is). Furthermore, allow users to configure `--kv-cache-dtype`. Finally, update the vllm serve command. 

### Alternatives

_No response_

### Additional context or screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Config Explorer should consider `--dtype` and `--kv-cache-dtype` #420

Component

Desired use case or feature

Proposed solution

Alternatives

Additional context or screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Config Explorer should consider --dtype and --kv-cache-dtype #420

Description

Component

Desired use case or feature

Proposed solution

Alternatives

Additional context or screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Config Explorer should consider `--dtype` and `--kv-cache-dtype` #420