Skip to content

[Over-long prompt/response length] cannot reshape tensor of 0 elements into shape [1, 0, -1, 128] because the unspecified dimension size -1 can be any value and is ambiguous #50

@IsaacGHX

Description

@IsaacGHX

When there is a VLLM serving error, such as out of the length of the max length of a tiny LLM, the output is empty, then the trace backward to the agent client, the error happens.

🖇 AgentOps: [OPENAI WRAPPER] Error in chat_completion_stream_wrapper: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 4608 tokens. However, you requested 6859 tokens (6347 in the messages, 512 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}

E.g. when using the costum agent in the customized cal_x example, such issue happens at the end of the first batch of the multiprocess task.

 
  File "/workspace/workspace/agent-lightning/.venv/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 154, in forward
    query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1, 128] because the unspecified dimension size -1 can be any value and is ambiguous

...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions