When there is a VLLM serving error, such as out of the length of the max length of a tiny LLM, the output is empty, then the trace backward to the agent client, the error happens.
🖇 AgentOps: [OPENAI WRAPPER] Error in chat_completion_stream_wrapper: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 4608 tokens. However, you requested 6859 tokens (6347 in the messages, 512 in the completion). Please reduce the length of the messages or completion. None", 'type': 'BadRequestError', 'param': None, 'code': 400}
E.g. when using the costum agent in the customized cal_x example, such issue happens at the end of the first batch of the multiprocess task.
File "/workspace/workspace/agent-lightning/.venv/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 154, in forward
query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1, 128] because the unspecified dimension size -1 can be any value and is ambiguous
...