Skip to content

Conversation

@shengliangxu
Copy link
Contributor

@shengliangxu shengliangxu commented Dec 6, 2025

Purpose

This is a workaround for issue #28072

Before this issue gets officially resolved, this workaround can make ModelOpt quantized models loading work without issues.

Test Plan

Tested using nvidia/Qwen2.5-VL-7B-Instruct-NVFP4

Test Result

vllm serve succeed

Before this issue gets officially resolved, this workaround can make
ModelOpt quantized models work without issues.

Tested using https://huggingface.co/nvidia/Qwen2.5-VL-7B-Instruct-NVFP4/tree/refs%2Fpr%2F2

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround to fix an issue with loading ModelOpt quantized models, specifically addressing how exclude_modules with wildcards are handled during weight remapping. The change correctly transforms module_path* patterns into module_path and module_path.* to ensure proper matching. The logic is sound and effectively resolves the issue. I have one suggestion to improve the code's readability and maintainability.

Comment on lines +201 to +205
if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".":
new_exclude_modules.append(exclude[:-1])
new_exclude_modules.append(exclude[:-1] + ".*")
else:
new_exclude_modules.append(exclude)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to identify and transform the wildcard patterns is correct. However, using direct string indexing (exclude[-1], exclude[-2], exclude[:-1]) can be less readable and potentially more error-prone for future modifications compared to using built-in string methods. Refactoring this to use endswith() and removesuffix() would make the code's intent clearer and improve its maintainability.

Suggested change
if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".":
new_exclude_modules.append(exclude[:-1])
new_exclude_modules.append(exclude[:-1] + ".*")
else:
new_exclude_modules.append(exclude)
if len(exclude) > 1 and exclude.endswith("*") and not exclude.endswith(".*"):
base = exclude.removesuffix("*")
new_exclude_modules.append(base)
new_exclude_modules.append(f"{base}.*")
else:
new_exclude_modules.append(exclude)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +199 to +203
new_exclude_modules = []
for exclude in self.exclude_modules:
if len(exclude) >= 2 and exclude[-1] == "*" and exclude[-2] != ".":
new_exclude_modules.append(exclude[:-1])
new_exclude_modules.append(exclude[:-1] + ".*")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Workaround expands exclusions beyond intended prefix

The new loop replaces each module_path* entry with plain module_path/module_path.* before mapping. In is_layer_excluded the legacy substring branch checks exclude_module in prefix, so these plain entries now match any module name containing the substring rather than only those starting with the ModelOpt wildcard. With configs that blacklist branches such as vision_model*, prefixes like encoder.vision_model_adapter (which would not have matched the original wildcard) will now be treated as excluded, leaving unrelated layers unquantized and reducing performance. This over‑exclusion did not occur before the trailing * was stripped.

Useful? React with 👍 / 👎.

@wangshangsam
Copy link
Collaborator

@shengliangxu (if you haven't done so already) could you test it to make sure it works for Qwen/Qwen3-VL-235B-A22B-Instruct also? Thanks a lot!

@shengliangxu
Copy link
Contributor Author

@shengliangxu (if you haven't done so already) could you test it to make sure it works for Qwen/Qwen3-VL-235B-A22B-Instruct also? Thanks a lot!

Yes, I have tested multiple models including this one. This model does not have online modelopt quantized checkpoint so I do not put into PR description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants