-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support StableLM2 12B #6635
Support StableLM2 12B #6635
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this working, or work in progress?
Since you require specific branch to convert, perhaps it'd be a good idea to warn user if they are using |
By then the user would already have downloaded more than 20GB of model files. Ideally, the q and k layernorms should be stacked during conversion (similarly to how mixtral's expert tensors are concatenated) if they aren't already and if they're present. |
38a4de3
to
29d940b
Compare
Done, thanks for pointing this out. Makes the conversion a lot simpler. |
…evert change to base class `write_tensors()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this helps with correcting the flake8 linter errors
@ggerganov does this look good to merge now? |
llama.cpp
Outdated
if (model.layers[il].ffn_norm) { | ||
// non-parallel residual | ||
cur = ggml_add(ctx0, cur, ffn_inp); | ||
} else { | ||
// add together residual + FFN + self-attention | ||
cur = ggml_add(ctx0, cur, inpL); | ||
cur = ggml_add(ctx0, cur, attn_out); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these 2 branches equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so. One is doing parallel residual (eg 12b) and the other when the ffn norm is present (eg stablelm 1.6 and 3b) is not doing parallel residual. If I am missing something please let me know thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ffn_inp = attn_out + inpL
I think these branches do the same and can be replaced by simply with:
cur = ggml_add(ctx0, cur, ffn_inp);
I am looking for ways to avoid the unused ffn_inp = ggml_add(...)
in the parallel-residual case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
ffn_inp = attn_out + inpL
I think these branches do the same
Reasoning from the relevant modeling code in transformers
, even though they separate them for clarity, I think you're right, theses branches do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Thanks! I removed the branches. And re-ran on 1.6B, 3B and 12B no problems. Please let me know if there is anything else !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to do anything else, or can we merge?
All done from my side |
* StableLM2 12B support for huggingface -> GGUF * StableLM12 tensormapping and constants * StableLM-2-12b model support * fix * Added 12B support * Removed autoformatting; resolved bug where model_arch was not selecting StableLM2 * Formatting * Do QK norm stacking in model conversion step * Converge StableLM and StableLM2 code to simplify graph construction * Fix accidental removal * Removed warnings * Revert formatter * Move QK norm stack to private function so it's easier to read * refactor stablelm graph builder to support 1.6, 3b and 12b more efficiently * Proper check for None type for new_name to avoid crash; formatting; revert change to base class `write_tensors()` * Format * Formatting * format Co-authored-by: compilade <[email protected]> * Fix incorrect check for K norm * space after commas; Keep indentation multiple of 4 spaces * Flake8 format * Removed unnecessary conditional branches * Removed unused comment * Fixed incorrect tensor passing * Format --------- Co-authored-by: compilade <[email protected]>
Support for https://huggingface.co/stabilityai/stablelm-2-12b-chat, resolving #6553