-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cpu]llama avx model inference supports #8634
Conversation
Thanks for your contribution! |
[email protected] seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8634 +/- ##
===========================================
- Coverage 55.80% 55.63% -0.18%
===========================================
Files 620 620
Lines 96642 96940 +298
===========================================
Hits 53928 53928
- Misses 42714 43012 +298 ☔ View full report in Codecov by Sentry. |
426aa9d
to
c3c3d49
Compare
paddlenlp/experimental/transformers/fused_transformer_layers.py
Outdated
Show resolved
Hide resolved
9225301
to
ac28f8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
PR changes
Description
paddle inference_mode 集成xft cpu kernel
机器8463B
输入/输出 128/15 bs=1
静态图llama 测速 next_tokens: 100+ms
48线程 动态图llama 测速 next_tokens: 70+ms