[cpu]llama avx model inference supports #8634

bukejiyu · 2024-06-20T04:27:36Z

PR types

PR changes

Description

paddle inference_mode 集成xft cpu kernel
机器8463B
输入/输出 128/15 bs=1
静态图llama 测速 next_tokens: 100+ms
48线程动态图llama 测速 next_tokens: 70+ms

paddle-bot · 2024-06-20T04:27:40Z

Thanks for your contribution!

CLAassistant · 2024-06-20T04:27:41Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

[email protected] seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2024-06-20T04:57:52Z

Codecov Report

Attention: Patch coverage is 0% with 282 lines in your changes missing coverage. Please review.

Project coverage is 55.63%. Comparing base (65e721e) to head (b791375).
Report is 243 commits behind head on develop.

Files with missing lines	Patch %	Lines
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	133 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py	0.00%	101 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	48 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8634      +/-   ##
===========================================
- Coverage    55.80%   55.63%   -0.18%     
===========================================
  Files          620      620              
  Lines        96642    96940     +298     
===========================================
  Hits         53928    53928              
- Misses       42714    43012     +298

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

csrc/cpu/src/set_value_by_flags.cc

csrc/cpu/src/setup.py

llm/predict/predictor.py

paddlenlp/experimental/transformers/fused_transformer_layers.py

paddlenlp/experimental/transformers/llama/modeling.py

llm/predict/predictor.py

csrc/cpu/README.md

csrc/cpu/setup.sh

DesmonDay

LGTM

bukejiyu force-pushed the tmp_cpu branch 2 times, most recently from 426aa9d to c3c3d49 Compare June 25, 2024 04:20

bukejiyu changed the title ~~tmp support cpu~~ [cpu]llama avx model inference supports Jun 25, 2024

bukejiyu force-pushed the tmp_cpu branch from c3c3d49 to f5c1503 Compare June 25, 2024 06:49