Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the tensor x used to compute the max infinity norm and kurtosis #5

Open
wormyu opened this issue Jan 3, 2024 · 0 comments

Comments

@wormyu
Copy link

wormyu commented Jan 3, 2024

Hi,

Thank you for your excellent work and open-sourcing your code! I have a question regarding the tensor used to compute the max infinity norm and kurtosis. In the paper, this tensor, denoted as 'x', is described as 'the output of the attention layer.' After running your code for BERT validation, the results include:

  1. max FFN output inf norm, max_ffn_out_inf_norm
  2. max FFN input + output inf norm, max_LN_inp_inf_norm
  3. max LN(FFN i + o) inf norm, max_LN_out_inf_norm

I am wondering, for the Max inf. norm values in your paper's table, which output are they based on? Are they the maximum values among these three metrics? I attatch a screenshot of the table below for reference.

image

Thank you for your time and help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant