Question about the tensor x used to compute the max infinity norm and kurtosis #5

wormyu · 2024-01-03T06:32:45Z

Hi,

Thank you for your excellent work and open-sourcing your code! I have a question regarding the tensor used to compute the max infinity norm and kurtosis. In the paper, this tensor, denoted as 'x', is described as 'the output of the attention layer.' After running your code for BERT validation, the results include:

max FFN output inf norm, max_ffn_out_inf_norm
max FFN input + output inf norm, max_LN_inp_inf_norm
max LN(FFN i + o) inf norm, max_LN_out_inf_norm

I am wondering, for the Max inf. norm values in your paper's table, which output are they based on? Are they the maximum values among these three metrics? I attatch a screenshot of the table below for reference.

Thank you for your time and help!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the tensor x used to compute the max infinity norm and kurtosis #5

Question about the tensor x used to compute the max infinity norm and kurtosis #5

wormyu commented Jan 3, 2024

Question about the tensor x used to compute the max infinity norm and kurtosis #5

Question about the tensor x used to compute the max infinity norm and kurtosis #5

Comments

wormyu commented Jan 3, 2024