You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work and open-sourcing your code! I have a question regarding the tensor used to compute the max infinity norm and kurtosis. In the paper, this tensor, denoted as 'x', is described as 'the output of the attention layer.' After running your code for BERT validation, the results include:
max FFN output inf norm, max_ffn_out_inf_norm
max FFN input + output inf norm, max_LN_inp_inf_norm
max LN(FFN i + o) inf norm, max_LN_out_inf_norm
I am wondering, for the Max inf. norm values in your paper's table, which output are they based on? Are they the maximum values among these three metrics? I attatch a screenshot of the table below for reference.
Thank you for your time and help!
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for your excellent work and open-sourcing your code! I have a question regarding the tensor used to compute the max infinity norm and kurtosis. In the paper, this tensor, denoted as 'x', is described as 'the output of the attention layer.' After running your code for BERT validation, the results include:
max_ffn_out_inf_norm
max_LN_inp_inf_norm
max_LN_out_inf_norm
I am wondering, for the Max inf. norm values in your paper's table, which output are they based on? Are they the maximum values among these three metrics? I attatch a screenshot of the table below for reference.
Thank you for your time and help!
The text was updated successfully, but these errors were encountered: