FAR.AI
Frontier alignment research to ensure the safe development and deployment of advanced AI systems.
Popular repositories Loading
-
tuned-lens
tuned-lens PublicTools for understanding how transformer predictions are built layer-by-layer
-
-
learned-planner
learned-planner PublicInterpretability tools for recurrent networks that play Sokoban
-
Repositories
Showing 10 of 38 repositories
- scaling-llm-robustness-paper Public
Code used for the paper `Scaling Trends in Language Model Robustness`
AlignmentResearch/scaling-llm-robustness-paper’s past year of commit activity - KataGoVisualizer Public
AlignmentResearch/KataGoVisualizer’s past year of commit activity - refusal_direction Public Forked from andyrdt/refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
AlignmentResearch/refusal_direction’s past year of commit activity - learned-planners-stable-baselines3 Public Forked from AlignmentResearch/stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
AlignmentResearch/learned-planners-stable-baselines3’s past year of commit activity