Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce pretrained-smollm #57

Open
01000-you opened this issue Feb 5, 2025 · 0 comments
Open

Reproduce pretrained-smollm #57

01000-you opened this issue Feb 5, 2025 · 0 comments

Comments

@01000-you
Copy link

01000-you commented Feb 5, 2025

I tried to pretrain smollm myself, but the performance was not good. However, the config has been updated recently, and the starting point of the lr schedule has changed from 500000 to 250000 iterations. Is this a valid change?

training config

# SmolLM1 135M trained on 600B tokens
checkpoints:
  checkpoint_interval: 2000
  checkpoints_path: checkpoints_135M
  checkpoints_path_is_shared_file_system: false
  resume_checkpoint_path: null
  save_final_state: false
  save_initial_state: false
data_stages:
- data:
    dataset:
      dataset_folder: # paths to tokenized datasets
        - .../fineweb-edu-dedup
        - .../cosmopedia-v2
        - .../python-edu
        - .../open-web-math
        - .../stackoverflow
      dataset_weights:
        - 0.7
        - 0.15
        - 0.08
        - 0.06
        - 0.01
    num_loading_workers: 1
    seed: 42
  name: training stage
  start_training_step: 1
general:
  benchmark_csv_path: null
  consumed_train_samples: null
  ignore_sanity_checks: true
  project: smollm
  run: smollm-135M
  seed: 8
  step: null
logging:
  iteration_step_info_interval: 1
  log_level: info
  log_level_replica: info
model:
  ddp_bucket_cap_mb: 25
  dtype: bfloat16
  init_method:
    std: 0.0416 # 1/sqrt(hidden_size)
  make_vocab_size_divisible_by: 1
  model_config:
    bos_token_id: 0
    eos_token_id: 0
    hidden_act: silu
    hidden_size: 576
    initializer_range: 0.02
    intermediate_size: 1536
    is_llama_config: true
    max_position_embeddings: 2048
    num_attention_heads: 9
    num_hidden_layers: 30
    num_key_value_heads: 3
    pad_token_id: null
    pretraining_tp: 1
    rms_norm_eps: 1.0e-05
    rope_scaling: null
    rope_theta: 10000.0
    tie_word_embeddings: true
    use_cache: true
    vocab_size: 49152
optimizer:
  accumulate_grad_in_fp32: true
  clip_grad: 1.0
  learning_rate_scheduler:
    learning_rate: 0.003
    lr_decay_starting_step: 500000
    lr_decay_steps: 100000
    lr_decay_style: 1-sqrt
    lr_warmup_steps: 5000
    lr_warmup_style: linear
    min_decay_lr: 0
  optimizer_factory:
    adam_beta1: 0.9
    adam_beta2: 0.95
    adam_eps: 1.0e-08
    name: adamW
    torch_adam_is_fused: true
  weight_decay: 0.01
  zero_stage: 0
parallelism:
  dp: 32 # 4 nodes => 32
  expert_parallel_size: 1
  pp: 1
  pp_engine: 1f1b
  recompute_layer: false
  tp: 1
  tp_linear_async_communication: true
  tp_mode: REDUCE_SCATTER
  tp_recompute_allgather: true
profiler: null
tokenizer:
  tokenizer_max_length: null
  tokenizer_name_or_path: HuggingFaceTB/cosmo2-tokenizer
  tokenizer_revision: null
tokens:
  batch_accumulation_per_replica: 1
  limit_test_batches: 0
  limit_val_batches: 0
  micro_batch_size: 16 # GBS = 8*2*32*sequence_length = 512*sequence_length = 1M tokens
  sequence_length: 2048
  train_steps: 600000
  val_check_interval: -1

eval command

torchrun --standalone --nnodes=1 --nproc-per-node=1  src/lighteval/__main__.py nanotron \
 --checkpoint-config-path .../checkpoints_smollm_135m/600000/config.yaml \
 --lighteval-config-path .../lighteval_config_override_template.yaml

lighteval_config_override_template.yaml

batch_size: 8
generation: null
logging:
  output_dir: "outputs"
  save_details: false
  public_run: false
  results_org: null
  tensorboard_metric_prefix: "eval"
parallelism:
  dp: 1
  pp: 1
  pp_engine: 1f1b
  tp: 1
  tp_linear_async_communication: false
  tp_mode: ALL_REDUCE
tasks:
  dataset_loading_processes: 8
  multichoice_continuations_start_space: null
  num_fewshot_seeds: null
  tasks: custom|arc|0|1,custom|openbook_qa|0|1,custom|winogrande|0|1,custom|commonsense_qa|0|1,custom|piqa|0|1,custom|hellaswag|0|0
  custom_tasks: .../tasks.py

my result

_ params ARC-C ARC-E Hellaswag OBQA PIQA Winogrande mmlu_pro gsm8k
SmolLM-ours 135M 0.2466 0.4613 0.2718 0.298 0.5963 0.4933 0.1182 _
SmolLM1-HF 135M 0.288 0.563 0.413 0.334 0.681 0.517 0.112 0.011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant