Reproduce pretrained-smollm #57

01000-you · 2025-02-05T07:45:23Z

I tried to pretrain smollm myself, but the performance was not good. However, the config has been updated recently, and the starting point of the lr schedule has changed from 500000 to 250000 iterations. Is this a valid change?

training config

# SmolLM1 135M trained on 600B tokens
checkpoints:
  checkpoint_interval: 2000
  checkpoints_path: checkpoints_135M
  checkpoints_path_is_shared_file_system: false
  resume_checkpoint_path: null
  save_final_state: false
  save_initial_state: false
data_stages:
- data:
    dataset:
      dataset_folder: # paths to tokenized datasets
        - .../fineweb-edu-dedup
        - .../cosmopedia-v2
        - .../python-edu
        - .../open-web-math
        - .../stackoverflow
      dataset_weights:
        - 0.7
        - 0.15
        - 0.08
        - 0.06
        - 0.01
    num_loading_workers: 1
    seed: 42
  name: training stage
  start_training_step: 1
general:
  benchmark_csv_path: null
  consumed_train_samples: null
  ignore_sanity_checks: true
  project: smollm
  run: smollm-135M
  seed: 8
  step: null
logging:
  iteration_step_info_interval: 1
  log_level: info
  log_level_replica: info
model:
  ddp_bucket_cap_mb: 25
  dtype: bfloat16
  init_method:
    std: 0.0416 # 1/sqrt(hidden_size)
  make_vocab_size_divisible_by: 1
  model_config:
    bos_token_id: 0
    eos_token_id: 0
    hidden_act: silu
    hidden_size: 576
    initializer_range: 0.02
    intermediate_size: 1536
    is_llama_config: true
    max_position_embeddings: 2048
    num_attention_heads: 9
    num_hidden_layers: 30
    num_key_value_heads: 3
    pad_token_id: null
    pretraining_tp: 1
    rms_norm_eps: 1.0e-05
    rope_scaling: null
    rope_theta: 10000.0
    tie_word_embeddings: true
    use_cache: true
    vocab_size: 49152
optimizer:
  accumulate_grad_in_fp32: true
  clip_grad: 1.0
  learning_rate_scheduler:
    learning_rate: 0.003
    lr_decay_starting_step: 500000
    lr_decay_steps: 100000
    lr_decay_style: 1-sqrt
    lr_warmup_steps: 5000
    lr_warmup_style: linear
    min_decay_lr: 0
  optimizer_factory:
    adam_beta1: 0.9
    adam_beta2: 0.95
    adam_eps: 1.0e-08
    name: adamW
    torch_adam_is_fused: true
  weight_decay: 0.01
  zero_stage: 0
parallelism:
  dp: 32 # 4 nodes => 32
  expert_parallel_size: 1
  pp: 1
  pp_engine: 1f1b
  recompute_layer: false
  tp: 1
  tp_linear_async_communication: true
  tp_mode: REDUCE_SCATTER
  tp_recompute_allgather: true
profiler: null
tokenizer:
  tokenizer_max_length: null
  tokenizer_name_or_path: HuggingFaceTB/cosmo2-tokenizer
  tokenizer_revision: null
tokens:
  batch_accumulation_per_replica: 1
  limit_test_batches: 0
  limit_val_batches: 0
  micro_batch_size: 16 # GBS = 8*2*32*sequence_length = 512*sequence_length = 1M tokens
  sequence_length: 2048
  train_steps: 600000
  val_check_interval: -1

eval command

torchrun --standalone --nnodes=1 --nproc-per-node=1  src/lighteval/__main__.py nanotron \
 --checkpoint-config-path .../checkpoints_smollm_135m/600000/config.yaml \
 --lighteval-config-path .../lighteval_config_override_template.yaml

lighteval_config_override_template.yaml

batch_size: 8
generation: null
logging:
  output_dir: "outputs"
  save_details: false
  public_run: false
  results_org: null
  tensorboard_metric_prefix: "eval"
parallelism:
  dp: 1
  pp: 1
  pp_engine: 1f1b
  tp: 1
  tp_linear_async_communication: false
  tp_mode: ALL_REDUCE
tasks:
  dataset_loading_processes: 8
  multichoice_continuations_start_space: null
  num_fewshot_seeds: null
  tasks: custom|arc|0|1,custom|openbook_qa|0|1,custom|winogrande|0|1,custom|commonsense_qa|0|1,custom|piqa|0|1,custom|hellaswag|0|0
  custom_tasks: .../tasks.py

my result

_	params	ARC-C	ARC-E	Hellaswag	OBQA	PIQA	Winogrande	mmlu_pro	gsm8k
SmolLM-ours	135M	0.2466	0.4613	0.2718	0.298	0.5963	0.4933	0.1182	_
SmolLM1-HF	135M	0.288	0.563	0.413	0.334	0.681	0.517	0.112	0.011

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce pretrained-smollm #57

Reproduce pretrained-smollm #57

01000-you commented Feb 5, 2025 •

edited

Loading

Reproduce pretrained-smollm #57

Reproduce pretrained-smollm #57

Comments

01000-you commented Feb 5, 2025 • edited Loading

01000-you commented Feb 5, 2025 •

edited

Loading