Participation Guidelines
Rules for model training, data usage, evaluation, and submission.
Tracks
Each team submits a single model. The three tracks differ only in which evaluation tasks are run against that model — unlike the English BabyLM Challenge, there are no separate training-setting requirements per track.
- NLU Track. CLUE fine-tuning (AFQMC, OCNLI, TNEWS, CLUEWSC2020) and zero-shot minimal pair scoring on ZhoBLiMP.
- Cognitive Modeling Track. MulCogBench: fit model representations to human fMRI brain recordings via ridge regression, at word and sentence levels.
- HANZI Track. Character-level minimal pairs via PinyinBench and HanziBench, targeting phonological and structural properties of Chinese characters.
The overall score is the total across all evaluation tasks (both open and hidden). Per-track scores are the total within that track's tasks.
Pretraining Data
- Pretrain from scratch. Models must be trained from randomly initialized weights. Loading pretrained checkpoints or distilling from existing large models is not allowed.
-
Choose one of two data options:
- Option 1 — Official corpus. Use the organizer-provided 102M-word training corpus on Hugging Face: chinese-babylm-org/babylm-zho-100M.
- Option 2 — Bring your own corpus. At most 102M words counted via Jieba 0.42.1 in default mode; see the evaluation pipeline's Token Count section for counting rules.
- No evaluation leakage. Fine-tuning, validation, or test splits from any track's evaluation datasets must not appear in your pretraining corpus.
- Reproducibility for winners. Each track's first-place team and the overall first-place team must submit their full training data (if Option 2) and complete training code. Organizers will reproduce the pipeline to verify results.
Evaluation
Evaluation follows a two-phase protocol: an open phase during development and a hidden phase on held-out tasks released after model submission.
Open Evaluation Tasks
Released alongside the guidelines on April 15, 2026. Teams run the open-source evaluation pipeline locally and may report scores to the public Hugging Face leaderboard during development. Open tasks are fully visible — training data must not include them.
Hidden Evaluation Tasks
Released after the model submission deadline (June 11, 2026). Teams evaluate their already-submitted, frozen models on the held-out test set and submit results by June 20, 2026. Hidden tasks test generalization beyond the open set.
Model Submission
All final models must be uploaded to Hugging Face as public repositories by June 11, 2026. No modifications to weights are permitted after this deadline — hidden evaluation uses exactly the submitted checkpoint.
Final Scoring
A team's final score is the total across all open and hidden evaluation tasks. Per-track winners are decided by the total within that track; the overall winner is decided by the total across all tasks across all tracks.
Ways to Improve Your Score
The following are suggestions, not requirements. Participants are free to explore other approaches within the data budget.
- Data strategy: improve corpus quality (filtering, deduplication), adjust training epochs, or experiment with training order / curriculum.
- Model architecture: try Transformer encoders, decoders, encoder-decoders, Mamba / state-space models, or Diffusion Language Models.
- Training method: explore reinforcement learning, preference optimization, multi-stage pretraining, and other advanced techniques.
- Cognitive track tuning: the choice of layer and feature extraction strategy (e.g., mean pooling, specific hidden layers) can substantially affect fMRI alignment.
- Learn from English BabyLM: recent winning approaches are a valuable source of ideas — see babylm.github.io.
Paper Submissions
Chinese BabyLM is a shared task in NLPCC 2026. Participating teams of top ranking sysmtes will usually be invited to submit system reports. After reviewing, the reports may be pusblished as part of the NLPCC proceedings. See proceedings of 2025: https://link.springer.com/book/10.1007/978-981-95-3352-7?page=2
You can also publish your paper on arXiv or submit to other relevant workshops such as BabyLM Workshop @ EMNLP 2026.
Awards
To be determined. Details will be announced once finalized with NLPCC officials and sponsors.