Open Japanese LLMs from academic research and development

Swallow LLM

Swallow LLM

Features

Developing high-performance and open large language models in Japan

High-performance Japanese LLMs

We aim to build general-purpose large language models that not only possess rich knowledge about Japan but also excel in English, mathematics, coding, and reasoning

Permissive licenses that allow commercial use

We adopt licenses with as few usage restrictions as possible and release the developed models on HuggingFace

Academic research and development

Led primarily by members of the Okazaki Laboratory and Yokota Laboratory at Institute of Science Tokyo, together with the National Institute of Advanced Industrial Science and Technology (AIST)

Open research and development

We share the recipes, training data, and experimental results needed to build high-performance foundation models, promoting AI research and applications

Achievements

Impact of Swallow projects spreading across the community (as of Feburary 2026)

2.3M
Model Downloads
540K
Dataset Downloads
135
Models
17
Datasets

Members

Naoaki Okazaki

Naoaki Okazaki

Professor, Science Tokyo


Project leader, team leader of pre-training corpus, Web developer

Rio Yokota

Rio Yokota

Professor, Science Tokyo


Team leader of LLM training

Sakae Mizuki

Sakae Mizuki

Researcher, AIST / Science Tokyo


Team leader of instruction tuning, team leader of LLM evaluation

Kazuki Fujii

Kazuki Fujii

Master student, Science Tokyo


LLM training, post-training

Taishi Nakamura

Taishi Nakamura

Master student, Science Tokyo


LLM training, post-training, and evaluation

Youmi Ma

Youmi Ma

Assistant Professor, Science Tokyo


Post-training

Daisuke Oba

Daisuke Oba

Specially Appointed Assistant Professor, Science Tokyo


Post-training

Susumu Ota

Susumu Ota

Researcher, Science Tokyo


Post-training

Koki Maeda

Koki Maeda

PhD student, Science Tokyo


LLM evaluation

Masanari Oi

Masanari Oi

Master student, Science Tokyo


LLM evaluation

Takumi Okamoto

Takumi Okamoto

Master student, Science Tokyo


Instruction tuning

Shigeki Ishida

Shigeki Ishida

Master student, Science Tokyo


LLM evaluation

Masaki Kawamura

Masaki Kawamura

Master student, Science Tokyo


Development of Language resources

Yukito Tajima

Yukito Tajima

Master student, Science Tokyo


Development of Language resources

Taihei Shiotani

Taihei Shiotani

Master student, Science Tokyo


LLM evaluation

Hinari Shimada

Hinari Shimada

Master student, Science Tokyo


LLM evaluation, safety

Koshiro Saito

Koshiro Saito

Master student, Science Tokyo


LLM evaluation

Tatsuya Ichinose

Tatsuya Ichinose

Undergraduate student, Science Tokyo


LLM evaluation

Naoya Matsushita

Naoya Matsushita

Undergraduate student, Science Tokyo


LLM evaluation

Sora Miyamoto

Sora Miyamoto

Undergraduate student, Science Tokyo


LLM evaluation

Daisuke Nohara

Daisuke Nohara

Undergraduate student, Science Tokyo


Post-training

Yuta Katayama

Yuta Katayama

Undergraduate student, Science Tokyo


Development of Language resources, instruction tuning

Nguyen Tien Dung

Nguyen Tien Dung

Undergraduate student, Science Tokyo


Development of Language resources

Hiroya Takamura

Hiroya Takamura

Research team leader, AIRC, AIST


Project manager

Past Members

Sangwhan Moon

Sangwhan Moon

Researcher, Science Tokyo


Post-training

Mengsay Loem

Mengsay Loem

Master of TokyoTech


Expert of LLM evaluation

Shota Hirai

Shota Hirai

Master of TokyoTech


Expert of development of pre-training corpora

Taiki Iida

Taiki Iida

PhD of TokyoTech


Expert of tokenization for LLMs

Kakeru Hattori

Kakeru Hattori

Master student, Science Tokyo


Development of pre-training corpora, LLM evaluation