Open Japanese LLMs from academic research and development

Swallow LLM

Swallow LLM

Features

Developing high-performance and open large language models in Japan

High-performance Japanese LLMs

We aim to build general-purpose large language models that not only possess rich knowledge about Japan but also excel in English, mathematics, coding, and reasoning

Permissive licenses that allow commercial use

We adopt licenses with as few usage restrictions as possible and release the developed models on HuggingFace

Academic research and development

Led primarily by members of the Okazaki Laboratory and Yokota Laboratory at Institute of Science Tokyo, together with the National Institute of Advanced Industrial Science and Technology (AIST)

Open research and development

We share the recipes, training data, and experimental results needed to build high-performance foundation models, promoting AI research and applications

Achievements

Impact of Swallow projects spreading across the community (as of March 2026)

2.4M
Model Downloads
551K
Dataset Downloads
132
Models
19
Datasets

Members

Naoaki Okazaki

Naoaki Okazaki

Professor, Science Tokyo


Project leader, team leader of pre-training corpus, Web developer

Rio Yokota

Rio Yokota

Professor, Science Tokyo


Team leader of LLM training

Sakae Mizuki

Sakae Mizuki

Researcher, AIST / Science Tokyo


Team leader of instruction tuning, team leader of LLM evaluation

Kazuki Fujii

Kazuki Fujii

PhD student, Science Tokyo


LLM training, post-training

Taishi Nakamura

Taishi Nakamura

PhD student, Science Tokyo


LLM training, post-training, and evaluation

Youmi Ma

Youmi Ma

Assistant Professor, Science Tokyo


Post-training

Daisuke Oba

Daisuke Oba

Specially Appointed Assistant Professor, Science Tokyo


Post-training

Susumu Ota

Susumu Ota

Researcher, Science Tokyo


Post-training

Koki Maeda

Koki Maeda

PhD student, Science Tokyo


LLM evaluation

Masanari Oi

Masanari Oi

PhD student, Science Tokyo


Post-training

Shigeki Ishida

Shigeki Ishida

PhD student, Science Tokyo


LLM evaluation

Masaki Kawamura

Masaki Kawamura

Master's student, Science Tokyo


Development of Language resources

Yukito Tajima

Yukito Tajima

Master's student, Science Tokyo


Development of Language resources

Koshiro Saito

Koshiro Saito

Master's student, Science Tokyo


LLM evaluation

Tatsuya Ichinose

Tatsuya Ichinose

Master's student, Science Tokyo


LLM evaluation

Naoya Matsushita

Naoya Matsushita

Master's student, Science Tokyo


LLM evaluation

Sora Miyamoto

Sora Miyamoto

Undergraduate student, Science Tokyo


LLM evaluation

Daisuke Nohara

Daisuke Nohara

Master's student, Science Tokyo


Post-training

Yuta Katayama

Yuta Katayama

Undergraduate student, Science Tokyo


Development of Language resources, instruction tuning

Nguyen Tien Dung

Nguyen Tien Dung

Undergraduate student, Science Tokyo


Development of Language resources

Takaya Hiratsuka

Takaya Hiratsuka

Undergraduate student, Science Tokyo


LLM evaluation, Development of Language resources

Hiroya Takamura

Hiroya Takamura

Research team leader, AIRC, AIST


Project manager

Past Members

Takumi Okamoto

Takumi Okamoto

Master's degree, Science Tokyo


Instruction tuning

Taihei Shiotani

Taihei Shiotani

Master's degree, Science Tokyo


LLM evaluation

Hinari Shimada

Hinari Shimada

Master's degree, Science Tokyo


LLM evaluation, safety

Sangwhan Moon

Sangwhan Moon

Researcher, Science Tokyo


Post-training

Mengsay Loem

Mengsay Loem

Master's degree, TokyoTech


Expert of LLM evaluation

Shota Hirai

Shota Hirai

Master's degree, TokyoTech


Expert of development of pre-training corpora

Taiki Iida

Taiki Iida

Ph.D., TokyoTech


Expert of tokenization for LLMs

Kakeru Hattori

Kakeru Hattori

Master's degree, Science Tokyo


Development of pre-training corpora, LLM evaluation