The Swallow Project is conducting independent evaluation experiments on major large language models (LLMs) in parallel with the development of a high-performance LLM specialized in Japanese. By comparing LLMs developed not only in Japan but also worldwide, we can better understand the current level of the Swallow Project. We conduct evaluations under fair conditions while considering the unique specifications of each LLM, such as tokenization and system prompts. By analyzing these evaluations in relation to the development methods of each LLM, we aim to explore the "recipe" for creating a high-performance LLM. This website visualizes the evaluation results of LLMs tested within the Swallow Project in bar charts, radar charts, and scatter plots. We hope this website serves not only as a guide for selecting high-performance LLMs but also as a reference for developing LLMs with strong Japanese language capabilities.

The content of this leaderboard (including data and graphs) is provided under a Creative Commons Attribution 4.0 (CC-BY 4.0) License, the evaluation software (swallow-evaluation) is distributed under the MIT License, and the source code of this website is also provided under the MIT License.

Change log

  • 2026-05-08
    • We upgraded the evaluation framework to swallow-evaluation-instruct v202604.
    • We adopted stochastic decoding (temperature = 0.6, top-p = 0.95) to evaluate reasoning models. This mitigates the impact of cases in which reasoning models fail to complete reasoning (e.g., when generation does not stop) and aligns the evaluation more closely with standard practices for reasoning models. For non-reasoning models, we continue to use greedy decoding as before, except for tasks that require multiple decoding trials, such as Japanese and English MT-Bench, JHumanEval, and LiveCodeBench.
    • We increased the number of decoding trials to four for GPQA and AIME evaluations. Since these benchmarks contain a limited number of questions, this is expected to reduce the effect of statistical variance.
    • We changed the questions evaluated in the LiveCodeBench to those added in v6 (previously, questions from v5 and v6 were used).
    • We changed the judge for both Japanese and English MT-Bench to GPT-5.2 (reasoning disabled). This change is intended to make scoring more accurate for hallucinations, incorrect answers, instruction-following failures, and off-track dialogues.
    • We removed MATH-100 from the Japanese benchmarks due to score saturation and instead adopted high- and top-difficulty Japanese questions from PolyMath.
    • We incorporated IFBench (English) and M-IFEval-Ja (Japanese), which measure instruction-following ability, into the computation of the average score.
    • We added evaluation results for recently released models, including GPT-5.4, Qwen 3.5, Gemma 4, and llm-jp-4.
    • We discontinued the evaluation of older model series such as Gemma 2 and Qwen2.5.
    • By clicking the columns in the leaderboard table, models can now be sorted according to the selected column.
  • 2026-03-12
    • We added evaluation results of Nemotron 3 Super 120B-A12B (reasoning off, reasoning low, reasoning on). We evaluated the model with early access provided by NVIDIA. Note that the evaluation on Live Code Bench was not completed for some reasoning modes; therefore, the Live Code Bench score for the English tasks and the overall average score are reported as missing values.
  • 2026-02-20
    • We added evaluation results of GPT-OSS Swallow and Qwen3 Swallow.
    • We added evaluation results of Gemini 3 Pro Preview (gemini-3-pro-preview), GPT-5.1 Thinking (gpt-5.1-2025-11-13), GPT-5 mini (gpt-5-mini-2025-08-07), Qwen3-30B-A3B, Qwen3-Next-80B-A3B-Instruct, Qwen3-Next-80B-A3B-Thinking. Olmo 3 7B Think, and Olmo 3 32B Think.
    • We added Japanese-English and English-Japanese translation (WMT20) to the benchmark datasets for post-trained models.
    • We changed the visualization method for individual Japanese and English task performance from radar charts to bar charts (excluding MT-Bench).
  • 2025-11-21
    • We added evaluation results of PLaMo 3 NICT 2B, 8B, and 31B Base.
  • 2025-10-29
    • We upgraded the evaluation framework to swallow-evaluation-instruct v202510.
    • We have added JamC-QA (question answering for Japan-specific knowledge) as Japanese benchmarks for post-trained models.
    • We have removed JEMHopQA from the Japanese benchmarks for post-trained models.
    • We added evaluation results of Apertus-8B-Instruct, Apertus-70B-Instruct, ELYZA-Shortcut-1.0-Qwen-32B, Flux-Japanese-Qwen2.5-32B-Instruct-V1.0, Qwen2.5-0.5B, and QwQ Bakeneko 32B.
  • 2025-08-18
    • Swallow LLM Leaderboard v2.
    • We have revamped evaluation benchmarks and methods for post-trained models in order to properly measure the capabilities of new large language models, such as reasoning models. We adopted six Japanese benchmarks (JEMHopQA, MMLU-ProX, GPQA, MATH-100, JHumanEval, M-IFEval-Ja) and six English benchmarks (HellaSwag, MMLU-Pro, GPQA, MATH-500, AIME 2024-2025, LiveCodeBench), and changed the evaluation method to zero-shot reasoning (previously few-shot reasoning). In addition, we have released the developed evaluation framework as swallow-evaluation-instruct.
    • We added evaluation results of ABEJA-QwQ32b-Reasoning-Japanese-v1.0, DeepSeek-R1-Distill series, ELYZA-Thinking-1.0-Qwen-32B, GPT-5 (gpt-5-2025-08-07), gpt-oss-20b, gpt-oss-120b, Llama-3.1-Nemotron series, Llama 4 Scout Instruct, MedGemma 27B IT, o3 (o3-2025-04-16), o3-mini (o3-mini-2025-01-31), Phi-4-reasoning-plus, Qwen3 series.
    • We have revised the structure to consist of three types of pages: overall results (bar chart of average scores), task-specific results (radar chart), and scatter plots. Each page visualizes the evaluation results of either pretrained models (without post-training) or post-trained models.
    • We implemented a feature on the right side of the model list (table) displayed on each page that allows users to bulk-select models by scale or category.
    • We implemented a feature in the bar chart on the overall results page that allows users to toggle the sorting order of models by clicking on a model name.
    • We added functionality to display the number of active parameters for Mixture of Experts (MoE) models.
    • We updated the scatter plot so that the plotted points are color-coded by model family (OpenAI, Llama, Gemma, Qwen, and others).
    • The old version was moved to https://swallow-llm.github.io/leaderboard-v1/.
  • 2025-06-27
    • Added a note regarding the in-domain evaluation of llm-jp-3.1-*-instruct4.
  • 2025-06-25
    • Added evaluation results of Llama 3.1 Swallow 8B v0.5.
    • Added evaluation results of Llama 4 Scout.
    • Added evaluation results of llm-jp-3-7.2b.
    • Added evaluation results of llm-jp-3-1.8b-instruct3, llm-jp-3-3.7b-instruct3, llm-jp-3-7.2b-instruct3, llm-jp-3-13b-instruct3.
    • Added evaluation results of llm-jp-3.1-1.8b-instruct4, llm-jp-3.1-13b-instruct4.
    • Added evaluation results of Qwen2.5-32B.
    • Added evaluation results of Qwen3-1.7B-Base, Qwen3-4B-Base, Qwen3-8B-Base, Qwen3-14B-Base, Qwen3-30B-A3B-Base.
  • 2025-05-21
    • Added evaluation results of Sarashina2.2 0.5B, 1B, 3B.
  • 2025-05-19
    • Added evaluation results of Gemma-2-Llama Swallow 2B, 9B, 27B.
  • 2025-04-14
    • Added evaluation results of Gemma 3 5B, 12B, 27B.
    • Added evaluation results (Japanese Understanding & Generation and Japanese MT-Bench) of GPT-4 (gpt-4-0613).
    • Added evaluation results (Japanese Understanding & Generation and Japanese MT-Bench) of GPT-4.5 (gpt-4.5-preview-2025-02-27) and o1 (o1-2024-12-17). We also considered evaluating on Japanese understanding and generation tasks; however, due to limitations in the OpenAI API specifications — specifically, the inability to generate 10 responses for a single prompt under the same conditions as other models — we will treat the scores for Japanese understanding and generation tasks as blank.
  • 2025-03-10
    • Relaunched as the Swallow LLM Leaderboard.
  • 2024-07-01

Evaluation tasks

Post-trained (Japanese)

This benchmark evaluates post-trained LLMs including reasoning models on Japanese benchmark datasets. The evaluation scores range from 0 (lowest) to 1 (highest).

Question answering
Q&A (JamC-QA)

Question answering for Japan-specific knowledge

Metric: Accuracy
English-Japanese translation
En-Ja (WMT20)

Translation of news articles (English to Japanese)

Metric: BLEU
Japanese-English translation
Ja-En (WMT20)

Translation of news articles (Japanese to English)

Metric: BLEU
Instruction following
M-IFEval-Ja

Controllability of instruction following

Metric: Accuracy
College-level exam
NLU and reasoning (MMLU-ProX)

Proficient-level multi-discipline language understanding and reasoning

Metric: Accuracy
Science
Science (GPQA, Japanese)

Graduate-level Google-proof question answering

Metric: Accuracy
Mathematics
Math (PolyMath Ja, HT)

High-to-competition level mathmatics

Metric: Accuracy
Coding
Coding (JHumanEval)

Japanese translation of HumanEval (code genration benchmark)

Metric: Pass@1 (n=10)
Post-trained (English)

This benchmark evaluates post-trained LLMs including reasoning models on English benchmark datasets. The evaluation scores range from 0 (lowest) to 1 (highest).

Natural language inference
NLI (HellaSwag)

Four-choice questions to predict the next event

Metric: Accuracy
Instruction following
IFBench

Instruction following

Metric: Accuracy
College-level exam
NLU and reasoning (MMLU-Pro)

Proficient-level multi-discipline language understanding and reasoning

Metric: Accuracy
Science
Science (GPQA)

Graduate-level Google-proof question answering

Metric: Accuracy
Mathematics
Math (MATH-500)

Competition-level mathmatics

Metric: Accuracy
Mathematics
Math (AIME 2024-2025)

Qualification for the United States Mathematical Olympiad (USAMO)

Metric: Accuracy
Coding
Coding (LCB v6)

Contests across competition platforms (LeetCode, AtCoder, and CodeForces)

Metric: Pass@1 (n=10)
Japanese MT-Bench

The Japanese version of MT-Bench (Nejumi LLM Leaderboard edition) evaluates multi-turn dialogue capabilities. The test questions are based on v4, and the reference answers are derived from v2 with corrections to incorrect responses. The evaluation scores range from 0 (lowest) to 1 (highest).

Coding

Implementing algorithms in Python or C++, and creating websites using HTML.

Metric: Reference-guided grading by GPT-5.2
Extraction

Extracting named entities (such as author names and numerical values) and sentiment (e.g., positive or negative) from text.

Metric: Reference-guided grading by GPT-5.2
Humanities

Creating essays and strategies on topics related to law, economics, history, philosophy, and education.

Metric: Reference-guided grading by GPT-5.2
Math

Generating solutions for problems and word problems in algebra, geometry, probability, and number theory.

Metric: Reference-guided grading by GPT-5.2
Reasoning

Generating answers to questions by leveraging common knowledge and reasoning skills.

Metric: Reference-guided grading by GPT-5.2
Roleplay

Writing creative texts by assuming the persona of famous individuals or fictional characters and imagining hypothetical scenarios.

Metric: Reference-guided grading by GPT-5.2
STEM

Generating answers and explanations on topics related to physics, chemistry, biology, geography, architecture, and machine learning.

Metric: Reference-guided grading by GPT-5.2
Writing

Writing blog articles, email drafts, and fictional narratives.

Metric: Reference-guided grading by GPT-5.2
English MT-Bench

English MT-Bench evaluates multi-turn dialogue capabilities. The evaluation scores range from 0 (lowest) to 1 (highest).

Coding

Implementing algorithms in Python or C++, and creating websites using HTML.

Metric: Reference-guided grading by GPT-5.2
Extraction

Extracting named entities (such as author names and numerical values) and sentiment (e.g., positive or negative) from text.

Metric: Reference-guided grading by GPT-5.2
Humanities

Creating essays and strategies on topics related to law, economics, history, philosophy, and education.

Metric: Reference-guided grading by GPT-5.2
Math

Generating solutions for problems and word problems in algebra, geometry, probability, and number theory.

Metric: Reference-guided grading by GPT-5.2
Reasoning

Generating answers to questions by leveraging common knowledge and reasoning skills.

Metric: Reference-guided grading by GPT-5.2
Roleplay

Writing creative texts by assuming the persona of famous individuals or fictional characters and imagining hypothetical scenarios.

Metric: Reference-guided grading by GPT-5.2
STEM

Generating answers and explanations on topics related to physics, chemistry, biology, geography, architecture, and machine learning.

Metric: Reference-guided grading by GPT-5.2
Writing

Writing blog articles, email drafts, and fictional narratives.

Metric: Reference-guided grading by GPT-5.2
Pre-trained (Japanese)

This benchmark evaluates pre-trained LLMs models (without post-training) on Japanese benchmark datasets. The evaluation scores range from 0 (lowest) to 1 (highest).

Commonsense
JCommonsenseQA

Five-choice questions created with a knowledge base

Metric: Accuracy
Multi-hop Q&A
JEMHopQA

Open-ended Q&A to assess the amount of knowledge and reasoning ability

Metric: Character F1
Classical Q&A
NIILC

Open-ended Q&A that can be answered by an encyclopedia

Metric: Character F1
Reference: Sekine (2003)
Reading comprehension
JSQuAD

Open-ended Q&A for Wikipedia article

Metric: Character F1
Summarization
XL-Sum

Task to generate a highlight from a news article of BBC

Metric: ROUGE-2
Mathematics
MGSM

Japanese translation of math word problems (GSM8K)

Metric: Accuracy (exact match)
English-Japanese translation
WMT20 (en-ja)

Translation of news articles (English to Japanese)

Metric: BLEU
Japanese-English translation
WMT20 (ja-en)

Translation of news articles (Japanese to English)

Metric: BLEU
Multi-task natural language understanding
JMMLU

Japanese translation of four-choice exam questions benchmark MMLU (53 subjects)

Metric: Accuracy
Reference: Yin et al (2024)
Code generation
JHumanEval

Japanese translation of HumanEval (code genration benchmark)

Metric: pass@1
Pre-trained (English)

This benchmark evaluates pre-trained LLMs models (without post-training) on English benchmark datasets. The evaluation scores range from 0 (lowest) to 1 (highest).

Q&A based on facts and common sense
OpenBookQA

Four-choice questions based on scientific knowledge and common sense

Metric: Accuracy
Q&A based on knowledge
TriviaQA

Open-ended Q&A based on trivias

Metric: Accuracy (exact match)
Commonsense inference
HellaSwag

Four-choice questions to predict the next event

Metric: Accuracy
Reading comprehension
SQuAD2

Open-ended Q&A developed for the evidence document

Metric: Accuracy (exact match)
Commonsense inference
XWINO

Two-choice question to predict the antecedent of a pronoun

Metric: Accuracy
Multitask natural language understanding
MMLU

Four-choice exam questions benchmark MMLU (53 subjects)

Metric: Accuracy
Mathematics
GSM8K

Math word problems

Metric: Accuracy (exact match)
Mathematics
MATH

High school math competitions

Metric: Accuracy (exact match)
Collection of hard-to-solve tasks for LLM
BIG-Bench-Hard (BBH)

23 tasks that are difficult in BIG-Bench dataset (Srivastava et al., 2023)

Metric: Accuracy (exact match)
Code generation
HumanEval

Ability of code generation measured by unit test

Metric: pass@1

Evaluation tools

Tools used to evaluate post-trained models

swallow-evaluation-instruct (v202604)
An evaluation framework developed in Swallow project based on lighteval (v0.8.0), developed by Hugging Face Inc.

Tools used to evaluate pre-trained models

LLM-jp evaluation script (1.3.0)
Automatic evaluation tool for Japanese LLMs
JP Language Model Evaluation Harness (commit #9b42d41)
An evaluation framework for Japanese LLMs
Language Model Evaluation Harness (0.4.2)
An evaluation framework for LLMs
Code Generation LM Evaluation Harness (commit #0261c52)
An evaluation framework for code generation (HumanEval)
FastChat (commit #e86e70d0)
An automatic evaluation framework by an LLM (MT-Bench)
swallow-evaluation
An evaluation framework used in Swallow Project (encompassing all the above-mentioned tools)

Evaluated models

Model name # Parameters [B] Release date Post-training Reasoning mode Missing scores
ABEJA-QwQ32b-Reasoning-Japanese-v1.0 33 2025-04-25 Yes on
DeepSeek-R1-Distill-Qwen-32B-Japanese 33 2025-01-27 Yes on
ELYZA-Shortcut-1.0-Qwen-32B 33 2025-05-01 Yes N/A
ELYZA-Thinking-1.0-Qwen-32B 33 2025-05-01 Yes on
Falcon3-1B-Base 1.7 2024-12-19 No
Falcon3-3B-Base 3.2 2024-12-19 No
Falcon3-7B-Base 7.5 2024-12-19 No
Falcon3-10B-Base 10 2024-12-19 No
Gemma 2 2B 2.6 2024-06-27 No
Gemma 2 9B 9.2 2024-06-27 No
Gemma 2 27B 27 2024-06-27 No
Gemma-2-Llama Swallow 2B 2.6 2025-05-19 No
Gemma-2-Llama Swallow 9B 9.2 2025-05-19 No
Gemma-2-Llama Swallow 27B 27 2025-05-19 No
Gemma 3 1B 1 2025-03-12 No
Gemma 3 1B IT 1.0 2025-03-12 Yes N/A
Gemma 3 4B 4.3 2025-03-12 No
Gemma 3 4B IT 4.3 2025-03-12 Yes N/A
Gemma 3 12B 12 2025-03-12 No
Gemma 3 12B IT 12 2025-03-12 Yes N/A
Gemma 3 27B 27 2025-03-12 No
Gemma 3 27B IT 27 2025-03-12 Yes N/A
Gemma 4 31B IT 33 2026-04-02 Yes on
Gemma 4 26B-A4B IT 27 2026-04-02 Yes on
Gemma 4 E2B IT 5.1 2026-04-02 Yes on
Gemma 4 E4B IT 8.0 2026-04-02 Yes on
GPT-4.1 (gpt-4.1-2025-04-14) 0 2025-04-14 Yes N/A
GPT-4o (gpt-4o-2024-08-06) 0 2024-08-06 Yes N/A
GPT-5 (gpt-5-2025-08-07) 0 2025-08-07 Yes on (medium)
GPT-5 mini (gpt-5-mini-2025-08-07) 0 2025-08-07 Yes on (medium)
GPT-5.4 Thinking (gpt-5.4-2026-03-05) 0 2026-03-05 Yes on (medium)
gpt-oss-20b 22 (3.6) 2025-08-05 Yes on (medium)
gpt-oss-120b 120 (5.1) 2025-08-05 Yes on (medium)
GPT-OSS-Swallow-20B-RL-v0.1 22 (3.6) 2026-02-20 Yes on
GPT-OSS-Swallow-120B-RL-v0.1 120 (5.1) 2026-02-20 Yes on
GPT-OSS-Swallow-20B-SFT-v0.1 22 (3.6) 2026-02-20 Yes on
GPT-OSS-Swallow-120B-SFT-v0.1 120 (5.1) 2026-02-20 Yes on
Llama 3.1 8B 8.0 2024-07-23 No
Llama 3.1 8B Instruct 8.0 2024-07-23 Yes N/A
Llama 3.1 70B 70 2024-07-23 No
Llama-3.1-Nemotron-Nano-8B-v1 8.0 2025-03-18 Yes on
Llama 3.1 Swallow 8B v0.5 8.0 2025-06-25 No
Llama 3.1 Swallow 8B Instruct v0.5 8.0 2025-06-25 Yes N/A
Llama 3.2 1B 1.2 2024-09-25 No
Llama 3.2 3B 3.2 2024-09-25 No
Llama 3.3 70B Instruct 70 2024-12-06 Yes N/A
Llama 3.3 Swallow 70B v0.4 70 2025-03-14 No
Llama 3.3 Swallow 70B Instruct v0.4 70 2025-03-10 Yes N/A
Llama 4 Scout 109 (17) 2025-04-04 No
Llama 4 Scout Instruct 109 (17) 2025-04-04 Yes N/A
llm-jp-3-1.8b 1.8 2024-09-25 No
llm-jp-3-3.7b 3.7 2024-09-25 No
llm-jp-3-7.2b 7.3 2025-02-05 No
llm-jp-3-13b 13 2024-09-25 No
llm-jp-3.1-1.8b-instruct4 1.8 2025-05-30 Yes N/A
llm-jp-3.1-13b-instruct4 14 2025-05-30 Yes N/A
llm-jp-4-32b-a3b-thinking 32 2026-04-03 Yes on
llm-jp-4-8b-thinking 8.6 2026-04-03 Yes on
NVIDIA-Nemotron-3-Super-120B-A12B 120 (12) 2026-03-12 Yes on
NVIDIA-Nemotron-Nano-9B-v2-Japanese 8.9 2026-02-17 Yes on
o3 (o3-2025-04-16) 0 2025-04-16 Yes on (medium)
o3-mini (o3-mini-2025-01-31) 0 2025-01-31 Yes on (medium)
Olmo 3 7B Think 7.3 2025-11-20 Yes on
Olmo 3 32B Think 32 2025-11-20 Yes on
PLaMo 2 1B 1.3 2025-02-21 No
PLaMo 2 8B 9.1 2025-02-21 No
PLaMo 3 NICT 2B Base 2.6 2025-11-14 No
PLaMo 3 NICT 8B Base 8.1 2025-11-14 No
PLaMo 3 NICT 31B Base 32 2025-11-14 No
Qwen2.5-0.5B 0.5 2024-09-19 No
Qwen2.5-1.5B 1.5 2024-09-19 No
Qwen2.5-3B 3.1 2024-09-19 No
Qwen2.5-7B 7.6 2024-09-19 No
Qwen2.5-14B 14 2024-09-19 No
Qwen2.5-32B 33 2024-09-19 No
Qwen2.5-72B 72 2024-09-19 No
Qwen3-0.6B 0.5 2025-04-29 Yes on
Qwen3-0.6B-Base 0.6 2025-04-29 No
Qwen3-1.7B 1.5 2025-04-29 Yes on
Qwen3-1.7B-Base 1.7 2025-04-29 No
Qwen3-4B 3.1 2025-04-29 Yes on
Qwen3-4B-Base 4.0 2025-04-29 No
Qwen3-8B-Base 8.2 2025-04-29 No
Qwen3-8B 8.2 2025-04-29 Yes on
Qwen3-14B-Base 15 2025-04-29 No
Qwen3-14B 15 2025-04-29 Yes on
Qwen3-32B 33 2025-04-29 Yes on
Qwen3-30B-A3B-Base 31 (3.3) 2025-04-29 No
Qwen3-30B-A3B 31 (3) 2025-04-29 Yes on
Qwen3-235B-A22B-Instruct-2507 235 (22) 2025-07-23 Yes N/A
Qwen3-235B-A22B-Thinking-2507 235 (22) 2025-07-23 Yes on
Qwen3-Next-80B-A3B-Instruct 81 (3) 2025-09-11 Yes N/A
Qwen3-Next-80B-A3B-Thinking 81 (3) 2025-09-11 Yes on
Qwen3-Swallow-30B-A3B-CPT-v0.2 31 (3) 2026-02-20 Yes on
Qwen3-Swallow-30B-A3B-RL-v0.2 31 (3) 2026-02-20 Yes on
Qwen3-Swallow-30B-A3B-SFT-v0.2 31 (3) 2026-02-20 Yes on
Qwen3-Swallow-8B-CPT-v0.2 8.2 2026-02-20 Yes on
Qwen3-Swallow-32B-CPT-v0.2 33 2026-02-20 Yes on
Qwen3-Swallow-8B-RL-v0.2 8.2 2026-02-20 Yes on
Qwen3-Swallow-32B-RL-v0.2 33 2026-02-20 Yes on
Qwen3-Swallow-8B-SFT-v0.2 8.2 2026-02-20 Yes on
Qwen3-Swallow-32B-SFT-v0.2 33 2026-02-20 Yes on
Qwen3.5-0.8B 0.8 2026-03-02 Yes on
Qwen3.5-2B 1.9 2026-03-02 Yes on
Qwen3.5-4B 4.2 2026-03-02 Yes on
Qwen3.5-9B 9.0 2026-03-02 Yes on
Qwen3.5-27B 28 2026-02-16 Yes on
Qwen3.5-35B-A3B 36 (3) 2026-02-16 Yes on
Qwen3.5-122B-A10B 125 (10) 2026-02-16 Yes on
QwQ Bakeneko 32B 33 2025-03-13 Yes on
Sarashina2-7B 7.3 2024-06-14 No
Sarashina2-13B 13 2024-06-14 No
Sarashina2-70B 70 2024-06-14 No
Sarashina2.2 0.5B 0.8 2025-03-07 No
Sarashina2.2 1B 1.4 2025-03-07 No
Sarashina2.2 3B 3.4 2025-03-07 No
Sarashina2.2 3B Instruct v0.1 3.4 2025-03-07 Yes N/A
TinySwallow-1.5B 1.5 2025-01-30 No

Acknowledgements

  • Tabler Admin Template licensed under MIT License
  • ApexCharts licensed under MIT License
  • Swallow icon by Game Icons.net in CC Attribution License via SVG Repo
  • The research and development of the large language model Swallow has been supported by the AIST Project "Research and Development on Generative AI Foundation Models in the Physical Domain"