The scores for all tasks in Japanese, Japanese MT-Bench, and English benchmark for the LLMs selected in the table below are visualized in radar charts. You can copy the permalink corresponding to the selected model from the icon 🔗 in the upper left corner of the site.
Models
Model | Average | Japanese | Japanese MT-Bench | English | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | SortKey | Type | Size (B) | Ja | Ja (MTB) | En | JCom | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20 (en-ja) | WMT20 (ja-en) | JMMLU | JHumanEval | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | Stem | Writing | OpenBookQA | TriviaQA | HellaSwag | SQuAD2 | XWINO | MMLU | GSM8K | BBH | HumanEval | |
Name | SortKey | Type | Size (B) | Ja | Ja (MTB) | En | JCom | JEMHopQA | NIILC | JSQuAD | XL-Sum | MGSM | WMT20 (en-ja) | WMT20 (ja-en) | JMMLU | JHumanEval | Coding | Extraction | Humanities | Math | Reasoning | Roleplay | Stem | Writing | OpenBookQA | TriviaQA | HellaSwag | SQuAD2 | XWINO | MMLU | GSM8K | BBH | HumanEval |