The scores for all tasks in Japanese, Japanese MT-Bench, and English benchmark for the LLMs selected in the table below are visualized in radar charts. You can copy the permalink corresponding to the selected model from the icon 🔗 in the upper left corner of the site.

Models

Model Average Japanese Japanese MT-Bench English
Name SortKey Type Size (B) Ja Ja (MTB) En JCom JEMHopQA NIILC JSQuAD XL-Sum MGSM WMT20 (en-ja) WMT20 (ja-en) JMMLU JHumanEval Coding Extraction Humanities Math Reasoning Roleplay Stem Writing OpenBookQA TriviaQA HellaSwag SQuAD2 XWINO MMLU GSM8K BBH HumanEval
Name SortKey Type Size (B) Ja Ja (MTB) En JCom JEMHopQA NIILC JSQuAD XL-Sum MGSM WMT20 (en-ja) WMT20 (ja-en) JMMLU JHumanEval Coding Extraction Humanities Math Reasoning Roleplay Stem Writing OpenBookQA TriviaQA HellaSwag SQuAD2 XWINO MMLU GSM8K BBH HumanEval