Model scores are visualized as scatterplots (the size of the dots corresponds to the size of the model) by selecting any Japanese, Japanese MT-Bench, or English tasks for the horizontal and vertical axes. This page provides two scatter plots to compare many tasks simultaneously. The model you wish to visualize can be selected from the table below. You can copy the permalink corresponding to the selected model from the icon 🔗 in the upper left corner of the site.

Models

Model Average Japanese Japanese MT-Bench English
Name SortKey Type Size (B) Ja Ja (MTB) En JCom JEMHopQA NIILC JSQuAD XL-Sum MGSM WMT20 (en-ja) WMT20 (ja-en) JMMLU JHumanEval Coding Extraction Humanities Math Reasoning Roleplay Stem Writing OpenBookQA TriviaQA HellaSwag SQuAD2 XWINO MMLU GSM8K BBH HumanEval
Name SortKey Type Size (B) Ja Ja (MTB) En JCom JEMHopQA NIILC JSQuAD XL-Sum MGSM WMT20 (en-ja) WMT20 (ja-en) JMMLU JHumanEval Coding Extraction Humanities Math Reasoning Roleplay Stem Writing OpenBookQA TriviaQA HellaSwag SQuAD2 XWINO MMLU GSM8K BBH HumanEval