Japanese LLM Evaluation

The scores for all tasks in Japanese, Japanese MT-Bench, and English benchmark for the LLMs selected in the table below are visualized in radar charts. You can copy the permalink corresponding to the selected model from the icon 🔗 in the upper left corner of the site.

Models

Rows:

Models:

Model					Average			Japanese										Japanese MT-Bench								English
	Name	SortKey	Type	Size (B)	Ja	Ja (MTB)	En	JCom	JEMHopQA	NIILC	JSQuAD	XL-Sum	MGSM	WMT20 (en-ja)	WMT20 (ja-en)	JMMLU	JHumanEval	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	Stem	Writing	OpenBookQA	TriviaQA	HellaSwag	SQuAD2	XWINO	MMLU	GSM8K	BBH	HumanEval
	Name	SortKey	Type	Size (B)	Ja	Ja (MTB)	En	JCom	JEMHopQA	NIILC	JSQuAD	XL-Sum	MGSM	WMT20 (en-ja)	WMT20 (ja-en)	JMMLU	JHumanEval	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	Stem	Writing	OpenBookQA	TriviaQA	HellaSwag	SQuAD2	XWINO	MMLU	GSM8K	BBH	HumanEval

Usage

Models