日本語理解・生成平均
0.5141
#35 / 113
英語理解・生成平均
0.5743
#47 / 108
日本語MT-Bench平均
0.612
#32 / 62

日本語理解・生成

英語理解・生成

日本語 MT-Bench

モデル名 Ja avg JComQA JEMHQA NIILC JSQuAD XL-Sum MGSM En-Ja Ja-En JMMLU JHumanEval
Aya Expanse 8B 0.445 0.922 0.467 0.385 0.867 0.211 0.608 0.261 0.206 0.521 0.001
Aya Expanse 32B 0.512 0.965 0.554 0.586 0.812 0.295 0.716 0.287 0.245 0.655 0.001
CyberAgentLM3-22B-chat 0.471 0.934 0.510 0.648 0.911 0.104 0.576 0.275 0.215 0.541 0.001
Falcon3-1B-Base 0.129 0.216 0.251 0.062 0.281 0.085 0.008 0.012 0.020 0.264 0.088
Falcon3-1B-Instruct 0.169 0.240 0.312 0.132 0.454 0.101 0.020 0.028 0.032 0.281 0.089
Falcon3-3B-Base 0.209 0.281 0.333 0.113 0.517 0.120 0.096 0.031 0.051 0.319 0.229
Falcon3-3B-Instruct 0.232 0.421 0.160 0.113 0.632 0.141 0.092 0.061 0.058 0.331 0.308
Falcon3-7B-Base 0.337 0.634 0.412 0.180 0.788 0.173 0.244 0.078 0.119 0.385 0.361
Falcon3-7B-Instruct 0.363 0.684 0.436 0.152 0.816 0.177 0.320 0.094 0.126 0.415 0.416
Falcon3-10B-Base 0.383 0.680 0.443 0.187 0.854 0.187 0.376 0.103 0.139 0.435 0.426
Falcon3-10B-Instruct 0.367 0.690 0.221 0.122 0.853 0.192 0.392 0.108 0.135 0.442 0.515
Gemma 2 2B 0.348 0.721 0.472 0.316 0.810 0.083 0.124 0.203 0.190 0.388 0.177
Gemma 2 2B IT 0.392 0.862 0.348 0.315 0.879 0.117 0.252 0.207 0.183 0.437 0.321
Gemma 2 9B 0.500 0.904 0.573 0.524 0.898 0.168 0.456 0.269 0.236 0.623 0.345
Gemma 2 9B IT 0.534 0.931 0.532 0.526 0.876 0.149 0.636 0.273 0.239 0.623 0.559
Gemma 2 27B 0.546 0.936 0.553 0.573 0.916 0.194 0.596 0.295 0.251 0.659 0.490
Gemma 2 27B IT 0.567 0.956 0.541 0.576 0.883 0.166 0.704 0.290 0.249 0.670 0.638
Gemma 2 Baku 2B 0.372 0.760 0.475 0.443 0.843 0.121 0.124 0.255 0.187 0.376 0.137
Gemma 2 Baku 2B IT 0.366 0.855 0.228 0.390 0.877 0.115 0.172 0.255 0.190 0.415 0.165
Gemma 2 JPN 0.377 0.845 0.321 0.291 0.877 0.132 0.192 0.204 0.179 0.418 0.311
GPT-3.5 (gpt-3.5-turbo-0125) 0.515 0.922 0.456 0.447 0.893 0.215 0.572 0.287 0.243 0.499 0.616
GPT-4-turbo (gpt-4-turbo-2024-04-09) 0.626 0.971 0.690 0.615 0.878 0.201 0.848 0.295 0.239 0.753 0.773
GPT-4o (gpt-4o-2024-05-13) 0.649 0.979 0.737 0.722 0.892 0.140 0.860 0.314 0.237 0.794 0.813
GPT-4o (gpt-4o-2024-08-06) 0.646 0.982 0.731 0.709 0.889 0.170 0.864 0.314 0.254 0.797 0.752
GPT-4o-mini (gpt-4o-mini-2024-07-18) 0.581 0.961 0.464 0.591 0.902 0.160 0.832 0.299 0.241 0.679 0.675
Llama 3 8B 0.429 0.835 0.436 0.410 0.892 0.177 0.312 0.221 0.206 0.455 0.344
Llama 3 8B Instruct 0.430 0.880 0.417 0.385 0.891 0.126 0.424 0.214 0.202 0.468 0.296
Llama 3 70B 0.569 0.946 0.606 0.589 0.922 0.228 0.664 0.286 0.252 0.705 0.491
Llama 3 70B Instruct 0.578 0.940 0.615 0.557 0.913 0.191 0.716 0.270 0.234 0.680 0.662
Llama-3-ELYZA-JP-8B 0.471 0.897 0.498 0.496 0.906 0.169 0.436 0.250 0.185 0.487 0.388
Llama 3 heron brain 8B v0.3 0.488 0.923 0.493 0.569 0.906 0.218 0.456 0.277 0.217 0.499 0.318
Llama 3 heron brain 70B v0.3 0.615 0.965 0.652 0.679 0.922 0.261 0.772 0.309 0.258 0.707 0.623
Llama 3 Swallow 8B 0.471 0.896 0.478 0.546 0.900 0.198 0.440 0.276 0.222 0.471 0.282
Llama 3 Swallow 8B Instruct 0.480 0.911 0.496 0.517 0.905 0.128 0.492 0.253 0.227 0.481 0.394
Llama 3 Swallow 70B 0.594 0.968 0.675 0.684 0.923 0.239 0.708 0.307 0.255 0.706 0.477
Llama 3 Swallow 70B Instruct 0.571 0.963 0.627 0.598 0.921 0.139 0.672 0.272 0.255 0.657 0.608
Llama 3 Youko 8B 0.442 0.870 0.493 0.513 0.895 0.213 0.276 0.276 0.219 0.449 0.222
Llama 3 Youko 8B Instruct 0.468 0.920 0.481 0.517 0.899 0.209 0.472 0.256 0.191 0.469 0.262
Llama 3 Youko 70B 0.571 0.946 0.602 0.610 0.923 0.242 0.684 0.292 0.250 0.704 0.463
Llama 3 Youko 70B Instruct 0.582 0.952 0.625 0.584 0.921 0.198 0.720 0.263 0.226 0.718 0.610
Llama 3.1 8B 0.437 0.845 0.461 0.405 0.895 0.179 0.356 0.221 0.210 0.479 0.320
Llama 3.1 8B Instruct 0.470 0.880 0.447 0.407 0.886 0.148 0.516 0.218 0.200 0.509 0.488
Llama 3.1 70B 0.566 0.946 0.616 0.603 0.925 0.228 0.672 0.287 0.257 0.669 0.462
Llama 3.1 70B Instruct 0.595 0.950 0.635 0.579 0.921 0.178 0.732 0.279 0.247 0.733 0.696
Llama-3.1-70B-Japanese-Instruct-2407 0.597 0.956 0.647 0.660 0.919 0.156 0.748 0.290 0.241 0.723 0.627
Llama 3.1 Swallow 8B v0.1 0.490 0.912 0.509 0.601 0.899 0.202 0.460 0.291 0.231 0.518 0.276
Llama 3.1 Swallow 8B Instruct v0.1 0.505 0.924 0.587 0.574 0.917 0.138 0.508 0.282 0.228 0.530 0.366
Llama 3.1 Swallow 70B v0.1 0.593 0.955 0.645 0.678 0.923 0.272 0.684 0.320 0.259 0.709 0.487
Llama 3.1 Swallow 70B Instruct v0.1 0.588 0.962 0.621 0.660 0.924 0.192 0.776 0.312 0.259 0.711 0.468
Llama 3.1 Swallow 8B v0.2 0.499 0.911 0.510 0.627 0.892 0.198 0.464 0.296 0.233 0.525 0.336
Llama 3.1 Swallow 8B Instruct v0.2 0.514 0.929 0.560 0.599 0.915 0.137 0.528 0.288 0.227 0.550 0.408
Llama 3.1 Swallow 8B Instruct v0.3 0.510 0.924 0.528 0.583 0.896 0.191 0.532 0.281 0.229 0.544 0.394
Llama 3.1 Swallow 70B Instruct v0.3 0.598 0.964 0.632 0.654 0.910 0.196 0.772 0.305 0.257 0.690 0.596
Llama 3.2 1B 0.201 0.208 0.404 0.188 0.525 0.081 0.024 0.079 0.092 0.260 0.150
Llama 3.2 1B Instruct 0.239 0.397 0.346 0.179 0.570 0.075 0.164 0.070 0.091 0.287 0.207
Llama 3.2 3B 0.337 0.605 0.443 0.324 0.816 0.129 0.136 0.161 0.167 0.352 0.235
Llama 3.2 3B Instruct 0.380 0.783 0.304 0.268 0.846 0.112 0.372 0.173 0.155 0.404 0.387
Llama 3.3 70B Instruct 0.601 0.941 0.640 0.570 0.893 0.179 0.784 0.278 0.243 0.735 0.744
Llama 3.3 Swallow 70B v0.4 0.629 0.967 0.671 0.732 0.924 0.283 0.776 0.327 0.260 0.742 0.604
Llama 3.3 Swallow 70B Instruct v0.4 0.613 0.981 0.618 0.662 0.907 0.162 0.812 0.319 0.261 0.707 0.700
llm-jp-3-1.8b 0.251 0.209 0.463 0.449 0.703 0.100 0.012 0.198 0.134 0.242 0.001
llm-jp-3-1.8b-instruct 0.293 0.324 0.413 0.466 0.837 0.105 0.080 0.206 0.142 0.292 0.061
llm-jp-3-3.7b 0.281 0.203 0.431 0.541 0.804 0.142 0.060 0.223 0.159 0.249 0.000
llm-jp-3-3.7b-instruct 0.350 0.533 0.464 0.528 0.847 0.139 0.152 0.224 0.170 0.359 0.085
llm-jp-3-13b 0.393 0.650 0.525 0.649 0.882 0.164 0.160 0.273 0.210 0.399 0.023
llm-jp-3-13b-instruct 0.436 0.894 0.339 0.638 0.901 0.151 0.324 0.252 0.203 0.468 0.188
Mistral-Nemo-Base-2407 (12B) 0.460 0.911 0.516 0.475 0.904 0.192 0.416 0.244 0.212 0.538 0.194
Mistral-NeMo-Instruct-2407 (12B) 0.500 0.927 0.497 0.484 0.905 0.176 0.552 0.240 0.205 0.548 0.469
Mistral-NeMo-Minitron 8B 0.444 0.887 0.486 0.374 0.902 0.157 0.424 0.186 0.193 0.494 0.332
Mistral-NeMo-Minitron 8B Instruct 0.478 0.892 0.498 0.380 0.578 0.000 0.556 0.199 0.193 0.510 0.496
Mistral-7B-v0.3 0.361 0.714 0.474 0.245 0.847 0.212 0.156 0.142 0.171 0.404 0.242
Mistral-7B-Instruct-v0.3 0.378 0.754 0.447 0.268 0.870 0.205 0.224 0.163 0.177 0.403 0.267
Mixtral-8x22B-v0.1 0.496 0.895 0.512 0.420 0.914 0.241 0.544 0.229 0.229 0.604 0.371
Mixtral-8x22B-Instruct-v0.1 0.532 0.903 0.498 0.446 0.918 0.207 0.696 0.233 0.232 0.602 0.588
Phi-3-Mini-128K-Instruct 0.382 0.720 0.394 0.208 0.832 0.132 0.408 0.150 0.136 0.409 0.428
Phi-4 0.580 0.945 0.608 0.507 0.923 0.219 0.796 0.283 0.231 0.689 0.598
PLaMo 2 1B 0.250 0.203 0.463 0.434 0.626 0.055 0.052 0.236 0.119 0.256 0.057
PLaMo 2 8B 0.481 0.909 0.474 0.655 0.910 0.120 0.508 0.280 0.205 0.536 0.213
Qwen2-7B 0.472 0.875 0.463 0.372 0.899 0.172 0.524 0.209 0.195 0.587 0.422
Qwen2-7B-Instruct 0.478 0.888 0.390 0.379 0.897 0.127 0.576 0.206 0.190 0.571 0.555
Qwen2-72B 0.593 0.960 0.620 0.561 0.926 0.238 0.768 0.275 0.241 0.782 0.561
Qwen2-72B-Instruct 0.598 0.963 0.628 0.557 0.920 0.166 0.780 0.260 0.232 0.771 0.701
Qwen2.5-0.5B 0.234 0.369 0.389 0.139 0.635 0.101 0.076 0.058 0.064 0.304 0.203
Qwen2.5-0.5B-Instruct 0.243 0.382 0.401 0.157 0.687 0.112 0.080 0.095 0.067 0.318 0.135
Qwen2.5-1.5B 0.372 0.800 0.383 0.241 0.849 0.143 0.292 0.132 0.134 0.438 0.308
Qwen2.5-1.5B-Instruct 0.355 0.812 0.276 0.240 0.847 0.128 0.292 0.147 0.119 0.447 0.242
Qwen2.5-3B 0.442 0.847 0.475 0.306 0.878 0.176 0.460 0.180 0.167 0.529 0.404
Qwen2.5-3B-Instruct 0.409 0.876 0.304 0.293 0.866 0.144 0.228 0.198 0.168 0.536 0.474
Qwen2.5-7B 0.512 0.924 0.459 0.426 0.907 0.216 0.616 0.229 0.199 0.634 0.507
Qwen2.5-7B-Instruct 0.498 0.915 0.429 0.391 0.891 0.168 0.632 0.210 0.192 0.623 0.532
Qwen2.5-14B-Instruct 0.553 0.953 0.588 0.519 0.902 0.140 0.680 0.193 0.160 0.708 0.691
Qwen2.5-32B-Instruct 0.571 0.959 0.567 0.497 0.903 0.169 0.780 0.228 0.195 0.757 0.651
Qwen2.5-72B 0.623 0.972 0.611 0.619 0.930 0.279 0.828 0.287 0.252 0.804 0.648
Qwen2.5-72B-Instruct 0.574 0.970 0.569 0.582 0.738 0.170 0.840 0.227 0.218 0.789 0.634
Sarashina2-7B 0.395 0.742 0.509 0.634 0.868 0.141 0.080 0.273 0.201 0.384 0.121
Sarashina2-13B 0.445 0.850 0.557 0.661 0.898 0.158 0.188 0.284 0.221 0.473 0.161
Sarashina2-70B 0.530 0.929 0.717 0.668 0.929 0.190 0.488 0.313 0.243 0.592 0.235
Stockmark-100b 0.238 0.205 0.408 0.557 0.558 0.062 0.008 0.203 0.118 0.235 0.032
Swallow 7B 0.346 0.483 0.511 0.585 0.847 0.182 0.108 0.250 0.149 0.324 0.018
Swallow 13B 0.415 0.764 0.507 0.643 0.893 0.215 0.208 0.272 0.178 0.439 0.027
Swallow 70B 0.519 0.920 0.626 0.689 0.920 0.225 0.480 0.304 0.231 0.579 0.220
Swallow-MS 7B v0.1 0.439 0.873 0.517 0.572 0.879 0.197 0.244 0.251 0.167 0.459 0.232
Swallow-MS-7b-instruct-v0.1 0.394 0.758 0.490 0.446 0.864 0.158 0.172 0.227 0.187 0.419 0.215
Swallow-MX 8x7B v0.1 0.506 0.922 0.533 0.577 0.917 0.263 0.444 0.272 0.209 0.565 0.358
Swallow-7b-instruct-v0.1 0.353 0.599 0.491 0.531 0.837 0.153 0.128 0.228 0.179 0.352 0.027
Swallow-70b-instruct-v0.1 0.492 0.923 0.566 0.565 0.903 0.186 0.420 0.263 0.232 0.571 0.293
Tanuki-8B-dpo-v1.0 0.311 0.278 0.284 0.370 0.670 0.102 0.428 0.238 0.183 0.306 0.251
Tanuki-8x8B-dpo-v1.0 0.454 0.708 0.551 0.612 0.867 0.142 0.456 0.269 0.208 0.439 0.284
TinySwallow-1.5B 0.402 0.840 0.437 0.474 0.839 0.173 0.256 0.201 0.125 0.446 0.231
TinySwallow-1.5B-Instruct 0.398 0.802 0.345 0.447 0.856 0.159 0.308 0.203 0.143 0.461 0.251
Yi-1.5 6B 0.354 0.658 0.380 0.226 0.829 0.198 0.240 0.130 0.147 0.423 0.313
Yi-1.5 9B 0.432 0.834 0.417 0.265 0.894 0.224 0.420 0.174 0.187 0.516 0.391
Yi-1.5 34B 0.468 0.869 0.461 0.332 0.899 0.238 0.520 0.219 0.208 0.591 0.346
モデル名 En avg OpenBookQA TriviaQA HellaSwag SQuAD2 XWINO MMLU GSM8K MATH BBH HumanEval
Aya Expanse 8B 0.539 0.384 0.591 0.605 0.664 0.892 0.628 0.756 0.284 0.590 0.000
Aya Expanse 32B 0.614 0.420 0.757 0.668 0.679 0.912 0.744 0.858 0.344 0.757 0.005
CyberAgentLM3-22B-chat 0.527 0.372 0.619 0.598 0.603 0.905 0.603 0.698 0.274 0.599 0.000
Falcon3-1B-Base 0.376 0.316 0.296 0.458 0.501 0.816 0.449 0.337 0.140 0.323 0.125
Falcon3-1B-Instruct 0.381 0.344 0.261 0.480 0.501 0.815 0.459 0.391 0.130 0.330 0.101
Falcon3-3B-Base 0.495 0.312 0.346 0.492 0.503 0.847 0.567 0.634 0.344 0.553 0.348
Falcon3-3B-Instruct 0.526 0.372 0.286 0.541 0.513 0.818 0.562 0.712 0.440 0.562 0.454
Falcon3-7B-Base 0.596 0.354 0.552 0.566 0.539 0.881 0.701 0.766 0.438 0.692 0.476
Falcon3-7B-Instruct 0.618 0.394 0.517 0.611 0.525 0.855 0.705 0.773 0.542 0.711 0.551
Falcon3-10B-Base 0.639 0.368 0.579 0.596 0.603 0.901 0.732 0.802 0.492 0.776 0.543
Falcon3-10B-Instruct 0.633 0.424 0.503 0.640 0.549 0.875 0.730 0.793 0.462 0.729 0.627
Gemma 2 2B 0.439 0.342 0.552 0.552 0.501 0.890 0.530 0.249 0.176 0.415 0.188
Gemma 2 2B IT 0.489 0.354 0.502 0.520 0.548 0.878 0.569 0.440 0.230 0.464 0.382
Gemma 2 9B 0.597 0.382 0.718 0.626 0.506 0.907 0.706 0.688 0.338 0.704 0.390
Gemma 2 9B IT 0.649 0.432 0.658 0.605 0.659 0.904 0.723 0.779 0.394 0.719 0.613
Gemma 2 27B 0.655 0.412 0.780 0.675 0.549 0.921 0.754 0.757 0.438 0.760 0.508
Gemma 2 27B IT 0.703 0.458 0.766 0.655 0.669 0.909 0.762 0.851 0.466 0.790 0.707
Gemma 2 Baku 2B 0.400 0.314 0.475 0.533 0.501 0.881 0.493 0.168 0.110 0.376 0.150
Gemma 2 Baku 2B IT 0.361 0.342 0.416 0.511 0.522 0.871 0.526 0.026 0.174 0.063 0.158
Gemma 2 JPN 0.470 0.370 0.503 0.532 0.539 0.879 0.557 0.351 0.132 0.451 0.392
GPT-3.5 (gpt-3.5-turbo-0125)
GPT-4-turbo (gpt-4-turbo-2024-04-09)
GPT-4o (gpt-4o-2024-05-13)
GPT-4o (gpt-4o-2024-08-06)
GPT-4o-mini (gpt-4o-mini-2024-07-18)
Llama 3 8B 0.542 0.380 0.712 0.612 0.502 0.905 0.651 0.487 0.180 0.620 0.376
Llama 3 8B Instruct 0.605 0.388 0.670 0.583 0.611 0.892 0.657 0.745 0.306 0.646 0.554
Llama 3 70B 0.689 0.440 0.826 0.690 0.618 0.920 0.787 0.801 0.446 0.829 0.527
Llama 3 70B Instruct 0.729 0.438 0.800 0.655 0.696 0.914 0.800 0.909 0.474 0.833 0.774
Llama-3-ELYZA-JP-8B 0.495 0.318 0.551 0.523 0.600 0.882 0.587 0.558 0.164 0.321 0.449
Llama 3 heron brain 8B v0.3 0.551 0.362 0.656 0.569 0.581 0.901 0.622 0.578 0.222 0.641 0.381
Llama 3 heron brain 70B v0.3 0.715 0.446 0.811 0.668 0.706 0.919 0.790 0.877 0.508 0.759 0.668
Llama 3 Swallow 8B 0.523 0.350 0.656 0.590 0.519 0.901 0.615 0.483 0.182 0.598 0.337
Llama 3 Swallow 8B Instruct 0.560 0.370 0.655 0.585 0.567 0.899 0.633 0.592 0.244 0.639 0.419
Llama 3 Swallow 70B 0.672 0.430 0.823 0.682 0.628 0.923 0.774 0.817 0.414 0.734 0.499
Llama 3 Swallow 70B Instruct 0.716 0.446 0.818 0.676 0.681 0.923 0.789 0.868 0.460 0.816 0.680
Llama 3 Youko 8B 0.486 0.348 0.625 0.589 0.502 0.896 0.601 0.355 0.096 0.571 0.281
Llama 3 Youko 8B Instruct 0.507 0.406 0.613 0.599 0.559 0.897 0.597 0.562 0.152 0.402 0.287
Llama 3 Youko 70B 0.671 0.436 0.829 0.690 0.610 0.922 0.785 0.797 0.408 0.826 0.412
Llama 3 Youko 70B Instruct 0.709 0.454 0.797 0.686 0.659 0.915 0.805 0.892 0.434 0.780 0.662
Llama 3.1 8B 0.545 0.380 0.702 0.609 0.503 0.907 0.651 0.507 0.214 0.616 0.364
Llama 3.1 8B Instruct 0.627 0.366 0.699 0.592 0.600 0.904 0.680 0.743 0.376 0.690 0.624
Llama 3.1 70B 0.671 0.450 0.829 0.690 0.605 0.920 0.786 0.798 0.434 0.655 0.546
Llama 3.1 70B Instruct 0.738 0.426 0.821 0.662 0.660 0.917 0.822 0.876 0.560 0.842 0.794
Llama-3.1-70B-Japanese-Instruct-2407 0.725 0.422 0.810 0.647 0.663 0.917 0.807 0.889 0.528 0.823 0.746
Llama 3.1 Swallow 8B v0.1 0.538 0.378 0.671 0.605 0.502 0.905 0.624 0.511 0.224 0.615 0.348
Llama 3.1 Swallow 8B Instruct v0.1 0.563 0.388 0.649 0.615 0.598 0.891 0.624 0.605 0.236 0.642 0.379
Llama 3.1 Swallow 70B v0.1 0.679 0.428 0.826 0.690 0.612 0.927 0.772 0.809 0.380 0.806 0.540
Llama 3.1 Swallow 70B Instruct v0.1 0.710 0.446 0.815 0.683 0.681 0.917 0.787 0.884 0.474 0.848 0.568
Llama 3.1 Swallow 8B v0.2 0.539 0.382 0.651 0.596 0.513 0.904 0.622 0.521 0.228 0.605 0.366
Llama 3.1 Swallow 8B Instruct v0.2 0.574 0.380 0.625 0.603 0.607 0.887 0.634 0.620 0.264 0.649 0.474
Llama 3.1 Swallow 8B Instruct v0.3 0.566 0.396 0.629 0.593 0.570 0.884 0.629 0.622 0.266 0.626 0.445
Llama 3.1 Swallow 70B Instruct v0.3 0.710 0.454 0.825 0.692 0.647 0.919 0.777 0.872 0.458 0.816 0.643
Llama 3.2 1B 0.339 0.300 0.388 0.477 0.501 0.849 0.313 0.049 0.020 0.303 0.193
Llama 3.2 1B Instruct 0.408 0.274 0.375 0.440 0.501 0.837 0.454 0.318 0.172 0.362 0.347
Llama 3.2 3B 0.450 0.326 0.586 0.558 0.502 0.888 0.558 0.262 0.070 0.466 0.285
Llama 3.2 3B Instruct 0.537 0.306 0.556 0.524 0.540 0.874 0.597 0.629 0.324 0.512 0.511
Llama 3.3 70B Instruct 0.762 0.426 0.817 0.667 0.684 0.917 0.824 0.890 0.706 0.853 0.834
Llama 3.3 Swallow 70B v0.4 0.711 0.424 0.817 0.683 0.641 0.920 0.802 0.863 0.496 0.754 0.709
Llama 3.3 Swallow 70B Instruct v0.4 0.736 0.448 0.817 0.686 0.654 0.912 0.803 0.907 0.566 0.812 0.750
llm-jp-3-1.8b 0.293 0.244 0.301 0.462 0.501 0.851 0.248 0.017 0.018 0.276 0.008
llm-jp-3-1.8b-instruct 0.313 0.286 0.296 0.485 0.502 0.847 0.277 0.043 0.016 0.290 0.087
llm-jp-3-3.7b 0.324 0.280 0.421 0.506 0.502 0.876 0.253 0.055 0.016 0.309 0.019
llm-jp-3-3.7b-instruct 0.347 0.310 0.398 0.534 0.503 0.862 0.349 0.071 0.022 0.324 0.099
llm-jp-3-13b 0.399 0.332 0.602 0.570 0.501 0.902 0.462 0.158 0.026 0.402 0.032
llm-jp-3-13b-instruct 0.432 0.342 0.534 0.594 0.516 0.892 0.506 0.243 0.046 0.438 0.205
Mistral-Nemo-Base-2407 (12B) 0.559 0.422 0.741 0.647 0.528 0.914 0.690 0.550 0.184 0.657 0.259
Mistral-NeMo-Instruct-2407 (12B) 0.608 0.406 0.726 0.645 0.606 0.911 0.683 0.721 0.274 0.537 0.571
Mistral-NeMo-Minitron 8B 0.572 0.406 0.728 0.621 0.525 0.915 0.694 0.585 0.202 0.658 0.382
Mistral-NeMo-Minitron 8B Instruct 0.634 0.452 0.719 0.639 0.624 0.909 0.701 0.754 0.274 0.663 0.601
Mistral-7B-v0.3 0.507 0.374 0.695 0.622 0.511 0.909 0.623 0.361 0.116 0.585 0.273
Mistral-7B-Instruct-v0.3 0.541 0.408 0.677 0.652 0.576 0.905 0.621 0.500 0.160 0.563 0.346
Mixtral-8x22B-v0.1 0.652 0.420 0.833 0.696 0.593 0.919 0.772 0.754 0.414 0.811 0.309
Mixtral-8x22B-Instruct-v0.1 0.720 0.450 0.827 0.708 0.676 0.920 0.774 0.832 0.456 0.830 0.723
Phi-3-Mini-128K-Instruct 0.615 0.422 0.526 0.605 0.559 0.871 0.695 0.759 0.368 0.711 0.627
Phi-4 0.677 0.378 0.682 0.647 0.646 0.903 0.802 0.899 0.556 0.654 0.601
PLaMo 2 1B 0.274 0.280 0.129 0.425 0.501 0.807 0.294 0.072 0.034 0.122 0.080
PLaMo 2 8B 0.474 0.346 0.584 0.560 0.511 0.890 0.575 0.550 0.200 0.260 0.260
Qwen2-7B 0.602 0.374 0.610 0.602 0.574 0.891 0.705 0.781 0.492 0.530 0.460
Qwen2-7B-Instruct 0.582 0.396 0.547 0.615 0.593 0.886 0.707 0.626 0.504 0.304 0.643
Qwen2-72B 0.702 0.418 0.790 0.677 0.673 0.915 0.842 0.893 0.560 0.643 0.608
Qwen2-72B-Instruct 0.669 0.444 0.759 0.685 0.685 0.911 0.840 0.848 0.634 0.193 0.688
Qwen2.5-0.5B 0.365 0.266 0.190 0.399 0.501 0.768 0.479 0.341 0.148 0.277 0.277
Qwen2.5-0.5B-Instruct 0.336 0.272 0.184 0.398 0.501 0.767 0.471 0.190 0.236 0.105 0.240
Qwen2.5-1.5B 0.490 0.342 0.397 0.499 0.506 0.851 0.610 0.611 0.314 0.413 0.356
Qwen2.5-1.5B-Instruct 0.424 0.334 0.378 0.503 0.501 0.844 0.604 0.257 0.272 0.272 0.277
Qwen2.5-3B 0.534 0.360 0.504 0.553 0.541 0.872 0.657 0.580 0.440 0.442 0.387
Qwen2.5-3B-Instruct 0.472 0.364 0.446 0.562 0.504 0.869 0.664 0.096 0.612 0.128 0.471
Qwen2.5-7B 0.630 0.392 0.601 0.600 0.618 0.888 0.742 0.832 0.510 0.562 0.554
Qwen2.5-7B-Instruct 0.604 0.428 0.519 0.624 0.569 0.877 0.742 0.739 0.688 0.217 0.636
Qwen2.5-14B-Instruct 0.614 0.438 0.592 0.656 0.680 0.890 0.800 0.761 0.666 0.029 0.632
Qwen2.5-32B-Instruct 0.588 0.424 0.534 0.671 0.536 0.893 0.834 0.581 0.802 0.017 0.589
Qwen2.5-72B 0.709 0.416 0.760 0.685 0.693 0.901 0.861 0.870 0.626 0.727 0.554
Qwen2.5-72B-Instruct 0.691 0.454 0.676 0.706 0.677 0.889 0.848 0.904 0.770 0.375 0.614
Sarashina2-7B 0.383 0.346 0.479 0.532 0.501 0.892 0.425 0.101 0.034 0.373 0.146
Sarashina2-13B 0.418 0.340 0.548 0.562 0.501 0.896 0.496 0.158 0.036 0.442 0.198
Sarashina2-70B 0.491 0.388 0.537 0.628 0.675 0.917 0.630 0.011 0.206 0.639 0.281
Stockmark-100b 0.302 0.278 0.366 0.458 0.501 0.820 0.258 0.017 0.014 0.259 0.046
Swallow 7B 0.363 0.312 0.491 0.527 0.501 0.885 0.391 0.103 0.020 0.354 0.041
Swallow 13B 0.412 0.344 0.580 0.560 0.502 0.902 0.501 0.197 0.024 0.430 0.080
Swallow 70B 0.543 0.416 0.761 0.643 0.522 0.920 0.659 0.503 0.108 0.655 0.240
Swallow-MS 7B v0.1 0.461 0.352 0.599 0.579 0.501 0.901 0.548 0.268 0.096 0.491 0.270
Swallow-MS-7b-instruct-v0.1 0.436 0.360 0.500 0.587 0.510 0.886 0.526 0.215 0.082 0.441 0.256
Swallow-MX 8x7B v0.1 0.589 0.348 0.773 0.651 0.538 0.919 0.692 0.574 0.298 0.686 0.410
Swallow-7b-instruct-v0.1 0.376 0.330 0.481 0.550 0.501 0.880 0.407 0.124 0.034 0.359 0.094
Swallow-70b-instruct-v0.1 0.556 0.446 0.742 0.656 0.571 0.917 0.668 0.509 0.108 0.664 0.281
Tanuki-8B-dpo-v1.0 0.406 0.334 0.283 0.469 0.501 0.816 0.377 0.487 0.178 0.333 0.288
Tanuki-8x8B-dpo-v1.0 0.464 0.348 0.481 0.555 0.521 0.850 0.493 0.544 0.236 0.419 0.193
TinySwallow-1.5B 0.413 0.308 0.332 0.468 0.501 0.850 0.546 0.379 0.162 0.328 0.254
TinySwallow-1.5B-Instruct 0.411 0.310 0.309 0.487 0.501 0.843 0.560 0.398 0.162 0.251 0.294
Yi-1.5 6B 0.540 0.344 0.593 0.575 0.651 0.898 0.636 0.522 0.244 0.583 0.352
Yi-1.5 9B 0.592 0.390 0.619 0.601 0.693 0.902 0.696 0.620 0.300 0.710 0.384
Yi-1.5 34B 0.650 0.402 0.708 0.662 0.754 0.910 0.774 0.743 0.394 0.763 0.385
モデル名 JMT avg Code Ext Human Math Reason Role STEM Write
Aya Expanse 8B 0.637 0.494 0.718 0.855 0.398 0.433 0.737 0.677 0.787
Aya Expanse 32B 0.713 0.548 0.720 0.846 0.657 0.602 0.824 0.712 0.794
CyberAgentLM3-22B-chat 0.691 0.519 0.744 0.859 0.605 0.548 0.784 0.700 0.772
Falcon3-1B-Base
Falcon3-1B-Instruct 0.161 0.176 0.178 0.121 0.161 0.224 0.154 0.124 0.148
Falcon3-3B-Base
Falcon3-3B-Instruct 0.260 0.329 0.392 0.219 0.199 0.267 0.234 0.229 0.208
Falcon3-7B-Base
Falcon3-7B-Instruct 0.377 0.549 0.506 0.340 0.406 0.257 0.299 0.340 0.317
Falcon3-10B-Base
Falcon3-10B-Instruct 0.413 0.509 0.545 0.382 0.480 0.356 0.335 0.373 0.324
Gemma 2 2B
Gemma 2 2B IT 0.569 0.454 0.587 0.693 0.524 0.445 0.654 0.567 0.630
Gemma 2 9B
Gemma 2 9B IT 0.736 0.652 0.765 0.857 0.614 0.673 0.811 0.713 0.800
Gemma 2 27B
Gemma 2 27B IT 0.768 0.727 0.809 0.874 0.719 0.639 0.810 0.740 0.826
Gemma 2 Baku 2B
Gemma 2 Baku 2B IT 0.590 0.470 0.625 0.810 0.414 0.382 0.713 0.609 0.697
Gemma 2 JPN 0.550 0.467 0.488 0.741 0.379 0.406 0.660 0.589 0.672
GPT-3.5 (gpt-3.5-turbo-0125) 0.691 0.693 0.789 0.773 0.665 0.462 0.728 0.644 0.775
GPT-4-turbo (gpt-4-turbo-2024-04-09) 0.837 0.842 0.891 0.863 0.865 0.673 0.861 0.844 0.854
GPT-4o (gpt-4o-2024-05-13) 0.848 0.859 0.930 0.882 0.917 0.631 0.858 0.858 0.851
GPT-4o (gpt-4o-2024-08-06) 0.848 0.855 0.926 0.880 0.872 0.706 0.862 0.838 0.849
GPT-4o-mini (gpt-4o-mini-2024-07-18) 0.824 0.825 0.865 0.857 0.843 0.665 0.846 0.855 0.840
Llama 3 8B
Llama 3 8B Instruct 0.529 0.467 0.706 0.692 0.310 0.433 0.542 0.532 0.546
Llama 3 70B
Llama 3 70B Instruct 0.640 0.588 0.884 0.715 0.637 0.487 0.594 0.598 0.619
Llama-3-ELYZA-JP-8B 0.587 0.389 0.706 0.647 0.426 0.613 0.684 0.533 0.697
Llama 3 heron brain 8B v0.3 0.497 0.362 0.566 0.602 0.315 0.426 0.586 0.567 0.550
Llama 3 heron brain 70B v0.3 0.683 0.510 0.870 0.776 0.680 0.513 0.727 0.692 0.693
Llama 3 Swallow 8B
Llama 3 Swallow 8B Instruct 0.427 0.411 0.575 0.476 0.309 0.305 0.499 0.438 0.406
Llama 3 Swallow 70B
Llama 3 Swallow 70B Instruct 0.618 0.633 0.823 0.601 0.521 0.482 0.622 0.635 0.630
Llama 3 Youko 8B
Llama 3 Youko 8B Instruct 0.616 0.464 0.757 0.769 0.414 0.487 0.695 0.583 0.753
Llama 3 Youko 70B
Llama 3 Youko 70B Instruct 0.750 0.607 0.894 0.834 0.609 0.673 0.790 0.764 0.829
Llama 3.1 8B
Llama 3.1 8B Instruct 0.519 0.420 0.830 0.550 0.514 0.349 0.502 0.479 0.504
Llama 3.1 70B
Llama 3.1 70B Instruct 0.706 0.691 0.848 0.730 0.669 0.618 0.699 0.699 0.694
Llama-3.1-70B-Japanese-Instruct-2407 0.751 0.683 0.827 0.824 0.749 0.643 0.818 0.715 0.751
Llama 3.1 Swallow 8B v0.1
Llama 3.1 Swallow 8B Instruct v0.1 0.581 0.427 0.738 0.675 0.527 0.453 0.615 0.593 0.624
Llama 3.1 Swallow 70B v0.1
Llama 3.1 Swallow 70B Instruct v0.1 0.691 0.654 0.792 0.768 0.704 0.573 0.682 0.653 0.704
Llama 3.1 Swallow 8B v0.2
Llama 3.1 Swallow 8B Instruct v0.2 0.612 0.534 0.748 0.705 0.565 0.475 0.646 0.579 0.646
Llama 3.1 Swallow 8B Instruct v0.3 0.705 0.562 0.756 0.869 0.610 0.512 0.783 0.748 0.803
Llama 3.1 Swallow 70B Instruct v0.3 0.769 0.678 0.820 0.867 0.776 0.570 0.816 0.769 0.852
Llama 3.2 1B
Llama 3.2 1B Instruct 0.273 0.254 0.376 0.218 0.307 0.267 0.262 0.246 0.258
Llama 3.2 3B
Llama 3.2 3B Instruct 0.405 0.426 0.593 0.431 0.389 0.292 0.350 0.380 0.380
Llama 3.3 70B Instruct 0.737 0.707 0.865 0.757 0.720 0.635 0.773 0.706 0.733
Llama 3.3 Swallow 70B v0.4
Llama 3.3 Swallow 70B Instruct v0.4 0.772 0.705 0.820 0.870 0.730 0.623 0.811 0.781 0.832
llm-jp-3-1.8b
llm-jp-3-1.8b-instruct 0.451 0.274 0.321 0.680 0.281 0.301 0.628 0.504 0.617
llm-jp-3-3.7b
llm-jp-3-3.7b-instruct 0.485 0.311 0.418 0.730 0.311 0.339 0.618 0.551 0.600
llm-jp-3-13b
llm-jp-3-13b-instruct 0.588 0.373 0.556 0.816 0.371 0.526 0.730 0.614 0.715
Mistral-Nemo-Base-2407 (12B)
Mistral-NeMo-Instruct-2407 (12B) 0.616 0.515 0.698 0.702 0.512 0.481 0.669 0.660 0.691
Mistral-NeMo-Minitron 8B
Mistral-NeMo-Minitron 8B Instruct 0.567 0.547 0.684 0.649 0.545 0.454 0.564 0.549 0.541
Mistral-7B-v0.3
Mistral-7B-Instruct-v0.3 0.428 0.488 0.540 0.435 0.354 0.392 0.409 0.405 0.401
Mixtral-8x22B-v0.1
Mixtral-8x22B-Instruct-v0.1 0.622 0.591 0.797 0.606 0.585 0.557 0.618 0.565 0.658
Phi-3-Mini-128K-Instruct 0.524 0.535 0.680 0.553 0.514 0.416 0.505 0.465 0.525
Phi-4 0.769 0.692 0.929 0.795 0.914 0.544 0.754 0.688 0.840
PLaMo 2 1B
PLaMo 2 8B
Qwen2-7B
Qwen2-7B-Instruct 0.646 0.512 0.771 0.719 0.687 0.514 0.683 0.563 0.717
Qwen2-72B
Qwen2-72B-Instruct 0.756 0.632 0.800 0.842 0.688 0.616 0.824 0.797 0.846
Qwen2.5-0.5B
Qwen2.5-0.5B-Instruct 0.294 0.335 0.284 0.285 0.317 0.248 0.294 0.279 0.313
Qwen2.5-1.5B
Qwen2.5-1.5B-Instruct 0.450 0.408 0.513 0.456 0.527 0.352 0.473 0.406 0.469
Qwen2.5-3B
Qwen2.5-3B-Instruct 0.593 0.567 0.647 0.597 0.665 0.457 0.649 0.526 0.637
Qwen2.5-7B
Qwen2.5-7B-Instruct 0.665 0.599 0.741 0.719 0.637 0.541 0.744 0.624 0.713
Qwen2.5-14B-Instruct 0.762 0.673 0.829 0.798 0.828 0.571 0.815 0.743 0.841
Qwen2.5-32B-Instruct 0.809 0.724 0.885 0.816 0.918 0.726 0.834 0.763 0.808
Qwen2.5-72B
Qwen2.5-72B-Instruct 0.835 0.795 0.860 0.865 0.857 0.784 0.863 0.804 0.854
Sarashina2-7B
Sarashina2-13B
Sarashina2-70B
Stockmark-100b
Swallow 7B
Swallow 13B
Swallow 70B
Swallow-MS 7B v0.1
Swallow-MS-7b-instruct-v0.1 0.400 0.358 0.421 0.501 0.222 0.349 0.458 0.444 0.449
Swallow-MX 8x7B v0.1
Swallow-7b-instruct-v0.1 0.419 0.324 0.401 0.519 0.275 0.344 0.535 0.494 0.462
Swallow-70b-instruct-v0.1 0.509 0.381 0.604 0.568 0.464 0.402 0.583 0.557 0.510
Tanuki-8B-dpo-v1.0 0.529 0.461 0.597 0.562 0.495 0.377 0.589 0.509 0.643
Tanuki-8x8B-dpo-v1.0 0.546 0.513 0.489 0.624 0.557 0.445 0.604 0.547 0.594
TinySwallow-1.5B
TinySwallow-1.5B-Instruct 0.565 0.434 0.572 0.772 0.453 0.392 0.645 0.610 0.643
Yi-1.5 6B
Yi-1.5 9B
Yi-1.5 34B