Model name Ja avg JComQA JEMHQA NIILC JSQuAD XL-Sum MGSM En-Ja Ja-En JMMLU JHumanEval
Aya Expanse 32B 0.512 0.965 0.554 0.586 0.812 0.295 0.716 0.287 0.245 0.655 0.001
CyberAgentLM3-22B-chat 0.471 0.934 0.510 0.648 0.911 0.104 0.576 0.275 0.215 0.541 0.001
Gemma 2 27B IT 0.567 0.956 0.541 0.576 0.883 0.166 0.704 0.290 0.249 0.670 0.638
GPT-3.5 (gpt-3.5-turbo-0125) 0.515 0.922 0.456 0.447 0.893 0.215 0.572 0.287 0.243 0.499 0.616
GPT-4-turbo (gpt-4-turbo-2024-04-09) 0.626 0.971 0.690 0.615 0.878 0.201 0.848 0.295 0.239 0.753 0.773
GPT-4o (gpt-4o-2024-05-13) 0.649 0.979 0.737 0.722 0.892 0.140 0.860 0.314 0.237 0.794 0.813
GPT-4o (gpt-4o-2024-08-06) 0.646 0.982 0.731 0.709 0.889 0.170 0.864 0.314 0.254 0.797 0.752
GPT-4o-mini (gpt-4o-mini-2024-07-18) 0.581 0.961 0.464 0.591 0.902 0.160 0.832 0.299 0.241 0.679 0.675
Llama 3 70B Instruct 0.578 0.940 0.615 0.557 0.913 0.191 0.716 0.270 0.234 0.680 0.662
Llama 3 heron brain 70B v0.3 0.615 0.965 0.652 0.679 0.922 0.261 0.772 0.309 0.258 0.707 0.623
Llama 3 Swallow 70B Instruct 0.571 0.963 0.627 0.598 0.921 0.139 0.672 0.272 0.255 0.657 0.608
Llama 3 Youko 70B Instruct 0.582 0.952 0.625 0.584 0.921 0.198 0.720 0.263 0.226 0.718 0.610
Llama 3.1 70B Instruct 0.595 0.950 0.635 0.579 0.921 0.178 0.732 0.279 0.247 0.733 0.696
Llama-3.1-70B-Japanese-Instruct-2407 0.597 0.956 0.647 0.660 0.919 0.156 0.748 0.290 0.241 0.723 0.627
Llama 3.1 Swallow 70B Instruct v0.1 0.588 0.962 0.621 0.660 0.924 0.192 0.776 0.312 0.259 0.711 0.468
Llama 3.1 Swallow 70B Instruct v0.3 0.598 0.964 0.632 0.654 0.910 0.196 0.772 0.305 0.257 0.690 0.596
Llama 3.3 70B Instruct 0.601 0.941 0.640 0.570 0.893 0.179 0.784 0.278 0.243 0.735 0.744
Llama 3.3 Swallow 70B Instruct v0.4 0.613 0.981 0.618 0.662 0.907 0.162 0.812 0.319 0.261 0.707 0.700
Qwen2-72B-Instruct 0.598 0.963 0.628 0.557 0.920 0.166 0.780 0.260 0.232 0.771 0.701
Qwen2.5-32B-Instruct 0.571 0.959 0.567 0.497 0.903 0.169 0.780 0.228 0.195 0.757 0.651
Qwen2.5-72B-Instruct 0.574 0.970 0.569 0.582 0.738 0.170 0.840 0.227 0.218 0.789 0.634
Swallow-70b-instruct-v0.1 0.492 0.923 0.566 0.565 0.903 0.186 0.420 0.263 0.232 0.571 0.293
Tanuki-8x8B-dpo-v1.0 0.454 0.708 0.551 0.612 0.867 0.142 0.456 0.269 0.208 0.439 0.284
Model name En avg OpenBookQA TriviaQA HellaSwag SQuAD2 XWINO MMLU GSM8K MATH BBH HumanEval
Aya Expanse 32B 0.614 0.420 0.757 0.668 0.679 0.912 0.744 0.858 0.344 0.757 0.005
CyberAgentLM3-22B-chat 0.527 0.372 0.619 0.598 0.603 0.905 0.603 0.698 0.274 0.599 0.000
Gemma 2 27B IT 0.703 0.458 0.766 0.655 0.669 0.909 0.762 0.851 0.466 0.790 0.707
GPT-3.5 (gpt-3.5-turbo-0125)
GPT-4-turbo (gpt-4-turbo-2024-04-09)
GPT-4o (gpt-4o-2024-05-13)
GPT-4o (gpt-4o-2024-08-06)
GPT-4o-mini (gpt-4o-mini-2024-07-18)
Llama 3 70B Instruct 0.729 0.438 0.800 0.655 0.696 0.914 0.800 0.909 0.474 0.833 0.774
Llama 3 heron brain 70B v0.3 0.715 0.446 0.811 0.668 0.706 0.919 0.790 0.877 0.508 0.759 0.668
Llama 3 Swallow 70B Instruct 0.716 0.446 0.818 0.676 0.681 0.923 0.789 0.868 0.460 0.816 0.680
Llama 3 Youko 70B Instruct 0.709 0.454 0.797 0.686 0.659 0.915 0.805 0.892 0.434 0.780 0.662
Llama 3.1 70B Instruct 0.738 0.426 0.821 0.662 0.660 0.917 0.822 0.876 0.560 0.842 0.794
Llama-3.1-70B-Japanese-Instruct-2407 0.725 0.422 0.810 0.647 0.663 0.917 0.807 0.889 0.528 0.823 0.746
Llama 3.1 Swallow 70B Instruct v0.1 0.710 0.446 0.815 0.683 0.681 0.917 0.787 0.884 0.474 0.848 0.568
Llama 3.1 Swallow 70B Instruct v0.3 0.710 0.454 0.825 0.692 0.647 0.919 0.777 0.872 0.458 0.816 0.643
Llama 3.3 70B Instruct 0.762 0.426 0.817 0.667 0.684 0.917 0.824 0.890 0.706 0.853 0.834
Llama 3.3 Swallow 70B Instruct v0.4 0.736 0.448 0.817 0.686 0.654 0.912 0.803 0.907 0.566 0.812 0.750
Qwen2-72B-Instruct 0.669 0.444 0.759 0.685 0.685 0.911 0.840 0.848 0.634 0.193 0.688
Qwen2.5-32B-Instruct 0.588 0.424 0.534 0.671 0.536 0.893 0.834 0.581 0.802 0.017 0.589
Qwen2.5-72B-Instruct 0.691 0.454 0.676 0.706 0.677 0.889 0.848 0.904 0.770 0.375 0.614
Swallow-70b-instruct-v0.1 0.556 0.446 0.742 0.656 0.571 0.917 0.668 0.509 0.108 0.664 0.281
Tanuki-8x8B-dpo-v1.0 0.464 0.348 0.481 0.555 0.521 0.850 0.493 0.544 0.236 0.419 0.193
Model name JMT avg Code Ext Human Math Reason Role STEM Write
Aya Expanse 32B 0.713 0.548 0.720 0.846 0.657 0.602 0.824 0.712 0.794
CyberAgentLM3-22B-chat 0.691 0.519 0.744 0.859 0.605 0.548 0.784 0.700 0.772
Gemma 2 27B IT 0.768 0.727 0.809 0.874 0.719 0.639 0.810 0.740 0.826
GPT-3.5 (gpt-3.5-turbo-0125) 0.691 0.693 0.789 0.773 0.665 0.462 0.728 0.644 0.775
GPT-4-turbo (gpt-4-turbo-2024-04-09) 0.837 0.842 0.891 0.863 0.865 0.673 0.861 0.844 0.854
GPT-4o (gpt-4o-2024-05-13) 0.848 0.859 0.930 0.882 0.917 0.631 0.858 0.858 0.851
GPT-4o (gpt-4o-2024-08-06) 0.848 0.855 0.926 0.880 0.872 0.706 0.862 0.838 0.849
GPT-4o-mini (gpt-4o-mini-2024-07-18) 0.824 0.825 0.865 0.857 0.843 0.665 0.846 0.855 0.840
Llama 3 70B Instruct 0.640 0.588 0.884 0.715 0.637 0.487 0.594 0.598 0.619
Llama 3 heron brain 70B v0.3 0.683 0.510 0.870 0.776 0.680 0.513 0.727 0.692 0.693
Llama 3 Swallow 70B Instruct 0.618 0.633 0.823 0.601 0.521 0.482 0.622 0.635 0.630
Llama 3 Youko 70B Instruct 0.750 0.607 0.894 0.834 0.609 0.673 0.790 0.764 0.829
Llama 3.1 70B Instruct 0.706 0.691 0.848 0.730 0.669 0.618 0.699 0.699 0.694
Llama-3.1-70B-Japanese-Instruct-2407 0.751 0.683 0.827 0.824 0.749 0.643 0.818 0.715 0.751
Llama 3.1 Swallow 70B Instruct v0.1 0.691 0.654 0.792 0.768 0.704 0.573 0.682 0.653 0.704
Llama 3.1 Swallow 70B Instruct v0.3 0.769 0.678 0.820 0.867 0.776 0.570 0.816 0.769 0.852
Llama 3.3 70B Instruct 0.737 0.707 0.865 0.757 0.720 0.635 0.773 0.706 0.733
Llama 3.3 Swallow 70B Instruct v0.4 0.772 0.705 0.820 0.870 0.730 0.623 0.811 0.781 0.832
Qwen2-72B-Instruct 0.756 0.632 0.800 0.842 0.688 0.616 0.824 0.797 0.846
Qwen2.5-32B-Instruct 0.809 0.724 0.885 0.816 0.918 0.726 0.834 0.763 0.808
Qwen2.5-72B-Instruct 0.835 0.795 0.860 0.865 0.857 0.784 0.863 0.804 0.854
Swallow-70b-instruct-v0.1 0.509 0.381 0.604 0.568 0.464 0.402 0.583 0.557 0.510
Tanuki-8x8B-dpo-v1.0 0.546 0.513 0.489 0.624 0.557 0.445 0.604 0.547 0.594