NNP Arena

evaluating NNP methods against coupled cluster & experimental data and other tests

Molecular Energy

Name
UMA Medium 1.1 (OMol task)	3.75	0.21	0.14	0.88	1.32
OMol25's eSEN Conserving Small	3.86	0.21	0.14	0.91	1.35
UMA Small 1.1 (OMol task)	3.90	0.21	0.14	0.90	1.37
OrbMol (Orb-v3 Conservative OMol)	4.13	0.22	0.15	0.94	1.45
B97-3c	10.33	0.30	0.35	2.32	3.52
AIMNet2 (ωB97M-D3, new)	14.46	0.54	0.39	2.35	4.90
AIMNet2-NSE	16.37	0.55	0.41	3.03	5.53
MACE-Osaka24-large-D3BJ	18.74	0.38	0.28	4.88	6.24
Prescient's StrainRelief MACE	19.89	0.57	0.61	4.46	6.75
GFN2-xTB	18.94	0.72	0.73	14.60	6.91
Orb-v3 (Conservative Inf. OMat)	21.44	0.88	0.97	7.70	7.56
eSEN-OAM	22.50	0.84	0.69	7.05	7.78
MACE-MP-0b2(Large)-D3BJ	27.14	0.81	1.12	14.60	9.61

Overall scores are a weighted average of individual benchmark results.

The "Overall" scores were calculated using a multi-step process. First, we assigned difficulty scores to each component benchmark by computing the ratio of MAEs between GFN2-xTB and B3LYP. We then weighted each benchmark result by multiplying it by two factors: its difficulty score and the number of systems under study. Finally, we computed a weighted average of these adjusted values to generate the "Overall" performance metric.

Benchmark	Weight
GMTKN55	0.31
Folmsbee	0.38
TorsionNet206	0.27
Wiggle150	0.04

The GMTKN55 WTMAD-2 shown here is filtered to only include exclusively neutral, singlet, and elemental-organic subsets.

Molecular Optimization

Name
AIMNet2 (ωB97M-D3, new)	25	12.88	21	0.16
AIMNet2-NSE	24	100.88	14	0.67
eSEN-OAM	24	109.46	19	0.21
OMol25's eSEN Conserving Small	24	100.88	18	0.25
UMA Small 1.1 (OMol task)	24	101.21	17	0.38
OrbMol (Orb-v3 Conservative OMol)	14	88.71	9	0.50

Optimization is a simple test that evaluates a method's ability to optimize 25 drug-like molecules without producing imaginary frequencies. Optimizations are run using the sella optimizer, 0.01 eV/Å fmax, and max. 250 steps. See all optimization results. See our blog post testing NNP–optimizer pairings.

Periodic Optimization

Name
r²SCAN-3c	0.97	3.68	1.71
UMA Small 1.1 (OMol task)	1.43	2.70	1.78
UMA Medium 1.1 (OMol task)	2.23	3.02	2.45
UMA Small 1.1 (OMC task)	1.57	4.79	2.46
Egret-1	2.61	4.38	3.09
MACE-MP-0b2(Large)-D3BJ	3.47	3.03	3.35
GFN2-xTB	5.38	7.76	6.03
MACE-Osaka24-large-D3BJ	7.56	3.48	6.44
UMA Medium 1.1 (OMC task)	9.21	8.78	9.09
AIMNet2 (ωB97M-D3, new)	9.32	19.41	12.10
eSEN-OAM	9.16	32.35	15.54
Orb-v3 (Conservative Inf. OMat)	26.79	12.30	22.80

Overall scores are a weighted average of individual benchmark results.

The "Overall" scores were calculated using a multi-step process. First, we assigned difficulty scores to each component benchmark by computing the ratio of MAEs between GFN2-xTB and r²SCAN-3c. We then weighted each benchmark result by multiplying it by two factors: its difficulty score and the number of systems under study. Finally, we computed a weighted average of these adjusted values to generate the "Overall" performance metric.

Benchmark	Weight
X23b Lattice Energies	0.72
X23b Cell Volumes	0.28

Speed

Name
AIMNet2 (ωB97M-D3, new)	0.02	4.67
AIMNet2-NSE	0.02	4.52
OrbMol (Orb-v3 Conservative OMol)	0.03	2.51
OMol25's eSEN Conserving Small	0.10	0.85
UMA Small 1.1 (OMol task)	0.12	0.71
MACE-MP-0b2(Large)-D3BJ	0.13	0.68
eSEN-OAM	0.41	0.21
UMA Medium 1.1 (OMol task)	0.55	0.16

This benchmark measures the speed of running molecular dynamics (MD) simulations on tacrolimus (126 atoms) through ASE with a 1 fs timestep at 300 K for 50 steps. All calculations were run on A10G GPUs through Modal. See all speed results.

NNP Arena

Molecular Energy

Molecular Optimization

Periodic Optimization

Speed

View Results by Benchmark

View Results by Method