🏆 MHPP Leaderboard 🏆

MHPP Evaluates AI Coders Performance against Diverse Code Generation Challenges

🤗

to add your models on our leaderboard!

⚡MHPP⚡

📝 Notes

Models are ranked based on their pass@1 scores using greedy decoding. For the sampling results, we set the temperature to 0.7 and sampled 100 times. We recommend using 1024 tokens as the context length, considering the length of problems and potential responses.
In the table, positions marked with a '-' indicate that the data was not collected due to limited resources or budget constraints.

🤗 Acknowledgement and More Leaderboards

We greatly thank the authors of the EvalPlus Leaderboard for allowing us to borrow their leaderboard code! In addition to MHPP leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as:

Big Code Models Leaderboard
EvalPlus Leaderboard
BigCodeBench Leaderboard
SWE-bench Leaderboard
CRUXEval Leaderboard
Chatbot Arena Leaderboard