🏆 MHPP Leaderboard 🏆

MHPP Evaluates AI Coders Performance against Diverse Code Generation Challenges

github

🤗

File a request

to add your models on our leaderboard!

📝 Notes

  1. Models are ranked based on their pass@1 scores using greedy decoding. For the sampling results, we set the temperature to 0.7 and sampled 100 times. We recommend using 1024 tokens as the context length, considering the length of problems and potential responses.
  2. In the table, positions marked with a '-' indicate that the data was not collected due to limited resources or budget constraints.

🤗 Acknowledgement and More Leaderboards

We greatly thank the authors of the EvalPlus Leaderboard for allowing us to borrow their leaderboard code! In addition to MHPP leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as:

  1. Big Code Models Leaderboard
  2. EvalPlus Leaderboard
  3. BigCodeBench Leaderboard
  4. SWE-bench Leaderboard
  5. CRUXEval Leaderboard
  6. Chatbot Arena Leaderboard