🏆 MHPP Leaderboard 🏆
MHPP Evaluates AI Coders Performance against Diverse Code
Generation Challenges
🤗
File a request
to add your models on our leaderboard!
📝 Notes
-
Models are ranked based on their pass@1 scores using greedy decoding. For the sampling results, we set
the temperature to 0.7 and sampled 100 times.
We recommend using 1024 tokens as the context length, considering the length of problems and potential
responses.
- In the table, positions marked with a '-' indicate that the data was not collected due to limited
resources or budget constraints.