π Code2Bench-2509 benchmark π
Code2Bench: Scaling Source and Rigor for Dynamic Benchmark Construction
Our source code is available on GitHub
π Notes
- SC-Python: Strongly Constrained Python tasks (217 problems).
- WSC-Python: Weakly Structured Constrained Python tasks (194 problems).
- SC-Java: Strongly Constrained Java tasks (249 problems).
- For code completion generation, we adopted greedy decoding (temperature = 0) for Pass@1 evaluation.
- Open-source models with fewer than 32B parameters were hosted locally using vLLM; larger/proprietary models were accessed via official APIs.