CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Xie, Yiqing; Xie, Alex; Sheth, Divyanshu; Liu, Pengfei; Fried, Daniel; Rose, Carolyn

Full-text links:

Download:

Current browse context:

cs.SE

< prev | next >

new | recent | 2404

Computer Science > Software Engineering

Title: CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Authors: Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

(Submitted on 31 Mar 2024 (v1), last revised 8 May 2024 (this version, v3))

Abstract: To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries revised from code in 367 GitHub repositories taken from the CodeSearchNet dataset. To demonstrate the complexity and solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans and 61% are rated as "requires effort to solve". We conduct code generation experiments on open-source and proprietary models and analyze the performance of both humans and models. We provide the code at this https URL

Subjects:	Software Engineering (cs.SE); Computation and Language (cs.CL)
Cite as:	arXiv:2404.00566 [cs.SE]
	(or arXiv:2404.00566v3 [cs.SE] for this version)

Submission history

From: Yiqing Xie [view email]
[v1] Sun, 31 Mar 2024 05:20:53 GMT (1128kb,D)
[v2] Fri, 26 Apr 2024 08:48:19 GMT (1133kb,D)
[v3] Wed, 8 May 2024 03:14:48 GMT (1129kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.00566

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Software Engineering

Title: CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Submission history