We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SE

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Software Engineering

Title: CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Abstract: To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries revised from code in 367 GitHub repositories taken from the CodeSearchNet dataset. To demonstrate the complexity and solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans and 61% are rated as "requires effort to solve". We conduct code generation experiments on open-source and proprietary models and analyze the performance of both humans and models. We provide the code at this https URL
Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
Cite as: arXiv:2404.00566 [cs.SE]
  (or arXiv:2404.00566v3 [cs.SE] for this version)

Submission history

From: Yiqing Xie [view email]
[v1] Sun, 31 Mar 2024 05:20:53 GMT (1128kb,D)
[v2] Fri, 26 Apr 2024 08:48:19 GMT (1133kb,D)
[v3] Wed, 8 May 2024 03:14:48 GMT (1129kb,D)

Link back to: arXiv, form interface, contact.