CodeApex is a bilingual programming evaluation benchmark for Large Language Models proposed by Apex Lab. It consists two basic programming tasks: programming comprehension and code generation. Programming Comprehension Test consists of 250 multiple choice quesitions, including conceptual understanding, commonsense reasoning, and multi-hop reasoning three question categories. Code generation Task consists of 476 C++ based algorithm problems, covering common algorithm knowledge points like binary search, depth-firsts-search and so on. In the future, CodeApex will publish other code-related functional tests, such as code correction.
Our data can be directly downloaded at Github. You can download our paper of CodeApex here.
@misc{fu2023codeapex, title={CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models}, author={Fu, Lingyue and Chai, Huacan and Luo, Shuang and Du, Kounianhua and Zhang, Weiming and Fan, Longteng and Lei, Jiayi and Rui, Renting and Lin, Jianghao and Fang, Yuchen and Liu, Yifan and Wang, Jingkuan and Qi, Siyuan and Zhang, Kangning and Zhang, Weinan and Yu, Yong}, year={2023}, eprint={2309.01940}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Have any questions about CodeApex? Please contact us at codeapex@apex.sjtu.edu.cn or create an issue on Github. Download our manual here.