In a thorough empirical analysis of large language models’ output in code translation tasks, researchers found that a significant percentage of the code translations from LLMs required post-processing. The study examines various instruct-tuned LLMs across multiple programming languages and highlights the underestimation of their performance when output formats are not properly considered. The application of prompt engineering and regular expressions improved source code extraction, emphasizing the need for more reliable benchmarks in code translation.
Understanding the impact of output formats opens new avenues for more effective use of LLMs in software development and translation, providing insight into the optimization of code generation processes.