
Large Language Models (LLMs) like GPT-4 continue to impress with their capabilities, yet they struggle with generating complex structured tabular data. A recent study introduces Struc-Bench as a comprehensive benchmark that pits prominent LLMs (GPT-NeoX-20B, GPT-3.5, GPT-4, and Vicuna) against one another in the arena of structuring tables across text tables, HTML, and LaTeX formats. The innovation doesn’t stop here—the study also presents FormatCoT, a method to craft format-specific instructions, and two new metrics, P-Score and H-Score, to evaluate LLM performance more accurately.
The paper emphasizes the importance of bespoke fine-tuning techniques for LLMs, especially when dealing with structured, table-like data. Addressing this challenge could extend LLMs’ applicability in data-heavy sectors like scientific research, finance, and logistics, showcasing a step towards AI models that can grasp and generate complex data formats. Read More