
Despite the impressive strides made by LLMs (e.g., GPT-4), producing complex structured data such as tables, HTML, and LaTeX remains a challenge. Our study introduces Struc-Bench, a new benchmark to assess LLMs capabilities in handling structured data formats. The experiment highlights the need for task-specific evaluation and presents novel metrics (P-Score and H-Score) to better evaluate LLM performance. We applied structure-aware fine-tuning to LLaMA-7B, resulting in substantial performance gains across multiple dimensions:
This research pushes the boundaries on how LLMs can be further refined and utilized for generating reliable and complex data structures. It opens up numerous possibilities for automating data-sensitive tasks, which can be widely beneficial across various fields requiring structured outputs. The full study and tools can be accessed on GitHub.