Struc-Bench: Evaluating LLMs in Generating Structured Data

LLM Information mining

LLMs

Structured Data

Benchmark

Format-specific

Data Structures

Struc-Bench: Evaluating LLMs in Generating Structured Data

Despite the impressive strides made by LLMs (e.g., GPT-4), producing complex structured data such as tables, HTML, and LaTeX remains a challenge. Our study introduces Struc-Bench, a new benchmark to assess LLMs capabilities in handling structured data formats. The experiment highlights the need for task-specific evaluation and presents novel metrics (P-Score and H-Score) to better evaluate LLM performance. We applied structure-aware fine-tuning to LLaMA-7B, resulting in substantial performance gains across multiple dimensions:

Coverage: Ensuring extensive topic handling.
Formatting: Adherence to layout and syntax.
Reasoning: Logical structuring of data.
Comprehension: Understanding and using data contextually.
Pragmatics: Application of data in real-world scenarios.
Hallucination: Minimizing incorrect data generation.

This research pushes the boundaries on how LLMs can be further refined and utilized for generating reliable and complex data structures. It opens up numerous possibilities for automating data-sensitive tasks, which can be widely beneficial across various fields requiring structured outputs. The full study and tools can be accessed on GitHub.