
Researchers examined how LLMs like GPT-4, GPT-NeoX-20B, and GPT-3.5 perform in creating complex structured data. A novel benchmark, Struc-Bench, was developed to evaluate LLMs across different formats like text tables, HTML, and LaTeX. The study introduces two new metrics: P-Score (Prompting Score) and H-Score (Heuristical Score), aiming to bring a more precise evaluation of LLMs’ abilities.
The revelation that LLMs can be fine-tuned for better structuring of tables is significant, highlighting the nuanced capabilities of these models and their potential applications in data structuring tasks. Read more.