TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

Table Manipulation using TableLLM on our platform

Abstract

We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the self-created data. To evaluate the performance of TableLLM, we have crafted a benchmark tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs.

Overview

Evaluation Results

We evaluate the code solution generation ability of TableLLM on three benchmarks: WikiSQL, Spider and Self-created table operation benchmark. The text answer generation ability is tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT-QA, FeTaQA and OTTQA. The evaluation result is shown below:

ModelWikiTQTAT-QAFeTaQAOTTQAWikiSQLSpiderSelf-createdAverage
TaPEX38.583.915.0/45.8
TaPas31.574.223.1/42.9
TableLlama24.022.220.56.443.79.0/20.7
GPT3.558.572.171.260.881.767.477.169.8
GPT474.177.178.469.584.069.577.875.8
Llama2-Chat (13B)48.849.667.761.556.9
CodeLlama (13B)43.447.257.249.738.321.947.643.6
Deepseek-Coder (33B)6.511.07.17.472.558.473.933.8
StructGPT (GPT3.5)52.527.511.814.067.884.8/48.9
Binder (GPT3.5)61.612.86.85.178.652.6/42.5
DATER (GPT3.5)53.428.418.313.058.226.5/37.0
TableLLM-7B (Ours)58.866.972.663.186.682.678.872.8
TableLLM-13B (Ours)62.468.274.562.590.783.480.874.7

Contact

If you have any questions, we encourage you to either create Github issues or get in touch with us at zhang2718@ruc.edu.cn, zeyaoma@ruc.edu.cn or zhang-jing@ruc.edu.cn.