The Natural Language Toolkit for Data Visualization
Georgia Institute of Technology, UNC CharlotteNatural language interfaces (NLIs) have shown great promise for visual data analysis, allowing people to flexibly specify and interact with visualizations. However, developing visualization NLIs remains a challenging task, requiring low-level implementation of natural language processing (NLP) techniques as well as knowledge of visual analytic tasks and visualization design.
NL4DV is a Python package that takes as input a tabular dataset and a natural language query about that dataset. In response, the toolkit returns an analytic specification modeled as a JSON object containing data attributes, analytic tasks, and a list of Vega-Lite specifications relevant to the input query. In doing so, NL4DV aids visualization developers who may not have a background in NLP, enabling them to create new visualization NLIs or incorporate natural language input within their existing systems. NL4DV has had a rich history with three major version releases. The versioning scheme is cumulative, i.e., with appropriate configurations, installing v3 (or 3.x) will include capabilities of both v2 (2.x) and v1 (1.x). Read more below:
Released in 2024, this version enables developers to utilize a Large Language Model (GPT) to translate a natural language query about a dataset into a relevant visualization, including additional features such as multi-turn conversational interaction and ambiguity resolution. We present a comprehensive text prompt that, given a tabular dataset and an NL query about the dataset, generates an analytic specification including (detected) data attributes, (inferred) analytic tasks, and (recommended) visualizations. This specification captures key aspects of the query translation process, affording both explainability and debuggability. For instance, it provides mappings from the detected entities to the corresponding phrases in the input query, as well as the specific visual design principles that determined the visualization recommendations. Moreover, unlike prior LLM-based approaches, our prompt supports conversational interaction and ambiguity detection capabilities. In our paper, we detail the iterative process of curating our prompt, present a preliminary performance evaluation using GPT-4, and discuss the strengths and limitations of LLMs at various stages of query translation. Check it out!
Citation:
@misc{sah2024nl4dvllm, title={Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models}, author={Subham Sah and Rishab Mitra and Arpit Narechania and Alex Endert and John Stasko and Wenwen Dou}, year={2024}, eprint={2408.13391}, archivePrefix={arXiv}, primaryClass={cs.HC}, url={https://arxiv.org/abs/2408.13391}, howpublished={Presented at the NLVIZ Workshop, IEEE VIS 2024} }
Released in 2022, this version enables developers to utilize semantic parsing techniques to facilitate multiple conversations about a dataset (conversational interaction) and also resolve associated ambiguities, in addition to the core functionality of translating a natural language query to a visualization. Checkout the showcase to see three examples: (1) an NLI to learn aspects of the Vega-Lite grammar, (2) a mind mapping application to create free-flowing conversations, and (3) a chatbot to answer questions and resolve ambiguities. Check it out!
Citation:
@article{mitra2022conversational, title={{Facilitating Conversational Interaction in Natural Language Interfaces for Visualization}}, author={{Rishab Mitra} and Narechania, Arpit and Endert, Alex and Stasko, John}, journal={IEEE VIS (Short Papers)}, year={{2022}}, publisher={IEEE}, url={https://doi.org/10.1109/VIS54862.2022.00010} }
Released in 2020, this version enables developers to utilize semantic parsing techniques to translate a natural language query about a tabular dataset to one or more relevant visualizations. Checkout the showcase to see four examples: (1) rendering visualizations using natural language in a Jupyter notebook, (2) developing a NLI to specify and edit Vega-Lite charts, (3) recreating data ambiguity widgets from the DataTone system, and (4) incorporating speech input to create a multimodal visualization system.
Citation:
@article{narechania2020nl4dv, title={{NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries}}, author={Narechania, Arpit and Srinivasan, Arjun and Stasko, John}, journal={IEEE TVCG}, year={{2021}}, publisher={IEEE}, url={https://doi.org/10.1109/TVCG.2020.3030378} }