Documentation

Setup

Installation

  1. nl4dv requires a 64-bit Python 3 environment. Windows users must ensure that Microsoft C++ Build Tools is installed. Mac OS X users must ensure that Xcode is installed.

  2. Install using one of the below methods:

    1. PyPi.

      To download, run

      pip install nl4dv==4.1.0

    2. A local distributable. Download

      nl4dv-4.1.0.tar.gz   OR   nl4dv-4.1.0.zip

      Accordingly, run

      pip install nl4dv-4.1.0.tar.gz
      OR
      pip install nl4dv-4.1.0.zip
      .

    Note: We recommend installing NL4DV in a virtual environment as it avoids version conflicts with globally installed packages.

Post Installation


  • Instructions for "processing_mode" = "language-model" (v4)

    NL4DV requires an API Key to configure the new "processing_mode" = "language-model". Internally, NL4DV utilizes LiteLLM as its LLM gateway, which enables access to a variety of language models. Check the LiteLLM website to know more about the supported models and accordingly generate an API Key for your chosen model.

  • Instructions for "processing_mode" = "gpt" (v3)

    NL4DV requires an OpenAI API key for its "processing_mode" = "gpt". Please refer the OpenAI website to generate an API Key.

  • Instructions for "processing_mode" = "semantic-parsing" (v1, v2)

    1. NL4DV installs nltk by default but requires a few datasets/models/corpora to be separately installed. Please download the popular nltk artifacts using:

      python -m nltk.downloader popular

    2. NL4DV requires a third-party Dependency Parser module to infer tasks. Download and install one of:

      1. Stanford CoreNLP (recommended):

        • Download the English model of Stanford CoreNLP version 3.9.2 and copy it to `examples/assets/jars/` or a known location.

        • Download the Stanford Parser version 3.9.2 and after unzipping the folder, copy the `stanford-parser.jar` file to `examples/assets/jars/` or a known location.

          Note: This requires JAVA installed and the JAVA_HOME / JAVAHOME environment variables to be set.

      2. Stanford CoreNLPServer :

        • Download the Stanford CoreNLPServer, unzip it in a known location, and cd into it.

        • Start the server using the below command. It will run on http://localhost:9000.

          java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

          Note: This requires JAVA installed and the JAVA_HOME / JAVAHOME environment variables to be set.

      3. Spacy :

Sample Code | v4 ('language-model') | NL4DV-Stylist

import json
from nl4dv import NL4DV
# Your dataset must be hosted on Github for the LLM-based mode to function.
data_url="https://raw.githubusercontent.com/nl4dv/nl4dv/master/examples/assets/data/movies-w-year.csv" #paste your data URL

# Choose your processing mode LLM or parsing. Choose "language-model" for the language-model based mode or "semantic-parsing" for the rules-based mode.
processing_mode="language-model"

# Enter your Language Model configuration
lm_config = {
    "model": "gpt-4o", # gpt-4o, gpt-4o-mini
    "environ_var_name": "OPENAI_API_KEY",
    "api_key": "Your api key",
    "api_base": None
}

# Define a query
query = "correlate budget and gross for action and adventure movies"

design_config = [{
    "type": "image_url",
    "image_url": { "url": "https://i.ibb.co/LXk7QMvL/17.png"}, # examples/assets/example-charts/17.png
},{
    "type": "text",
    "text": " ".join(["Apply this chart's design.", "Remove ticks on either axes."])
},{
    "type": "image_url",
    "image_url": { "url": "https://i.ibb.co/HTL4HQSP/4.png"}, # examples/assets/example-charts/4.png
},{
    "type": "text",
    "text": " ".join(["Apply this chart's title's font style and position.", "Apply this chart's legend's style but position it to the left of the chart, not right."])
}]

# Initialize an instance of NL4DV
nl4dv_instance = NL4DV(data_url=data_url, processing_mode=processing_mode, lm_config=lm_config, design_config=design_config)

# Execute the query
output = nl4dv_instance.analyze_query(query)

# Print the output
print(output)
Input Chart 1
Example chart input 1 of v4
Input Chart 2
Example chart input 2 of v4

Output

{
  "query": "correlate budget and gross for action and adventure movies",
  "query_raw": "correlate budget and gross for action and adventure movies",
  "dataset": "https://raw.githubusercontent.com/nl4dv/nl4dv/master/examples/assets/data/movies-w-year.csv",
  "visList": ["..."],
  "attributeMap": {"..."},
  "taskMap": {"..."},
  "vlSpec": {"..."}, // Vega-Lite specification without any design customizations. 
  "vlSpec_design": {"..."},  // New in v4. Vega-Lite specification with requested design customizations.
  "design_checklist": ["..."], // New in v4.
  "design_successlist": ["..."], // New in v4.
  "design_failurelist": [] // New in v4.
}

"vlSpec_design" ▸

Note that "vlSpec_design" is a new property in v4 containing a Vega-Lite specification that includes the design customizations requested by the user; the original "vlSpec" property continues to exist with the Vega-Lite specification without any design customizations (i.e., with the default designs).

{
"$schema": "https://vega.github.io/schema/vega-lite/v6.json",
"data": {
    "url": "https://raw.githubusercontent.com/nl4dv/nl4dv/master/examples/assets/data/movies-w-year.csv"
},
"transform": [
    {
        "filter": {
            "field": "Genre",
            "oneOf": ["Action", "Adventure"]
        }
    }
],
"mark": "point",
"encoding": {
    "x": {
        "field": "Production Budget",
        "type": "quantitative",
        "axis": {
            "ticks": false
        }
    },
    "y": {
        "field": "Worldwide Gross",
        "type": "quantitative",
        "axis": {
            "ticks": false
        }
    },
    "color": {
        "field": "Genre",
        "type": "nominal",
        "legend": {
            "padding": 8
        }
    }
},
"config": {
    "background": "#ff8c00",
    "view": {
        "fill": "#ffff00"
    },
    "legend": {
        "orient": "left",
        "fillColor": "white",
        "strokeColor": "gray",
        "cornerRadius": 4,
        "titleFontSize": 12
    }
},
"title": {
    "text": "Correlation of Budget and Gross for Action and Adventure Movies",
    "fontSize": 18,
    "fontWeight": "bold",
    "anchor": "middle",
    "align": "center"
    }
}

"design_checklist" ▸

[
    {
        "type": "chart",
        "name": "chart 1",
        "content": [
            "Apply this chart's design.",
            "Remove ticks on either axes."
        ]
    },
    {
        "type": "chart",
        "name": "chart 2",
        "content": [
            "Apply this chart's title's font style and position.",
            "Apply this chart's legend's style but position it to the left of the chart."
        ]
    }
]

"design_successlist"▸

[
    {
        "type": "chart",
        "name": "chart 1",
        "content": [
            "Used background color.",
            "Used mark styles."
        ]
    },
    {
        "type": "chart",
        "name": "chart 2",
        "content": [
            "Used title font style and position.",
            "Used legend style and positioned it to the left."
        ]
    }
]

"design_failurelist" ▸

[]

Visualization (when the Vega-Lite spec is rendered) ▸

Example output of v4

Sample Code | v3 ('gpt') | NL4DV-LLM

from nl4dv import NL4DV
#Your dataset must be hosted on Github for the LLM-based mode to function.
data_url="https://raw.githubusercontent.com/nl4dv/nl4dv/master/examples/assets/data/movies-w-year.csv" #paste your data URL

# Choose your processing mode LLM or parsing. Choose "gpt" for the LLM-based mode or "semantic-parsing" for the rules-based mode.
processing_mode="gpt"

#Enter your OpenAI key
gpt_api_key="[OpenAI KEY HERE]"

# Initialize an instance of NL4DV
nl4dv_instance = NL4DV(data_url=data_url, processing_mode=processing_mode, gpt_api_key=gpt_api_key)

# Define a query
query = "create a barchart showing average gross across genres"

# Execute the query
output = nl4dv_instance.analyze_query(query)

# Print the output
print(output)

Output

{
  "query": "create a barchart showing average gross across genres",
  "dataset": "https://raw.githubusercontent.com/nl4dv/nl4dv/master/examples/assets/data/cars-w-year.csv",
  "attributeMap": {"..."},
  "taskMap": {"..."},
  "visList": ["..."],
  "followUpQuery": false,
  "contextObj": null
}

"attributeMap" ▸

{
    "Worldwide Gross": {
        "name": "Worldwide Gross",
        "queryPhrase": ["gross"],
        "inferenceType": "explicit",
        "isAmbiguous": false,
        "ambiguity": []
    },
    "Genre": {
        "name": "Genre",
        "queryPhrase": ["genres"],
        "inferenceType": "explicit",
        "isAmbiguous": false,
        "ambiguity": []
    }
}

"taskMap" ▸

{
    "derived_value": [
        {
            "task": "derived_value",
            "queryPhrase": "average",
            "operator": "AVG",
            "values": [],
            "attributes": [
                "Worldwide Gross"
            ],
            "inferenceType": "explicit"
        }
    ]
}

"visList"▸

[
    {
        "attributes": [
            "Worldwide Gross",
            "Genre"
        ],
        "queryPhrase": "barchart",
        "visType": "barchart",
        "tasks": [
            "derived_value"
        ],
        "inferenceType": "explicit",
        "vlSpec": {
            "$schema": "https://vega.github.io/schema/vega-lite/v6.json",
            "mark": {
                "type": "bar",
                "tooltip": true
            },
            "encoding": {
                "y": {
                    "field": "Worldwide Gross",
                    "type": "quantitative",
                    "aggregate": "mean",
                    "axis": {
                        "format": "s"
                    }
                },
                "x": {
                    "field": "Genre",
                    "type": "nominal",
                    "aggregate": null
                }
            },
            "transform": [],
            "data": {
                "url": "https://raw.githubusercontent.com/nl4dv/nl4dv/master/examples/assets/data/cars-w-year.csv",
                "format": {
                    "type": "csv"
                }
            }
        }
    }
]

Visualization (when the Vega-Lite spec is rendered) ▸

Example output of v3

Sample Code | v2 ('semantic-parsing') | Multi-Turn Dialogs


Multi-Turn Dialogs Sample Code

Sample Code | v1 ('semantic-parsing') | Single-Turn Utterances

from nl4dv import NL4DV
import os

# Initialize an instance of NL4DV
# ToDo: verify the path to the source data file. modify accordingly.
nl4dv_instance = NL4DV(data_url = os.path.join(".", "examples", "assets", "data", "movies-w-year.csv"))

# using Stanford Core NLP
# ToDo: verify the paths to the jars. modify accordingly.
dependency_parser_config = {"name": "corenlp", "model": os.path.join(".", "examples","assets","jars","stanford-english-corenlp-2018-10-05-models.jar"),"parser": os.path.join(".", "examples","assets","jars","stanford-parser.jar")}

# using Stanford CoreNLPServer
# ToDo: verify the URL to the CoreNLPServer. modify accordingly.
# dependency_parser_config = {"name": "corenlp-server", "url": "http://localhost:9000"}

# using Spacy
# ToDo: ensure that the below spacy model is installed. if using another model, modify accordingly.
# dependency_parser_config = {"name": "spacy", "model": "en_core_web_sm", "parser": None}

# Set the Dependency Parser
nl4dv_instance.set_dependency_parser(config=dependency_parser_config)

# Define a query
query = "create a barchart showing average gross across genres"

# Execute the query
output = nl4dv_instance.analyze_query(query)

Applications

Follow these steps to run the example applications:

  • Download or Clone the repository using

    git clone https://github.com/nl4dv/nl4dv.git

  • cd into the examples directory and create a new virtual environment.

    virtualenv --python=python3 venv

  • Activate it using:

    source venv/bin/activate (MacOSX/ Linux)

    venv\Scripts\activate.bat (Windows)

  • Install dependencies.

    python -m pip install -r requirements.txt

  • Manually install nl4dv in this virtual environment using one of the above instructions.

  • Run python app.py.

  • Open your favorite browser and go to http://localhost:7001. You should see something like:


Showcase

For the Jupyter Notebook application,

  • cd into the examples directory.

  • Install and enable the Vega extension in the notebook using

    • jupyter nbextension install --sys-prefix --py vega

    • jupyter nbextension enable vega --py --sys-prefix

  • Launch the notebook using jupyter notebook.

    Make sure your Jupyter notebook uses an (virtual) environment that has NL4DV installed. Go to examples/applications/notebook and launch Single-Turn-Conversational-Interaction.ipynb to run the demo that showcases NL4DV's single-turn (standalone) conversational capabilities or Multi-Turn-Conversational-Interaction.ipynb for viewing NL4DV's follow-up capabilities

API Reference


Common for v1, v2, v3, v4.


Method Params Description
NL4DV()

data_url (str)

See set_data() for a description and example.
OR

data_value (list|dict|pandas DataFrame)

See set_data() for a description and example.

processing_mode (str, required). Possible values:

  • 'semantic-parsing' (for v1 or v2)
  • 'gpt' (for v3)
  • 'language-model' (for v4)

NL4DV constructor.

Returns:
nl4dv_instance.
nl4dv_instance.analyze_query()

query (str, required)

Example: "visualize mpg."

Analyzes the input query.

Returns: a JSON object comprising detected attributes, inferred analytic tasks, and relevant visualizations.

Note
The vlSpec property contains the Vega-Lite() specification for the translated query without design aspects applied.

vlSpec_design, design_checklist, design_successlist, design_failurelist properties were added in v4. The vlSpec_design contains the Vega-Lite() specification for the translated query with design aspects applied and the other properties have corresponding metadata.

The other properties in the output should be self-explanatory. Read the v1, v2, v3, v4 papers for details.

nl4dv_instance.render_vis()

query (str, required)

Example: "visualize mpg."

Calls analyze_query() internally, but...

Returns: VegaLite() object of the best, most relevant visualization. This is useful to directly render a visualization in a Jupyter Notebook cell.
nl4dv_instance.set_data()

data_url (str: path to local file or url)

Example:"euro.csv" (sample) | "euro.tsv" (sample) | "euro.json" (sample)
OR

data_value (list|dict|pandas.DataFrame)

Example:
  • list:  
    [{"acceleration": 19,
    "salary": 1000},
    {"acceleration": 21,
    "salary": 1320}]
  • dict:  
    { "acceleration": [19, 21],
    "salary": [1000, 1320]}
  • DataFrame:  pandas.DataFrame() instance.
Sets the dataset to query against. Use this if you want to change the dataset after initializing the constructor.
nl4dv_instance.get_metadata() - Returns: a JSON object consisting of the dataset metadata (e.g., attributes and inferred data types, etc) that is set after NL4DV is initialized with a dataset.



v4  (requires "processing_mode" set to "language-model")

Method Params Description
NL4DV()

lm_config (dict)

Example:
{
    "model": "gpt-4o",
    "environ_var_name": "OPENAI_API_KEY",
    "api_key": "sk-WMoBVu...",
    "api_base": None
}
Check the LiteLLM website to know more about the supported models, their environment_var_name, and accordingly generate an api_key or use an api_base. Also, note that not all language models will support all types of file uploads (e.g., PDF, PNG, etc); use the feature accordingly.

design_config (list, optional)

Example (PNG | JPEG | WEBP):
[{
    "type": "image_url",
    "image_url": { "url": "https://i.ibb.co/LXk7QMvL/17.png"}
},{
    "type": "text",
    "text": "Apply all design aspects from this chart."
}]

Example (PDF | SVG):
filename = "examples/assets/example-charts/32.pdf"
with open(filename, "rb") as f:
    file_content = f.read()
    base64_content = base64.b64encode(file_content).decode('utf-8')
    file_data = f"data:application/pdf;base64,{base64_content}"

    design_config = [
        {
            "type": "file",
            "file": {
                "filename": filename,
                "file_data": file_data,
            },
        },{
            "type": "text",
            "text": "Apply all design aspects from this chart."
        }]



v3  (requires "processing_mode" set to "gpt")

Method Params Description
NL4DV()

gpt_api_key (str)

Example: "sk-WMoBVu..."



v2 only  (requires "processing_mode" set to "semantic-parsing")

Method Params Description
NL4DV()

explicit_followup_keywords (dict)

Example:
{"put": [("addition", "add")],
"add": [("addition", "add")]}
Overrides the default explicit_followup_keywords map. The dictionary must be formatted as the following: The key must be the keyword string, and the value must be a list. The list must contain only one element that is a 2-tuple. The first element in the 2-tuple represents the noun version of the follow-up operation; it MUST come from these words - (addition, removal, replacement), and the second element represents the verb version of the follow-up operation (add, remove, replace).

implicit_followup_keywords (dict)

Example:
{"also": [("also", "add")],
"as well": [("aswell", "add")]}
Overrides the default implicit_followup_keyword_map map. The dictionary must be formatted as the following: The key must be the keyword string, and the value must be a list. The list must contain only one element that is a 2-tuple. The first element in the 2-tuple represents the concatenated version of the token with no spaces, and the second element represents the verb version of the follow-up operation (add, remove, replace).
nl4dv_instance.analyze_query()

dialog (bool or str='auto', optional)

dialog_id (str, optional)

query_id (str, optional)

dialog=True means a given query is a follow-up query; dialog=False means it is a new, standalone query; dialog='auto' means the user wants the system to automatically determine if the input query is a follow-up or not (and in which case you must not pass a dialog_id or a query_id).

If a dialog_id and query_id are specified, then the user's intent is to follow-up on the specific query at the (dialog_id -> query_id) node in the conversation graph. Read the v2 paper to know more about dialog, dialog_id, and query_id.

nl4dv_instance.update_query()

ambiguity_obj (dict)

Example:

{
"dialog_id": "0", "query_id": "0",
"attribute": {"medals": "Gold Medals"},
"value": {"hockey": "Ice Hockey", "skating": "Speed Skating"}
}
                                                
Resolve attribute-level and value-level ambiguities by setting the correct entities to the corresponding keywords (phrase) in the query.
nl4dv_instance.get_dialogs()

dialog_id (str)

query_id (str)

Get a specific dialog (if dialog_id is provided), a specific query in a dialog (if both dialog_id and query_id are provided), or all dialogs (if none of dialog_id and query_id are provided). Returns the requested entities as JSON specifications.
nl4dv_instance.delete_dialogs(dialog_id=None, query_id=None)

dialog_id (str)

query_id (str)

Delete a specific dialog (if dialog_id is provided), a specific query in a dialog (if both dialog_id and query_id are provided), or all dialogs (if none of dialog_id and query_id are provided), practically resetting the corresponding NL4DV instance. Returns the deleted entities as JSON specifications.
nl4dv_instance.undo() Delete the most recently processed query; returns the deleted entity as a JSON specification.
nl4dv_instance.set_explicit_followup_keywords()

explicit_followup_keyword_map (dict)

nl4dv_instance.set_implicit_followup_keywords()

implicit_followup_keyword_map (dict)




v2 and v1  (requires "processing_mode" set to "semantic-parsing")

Method Params Description
NL4DV()

alias_url (str, optional)

See set_alias_map() for a description and example.

alias_value (dict, optional)

See set_alias_map() for a description and example.

label_attribute (str, optional)

See set_label_attribute() for a description and example.

ignore_words (list, optional)

See set_ignore_words() for a description and example.

reserve_words (list, optional)

See set_reserve_words() for a description and example.

dependency_parser_config (dict, required)

See set_dependency_parser() for a description and example.

thresholds (dict, optional)

See set_thresholds() for a description and example.

importance_scores (dict, optional)

See set_importance_scores() for a description and example.

attribute_datatype (dict, optional)

See set_attribute_datatype() for a description and example.
nl4dv_instance.analyze_query()

debug (bool)

Print logs to help debug the query translation process.

verbose (bool)

Include metadata (e.g., confidence scores) in the output JSON.
nl4dv_instance.set_alias_map()

alias_url (str: path to local file or url)

Example: "aliases/euro.json" (sample)
OR

alias_value (dict)

Example: "aliases/cars.json" where the json is like
{"MPG": ["miles per gallon"],
"Horsepower": ["hp"]}
Sets the alias values.
nl4dv_instance.set_thresholds()

thresholds (dict)

Example:
{"synonymity": 95,
"string_similarity": 85}
Overrides the default thresholds such as string matching.
nl4dv_instance.set_importance_scores()

scores (dict)

Example:
{'attribute': {
    'attribute_exact_match': 1,
    'attribute_similarity_match': 0.9,
    'attribute_alias_exact_match': 0.8,
    'attribute_alias_similarity_match': 0.75,
    'attribute_synonym_match': 0.5,
    'attribute_domain_value_match': 0.5,
},
'task': {
    'explicit': 1,
    'implicit': 0.5,
},
'vis': {
    'explicit': 1
}}
Sets the Scoring Weights for the way attributes / tasks and visualizations are detected.
nl4dv_instance.set_attribute_datatype()

attr_type_obj (dict)

Example:
{"Year": "T", # Temporal
"Rank": "O", # Ordinal
"Salary": "Q", # Quantitative
"Gender": "N" # Nominal}
Override the attribute datatypes that are detected by NL4DV.
nl4dv_instance.set_dependency_parser()

config (dict)

Example:
{"name": "corenlp-server",
"url": "http://192.168.99.102:9000"}
Set the dependency parser to be used in the Tasks detector module.
nl4dv_instance.set_reserve_words()

reserve_words (list)

Example:["A"] # "A" - although an article (like 'a/an/the') should be retained in a grades dataset.
Set the custom STOPWORDS that should NOT be removed from the query, as they might be present in the domain.
nl4dv_instance.set_ignore_words()

ignore_words (list)

Example:["movie"]
Set the words that should be IGNORED in the query, i.e. NOT lead to the detection of attributes and tasks.
nl4dv_instance.set_label_attribute()

label_attribute (str)

Example:["Model"] # Correlate horsepower and MPG for sports car models" should NOT apply an explicit attribute for models since there are two explicit attributes already present.
Set the words that should be IGNORED in the query, i.e. NOT lead to the detection of attributes and tasks.

FAQ | Message from Creators

As we plan additional features to add new features / improve the toolkit, we recommend users/developers to be aware of the following:

  • How is "data" returned in the output JSON?
    If the data was input via the "data_value" parameter, to minimize the storage footprint of this output JSON, the Vega-Lite spec (vlSpec) will NOT include the dataset values (under the "data" > "value" property); you are expected to supply these to render the visualization. However, if the dataset was input via the "data_url" parameter, the 'vlSpec' will have this data configuration by default (under the "data" > "url" property).

  • Dependency parser output variations.
    The dependency tree returned by CoreNLP, CoreNLP Server, and Spacy are sometimes different. The current parser logic was developed for CoreNLP, hence it'll work best. However, we are upgrading the rules to work consistently across all dependency parsers.

  • Attribute data types.
    Verify the attribute types (e.g., nominal, temporal) that are detected by NL4DV and override them if they are incorrect as they will most likely lead to erroneous visualizations. The current attribute datatype detection logic is based on heuristics and we are currently working towards a major improvement that semantically infers the data type from both, the attribute's name and its value.

  • Temporal attributes.
    NL4DV relies on Regular Expressions to detect common date formats (listed in the order of priority in case of conflicts).

    Supported Date Formats (Codes: 1989 C standard) Examples

    %m*%d*%Y or %m*%d*%y where * ∈ {. - /}

    • 12.24.2019
    • 12/24/2019
    • 1-24-19
    • 09.24.20

    %Y*%m*%d or %y*%m*%d where * ∈ {. - /}

    • 2019.12.24
    • 2019/12/24
    • 19-1-24
    • 20.09.24

    %d*%m*%Y or %d*%m*%y where * ∈ {. - /}

    • 24.12.2019
    • 24/12/2019
    • 24-1-19
    • 24.09.20

    %d/%b/%Y or %d/%B/%Y or %d/%b/%y or %d/%B/%y where * ∈ {. - / space}

    • 8-January-2019
    • 31 Dec 19
    • 1/Jan/19

    %d*%b or %d*%B where * ∈ {. - / space}

    • 8-January
    • 31 Dec
    • 1/Jan

    %b/%d/%Y or %B/%d/%Y or %b/%d/%y or %B/%d/%y where * ∈ {. - / space}

    • January-8-2019
    • Dec 31 19
    • Jan/1/19

    %Y

    Only the following series:
    • 18XX (e.g., 1801)
    • 19XX (e.g., 1929)
    • 20XX (e.g., 2010)

  • Filter task.
    NL4DV applies the filter task by matching the condition against each data point but does not encode the involved attributes in the visualization. This was a design decision taken to avoid recommending a complex visualization due to too many encoded attributes.

  • Thresholds and Match scores.
    These are currently set based on heuristics and prior research works; we encourage users/developers to modify them to suit their specific requirements.

Build

NL4DV can be installed as a Python package and imported in your own awesome applications!

  1. NL4DV is written in Python 3. Please ensure you have a Python 3 environment already installed.

  2. Clone this repository (master branch) and enter (`cd`) into it.

  3. Create a new virtual environment.

    virtualenv --python=python3 venv

  4. Activate it using:

    source venv/bin/activate (MacOSX/ Linux)

    venv\Scripts\activate.bat (Windows)

  5. Install dependencies.

    python -m pip install -r requirements.txt

  6. make your changes>
  7. Bump up the version in setup.py and create a Python distributable.

    python setup.py sdist

  8. This will create a new file inside **nl4dv-*.*.*.tar.gz** inside the dist directory.

  9. Install the above file in your Python environment using:

    python -m pip install <PATH-TO-nl4dv-*.*.*.tar.gz>

  10. Verify by opening your Python console and importing it:

    $python
    >>> from nl4dv import NL4DV

  11. Enjoy, NL4DV is now available for use as a Python package!

Docker | only v1

NL4DV v1 is containerized into a Docker Image. This image comes pre-installed with NL4DV, Spacy, Stanford CoreNLP, and a few datasets with a web application as a Demo. Install it using:

docker pull arpitnarechania/nl4dv

Note: This mode of installation does not require the Post Installation steps. For more informations, follow the detailed instructions in the Github repository (nl4dv-docker).

Credits

NL4DV is a collaborative project originally created by the Georgia Tech Visualization Lab at Georgia Institute of Technology with subsequent contributions from Ribarsky Center for Visual Analytics at UNC Charlotte and the DataVisards Group at The Hong Kong University of Science and Technology.

We thank the members of the Georgia Tech Visualization Lab for their support and constructive feedback. We also thank @vijaynyaya for the inspiration to support multiple language model providers.

Citations

2025 (coming soon)

@misc{sah2024generatinganalyticspecificationsdata,
    title={{NL4DV-Stylist: Styling Data Visualizations Using Natural Language and Example Charts}},
    author={{Ji}, Tenghao and {Narechania}, Arpit},
    year={2025}
}

2024 IEEE VIS NLVIZ Workshop Track

@misc{sah2024generatinganalyticspecificationsdata,
    title={Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models},
    author={{Sah}, Subham and {Mitra}, Rishab and {Narechania}, Arpit and {Endert}, Alex and {Stasko}, John and {Dou}, Wenwen},
    year={2024},
    eprint={2408.13391},
    archivePrefix={arXiv},
    primaryClass={cs.HC},
    url={https://arxiv.org/abs/2408.13391},
    howpublished = {Presented at NLVIZ Workshop, IEEE VIS 2024},
}

2022 IEEE VIS Conference Short Paper Track

@inproceedings{mitra2022conversationalinteraction,
  title = {Facilitating Conversational Interaction in Natural Language Interfaces for Visualization},
  author = {{Mitra}, Rishab and {Narechania}, Arpit and {Endert}, Alex and {Stasko}, John},
  booktitle={2022 IEEE Visualization Conference (VIS)},
  url = {https://doi.org/10.48550/arXiv.2207.00189},
  doi = {10.48550/arXiv.2207.00189},
  year = {2022},
  publisher = {IEEE}
}

2021 IEEE TVCG Journal Full Paper (Proceedings of the 2020 IEEE VIS Conference)

@article{narechania2021nl4dv,
title = {{NL4DV}: A {Toolkit} for Generating {Analytic Specifications} for {Data Visualization} from {Natural Language} Queries},
shorttitle = {{NL4DV}},
author = {{Narechania}, Arpit and {Srinivasan}, Arjun and {Stasko}, John},
journal = {IEEE Transactions on Visualization and Computer Graphics},
doi = {10.1109/TVCG.2020.3030378},
year = {2021},
publisher = {IEEE}
}

Contact Us

If you have any questions, feel free to open a GitHub issue or contact Arpit Narechania.

License

The software is available under the MIT License.