Open Data Exploration Dataset

Summary

26 chat-based student conversation transcripts about the content of the two Austrian open data portals:

data.gv.at (13 conversations)
opendataportal.at (13 conversations)

16 conversations included elements of exploration with listing of available categories, dataset samples, facets, etc., while other 10 were purely search-based interactions focused on addressing specific questions directly. 24 conversation transcripts as samples with a successful interacion outcome, i.e. resulting in satisfied information need and positive user feedback, 2 - unsuccessful interacion outcome.

Annotations

The transcripts are manually annotated with the dialog act types (intents/responses) and the speaker identifier (agent/User). The new lines separators were inserted within the same message to indicate the spans of text with different message types.

Annotation template: message_type>speaker_turn>message e.g. greeting>E>Hi?

We differentiate between 15 different dialog act types, which correspond to the basic operations available for data exploration.

Dialog acts used by both Seeker and Intermediary:

 * greeting(), indicates common start of the conversation, e.g. "hello"
 * confirm(), explicitly confirms the direction for exploration
 * verify(), prompts to confirm the direction for exploration or the results
 * success(), indicates end of the conversation with a successful outcome
 * prompt(link), suggests/asks for a direct link to the dataset

Dialog acts of an Intermediary:

 * prompt(keywords), indicates a general information request
 * list(keywords), indicates available options on different levels (attributes, items)
 * bool(data), reports existance of the requested set of items (datasets/columns/attributes)
 * top(keywords), reports the top (most frequent) items (rows/attributes)
 * count(data), reports the count of items (rows/datasets/attributes)
 * link(dataset), reports the direct link to a dataset

Dialog acts of a Seeker:

 * question(data), indicates a general information request, e.g. "what data do you have?"
 * set(keywords), indicates the choice of a specific direction for exploration, e.g. "have you got smth on population statistics?"
 * reject(), explicitly rejects the direction for exploration
 * more(), indicates a request for more items from the same equivalence class

Dialog turn separators:

A for the agent utterance
U for the user utterance

[[]] indicates concept span

Codes for span annotations:

[[]]H* - greeting;
[[]]G* - question about the data availability/need;
[[]]F* - attribute (facet) as a selection option for the next exploration direction, e.g. category;
[[]]E* - entity option (facet value) for exploration, e.g. finance;
[[]]Q* - cardinality of the corresponding item set;
[[]]R* - unique identifier of an item, e.g. title or link to the dataset;
[[]]+* - positive feedback;
[[]]-* - negative feedback.

Average number of concepts per utterance 2

Maximum number of concepts per utterance 16

Process models

Produced with ProM: Inductive Visual Miner and Declarative model?

License

All work in this repository is under the MIT license

Acknowledgement

If you find this work useful, feel free to cite us:

@article{DBLP:journals/corr/abs-2012-03704,
  author       = {Svitlana Vakulenko and
                  Vadim Savenkov and
                  Maarten de Rijke},
  title        = {Conversational Browsing},
  journal      = {CoRR},
  volume       = {abs/2012.03704},
  year         = {2020},
  url          = {https://arxiv.org/abs/2012.03704},
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data_gv_at		data_gv_at
opendataportal_at		opendataportal_at
process_diagrams		process_diagrams
questionnaires		questionnaires
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
concept_stats.txt		concept_stats.txt
conversations_log.csv		conversations_log.csv
dataset_stats.csv		dataset_stats.csv
dataset_summary.csv		dataset_summary.csv
transcripts_parser.py		transcripts_parser.py
turn_stats.txt		turn_stats.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Data Exploration Dataset

Summary

Annotations

Process models

License

Acknowledgement

About

Releases

Packages

Languages

License

svakulenk0/ODExploration_data

Folders and files

Latest commit

History

Repository files navigation

Open Data Exploration Dataset

Summary

Annotations

Process models

License

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages