Naval Procedure Analysis: Exploring the NavProc 1.0 Corpus

Published On Mon Sep 02 2024
Naval Procedure Analysis: Exploring the NavProc 1.0 Corpus

Annotated Procedural Texts in the Naval Domain

Part of the book series: Lecture Notes in Computer Science (LNAI, volume 15048)

Introduction

In this work, we introduce the NavProc 1.0 Corpus – a medium-scale, annotated corpus of procedural texts within the naval domain – for use as a first step in modeling procedural structures derived from real-world data sources. We have rigorously produced annotations of frame semantics across verbal, nominal, and adjectival frames. Furthermore, we have annotated 21 distinct types of semantic markers and structural links between textual elements resulting in a text-focused graph of semantic elements. This graph can be used to derive a more complex procedure structure for use in personnel training, simulation, or collaborative procedure execution.

ADCIRC Architecture - ADCIRC

Annotation Effort Details

This annotation effort has encompassed 158 procedural units composed of 2,316 sentences, 44,459 tokens, and 48,137 distinct span annotations. Additionally, LLM-based extraction scores were described and reported for use as a baseline in future research utilizing this dataset.

Curation Process

The curation process was carried out using LCC’s ATESSA web-based interface for document processing and management. Entities are included implicitly as frame roles and other textual spans. Noun adjuncts have been observed as a semantically ambiguous structure since the time of the Nombank project.

Attessa Streaming Amplifier | Roksan

Further Details

Note that these procedural units vary in length, with some being just a few lines long while others span multiple pages and contain subunits. More information about the project can be found on GitHub.