RDF to CSV Conversion Tools

Overview

This package provides two scripts for converting RDF files to CSV format:

rdf-to-csv.ts - Converts individual RDF files to CSV
rdf-folder-to-csv.ts - Batch converts entire directory structures

Both tools support intelligent namespace prefix handling, multilingual properties with repeatability based on DCTAP profiles, and multiple RDF formats including Turtle, N-Triples, RDF/XML, and JSON-LD.

Installation
Single File Converter
Batch Conversion Tool
Supported RDF Formats
Features
Troubleshooting
Extending the Tool

Installation

The scripts are located in /scripts/ and require the following dependencies:

yarn add n3 csv-parse csv-stringify commander @types/n3 jsonld @types/jsonld rdfxml-streaming-parser glob @types/glob chalk ora
yarn add -D tsx  # For running TypeScript files directly

Single File Converter

Basic Usage

yarn rdf:to-csv <rdf-file> [options]

Or directly:

tsx scripts/rdf-to-csv.ts <rdf-file> [options]

Command-Line Options

-p, --profile <dctap-file>: Path to DCTAP profile CSV file (optional)
-o, --output <output-file>: Output CSV file path (default: stdout)
-f, --format <format>: Force RDF format (auto-detected by default)
-h, --help: Display help information
-V, --version: Display version information

Examples

Basic conversion to stdout:
```
yarn rdf:to-csv /path/to/file.ttl
```

Save to file:

yarn rdf:to-csv /path/to/file.ttl -o output.csv

Use DCTAP profile:

yarn rdf:to-csv /path/to/file.ttl -p profile.csv -o output.csv

Real-world examples:

# Turtle format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/ttl/ns/isbd/elements.ttl -o isbd-elements.csv

# N-Triples format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/nt/ns/isbd/elements.nt -o isbd-elements.csv

# RDF/XML format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/xml/ns/isbd/elements.xml -o isbd-elements.csv

# JSON-LD format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/jsonld/ns/isbd/elements.jsonld -o isbd-elements.csv

# Force format detection
yarn rdf:to-csv data.rdf -f application/rdf+xml -o output.csv

Batch Conversion Tool

The rdf-folder-to-csv.ts script recursively converts all RDF files in a directory structure to CSV format, preserving the folder hierarchy.

Usage

yarn rdf:folder-to-csv <source-dir> [output-dir] [options]

Command-Line Options

-p, --profile <dctap-file>: Path to DCTAP profile CSV file
-d, --dry-run: Show what would be converted without actually converting
-v, --verbose: Show detailed progress information
-e, --extensions <exts>: Comma-separated list of file extensions to process
-h, --help: Display help information
-V, --version: Display version information

Default Output Directory

The tool intelligently determines the default output directory:

If source directory ends with /ttl, it replaces it with /csv
Otherwise, it adds _csv suffix to the source directory name

Examples

Basic batch conversion:

# Converts /path/to/ttl/* to /path/to/csv/*
yarn rdf:folder-to-csv /path/to/ttl

Specify output directory:

yarn rdf:folder-to-csv /path/to/rdf /path/to/output

Dry run to preview:

yarn rdf:folder-to-csv /path/to/ttl --dry-run

Verbose output with progress:

yarn rdf:folder-to-csv /path/to/ttl --verbose

With DCTAP profile:

yarn rdf:folder-to-csv /path/to/ttl -p profile.csv

Custom file extensions:

# Only process .ttl and .rdf files
yarn rdf:folder-to-csv /path/to/mixed -e ".ttl,.rdf"

Real-world example:

# Convert ISBD vocabularies
yarn rdf:folder-to-csv /Users/jonphipps/Code/ISBDM/static/vocabs/ttl
# Output: /Users/jonphipps/Code/ISBDM/static/vocabs/csv/

Features

Recursive processing: Finds all RDF files in subdirectories
Structure preservation: Maintains the same folder hierarchy in output
Progress reporting: Shows real-time conversion progress
Error handling: Continues processing even if individual files fail
Summary statistics: Reports success/failure counts
Flexible extensions: Configurable file type filtering

Supported RDF Formats

The tool automatically detects RDF format based on file extension:

Extension	Format	MIME Type	Support Level
`.ttl`, `.turtle`	Turtle	`text/turtle`	Full
`.n3`	Notation3	`text/n3`	Full
`.nt`, `.ntriples`	N-Triples	`application/n-triples`	Full
`.rdf`, `.xml`, `.owl`	RDF/XML	`application/rdf+xml`	Full
`.jsonld`, `.json`	JSON-LD	`application/ld+json`	Full*

*JSON-LD files that reference remote contexts will use local fallback context files if available. The tool includes context files for common vocabularies in static/data/contexts/.

Features

1. Dynamic Namespace Prefix Extraction

The tool automatically extracts namespace prefixes from:

@prefix declarations in the RDF file
Intelligent inference from URI patterns in the data

2. Intelligent Namespace Inference

The script uses a tree-based algorithm to infer namespace prefixes when not explicitly declared:

Builds a URI segment tree from all URIs in the RDF data
Identifies namespace boundaries by analyzing:
- Branching points in the URI structure
- Frequency of URI patterns
- Common namespace indicators (ns, vocab, onto, schema)
Prioritizes namespaces based on depth and frequency
Generates readable prefixes from URI segments

Example inference:

http://iflastandards.info/ns/isbd/ → isbd:
http://iflastandards.info/ns/isbd/elements/P1001 → isbd:elements/P1001
http://iflastandards.info/ns/isbd/unc/elements/P1001 → isbd:unc/elements/P1001

3. Multilingual Property Support

Properties with language tags are handled with separate columns:

skos:definition@en - English definition
skos:definition@es - Spanish definition
Properties without language tags have no suffix

4. Repeatable Property Handling

Repeatable properties (as defined in DCTAP or defaults) use indexed columns:

skos:definition@en[0] - First English definition
skos:definition@en[1] - Second English definition
rdfs:label@es[0] - First Spanish label
rdfs:label@es[1] - Second Spanish label

Default repeatable properties:

skos:definition
skos:scopeNote
rdfs:label

5. CSV Output Format

The CSV output includes:

First column: Resource URI (as CURIE if possible)
Subsequent columns: Property values with headers indicating property, language, and index
All URIs converted to CURIEs where possible for readability

DCTAP Profile Format

The DCTAP profile should be a CSV file with at least these columns:

propertyID	repeatable
skos:definition	true
rdfs:label	true
dcterms:title	false

The repeatable column accepts: true, yes, 1 (case-insensitive) for repeatable properties.

Example DCTAP profile:

shapeID,shapeLabel,propertyID,propertyLabel,mandatory,repeatable,valueDataType,valueShape,valueConstraint,valueConstraintType,note
,,skos:definition,Definition,false,true,,,,,
,,rdfs:label,Label,true,true,,,,,
,,dcterms:created,Date Created,false,false,xsd:date,,,,

Algorithm Details

Namespace Inference Algorithm

The namespace inference works in multiple phases:

Tree Construction Phase:
- Parse all URIs into a hierarchical tree structure
- Track frequency of each path segment
Candidate Identification Phase:
- Identify potential namespace boundaries based on:
  - Multiple child branches (indicating a common parent)
  - High frequency (used by many resources)
  - Namespace indicators in path segments
  - Reasonable depth (2-5 segments)
Scoring and Selection Phase:
- Score candidates: score = frequency × log(depth + 1)
- Higher scores indicate better namespace candidates
- Avoid overlapping namespaces
Prefix Generation Phase:
- Extract meaningful prefix from last segment
- Ensure no prefix conflicts

CSV Generation Process

Resource Extraction:
- Identify all unique subject URIs
- Group properties by subject
Header Generation:
- Count all property-language combinations
- Determine maximum occurrences for repeatable properties
- Generate indexed headers for repeatable properties
Row Generation:
- Convert resource URIs to CURIEs
- Map property values to appropriate columns
- Handle language tags and indices correctly

Output Example

Given this RDF:

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://example.org/concept/123>
    rdfs:label "Example"@en ;
    rdfs:label "Ejemplo"@es ;
    skos:definition "First definition"@en ;
    skos:definition "Second definition"@en ;
    skos:definition "Definición"@es .

The CSV output would be:

uri,rdfs:label@en,rdfs:label@es,skos:definition@en[0],skos:definition@en[1],skos:definition@es
ex:concept/123,Example,Ejemplo,First definition,Second definition,Definición

Troubleshooting

Common Issues

Missing prefixes in output:
- Check if the RDF file contains @prefix declarations
- The inference algorithm needs sufficient data to detect patterns
- Manually add prefixes to the DEFAULT_PREFIXES in the script if needed
Properties not marked as repeatable:
- Provide a DCTAP profile with the -p option
- Or modify the default repeatable properties in the script
Memory issues with large files:
- The script loads the entire RDF file into memory
- Consider splitting large files or increasing Node.js memory limit
JSON-LD remote context errors:
- Ensure local context files exist in static/data/contexts/
- Check that the context file name matches the expected pattern
- Verify the local context file contains proper JSON-LD context definitions

Debugging

Run with stderr to see inference information:

yarn rdf:to-csv input.ttl -o output.csv 2> debug.log

The debug output shows:

Extracted prefixes from the RDF file
Inferred namespace prefixes
Number of resources found

Extending the Tool

Adding Additional RDF Format Support

To add support for new RDF formats:

Update format detection in detectRdfFormat():

const formatMap: Record<string, string> = {
  '.newext': 'application/new-format',
  // ... existing formats
};

Add parser support:
- For formats supported by N3.js: Update the format parameter
- For other formats: Add a new parser library and handle in parseRdfFile()
Handle format-specific requirements:
- Namespace extraction
- Special parsing considerations
- Error handling

JSON-LD Context Files

The tool automatically handles JSON-LD files that reference remote contexts by using local fallback context files. When a JSON-LD file references a remote context URL that is unavailable, the tool will:

Detect the remote context URL (e.g., http://metadataregistry.org/uri/schema/Contexts/elements_langmap.jsonld)
Load the corresponding local context file from static/data/contexts/
Use the local context to properly parse the JSON-LD data

Available local context files:

elements_langmap.jsonld - For RDFS/OWL elements (metadataregistry.org element sets)
concepts_langmap.jsonld - For SKOS concepts (metadataregistry.org concept schemes)

These context files define the necessary mappings for custom properties used in metadata registry JSON-LD files, ensuring proper conversion even when remote contexts are unavailable.

Custom Namespace Inference

To customize namespace detection:

Modify the namespaceIndicators array
Adjust scoring algorithm in inferNamespaceFromURIs()
Change depth limits or frequency thresholds

Additional Output Formats

The tool could be extended to support:

JSON-LD output
Direct database insertion
Custom delimiters or quoting styles

Overview​

Table of Contents​

Installation​

Single File Converter​

Basic Usage​

Command-Line Options​

Examples​

Batch Conversion Tool​

Usage​

Command-Line Options​

Default Output Directory​

Examples​

Features​

Supported RDF Formats​

Features​

1. Dynamic Namespace Prefix Extraction​

2. Intelligent Namespace Inference​

3. Multilingual Property Support​

4. Repeatable Property Handling​

5. CSV Output Format​

DCTAP Profile Format​

Algorithm Details​

Namespace Inference Algorithm​

CSV Generation Process​

Output Example​

Troubleshooting​

Common Issues​

Debugging​

Extending the Tool​

Adding Additional RDF Format Support​

JSON-LD Context Files​

Custom Namespace Inference​

Additional Output Formats​

Overview

Table of Contents

Installation

Single File Converter

Basic Usage

Command-Line Options

Examples

Batch Conversion Tool

Usage

Command-Line Options

Default Output Directory

Examples

Features

Supported RDF Formats

Features

1. Dynamic Namespace Prefix Extraction

2. Intelligent Namespace Inference

3. Multilingual Property Support

4. Repeatable Property Handling

5. CSV Output Format

DCTAP Profile Format

Algorithm Details

Namespace Inference Algorithm

CSV Generation Process

Output Example

Troubleshooting

Common Issues

Debugging

Extending the Tool

Adding Additional RDF Format Support

JSON-LD Context Files

Custom Namespace Inference

Additional Output Formats