RDF to CSV Conversion Tools
Overview
This package provides two scripts for converting RDF files to CSV format:
rdf-to-csv.ts- Converts individual RDF files to CSVrdf-folder-to-csv.ts- Batch converts entire directory structures
Both tools support intelligent namespace prefix handling, multilingual properties with repeatability based on DCTAP profiles, and multiple RDF formats including Turtle, N-Triples, RDF/XML, and JSON-LD.
Table of Contents
- Installation
- Single File Converter
- Batch Conversion Tool
- Supported RDF Formats
- Features
- Troubleshooting
- Extending the Tool
Installation
The scripts are located in /scripts/ and require the following dependencies:
yarn add n3 csv-parse csv-stringify commander @types/n3 jsonld @types/jsonld rdfxml-streaming-parser glob @types/glob chalk ora
yarn add -D tsx # For running TypeScript files directly
Single File Converter
Basic Usage
yarn rdf:to-csv <rdf-file> [options]
Or directly:
tsx scripts/rdf-to-csv.ts <rdf-file> [options]
Command-Line Options
-p, --profile <dctap-file>: Path to DCTAP profile CSV file (optional)-o, --output <output-file>: Output CSV file path (default: stdout)-f, --format <format>: Force RDF format (auto-detected by default)-h, --help: Display help information-V, --version: Display version information
Examples
-
Basic conversion to stdout:
yarn rdf:to-csv /path/to/file.ttl -
Save to file:
yarn rdf:to-csv /path/to/file.ttl -o output.csv -
Use DCTAP profile:
yarn rdf:to-csv /path/to/file.ttl -p profile.csv -o output.csv -
Real-world examples:
# Turtle format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/ttl/ns/isbd/elements.ttl -o isbd-elements.csv
# N-Triples format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/nt/ns/isbd/elements.nt -o isbd-elements.csv
# RDF/XML format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/xml/ns/isbd/elements.xml -o isbd-elements.csv
# JSON-LD format
yarn rdf:to-csv /Users/jonphipps/Code/isbd/jsonld/ns/isbd/elements.jsonld -o isbd-elements.csv
# Force format detection
yarn rdf:to-csv data.rdf -f application/rdf+xml -o output.csv
Batch Conversion Tool
The rdf-folder-to-csv.ts script recursively converts all RDF files in a directory structure to CSV format, preserving the folder hierarchy.
Usage
yarn rdf:folder-to-csv <source-dir> [output-dir] [options]
Command-Line Options
-p, --profile <dctap-file>: Path to DCTAP profile CSV file-d, --dry-run: Show what would be converted without actually converting-v, --verbose: Show detailed progress information-e, --extensions <exts>: Comma-separated list of file extensions to process-h, --help: Display help information-V, --version: Display version information
Default Output Directory
The tool intelligently determines the default output directory:
- If source directory ends with
/ttl, it replaces it with/csv - Otherwise, it adds
_csvsuffix to the source directory name
Examples
-
Basic batch conversion:
# Converts /path/to/ttl/* to /path/to/csv/*
yarn rdf:folder-to-csv /path/to/ttl -
Specify output directory:
yarn rdf:folder-to-csv /path/to/rdf /path/to/output -
Dry run to preview:
yarn rdf:folder-to-csv /path/to/ttl --dry-run -
Verbose output with progress:
yarn rdf:folder-to-csv /path/to/ttl --verbose -
With DCTAP profile:
yarn rdf:folder-to-csv /path/to/ttl -p profile.csv -
Custom file extensions:
# Only process .ttl and .rdf files
yarn rdf:folder-to-csv /path/to/mixed -e ".ttl,.rdf" -
Real-world example:
# Convert ISBD vocabularies
yarn rdf:folder-to-csv /Users/jonphipps/Code/ISBDM/static/vocabs/ttl
# Output: /Users/jonphipps/Code/ISBDM/static/vocabs/csv/
Features
- Recursive processing: Finds all RDF files in subdirectories
- Structure preservation: Maintains the same folder hierarchy in output
- Progress reporting: Shows real-time conversion progress
- Error handling: Continues processing even if individual files fail
- Summary statistics: Reports success/failure counts
- Flexible extensions: Configurable file type filtering
Supported RDF Formats
The tool automatically detects RDF format based on file extension:
| Extension | Format | MIME Type | Support Level |
|---|---|---|---|
.ttl, .turtle | Turtle | text/turtle | Full |
.n3 | Notation3 | text/n3 | Full |
.nt, .ntriples | N-Triples | application/n-triples | Full |
.rdf, .xml, .owl | RDF/XML | application/rdf+xml | Full |
.jsonld, .json | JSON-LD | application/ld+json | Full* |
*JSON-LD files that reference remote contexts will use local fallback context files if available. The tool includes context files for common vocabularies in static/data/contexts/.
Features
1. Dynamic Namespace Prefix Extraction
The tool automatically extracts namespace prefixes from:
@prefixdeclarations in the RDF file- Intelligent inference from URI patterns in the data
2. Intelligent Namespace Inference
The script uses a tree-based algorithm to infer namespace prefixes when not explicitly declared:
- Builds a URI segment tree from all URIs in the RDF data
- Identifies namespace boundaries by analyzing:
- Branching points in the URI structure
- Frequency of URI patterns
- Common namespace indicators (ns, vocab, onto, schema)
- Prioritizes namespaces based on depth and frequency
- Generates readable prefixes from URI segments
Example inference:
http://iflastandards.info/ns/isbd/→isbd:http://iflastandards.info/ns/isbd/elements/P1001→isbd:elements/P1001http://iflastandards.info/ns/isbd/unc/elements/P1001→isbd:unc/elements/P1001
3. Multilingual Property Support
Properties with language tags are handled with separate columns:
skos:definition@en- English definitionskos:definition@es- Spanish definition- Properties without language tags have no suffix
4. Repeatable Property Handling
Repeatable properties (as defined in DCTAP or defaults) use indexed columns:
skos:definition@en[0]- First English definitionskos:definition@en[1]- Second English definitionrdfs:label@es[0]- First Spanish labelrdfs:label@es[1]- Second Spanish label
Default repeatable properties:
skos:definitionskos:scopeNoterdfs:label
5. CSV Output Format
The CSV output includes:
- First column: Resource URI (as CURIE if possible)
- Subsequent columns: Property values with headers indicating property, language, and index
- All URIs converted to CURIEs where possible for readability
DCTAP Profile Format
The DCTAP profile should be a CSV file with at least these columns:
| propertyID | repeatable |
|---|---|
| skos:definition | true |
| rdfs:label | true |
| dcterms:title | false |
The repeatable column accepts: true, yes, 1 (case-insensitive) for repeatable properties.
Example DCTAP profile:
shapeID,shapeLabel,propertyID,propertyLabel,mandatory,repeatable,valueDataType,valueShape,valueConstraint,valueConstraintType,note
,,skos:definition,Definition,false,true,,,,,
,,rdfs:label,Label,true,true,,,,,
,,dcterms:created,Date Created,false,false,xsd:date,,,,
Algorithm Details
Namespace Inference Algorithm
The namespace inference works in multiple phases:
-
Tree Construction Phase:
- Parse all URIs into a hierarchical tree structure
- Track frequency of each path segment
-
Candidate Identification Phase:
- Identify potential namespace boundaries based on:
- Multiple child branches (indicating a common parent)
- High frequency (used by many resources)
- Namespace indicators in path segments
- Reasonable depth (2-5 segments)
- Identify potential namespace boundaries based on:
-
Scoring and Selection Phase:
- Score candidates:
score = frequency × log(depth + 1) - Higher scores indicate better namespace candidates
- Avoid overlapping namespaces
- Score candidates:
-
Prefix Generation Phase:
- Extract meaningful prefix from last segment
- Ensure no prefix conflicts
CSV Generation Process
-
Resource Extraction:
- Identify all unique subject URIs
- Group properties by subject
-
Header Generation:
- Count all property-language combinations
- Determine maximum occurrences for repeatable properties
- Generate indexed headers for repeatable properties
-
Row Generation:
- Convert resource URIs to CURIEs
- Map property values to appropriate columns
- Handle language tags and indices correctly
Output Example
Given this RDF:
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://example.org/concept/123>
rdfs:label "Example"@en ;
rdfs:label "Ejemplo"@es ;
skos:definition "First definition"@en ;
skos:definition "Second definition"@en ;
skos:definition "Definición"@es .
The CSV output would be:
uri,rdfs:label@en,rdfs:label@es,skos:definition@en[0],skos:definition@en[1],skos:definition@es
ex:concept/123,Example,Ejemplo,First definition,Second definition,Definición
Troubleshooting
Common Issues
-
Missing prefixes in output:
- Check if the RDF file contains
@prefixdeclarations - The inference algorithm needs sufficient data to detect patterns
- Manually add prefixes to the DEFAULT_PREFIXES in the script if needed
- Check if the RDF file contains
-
Properties not marked as repeatable:
- Provide a DCTAP profile with the
-poption - Or modify the default repeatable properties in the script
- Provide a DCTAP profile with the
-
Memory issues with large files:
- The script loads the entire RDF file into memory
- Consider splitting large files or increasing Node.js memory limit
-
JSON-LD remote context errors:
- Ensure local context files exist in
static/data/contexts/ - Check that the context file name matches the expected pattern
- Verify the local context file contains proper JSON-LD context definitions
- Ensure local context files exist in
Debugging
Run with stderr to see inference information:
yarn rdf:to-csv input.ttl -o output.csv 2> debug.log
The debug output shows:
- Extracted prefixes from the RDF file
- Inferred namespace prefixes
- Number of resources found
Extending the Tool
Adding Additional RDF Format Support
To add support for new RDF formats:
-
Update format detection in
detectRdfFormat():const formatMap: Record<string, string> = {
'.newext': 'application/new-format',
// ... existing formats
}; -
Add parser support:
- For formats supported by N3.js: Update the
formatparameter - For other formats: Add a new parser library and handle in
parseRdfFile()
- For formats supported by N3.js: Update the
-
Handle format-specific requirements:
- Namespace extraction
- Special parsing considerations
- Error handling
JSON-LD Context Files
The tool automatically handles JSON-LD files that reference remote contexts by using local fallback context files. When a JSON-LD file references a remote context URL that is unavailable, the tool will:
- Detect the remote context URL (e.g.,
http://metadataregistry.org/uri/schema/Contexts/elements_langmap.jsonld) - Load the corresponding local context file from
static/data/contexts/ - Use the local context to properly parse the JSON-LD data
Available local context files:
elements_langmap.jsonld- For RDFS/OWL elements (metadataregistry.org element sets)concepts_langmap.jsonld- For SKOS concepts (metadataregistry.org concept schemes)
These context files define the necessary mappings for custom properties used in metadata registry JSON-LD files, ensuring proper conversion even when remote contexts are unavailable.
Custom Namespace Inference
To customize namespace detection:
- Modify the
namespaceIndicatorsarray - Adjust scoring algorithm in
inferNamespaceFromURIs() - Change depth limits or frequency thresholds
Additional Output Formats
The tool could be extended to support:
- JSON-LD output
- Direct database insertion
- Custom delimiters or quoting styles