42453

Generating TDB Dataset from archive containing N-TRIPLES files

Question:

Apologies, in advance, for a possible duplicate.

I have an archive containing 117,426 files (each in the N-TRIPLES format) that I wish to load into the default graph of a TDB dataset. Due to the large number of files, I need to be able to perform this import without manually selecting individual files for upload.

I am in Bash, with Jena and Fuseki distributions at my disposal.

If possible, I want to avoid the worst-case scenario of just writing a java application to do this. If I have to write a java application for this, what hooks exist in RIOT/TDB to perform programmatic bulk-loading?

Answer1:

As a genenral comment, one way is to concatenate the N-Triples files to generate one single file.

You can load many files at once with either tdbloader or tdbloader2.

tdbloader --loc DB ... your files ...

The 117,426 may strain you OS for a single command line invocation. You can pipe the files into tdbloader (it's just like concatenating the files first)

... | tdbloader --loc DB -- -

where ... is some way to get bash to cat the files (possible from a subshell).

e.g. (you'll need to adjust to file all 117,426 files):

( for x in data*.nt do cat $x done ) | tdbloader --loc DB -- -

Recommend

  • Load OWL schema into triple-store like Fuskei/TDB?
  • tdbloader2 fails with classpath error
  • NoClassDefFoundError: org/slf4j/LoggerFactory while creating a runnable *.jar with gradle
  • Desktop SPARQL client for Jena (TDB)?
  • Specifying class equivalence in Jena rules
  • Notepad++ - delete all lines with certain text
  • Gem not installing package
  • How to run bash commands like “npm install” on complie
  • Spring security - same page to deliver different content based on user role
  • change color of jstree node
  • Cannot invoke my method on the array type int[]
  • Lua: Line breaks in strings
  • Iron Router: How do I send data to the layout?
  • How can I get the full list of running processes on a Mac from a python app
  • perl, mysql - fasting way to upload a csv file into mysql?
  • how does System.Web.HttpRequest::PathInfo work?
  • Get specific string
  • Query to find the duplicates between the name and number in table
  • ViewController With Transparent Background Entering Current ViewController With Push Transition
  • Unable to get column index with table.getColumn method using custom table Model
  • Parse a date string in a specific locale (not timezone!)
  • azure media services - The request body is too large and exceeds the maximum permissible limit
  • How to define and use opencv mat of user type
  • Test if a set exists before trying to drop it
  • Allowing both email and username for authentication
  • Chrome doesn't support silverlight anymore? How to solve this?
  • Read a local file using javascript
  • Django: Count of Group Elements
  • Handling un-mapped Rest path
  • What is the “return” in scheme?
  • recyclerView does not call the onBindViewHolder when scroll in the view
  • JSON with duplicate key names losing information when parsed
  • How to make Safari send if-modified-since header?
  • Return words with double consecutive letters
  • How to pass list parameters for each object using Spring MVC?
  • Python: how to group similar lists together in a list of lists?
  • coudnt use logback because of log4j
  • Setting background image for body element in xhtml (for different monitors and resolutions)
  • JaxB to read class hierarchy
  • Busy indicator not showing up in wpf window [duplicate]