How to prevent triples from getting mixed up while uploading to Dydra programmatically?


I am trying to upload some data to Dydra from a Sesame triplestore I have on my computer. While the download from Sesame works fine, the triples get mixed up (the s-p-o relationships change as the object of one becomes object of another). Can someone please explain why this is happening and how it can be resolved? The code is below:

#Querying the triplestore to retrieve all results sesameSparqlEndpoint = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name' sparql = SPARQLWrapper(sesameSparqlEndpoint) queryStringDownload = 'SELECT * WHERE {?s ?p ?o}' dataGraph = Graph() sparql.setQuery(queryStringDownload) sparql.method = 'GET' sparql.setReturnFormat(JSON) output = sparql.query().convert() print output for i in range(len(output['results']['bindings'])): #The encoding is necessary to parse non-English characters output['results']['bindings'][i]['s']['value'].encode('utf-8') try: subject_extract = output['results']['bindings'][i]['s']['value'] if 'http' in subject_extract: subject = "<" + subject_extract + ">" subject_url = URIRef(subject) print subject_url predicate_extract = output['results']['bindings'][i]['p']['value'] if 'http' in predicate_extract: predicate = "<" + predicate_extract + ">" predicate_url = URIRef(predicate) print predicate_url objec_extract = output['results']['bindings'][i]['o']['value'] if 'http' in objec_extract: objec = "<" + objec_extract + ">" objec_url = URIRef(objec) print objec_url else: objec = objec_extract objec_wip = '"' + objec + '"' objec_url = URIRef(objec_wip) # Loading the data on a graph dataGraph.add((subject_url,predicate_url,objec_url)) except UnicodeError as error: print error #Print all statements in dataGraph for stmt in dataGraph: pprint.pprint(stmt) # Upload to Dydra URL = 'http://dydra.com/login' key = 'my_key' with requests.Session() as s: resp = s.get(URL) soup = BeautifulSoup(resp.text,"html5lib") csrfToken = soup.find('meta',{'name':'csrf-token'}).get('content') # print csrf_token payload = { 'account[login]':key, 'account[password]':'', 'csrfmiddlewaretoken':csrfToken, 'next':'/' } # print payload p = s.post(URL,data=payload, headers=dict(Referer=URL)) # print p.text r = s.get('http://dydra.com/username/rep_name/sparql') # print r.text dydraSparqlEndpoint = 'http://dydra.com/username/rep_name/sparql' for stmt in dataGraph: queryStringUpload = 'INSERT DATA {%s %s %s}' % stmt sparql = SPARQLWrapper(dydraSparqlEndpoint) sparql.setCredentials(key,key) sparql.setQuery(queryStringUpload) sparql.method = 'POST' sparql.query()


A far simpler way to copy your data over (apart from using a CONSTRUCT query instead of a SELECT, like I mentioned in the comment) is simply to have Dydra itself directly access your Sesame endpoint, for example via a SERVICE-clause.

Execute the following on your Dydra database, and (after some time, depending on how large your Sesame database is), everything will be copied over:

INSERT { ?s ?p ?o } WHERE { SERVICE <http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name> { ?s ?p ?o } }

If the above doesn't work on Dydra, you can alternatively just directly access the RDF statements from your Sesame store by using the URI http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements. Assuming Dydra has an upload-feature where you can provide the URL of an RDF document, you can simply provide it the above URI and it should be able to load it.


The code above can work if the following changes are made:

<ol><li>Use CONSTRUCT query instead of SELECT. Details here -> <a href="https://stackoverflow.com/questions/34425876/how-to-iterate-over-construct-output-from-rdflib" rel="nofollow">How to iterate over CONSTRUCT output from rdflib?</a></li> <li>Use key as input for both account[login] and account[password]</li> </ol>

However, this is probably not the most efficient way. Primarily, doing individual INSERTs for every triple is not a good way. Dydra doesn't record all statements this way (I got only about 30% of the triples inserted). On the contrary, using the http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements method as suggested by Jeen enabled me to port all the data successfully.


