[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [rdf4j-dev] thoughts on RDFa ?
|
Hi Jerven,
Thanks, looks like a good test case indeed.
I'll give it a try and let you know when the code becomes somewhat stable and passes the basic tests.
Best regards
Bart
-----Original Message-----
From: Jerven Tjalling Bolleman <Jerven.Bolleman@sib.swiss>
Sent: dinsdag 7 mei 2019 21:42
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Cc: Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>
Subject: Re: [rdf4j-dev] thoughts on RDFa ?
Hi Bart,
I can test it against www.uniprot.org. It has quite a bit of RDFa (written by me) on the entry pages. Pages are not quite valid but not to bad in terms of deviance of spec.
I like RDFa these days and probably prefer it over JSON-LD for the schema.org markup that I need to do in the day job.
Regards,
Jerven
On 2019-05-07 20:06, Bart Hanssens (BOSA) wrote:
> FWIW, I have some initial code so I can start testing it against the
> RDFa testsuite.
>
> I’ve used JSoup since it is a well-maintained library with a nice API
>
> The only really annoying part seems to be the lack of line/column
> indication when an error occurs.
>
> (I guess I could first use Jsoup to create well-formed X(HT)ML, and
> then use SAX to iterate over the result,
>
> but it seems to be a bit of an overkill to include a new dependency
> just to only do tag balancing…)
>
> Technically, attoparser fits the bill (smaller, line/column
> indication), but there seems to be only 1 maintainer and one other
> contributor.
>
> Which does not say anything about the quality of the project of course
> 😊
>
> Best regards
>
> Bart
>
> FROM: Bart Hanssens (BOSA)
> SENT: dinsdag 30 april 2019 9:57
> TO: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
> SUBJECT: RE: [rdf4j-dev] thoughts on RDFa ?
>
> Hi Håvard,
>
> Well, I’m mainly looking into RDFa because of the (somewhat basic)
> support for RDFa in Drupal CMS.
>
> We’re running quite a few Drupal-websites, so this could come in
> handy…
>
> But “perfect syntax” and “website” is a rare combo, so I’ll use Jsoup
> or attoparser 😊
>
> Best regards
>
> Bart
>
> FROM: rdf4j-dev-bounces@xxxxxxxxxxx <rdf4j-dev-bounces@xxxxxxxxxxx> ON
> BEHALF OF Håvard Ottestad
> SENT: zaterdag 20 april 2019 12:26
> TO: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
> SUBJECT: Re: [rdf4j-dev] thoughts on RDFa ?
>
> Hi Bart,
>
> I have not used RDFa for anything. I do know that the metadata in
> images is rdf, and also that google is pushing for more jsonld on
> webpages.
>
> My experience with both SAX and jsoup are good. I usually use jsoup
> when I need to crawl webpages, and for this it is the best library I
> have used. Very robust and simple to use.
>
> SAX I use in my XmlToRdf converter for performance. I can convert 100
> mb of XML to turtle with only 20 mb of ram in less than 2 seconds on
> my laptop. It even works all the way down to 3 mb of ram, but then the
> parsing time jumps to around 10 seconds because of GC.
>
> I would recommend SAX for pure XML, perfect syntax, usecases. JAXB for
> when you want java objects, and jsoup for everything else.
>
> Håvard
>
> On 18 Apr 2019, at 18:46, Bart Hanssens (BOSA)
> <bart.hanssens@xxxxxxxxxxxx> wrote:
>
>> Hi,
>>
>> for scraping purposes, I'm looking into RDFa/RDFa-Lite and I'm
>> thinking about writing a RIO parser (see also issue #512).
>>
>> IIRC James did some experimental work on RDFa as well, but I think it
>> was based on SAX,
>>
>> so probably assuming that the source would be perfectly formatted
>> XHTML... which is rarely the case
>>
>> So currently I'm looking at using either attoparser (smaller,
>> event-driven) or jsoup (more frequently updated, DOM-interface),
>>
>> and there is a wonderful test suite available at
>> http://rdfa.info/test-suite/
>>
>> So I was wondering
>>
>> - are there other HTML parser I'd should look into (Jodd Lagarto ?
>> NekoHTML ?)
>>
>> - where should the testsuite go (if it gets CQ approval): I remember
>> some emails about moving the rdf4j-testsuite back into the main repo,
>> but I'm not sure what the conclusion was
>>
>> Thanks
>>
>> Bart
>
>> _______________________________________________
>> rdf4j-dev mailing list
>> rdf4j-dev@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or
>> unsubscribe from this list, visit
>> https://www.eclipse.org/mailman/listinfo/rdf4j-dev
> _______________________________________________
> rdf4j-dev mailing list
> rdf4j-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> https://www.eclipse.org/mailman/listinfo/rdf4j-dev
--
Jerven Tjalling Bolleman
SIB | Swiss Institute of Bioinformatics
CMU - 1, rue Michel Servet - 1211 Geneva 4
t: +41 22 379 58 85 - f: +41 22 379 58 58 Jerven.Bolleman@sib.swiss - http://www.sib.swiss