[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
AW: AW: [smila-dev] SSS & Persistence Questions
|
Hi Markus,
#2:
As said before, Record Attributes (Metadata) are stored in the XMLStorage, Record Attachments (the content of crawled files) in the BinaryStorage. You could use the API of BinaryStorage directly to access the content, but in SMILA we recommend to access data via the BlackBoardService (see http://wiki.eclipse.org/SMILA/Documentation/Usage_of_Blackboard_Service). It is an abstraction layer for Records, so that users don't have to know what data is stored in what storage. The access to records is based on IDs, so you have to provide a Record's ID to access it.
I guess that you don't want to add any processing steps to the "addpipeline", but that your question is aimed at how to access the crawled data after the initial processing is finished (perhaps even from outside of SMILA). There is currently no single answer to this question. The data is definitely accessible, either
- via ID based access on the BlackboardService
- via queries on Lucene
- via XQueries on the XMLStorage
If you want to trigger the access on data from outside of SMILA, you have to provide some kind of access point, e.g.
- a servlet in tomcat
- a webservice
- a JMX Agent
- etc.
which in turn has access to the data in one of the above described ways. We are planning to provide remote access for SMILA services, but the concept is not finished, yet.
Hope this helps. Once you have a concrete use case it is much easier to suggest a way that fits your needs.
Bye,
Daniel
> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
> bounces@xxxxxxxxxxx] Im Auftrag von Markus Döhring
> Gesendet: Samstag, 8. November 2008 12:02
> An: 'Smila project developer mailing list'
> Betreff: AW: AW: [smila-dev] SSS & Persistence Questions
>
> Hi all,
>
> #1: I tried it both with a keystore containing the needed certificate
> chain
> as well as with providing no keystore in the configuration xml. Both
> yielded
> the same exception as I already described.
>
> #2: Currently that is no particular use case, I just want to be able to
> directly access and dump the (raw/unprocessed!) content which has been
> crawled. If it's already within the XML store - fine ! :) ... then for
> a
> first step, I just need to now how to access it. (sorry if this is
> described
> somewhere in the wiki, then just post me a link please...)
>
>
> Thanks!
>
> Markus