Project Plan For SMILA, version 1.2
SMILA is an extensible framework for building big data and/or search solutions to access and process unstructured information in the enterprise. Besides providing essential infrastructure components and services, SMILA also delivers ready-to-use add-on components, like connectors to most relevant data sources. Using the framework as their basis will enable developers to concentrate on the creation of higher value solutions, like semantic driven applications etc.
- Core and add-ons (includes core components as well as ready-to-use add-on components like various data connectors and BPEL services) available as compressed archive (ZIP file).
The target date for availability of SMILA 1.2 is April 17th, 2013.
SMILA 1.2 depends on Equinox 4.2. For this release, the sources will be written and compiled against Java Development Kit (JDK) 7 and designed to run on Java Runtime Environment (JRE) 7, Standard Edition.
- Apache Tika integration - extracting text from binary content
- JDBC-Crawler: Splitting functionality for scaling
- Web-Crawling enhancements (robots.txt, boilerpipe integration)
- Remote Crawling
- Cluster setup tutorial