Project Plan

All previous releases and milestones are documented in the project plan archive.

SMILA Version 1.2

This release was published on April, 17th 2013.

  • Apache Tika integration - extracting text from binary content
  • JDBC-Crawler: Splitting functionality for scaling
  • Web-Crawling enhancements (robots.txt, boilerpipe integration)
  • Remote-Crawling
  • Cluster setup tutorial

Further plans

  • HDFS objectstore
  • Solr 4 integration/clustering
  • Alternative (Scripting) engine for synchronous workflows
  • Basic MapReduce support
  • General configuration management
Part of

Links
Supporting Organizations
Meet us at