Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [science-iwg] tools for data import

Hi Scott, 

This seems like a job that can equally well be tackled with R.

For the fastest and most powerful data manipulation framework, have a look at data.table:

https://cran.r-project.org/web/packages/data.table/index.html
https://github.com/Rdatatable/data.table/wiki

Disclaimer: one of the data.table core developers works at Open Analytics.

Best,
Tobias

----- Original Message -----
> From: "Scott Lewis" <slewis@xxxxxxxxxxxxx>
> To: science-iwg@xxxxxxxxxxx
> Sent: Thursday, April 21, 2016 6:28:17 PM
> Subject: [science-iwg] tools for data import

> Question:
> 
> I'm working with some non-profits that are interested in doing analyses
> on datasets from a number of sources...e.g. certainly sql dbs, nosql
> dbs, but also lots of data sources that are represented as csv
> files/simple tables.    A tedious and time-consuming reality is that
> since these are from different orgs, these csv files are not formatted
> the same way (wrt missing values, column types, delimiters, encoding,
> etc) that it's often necessary to do a fair amount of trivial
> 'massaging' of the data so that it can be imported into a db.
> 
> I've been working with Python and Pandas csv library, but I'm curious
> what tools/tooling folks are using or recommend for similar kinds of
> data import problems.
> 
> Thanksinadvance,
> 
> Scott
> _______________________________________________
> science-iwg mailing list
> science-iwg@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from
> this list, visit
> https://dev.eclipse.org/mailman/listinfo/science-iwg


Back to the top