Anyone on here work in Data Science, specifically with R?
I’m workin on a project to get data science going, and we are running into some issues. My background is in data warehousing, workin with MS SQL, SSIS and SSAS. The data scientist we have is not great technically, so there is a disconnect.
Basically he is running out of RAM when running his queries in R, even with a server with 140GB of RAM. from what I understand, some stats functions end up duplicating the data, meaning you need a huge amount more ram than the data set? This seems illogical to me as a database person.
We are looking a revolution analytics, now MS, or Spark, but I don’t think we actually have that big a data set compared to what people using Hadoop and Spark are talking about, so I don’t understand why this is so difficult.
Any pointers?