Forum search & shortcuts

Big Data
 

[Closed] Big Data

Posts: 91171
Free Member
Topic starter
 
[#6568493]

Anyone working with this concept?

Just curious to know what people are up to.

NB I'm not talking about obesity statistics.


 
Posted : 20/10/2014 6:31 pm
Posts: 30656
Free Member
 

Timely...

http://www.bbc.co.uk/programmes/p028cm6q


 
Posted : 20/10/2014 6:35 pm
Posts: 0
Free Member
 

Interesting listen. The sources of most big data are not statistically representative, they have bias. Especially if you think of data from markets with much lower tech use or from consumer groups that are too exclusive. There are no shortcuts to gathering and generating statistically significant data. It is an expensive process, even if using a small enough sample size for it to statistically stand up.


 
Posted : 20/10/2014 7:11 pm
Posts: 0
Free Member
 

pfft just another phrase for the merry go round of IT issues that go on circles About every 10 years

if it didn't have a new name they couldn't sell new Courses


 
Posted : 20/10/2014 7:11 pm
 GJP
Posts: 0
Free Member
 

I am starting to take an interest from a CRM perspective. Spent most of the weekend googling Hadoop and casandra stuff. As business architect I found it tough going. Not doing anything actively yet.


 
Posted : 20/10/2014 7:15 pm
Posts: 1012
Free Member
 

Yep, not a big fan of the phrase as it is just marketing crap.

Very few sets of data meet the 3 Vs required (Velocity, volume and variety). The term is now just used to describe what Business Intelligence and data analytics should have been all along, although most people didn't really do this,


 
Posted : 20/10/2014 7:20 pm
Posts: 23340
Free Member
 

been looking at kaggle.com recently with a half baked view to a change in career. matlab nerd here but thinking about learning R or SciPy to give it a go.


 
Posted : 20/10/2014 8:04 pm
Posts: 0
Free Member
 

Not doing anything actively yet.

Pretty much sums up the industry at the moment. Follow @BigdataBorat for a funny angle on big data.


 
Posted : 20/10/2014 8:07 pm
Posts: 27
Free Member
 

Yep, not a big fan of the phrase as it is just marketing crap.

^^this.
it's the new "digital". it's quickly becoming an overused term; companies using it to sound forward-thinking and innovative.


 
Posted : 20/10/2014 8:08 pm
Posts: 0
Free Member
 

Big Data is like teenage sex.

Everyone thinks everyone else but them is doing it. In truth, only a very few people are doing it.

If you are a business it's only worth looking at once you're already operating at peak/close to peak efficiency and if you have deep pockets.

The cost of building the infrastructure to host it, train your people to use it and the time it will take for it to be useful is very high. You may also wind up finding out you're doing pretty well, in which case you'll be confirming what you know anyway!


 
Posted : 20/10/2014 8:31 pm
Posts: 4954
Free Member
 

back2basics - Member
pfft just another phrase for the merry go round of IT issues that go on circles About every 10 years

if it didn't have a new name they couldn't sell new Courses

Bang on, its te most annoying thing about IT the repackaging of everything into a (maybe) slightly easier to use API and claiming its all new.

Its good that some tool kits for parallel extraction of data have been built so you don't have to role your own but the way people are talking like it is the first use of parallel computing on commodity hardware. Gets right on my tits!


 
Posted : 20/10/2014 8:49 pm
Posts: 4954
Free Member
 

Big Data is like teenage sex.

Drink and drug fueled?


 
Posted : 20/10/2014 8:50 pm
Posts: 91171
Free Member
Topic starter
 

I like that description 🙂

From my point of view, as a consultant, it remindes me of that Tom Sharpe book. Riotous Assembly or the other one, Indecent Exposure. In it, the local chief of police is paranoid about communist terrorists so he creates a group of secret agents to pose as communists and infiltrate the cell. In order to protect their identities, he doesn't tell them who else is in the group. So off they go hanging around in bars and acting like they expect communist terrorists to act and looking out for other people acting like they expect communist terrorists to act. There aren't in fact any terrorists so all that ends up happening is they form their own terrorist cell. To maintain credibility with each other they start organising actual terrorist attacks, which simply hardens the resolve of the chief of police..

I'm being asked to look into new technologies such as 'mobile' and 'big data'. I'm happy to mess with this from a technology point of view but we'll just have to see how much business it generates 🙂

I should add though that my colleage did do a prototype using Hadoop for large batch processing. Of course the same thing could have easily been done via traditional means.


 
Posted : 20/10/2014 8:59 pm
Posts: 563
Free Member
 

I work in insurance. We build models (nerd alert) based on 10s of millions of customer records in order to price risks, predict competitor's rates and predict the price elasticity of individual customers. To me it feels like Big Data, but we have been doing it since way before anyone started talking about big data. This leaves me thoroughly confused as to whether it is cutting edge or way behind the times. I have no idea what people are actually doing with Big Data in other industries.

I agree that Big Data is a fairly stupid phrase though. I have heard it compared to the International Conference on Very Large Databases, which has been running annually since 1975. Their definition of "very large" much have changed quite a lot over that time.


 
Posted : 20/10/2014 9:06 pm
Posts: 91171
Free Member
Topic starter
 

The difference between Big Data and simply lots of data is how it's stored. You can store a lot in a relational database but that's peanuts compared to what big data techniques can do. It's meant to be infintely scalable, so you can simply keep on adding processing nodes to process more and more data.


 
Posted : 20/10/2014 9:16 pm
Posts: 4954
Free Member
 

Nothing is infinitely scalable the bottleneck is just moved.


 
Posted : 20/10/2014 9:32 pm
Posts: 401
Free Member
 

I am but I don't call it Big Data as that is just ****.

Here is something [url= http://phased.co.uk/london-bike-theft-weather-map/ ]bike related[/url]


 
Posted : 20/10/2014 9:40 pm
Posts: 0
Free Member
 

Yep, we do a lot of analysis based around the stock markets - looking for insider dealings and so on. Just moved over to Exadata and the results aren't quite what we thought they'd be considering the costs.

Whilst very much a marketing term that's currently all the rage, many sites (world pay, visa, Vodafone) have been doing something very similar for years. I think it's probably very similar to SOA in that if you have the perfect set of parameters combined with the right infrastructure and the demand from the business then it's probably the bees bendy bits. But most places wouldn't really fit 'the model'. a nice theory, but in practise it doesn't really work, certainly not for us. Hadoop books currently propping up my bike stand, about the most useful they've been...


 
Posted : 20/10/2014 9:53 pm
Posts: 91171
Free Member
Topic starter
 

How is crime data 'big'? Are there millions of crimes per second? It's worse than I thought!

flange - you're right, big data techniques are quite specific and much of what it is probably being used for could be done with something else.

I wrote a PoC creating energy forecasts with around 100k data points, and I parallelised it onto many nodes but that wasn't done with Hadoop (although it would've worked fine) and it wasn't big data.

Having said that, Hadoop is a convenient framework for chugging through data - my code did essentially the same thing but Hadoop's already written.


 
Posted : 20/10/2014 10:05 pm
Posts: 0
Free Member
 

Yup I work in it. We never call it big data that's just a nonsense term. Is it useful to be able to analyse a single large data set instead of multiple smaller ones? Yes, no, sometimes. People are sold the tech but not the outcome and that's when it fails. If you're trying to achieve a specific outcome then it can be very powerful for example linking customers and billions of transactions like the tesco clubcard database.


 
Posted : 20/10/2014 10:07 pm
Posts: 91171
Free Member
Topic starter
 

Interesting you call it a nonsense term - it's not to me. If you don't see the difference I'm wondering how big your data sets are.


 
Posted : 20/10/2014 10:08 pm
Posts: 0
Free Member
 

I've got plenty of colleagues doing it (in academia). I do some social network analysis stuff but it isn't big as n=10-15k


 
Posted : 20/10/2014 10:09 pm
Posts: 0
Free Member
 

None of our consultants ever use it and we work with some of the biggest datasets in the country. I take my steer from them. I feel its nonsense if it has multiple defintions that no one agrees upon. Is 6 billion rows "big data" enough or are we talking at crossed purposes?


 
Posted : 20/10/2014 10:11 pm
Posts: 0
Free Member
 

The stuff I'm working on is a anywhere from 1 million - 30 million data points, but it's not 'big data'. Big Data to me is data that's huge, bigger than standard relational databases can process easily. Data that numbers thousands or millions of points per second. Tweets, facebook graph interactions, credit card transactions etc It's a bit wooly to be honest and like other tech buzz words it's meaning will eventually become to have a fairly standard definition, but it might take a while.


 
Posted : 20/10/2014 10:17 pm
Posts: 0
Free Member
 

Which is why I hate the term it's a buzz word thats been knocking about for ages but isn't specific enough to mean much. It sells though which is why terradata,oracle etc make billions from telling people to collect it all like a hoarder filling their house with old newspapers.
Some scientific uses like weather and climate modelling are justified in using the term as they are working on huge datasets with thousands of variables but even there an actual defintion is hard to come by.

Still, you know, pays the bills.


 
Posted : 20/10/2014 10:24 pm
Posts: 91171
Free Member
Topic starter
 

6bn rows.. not that big 🙂

When I say big data I mean nosql databases, map/reduce and all that crap. Or whatnobeer said - that's the definition most people use. I wouldn't use it on a project, because it's a high level term.

Anyway, splitting hairs. I briefly considered installing a hypervisor and some VMs on my very old desktop as a hadoop cluster, but then it's single core so that would be a bit silly 🙂


 
Posted : 20/10/2014 10:26 pm
Posts: 0
Free Member
 

There's the rub I meet people all the time who've been misinformed that 5million customers with 100million transactions is "big data" and they need to spend a gazillion pounds to make any sense of it, it's annoying. Then they get sold some tech, a bit of software, are given a run book and told their problems are solved.


 
Posted : 20/10/2014 10:30 pm
Posts: 91171
Free Member
Topic starter
 

Just downloaded page view stats from wikipedia. I thought it was a good fit for hbase but I am now thinking it's not, should probably just be one big hdfs file maybe.

More reading tomorrow.


 
Posted : 20/10/2014 10:41 pm
Posts: 401
Free Member
 

How is crime data 'big'?

It's not, it's all the other stuff we are aggregating along with it to analyse that is.


 
Posted : 20/10/2014 10:50 pm
Posts: 52609
Free Member
 

we had a go at predicting the level of calls to a Bushfire advice line based on weather, ground conditions and heaps of other variables. Probably at the small end of big data. It was enough to put us off.

I saw a couple of decent presentations on it a few years back from IBM I think.


 
Posted : 21/10/2014 12:07 am
Posts: 91171
Free Member
Topic starter
 

Mike that sounds more like analytics.


 
Posted : 21/10/2014 8:10 am
Posts: 52609
Free Member
 

It was close to one end of it, if we had gone much further it would have headed into the big space


 
Posted : 21/10/2014 8:18 am
Posts: 0
Free Member
 

Interesting/amusing article about it on The Reg

Data Mining, noun: "Torturing data until it confesses ... and if you torture it enough, it will confess to anything."

[url= http://www.theregister.co.uk/2014/10/20/sanity_now_ending_the_madness_of_data_completism/ ]http://www.theregister.co.uk/2014/10/20/sanity_now_ending_the_madness_of_data_completism/[/url]


 
Posted : 21/10/2014 8:44 am
Posts: 12089
Full Member
 

Enjoyed that, cheers Aidan.

Not sure what Big Data is, but if there're employment opportunities it's surely worth looking into...


 
Posted : 21/10/2014 8:54 am
Posts: 0
Full Member
 

When I say big data I mean nosql databases, map/reduce and all that crap

Thats how I see it. Locally I've got nothing that can't be processed perfectly well in relational databases, as long as the queries are good and the indices right. And then just chuck a ton of virtual hardware at it.

Elsewhere we have some secret squirrel stuff using it but it's certainly not breaking into our BI / normal LoB world and I don't think it will as we don't have the problems it's best palced to solve.


 
Posted : 21/10/2014 8:56 am
Posts: 0
Free Member
 

i spend a lot of time (ok, some) getting rid of data - we're drowning in the stuff.

we've got laser scanners, and GOM scanners, Alicona, and etc. which are very good at generating massive point clouds, which we then have to 'thin' - often chucking away 50% or more.

(you only need 3 points to define a radius, there's no point having 5000, especially when that curve is an agreed non-critical feature)


 
Posted : 21/10/2014 9:02 am
Posts: 9071
Free Member
 

I work for a worldwide data company and Big Data is something that gets thrown around regularly but as someone has mentioned already, it seems a little vague as to what it is and what to do with it.

The way I see it, its the stuff thats out there that isn't really easily categorised. So we deal with all sorts of data - financial, some personal etc. Big Data to me is the stuff you can't easily pigeonhole so mainly the social media stuff. Trawling social media and pulling out useful stuff from that for example. What that 'stuff' is though is anyones guess.

As an aside, I've recently started developing with a new ETL tool that has apps to link into Hadoop etc. I'd be interested to see how that works considering it kind of grounds to a halt if you feed a 10MB Excel file into it given the limitations of the Java code behind it :S


 
Posted : 21/10/2014 9:14 am
Posts: 91171
Free Member
Topic starter
 

I just spent a week for an extremely well known and important client tweaking their Java based core system so it doens't grind to a halt. Drop me a line I can probably help you out 🙂


 
Posted : 21/10/2014 9:21 am
Posts: 3328
Full Member
 

Everyone wants to do it yet they don't know what it is.

Just add 'big data' and 'cloud' to a power point slide and away you go.

I have done some work with Hadoop and did dome data processing waaaaaay quicker than we've ever done with SQL (and that's with some real SQL nerds). However, the type of processing lent itself well to map/reduce so it was a good fit. We certainly have data volume with as many events as you want per second. Can't really give any more details than that though.


 
Posted : 21/10/2014 9:36 am
Posts: 14143
Full Member
 

I've got 1300 odd customer records to cleanse - that's Big Data to me! 😀


 
Posted : 21/10/2014 9:45 am
Posts: 1012
Free Member
 

The value of "big data" seems to come from this concept of looking at data to find an answer to a question you never knew. That is what I try to get the data analysts (or data scientists in new marketing speak) to do on a daily basis and they have been for years.
But, most of the people who talk about this do not include the statistical verification to ensure that the data they are looking at is relevant.

All the map reduce, hadoop etc are just tools that will allow you to work with data in different ways. Sometimes they will be beneficial, sometimes not depending on each projects requirements. I don't agree with using the term big data just because you use hadoop.

It also means that more companies are put off investing in simple data analytics as the costs of implementing big data, and the fact that there is no calculable ROI with big data makes it unattractive.

For most companies, if they have basic data warehousing, collation of data and get an data analyst to provide relevant insights into their business, they will benefit massively.


 
Posted : 21/10/2014 9:59 am
Posts: 91171
Free Member
Topic starter
 

From a technology point of view (since I am a techie) the value of the tools is being able to store process datasets that would have been way too big to fit into a traditional SQL DB, and would have been dismissed. That's basically it.


 
Posted : 21/10/2014 10:05 am
Posts: 3328
Full Member
 

Relational DBs are quite happy to store humungous amounts of data. As long as you are doing straight forward CRUD, or else at least have your schema optimised for your queries, and the way it's scaled ties in with all of this, and you can pay for whatever enterprise version of software you need, then everything is fine.

IME the tricky bit comes when the queries are not simple CRUD, are more complex, and you don't even know what they are going to be up front. Maybe they cannot make use of indexing because they need to touch a large number of rows, or maybe the query engine is not clever enough to scale your query the way you scaled your schema. That is when less structured technologies that are designed with scaling in mind _might_ be useful.

So it's not just the amount of data, it's what you want to do with it.


 
Posted : 21/10/2014 10:41 am
Posts: 9071
Free Member
 

I just spent a week for an extremely well known and important client tweaking their Java based core system so it doens't grind to a halt. Drop me a line I can probably help you out

Its an out the box application. Doesn't seem to like big excel files. Convert to tab delimited text inputs and everything is fine 🙂


 
Posted : 21/10/2014 10:43 am
Posts: 91171
Free Member
Topic starter
 

You will still be able to change the heap parameters on a 3rd party app, I'd have thought. That'll be your problem for sure. If you want, you can PM me the name of the app and I can have a look.


 
Posted : 21/10/2014 11:37 am
Page 1 / 3