Home › Forums › Chat Forum › Big Data
- This topic has 125 replies, 40 voices, and was last updated 10 years ago by molgrips.
-
Big Data
-
molgripsFree Member
Anyone working with this concept?
Just curious to know what people are up to.
NB I’m not talking about obesity statistics.
TrailriderJimFree MemberInteresting listen. The sources of most big data are not statistically representative, they have bias. Especially if you think of data from markets with much lower tech use or from consumer groups that are too exclusive. There are no shortcuts to gathering and generating statistically significant data. It is an expensive process, even if using a small enough sample size for it to statistically stand up.
back2basicsFree Memberpfft just another phrase for the merry go round of IT issues that go on circles About every 10 years
if it didn’t have a new name they couldn’t sell new Courses
GJPFree MemberI am starting to take an interest from a CRM perspective. Spent most of the weekend googling Hadoop and casandra stuff. As business architect I found it tough going. Not doing anything actively yet.
ShredFree MemberYep, not a big fan of the phrase as it is just marketing crap.
Very few sets of data meet the 3 Vs required (Velocity, volume and variety). The term is now just used to describe what Business Intelligence and data analytics should have been all along, although most people didn’t really do this,
jam-boFull Memberbeen looking at kaggle.com recently with a half baked view to a change in career. matlab nerd here but thinking about learning R or SciPy to give it a go.
PemboFree MemberNot doing anything actively yet.
Pretty much sums up the industry at the moment. Follow @BigdataBorat for a funny angle on big data.
brakesFree MemberYep, not a big fan of the phrase as it is just marketing crap.
^^this.
it’s the new “digital”. it’s quickly becoming an overused term; companies using it to sound forward-thinking and innovative.curiousyellowFree MemberBig Data is like teenage sex.
Everyone thinks everyone else but them is doing it. In truth, only a very few people are doing it.
If you are a business it’s only worth looking at once you’re already operating at peak/close to peak efficiency and if you have deep pockets.
The cost of building the infrastructure to host it, train your people to use it and the time it will take for it to be useful is very high. You may also wind up finding out you’re doing pretty well, in which case you’ll be confirming what you know anyway!
TheBrickFree Memberback2basics – Member
pfft just another phrase for the merry go round of IT issues that go on circles About every 10 yearsif it didn’t have a new name they couldn’t sell new Courses
Bang on, its te most annoying thing about IT the repackaging of everything into a (maybe) slightly easier to use API and claiming its all new.
Its good that some tool kits for parallel extraction of data have been built so you don’t have to role your own but the way people are talking like it is the first use of parallel computing on commodity hardware. Gets right on my tits!
molgripsFree MemberI like that description 🙂
From my point of view, as a consultant, it remindes me of that Tom Sharpe book. Riotous Assembly or the other one, Indecent Exposure. In it, the local chief of police is paranoid about communist terrorists so he creates a group of secret agents to pose as communists and infiltrate the cell. In order to protect their identities, he doesn’t tell them who else is in the group. So off they go hanging around in bars and acting like they expect communist terrorists to act and looking out for other people acting like they expect communist terrorists to act. There aren’t in fact any terrorists so all that ends up happening is they form their own terrorist cell. To maintain credibility with each other they start organising actual terrorist attacks, which simply hardens the resolve of the chief of police..
I’m being asked to look into new technologies such as ‘mobile’ and ‘big data’. I’m happy to mess with this from a technology point of view but we’ll just have to see how much business it generates 🙂
I should add though that my colleage did do a prototype using Hadoop for large batch processing. Of course the same thing could have easily been done via traditional means.
FueledFree MemberI work in insurance. We build models (nerd alert) based on 10s of millions of customer records in order to price risks, predict competitor’s rates and predict the price elasticity of individual customers. To me it feels like Big Data, but we have been doing it since way before anyone started talking about big data. This leaves me thoroughly confused as to whether it is cutting edge or way behind the times. I have no idea what people are actually doing with Big Data in other industries.
I agree that Big Data is a fairly stupid phrase though. I have heard it compared to the International Conference on Very Large Databases, which has been running annually since 1975. Their definition of “very large” much have changed quite a lot over that time.
molgripsFree MemberThe difference between Big Data and simply lots of data is how it’s stored. You can store a lot in a relational database but that’s peanuts compared to what big data techniques can do. It’s meant to be infintely scalable, so you can simply keep on adding processing nodes to process more and more data.
TheBrickFree MemberNothing is infinitely scalable the bottleneck is just moved.
flangeFree MemberYep, we do a lot of analysis based around the stock markets – looking for insider dealings and so on. Just moved over to Exadata and the results aren’t quite what we thought they’d be considering the costs.
Whilst very much a marketing term that’s currently all the rage, many sites (world pay, visa, Vodafone) have been doing something very similar for years. I think it’s probably very similar to SOA in that if you have the perfect set of parameters combined with the right infrastructure and the demand from the business then it’s probably the bees bendy bits. But most places wouldn’t really fit ‘the model’. a nice theory, but in practise it doesn’t really work, certainly not for us. Hadoop books currently propping up my bike stand, about the most useful they’ve been…
molgripsFree MemberHow is crime data ‘big’? Are there millions of crimes per second? It’s worse than I thought!
flange – you’re right, big data techniques are quite specific and much of what it is probably being used for could be done with something else.
I wrote a PoC creating energy forecasts with around 100k data points, and I parallelised it onto many nodes but that wasn’t done with Hadoop (although it would’ve worked fine) and it wasn’t big data.
Having said that, Hadoop is a convenient framework for chugging through data – my code did essentially the same thing but Hadoop’s already written.
joolsburgerFree MemberYup I work in it. We never call it big data that’s just a nonsense term. Is it useful to be able to analyse a single large data set instead of multiple smaller ones? Yes, no, sometimes. People are sold the tech but not the outcome and that’s when it fails. If you’re trying to achieve a specific outcome then it can be very powerful for example linking customers and billions of transactions like the tesco clubcard database.
molgripsFree MemberInteresting you call it a nonsense term – it’s not to me. If you don’t see the difference I’m wondering how big your data sets are.
CaptJonFree MemberI’ve got plenty of colleagues doing it (in academia). I do some social network analysis stuff but it isn’t big as n=10-15k
joolsburgerFree MemberNone of our consultants ever use it and we work with some of the biggest datasets in the country. I take my steer from them. I feel its nonsense if it has multiple defintions that no one agrees upon. Is 6 billion rows “big data” enough or are we talking at crossed purposes?
whatnobeerFree MemberThe stuff I’m working on is a anywhere from 1 million – 30 million data points, but it’s not ‘big data’. Big Data to me is data that’s huge, bigger than standard relational databases can process easily. Data that numbers thousands or millions of points per second. Tweets, facebook graph interactions, credit card transactions etc It’s a bit wooly to be honest and like other tech buzz words it’s meaning will eventually become to have a fairly standard definition, but it might take a while.
joolsburgerFree MemberWhich is why I hate the term it’s a buzz word thats been knocking about for ages but isn’t specific enough to mean much. It sells though which is why terradata,oracle etc make billions from telling people to collect it all like a hoarder filling their house with old newspapers.
Some scientific uses like weather and climate modelling are justified in using the term as they are working on huge datasets with thousands of variables but even there an actual defintion is hard to come by.Still, you know, pays the bills.
molgripsFree Member6bn rows.. not that big 🙂
When I say big data I mean nosql databases, map/reduce and all that crap. Or whatnobeer said – that’s the definition most people use. I wouldn’t use it on a project, because it’s a high level term.
Anyway, splitting hairs. I briefly considered installing a hypervisor and some VMs on my very old desktop as a hadoop cluster, but then it’s single core so that would be a bit silly 🙂
joolsburgerFree MemberThere’s the rub I meet people all the time who’ve been misinformed that 5million customers with 100million transactions is “big data” and they need to spend a gazillion pounds to make any sense of it, it’s annoying. Then they get sold some tech, a bit of software, are given a run book and told their problems are solved.
molgripsFree MemberJust downloaded page view stats from wikipedia. I thought it was a good fit for hbase but I am now thinking it’s not, should probably just be one big hdfs file maybe.
More reading tomorrow.
DavidBFree MemberHow is crime data ‘big’?
It’s not, it’s all the other stuff we are aggregating along with it to analyse that is.
mikewsmithFree Memberwe had a go at predicting the level of calls to a Bushfire advice line based on weather, ground conditions and heaps of other variables. Probably at the small end of big data. It was enough to put us off.
I saw a couple of decent presentations on it a few years back from IBM I think.
mikewsmithFree MemberIt was close to one end of it, if we had gone much further it would have headed into the big space
AidanFree MemberInteresting/amusing article about it on The Reg
Data Mining, noun: “Torturing data until it confesses … and if you torture it enough, it will confess to anything.”
http://www.theregister.co.uk/2014/10/20/sanity_now_ending_the_madness_of_data_completism/%5B/url%5D
mogrimFull MemberEnjoyed that, cheers Aidan.
Not sure what Big Data is, but if there’re employment opportunities it’s surely worth looking into…
brassneckFull MemberWhen I say big data I mean nosql databases, map/reduce and all that crap
Thats how I see it. Locally I’ve got nothing that can’t be processed perfectly well in relational databases, as long as the queries are good and the indices right. And then just chuck a ton of virtual hardware at it.
Elsewhere we have some secret squirrel stuff using it but it’s certainly not breaking into our BI / normal LoB world and I don’t think it will as we don’t have the problems it’s best palced to solve.
ahwilesFree Memberi spend a lot of time (ok, some) getting rid of data – we’re drowning in the stuff.
we’ve got laser scanners, and GOM scanners, Alicona, and etc. which are very good at generating massive point clouds, which we then have to ‘thin’ – often chucking away 50% or more.
(you only need 3 points to define a radius, there’s no point having 5000, especially when that curve is an agreed non-critical feature)
DaveyBoyWonderFree MemberI work for a worldwide data company and Big Data is something that gets thrown around regularly but as someone has mentioned already, it seems a little vague as to what it is and what to do with it.
The way I see it, its the stuff thats out there that isn’t really easily categorised. So we deal with all sorts of data – financial, some personal etc. Big Data to me is the stuff you can’t easily pigeonhole so mainly the social media stuff. Trawling social media and pulling out useful stuff from that for example. What that ‘stuff’ is though is anyones guess.
As an aside, I’ve recently started developing with a new ETL tool that has apps to link into Hadoop etc. I’d be interested to see how that works considering it kind of grounds to a halt if you feed a 10MB Excel file into it given the limitations of the Java code behind it :S
molgripsFree MemberI just spent a week for an extremely well known and important client tweaking their Java based core system so it doens’t grind to a halt. Drop me a line I can probably help you out 🙂
llamaFull MemberEveryone wants to do it yet they don’t know what it is.
Just add ‘big data’ and ‘cloud’ to a power point slide and away you go.
I have done some work with Hadoop and did dome data processing waaaaaay quicker than we’ve ever done with SQL (and that’s with some real SQL nerds). However, the type of processing lent itself well to map/reduce so it was a good fit. We certainly have data volume with as many events as you want per second. Can’t really give any more details than that though.
the-muffin-manFull MemberI’ve got 1300 odd customer records to cleanse – that’s Big Data to me! 😀
The topic ‘Big Data’ is closed to new replies.