Home › Forums › Chat Forum › Big Data
- This topic has 125 replies, 40 voices, and was last updated 10 years ago by molgrips.
-
Big Data
-
ShredFree Member
The value of “big data” seems to come from this concept of looking at data to find an answer to a question you never knew. That is what I try to get the data analysts (or data scientists in new marketing speak) to do on a daily basis and they have been for years.
But, most of the people who talk about this do not include the statistical verification to ensure that the data they are looking at is relevant.All the map reduce, hadoop etc are just tools that will allow you to work with data in different ways. Sometimes they will be beneficial, sometimes not depending on each projects requirements. I don’t agree with using the term big data just because you use hadoop.
It also means that more companies are put off investing in simple data analytics as the costs of implementing big data, and the fact that there is no calculable ROI with big data makes it unattractive.
For most companies, if they have basic data warehousing, collation of data and get an data analyst to provide relevant insights into their business, they will benefit massively.
molgripsFree MemberFrom a technology point of view (since I am a techie) the value of the tools is being able to store process datasets that would have been way too big to fit into a traditional SQL DB, and would have been dismissed. That’s basically it.
llamaFull MemberRelational DBs are quite happy to store humungous amounts of data. As long as you are doing straight forward CRUD, or else at least have your schema optimised for your queries, and the way it’s scaled ties in with all of this, and you can pay for whatever enterprise version of software you need, then everything is fine.
IME the tricky bit comes when the queries are not simple CRUD, are more complex, and you don’t even know what they are going to be up front. Maybe they cannot make use of indexing because they need to touch a large number of rows, or maybe the query engine is not clever enough to scale your query the way you scaled your schema. That is when less structured technologies that are designed with scaling in mind _might_ be useful.
So it’s not just the amount of data, it’s what you want to do with it.
DaveyBoyWonderFree MemberI just spent a week for an extremely well known and important client tweaking their Java based core system so it doens’t grind to a halt. Drop me a line I can probably help you out
Its an out the box application. Doesn’t seem to like big excel files. Convert to tab delimited text inputs and everything is fine 🙂
molgripsFree MemberYou will still be able to change the heap parameters on a 3rd party app, I’d have thought. That’ll be your problem for sure. If you want, you can PM me the name of the app and I can have a look.
dazhFull MemberOoh a ‘big data’ discussion. I’m hoping it’s more informed than the ones we have here at a very large and well known engineering consultancy where every man and his dog thinks they know about it, and thinks they’re using it. I’m still struggling to explain to some engineers that storing relational data in a number of separate excel spreadsheets isn’t necessarily a good idea. I haven’t even bothered trying to correct them on what they think big data is.
DaveyBoyWonderFree MemberYou will still be able to change the heap parameters on a 3rd party app, I’d have thought. That’ll be your problem for sure. If you want, you can PM me the name of the app and I can have a look.
Done that. Same. Its the (excuse my Java terminology!) module that the application uses to read Excel files. Apparently theres a more efficient one which reads them as XML which we’re using to convert to text instead. Happy days. If it wasn’t for luddites supplying in Excel files we’d be fine!
molgripsFree MemberYou probably need to increase the nursery pool size on the heap 🙂
leftyboyFree MemberI think it’s interesting that people not using any form of big data have no idea what it is but feel able to give a black and white opinion on it. We use big data techniques to reduce our huge daily data collection into something more manageable so 30Gb ends up nearer 6Gb (yes I mean gigabytes BTW) which we couldn’t do without our largish hadoop cluster.
The investment was, and is, very large but the dat we get is vital to our business so it’s worth the cost etc.
brassneckFull Memberso 30Gb ends up nearer 6Gb
I could do that with 7zip for you 😀
ShredFree MemberWell, by most definitions I’ve read, you’re not doing big data, just using hadoop as an ETL tool.
Now you obviously do something with that data that could include analytics that require the 3 or 4 V’s, and you could be doing all sorts of cool statistical analysis on the data, but big data is more than ETL.
Great tools if they work in your environment.
molgripsFree MemberWell Hadoop is a tool that is very useful for bigdata so it comes under that name, but yeah there is no such thing as ‘doing’ big data, just using bigdata tools and techniques.
To me it’s a name for the tools, techniques and challenges – and what you do with them may include petabytes of data, it may not.
CaptJonFree MemberFor those seeking a definition, this is from a new academic journal called Big Data and Society:
Kitchin (2013) details that Big Data is:
– huge in volume, consisting of terabytes or petabytes of data;
– high in velocity, being created in or near real-time;
– diverse in variety, being structured and unstructured in nature;
– exhaustive in scope, striving to capture entire populations or systems (n?=?all);
– fine-grained in resolution and uniquely indexical in identification;
– relational in nature, containing common fields that enable the conjoining of different data sets;
– flexible, holding the traits of extensionality (can add new fields easily) and scaleability (can expand in size rapidly)
I take issue with the second point and argue the data doesn’t need to be created in real time.
footflapsFull MemberBig data must be more than GBs of data, eg our radio networks generate Gbs of data every week, I happily munge it all in VBA…..
I generally reduce Gbs of data into a few Kb of KML and visualise stuff in Google Earth. More GIS than Big data though.
molgripsFree MemberNope – afaik IBM have different product lines for processing it in real time and for dealing with it when it’s been stored.
CaptainMainwaringFree MemberI work for a company that sells “Big Data” stuff. FWIW a few points that we tend to make, and I am not technical so don’t know any of the geekery
It’s really four “V’s”. Volume, velocity, variety, value” although though that’s really consultant bulls**t. There are lots of industries that generate enough data to make capturing and analyzing it a challenge such as:
Financial Services: all trading across commodities, stocks, currency etc
Utilities: data coming every 30 mins from Smart meters (or will do)
Utilities and oil companies: SCADA data being produced every 0.1 secs by tens of thousands of sensors
Security services: monitoring emails, mobiles and social mediaRelational databases are very good at holding vast amounts of data but there are two challenges:
1) Cost of storage is high when you take into account the hardware, software, management, support, etc. A Hadoop environment is much cheaper for storing large amounts of rapidly changing data
2) You need to do a lot of data modeling to get stuff into a relational database and it needs to have consistent structure which is not the case in the examples above. Relational is hard to get data into, but easy to get it out, whereas Hadoop is the opposite as you can just dump anything inSo if you have huge amounts of different types of data arriving at high velocity and you want to extract value from it, you can dump it into a Big Data environment, do some analysis, throw away anything that’s irrelevant and move stuff you want to keep into your higher cost relational environment.
KitFree MemberThere’s an afternoon of talks about “Data Science” on 3rd November in Ednburgh, if anyone’s interested. Unis deal with huge datasets on a regular basis 🙂
ShredFree MemberA problem I also have is finding the right people who can properly analyse and draw conclusions from the data. I’m battling to hire the right people, so I often wonder who is doing the analytics on all of these projects, and how do they know that the results they are getting are actually relevant.
footflapsFull MemberI’m battling to hire the right people, so I often wonder who is doing the analytics on all of these projects, and how do they know that the results they are getting are actually relevant.
That’s where the money is. You can hire a coder for peanuts from India / China, but finding people who actually understand what’s going on is like finding rocking horse shit, hence pays very well.
flangeFree MemberA lot of people who think they are doing business intelligence aren’t. Much the same a lot of folk who think they’re doing big data, aren’t. Big data isn’t 30gb, it’s tb and pb.
There’s an article in one of the London papers tonight about how big data growth is being limited by a lack of expertise, so clearly there is a market for it. I only caught the headline but if that’s the case it’ll either become very niche or die out altogether.
As an example, We process submissions in both xml and xbrl. Xml is straightforward enough, xbrl was supposedly the new super improved way of submitting what is essentially the same data. As a banking standard everyone was told they should adopt it. So far there are about three people in the uk who understand it and can make sense of it. Great when every financial institution should be submitting in it. i give it two years before the eba revert back to xml.
TurnerGuyFree MemberLots of xbrl in use in the Aus Superstream system, and Schemtron validation xslts, which take an age to run.
ShredFree MemberYou’ve already said you are a techie, where I m looking for analysts.
I would like someone with a stats background, has experience in analytics, data mining and understands the business. Not easy as either people are not business types, or not stats types.
back2basicsFree Memberisn’t it the same old IT problem though, companies suddenly realise they need X skill for Y project but wont hire 30 year experience tech programmer Z to train him/her into X skill, instead they pay megabucks contact wage instead and pull them from some other company
so that company then decides never to train any permanent staff again and does what the above company does and hires contractors so demand fuels salaryback2basicsFree Memberisn’t it the same old IT problem though, companies suddenly realise they need X skill for Y project but wont hire 30 year experience tech programmer Z to train him/her into X skill, instead they pay megabucks contact wage instead and pull them from some other company
so that company then decides never to train any permanent staff again and does what the above company does and hires contractors so demand fuels salaryatlazFree MemberWe have some use for it but few people who can interpret it or know how to use it. Won’t spend any more money until people get smarter or we get smarter people
molgripsFree MemberYou’ve already said you are a techie, where I m looking for analysts.
I’ll do it anyway. How hard can it be? 😆
antigeeFree MemberBig Data or just Data Mapping?
http://www.technologyreview.com/news/530296/cell-phone-data-might-help-predict-ebolas-spread/
DavidBFree MemberLots of xbrl in use in the Aus Superstream system
“Makes sign of the cross and furiously eats garlic”
molgripsFree MemberBig Data or just Data Mapping?
Rather depends on how much data there is. If you have to use big data techniques because of the size of the data set then it’s big data.
ShredFree Memberback2basics, it’s not just a case of retraining. People that understand data are few and far between. Most of my work is fighting with the dev teams about data accuracy and problems they have introduced in the data.
Even most DBAs do not get data and another large part of my job is explaining to people why close enough is not actually good enough.
It is hugely frustrating when I really think it is very easy to put in some basic controls to ensure data accuracy and testing to ensure dev changes do not cause major problems, but again, most devs and DBAs just don’t see it.
wonderchumpFree Member+1 for Shred. I run the Data & Analytics practice of a Big Four consultancy and the mainstay of my work is to get clients to think about the outcomes they want from their data. Better insight is the important element here rather than the routes taken to reveal an outcome. MapReduce et al is just another tool in the bag. Without a purpose the tool is null and void.
TurnerGuyFree MemberBetter insight is the important element here rather than the routes taken to reveal an outcome. MapReduce et al is just another tool in the bag. Without a purpose the tool is null and void.
but the toolset is important as otherwise you wouldn’t be able to feasibly process the data in the timescales required to make the insight acquired of much use as it would be too far out of date.
jam-boFull MemberPeople that understand data are few and far between.
just out of interest, what sort of money are we talking about for people that do get ‘data’…
molgripsFree MemberMapReduce et al is just another tool in the bag. Without a purpose the tool is null and void.
Quite right, and that’s interesting that the same term means different things to techies and analysts. To me it’s just the tools, but of course not to you 🙂
Toolset is important of course – without the tools we wouldn’t be having the discussion as it would be too difficult/expensive to even attempt.
mogrimFull Memberjust out of interest, what sort of money are we talking about for people that do get ‘data’…
… and how do you get into it? I’m starting to get bored of Java EE (after 15 years of the stuff), a change would be nice…
flangeFree MemberContracting money for a decent ‘known’ name in big data is anywhere from £850 pd upwards. I had a pure techie in recently who was £1240 a day but he was pretty good. Useless on the Analysis side of things though.
Like anything though, just doing the courses won’t ‘get you into it’. I certainly wouldn’t employ someone fresh off a course or on the basis that they’d built some heath robinson thing at home. Perhaps the best way is to be employed on a project that uses your current skillsets but also involves the technology you want to move into. Hence the issue with lack of people with skills – you can’t get experience without experience.
I’m assuming that you’re not heavily involved with BI stuff at the moment? I’d view a foundation in BI as the way to move into something like Big Data. If you have a Java background, some of the end user BI tools use Java and a lot of smaller firms like custom development. Certainly Cognos Report studio is quite lucrative if you can do clever stuff that the standard tool can’t do (god knows why though – it makes upgrading a nightmare). The arse has fallen out of the Cognos market though, with contract rates being pretty low (£300’ish a day for a report writer, if you can find a role). Maybe look at Tableau and Qlikview as other possible toolsets that everyone currently wants. Once into reporting, move into the ETL side of things and bob is your mothers brother..
The topic ‘Big Data’ is closed to new replies.