Subscribe now and choose from over 30 free gifts worth up to £49 - Plus get £25 to spend in our shop
So supposing I wanted to gather all the results data sets that are freely provided by the UCI here:
[url= http://www.uci.ch/templates/BUILTIN-NOFRAMES/Template3/layout.asp?MenuId=MjExMg ]UCI[/url]
and download whole sections of the website data at once (ie the whole 2013 Pro Road Season)
Is there any simple way I could do it?
do I need to start writing code?
What code? could I do it in C#?
Cheers
I'd start by
1) ensuring that you are allowed to redistribute the results - just becasue you can access them, doesn;t mean you are allowed to republish elsewhere
2) learning about coding and deciding what language to learn then working out how to do a job in that language, rather than working out what language to learning to do a job
3) use the RSS feed rather than the website
thx, NBT
not planning on re-distributing just playing with some data vis software
I am an IT pro who mostly works in SQL and a smattering of C# but web is not my forte, was just wondering if my c# could be used to do this job (im not gonna teach myself a new language just to gather some cool demo data)
RSS feed would be amazing as I can take them in no problems but they only have RSS for news that I can find not results
There is a scraper plugin for Firefox that works pretty well.
If all you want to do is play with the data the easiest way is probably just to copy the text you want into a (decent) text editor, then using regexp change the lines to SQL insert statements. For tabular data like this Excel is often an easy tool to use, just insert extra columns between the original table columns, and add the SQL you need (ie the apostrophes, commas, etc.).
If you want to learn screen scraping I'd use footie results - they're weekly, and always have the same format. There are also more data to play with, and it's a lot easier to work out what the next match is.
There is a scraper plugin for Firefox that works pretty well.
What's it called?
You could do it in c# but I suspect that python would be a lot easier. I wrote a script that scraped all the youtube urls for tracks posted in a topic on here, was quite easy in python. I don't have the script anymore though or I'd send it to you!
What's it called?
Outwit Hub
Fairly powerful.
Without an API to plug into scraping would become a laborious task. It would be easy enough (with PHP) to grab the contents of the page, and then using XPath go through and scrape the data you need.
Although you'd be better off targeting this page and modifying the query vars to suit your requirements:
[url= http://www.uci.infostradasports.com/asp/lib/TheASP.asp?PageID=19004&TaalCode=2&StyleID=0&SportID=102&CompetitionID=-1&EditionID=-1&EventID=-1&GenderID=1&ClassID=1&EventPhaseID=0&Phase1ID=0&Phase2ID=0&CompetitionCodeInv=1&PhaseStatusCode=262280&DerivedEventPhaseID=-1&SeasonID=484&StartDateSort=20121004&EndDateSort=20131020&Detail=1&DerivedCompetitionID=-1&S00=-3&S01=2&S02=1&PageNr0=-1&Cache=8 ]http://www.uci.infostradasports.com/asp/lib/TheASP.asp?PageID=19004&TaalCode=2&StyleID=0&SportID=102&CompetitionID=-1&EditionID=-1&EventID=-1&GenderID=1&ClassID=1&EventPhaseID=0&Phase1ID=0&Phase2ID=0&CompetitionCodeInv=1&PhaseStatusCode=262280&DerivedEventPhaseID=-1&SeasonID=484&StartDateSort=20121004&EndDateSort=20131020&Detail=1&DerivedCompetitionID=-1&S00=-3&S01=2&S02=1&PageNr0=-1&Cache=8[/url]
oooh good a programming thread
If you are already in the msoft world then use C#
Ignore the 'xxxx is much better for this' and go with what you know. xxxxx might be nice, but C# is just as if not more productive. Nowadays there are loads of free to use libraries for this sort of stuff. Install the nuget package manager in your visual studio, and use that to search for for a nuget package that can do html parsing.
sharkbait - Member
Outwit Hub
Fairly powerful.
Cheers, will take a look!
Edit: not compatible with my version of Firefox 🙁
I have a bit of code written in C# which downloads mapping data, so it's certainly possible to do that way, and once you've downloaded there are other classes available which will help parse the data.
Also check out iMacros. It's a plugin for Chrome, Firefox and I think IE too. The syntax if very easy but it helps to know a little bit about the structure of webpages (i.e. what a div is)
