- Website data scraping
I’d start by
1) ensuring that you are allowed to redistribute the results – just becasue you can access them, doesn;t mean you are allowed to republish elsewhere
2) learning about coding and deciding what language to learn then working out how to do a job in that language, rather than working out what language to learning to do a job
3) use the RSS feed rather than the websitePosted 4 years agotitusriderMember
not planning on re-distributing just playing with some data vis software
I am an IT pro who mostly works in SQL and a smattering of C# but web is not my forte, was just wondering if my c# could be used to do this job (im not gonna teach myself a new language just to gather some cool demo data)
RSS feed would be amazing as I can take them in no problems but they only have RSS for news that I can find not resultsPosted 4 years agotitusriderMember
So supposing I wanted to gather all the results data sets that are freely provided by the UCI here:
and download whole sections of the website data at once (ie the whole 2013 Pro Road Season)
Is there any simple way I could do it?
do I need to start writing code?
What code? could I do it in C#?
CheersPosted 4 years ago
If all you want to do is play with the data the easiest way is probably just to copy the text you want into a (decent) text editor, then using regexp change the lines to SQL insert statements. For tabular data like this Excel is often an easy tool to use, just insert extra columns between the original table columns, and add the SQL you need (ie the apostrophes, commas, etc.).
If you want to learn screen scraping I’d use footie results – they’re weekly, and always have the same format. There are also more data to play with, and it’s a lot easier to work out what the next match is.Posted 4 years agoprezetMember
Without an API to plug into scraping would become a laborious task. It would be easy enough (with PHP) to grab the contents of the page, and then using XPath go through and scrape the data you need.
Although you’d be better off targeting this page and modifying the query vars to suit your requirements:
http://www.uci.infostradasports.com/asp/lib/TheASP.asp?PageID=19004&TaalCode=2&StyleID=0&SportID=102&CompetitionID=-1&EditionID=-1&EventID=-1&GenderID=1&ClassID=1&EventPhaseID=0&Phase1ID=0&Phase2ID=0&CompetitionCodeInv=1&PhaseStatusCode=262280&DerivedEventPhaseID=-1&SeasonID=484&StartDateSort=20121004&EndDateSort=20131020&Detail=1&DerivedCompetitionID=-1&S00=-3&S01=2&S02=1&PageNr0=-1&Cache=8Posted 4 years agollamaMember
oooh good a programming thread
If you are already in the msoft world then use C#
Ignore the ‘xxxx is much better for this’ and go with what you know. xxxxx might be nice, but C# is just as if not more productive. Nowadays there are loads of free to use libraries for this sort of stuff. Install the nuget package manager in your visual studio, and use that to search for for a nuget package that can do html parsing.Posted 4 years ago
The topic ‘Website data scraping’ is closed to new replies.