Viewing 14 posts - 1 through 14 (of 14 total)
  • Website data scraping
  • titusrider
    Free Member

    So supposing I wanted to gather all the results data sets that are freely provided by the UCI here:
    UCI

    and download whole sections of the website data at once (ie the whole 2013 Pro Road Season)

    Is there any simple way I could do it?
    do I need to start writing code?
    What code? could I do it in C#?

    Cheers

    nbt
    Full Member

    I’d start by

    1) ensuring that you are allowed to redistribute the results – just becasue you can access them, doesn;t mean you are allowed to republish elsewhere

    2) learning about coding and deciding what language to learn then working out how to do a job in that language, rather than working out what language to learning to do a job

    3) use the RSS feed rather than the website

    damo2576
    Free Member

    Fresh Goods Friday 696: The Middling Edition

    Fresh Goods Friday 696: The Middlin...
    Latest Singletrack Videos
    titusrider
    Free Member

    thx, NBT

    not planning on re-distributing just playing with some data vis software

    I am an IT pro who mostly works in SQL and a smattering of C# but web is not my forte, was just wondering if my c# could be used to do this job (im not gonna teach myself a new language just to gather some cool demo data)

    RSS feed would be amazing as I can take them in no problems but they only have RSS for news that I can find not results

    sharkbait
    Free Member

    There is a scraper plugin for Firefox that works pretty well.

    mogrim
    Full Member

    If all you want to do is play with the data the easiest way is probably just to copy the text you want into a (decent) text editor, then using regexp change the lines to SQL insert statements. For tabular data like this Excel is often an easy tool to use, just insert extra columns between the original table columns, and add the SQL you need (ie the apostrophes, commas, etc.).

    If you want to learn screen scraping I’d use footie results – they’re weekly, and always have the same format. There are also more data to play with, and it’s a lot easier to work out what the next match is.

    mogrim
    Full Member

    There is a scraper plugin for Firefox that works pretty well.

    What’s it called?

    chvck
    Free Member

    You could do it in c# but I suspect that python would be a lot easier. I wrote a script that scraped all the youtube urls for tracks posted in a topic on here, was quite easy in python. I don’t have the script anymore though or I’d send it to you!

    sharkbait
    Free Member

    What’s it called?

    Outwit Hub

    Fairly powerful.

    prezet
    Free Member

    Without an API to plug into scraping would become a laborious task. It would be easy enough (with PHP) to grab the contents of the page, and then using XPath go through and scrape the data you need.

    Although you’d be better off targeting this page and modifying the query vars to suit your requirements:

    http://www.uci.infostradasports.com/asp/lib/TheASP.asp?PageID=19004&TaalCode=2&StyleID=0&SportID=102&CompetitionID=-1&EditionID=-1&EventID=-1&GenderID=1&ClassID=1&EventPhaseID=0&Phase1ID=0&Phase2ID=0&CompetitionCodeInv=1&PhaseStatusCode=262280&DerivedEventPhaseID=-1&SeasonID=484&StartDateSort=20121004&EndDateSort=20131020&Detail=1&DerivedCompetitionID=-1&S00=-3&S01=2&S02=1&PageNr0=-1&Cache=8

    llama
    Full Member

    oooh good a programming thread

    If you are already in the msoft world then use C#

    Ignore the ‘xxxx is much better for this’ and go with what you know. xxxxx might be nice, but C# is just as if not more productive. Nowadays there are loads of free to use libraries for this sort of stuff. Install the nuget package manager in your visual studio, and use that to search for for a nuget package that can do html parsing.

    mogrim
    Full Member

    sharkbait – Member
    Outwit Hub
    Fairly powerful.

    Cheers, will take a look!

    Edit: not compatible with my version of Firefox 🙁

    aracer
    Free Member

    I have a bit of code written in C# which downloads mapping data, so it’s certainly possible to do that way, and once you’ve downloaded there are other classes available which will help parse the data.

    Steve77
    Free Member

    Also check out iMacros. It’s a plugin for Chrome, Firefox and I think IE too. The syntax if very easy but it helps to know a little bit about the structure of webpages (i.e. what a div is)

Viewing 14 posts - 1 through 14 (of 14 total)

The topic ‘Website data scraping’ is closed to new replies.