Perri Nelson's Blog$hares Page Scrapers
List created Saturday, November 10, 2007 1:51 PM
This is a list of the blogshares page scrapers that I have written for public consumption. Each of these page scrapers is run by an automated process that runs them in sequence. All of the page scrapers will run before a cycle has completed. There is a built in delay of one hour between page scraper cycles.
The length of time it takes for a page scraper to run will depend upon the number of pages that must be retrieved to process the data. The page loader has a built in delay of two seconds between the completion of page processing and the next page load.
The pagedelays involved will mean that the general picture may have changed between the time the page scraper begins processing and when it finishes processing. The delay between cycles will mean that the data you are viewing will probably be out of date. Nevertheless, it should give you a general picture of the activity taking place on BlogShares.
Blog$hares Top 30 Blogs - Who holds them?
This page scraper begins by retrieving the list of the top 100 blogs in the blogshares index. It then retrieves the blog's detail page and presents some basic statistics about the blog and who holds how many shares.
This page scraper typically takes about four to five minutes to run.
Blog$hares Top Traders Portfolio Analysis
This page scraper begins by retrieving the "most active players" page. It then retrieves the folder lists for each player in the top 20 share traders section.
Each folder belonging to the player is then retrieved, sorted in descending order by total investment. Up to 100 blogs are retrieved for each folder.
The total valuation of each blog retreived in this way is then calculated, based upon the information contained in the folder. The blog's details are not retrieved.
After the (approximate) total valuation for each blog has been determined, the blogs are then sorted in descending order by valuation and the top 30 are listed. This doesn't give a guaranteed list of the top 30 blogs in valuation in the user's portfolio, but it's close enough for this analysis.
The run time for this page scraper depends upon the number of folders each player has. The first time this page scraper ran it took an hour and three minutes to complete.