Subscribe to DSC Newsletter

Top 2,500 Websites - not containing seed keywords

For explanations about the methodology, including source code and possible improvements, read our main article on this subject. It also provides links to our other three listings.

The field between parentheses represents the year when the website in question was first mentioned - it does not represent when the website was created, thought it's a good proxy to tell how old the website is. The member database goes as far back as 2007. The list of keywords attached to each website represents which seed keywords were found on the front page, when crawling the website. The number of stars (1, 2 or 3) represents how popular the website is: it's an indicator of how many members mentioned it. Of course, brand new websites might not have 3 stars yet. 

Notes

  • This category of websites (those containing no seed keywords, and mentioned at least 4 times) is interesting nevertheless: it shows which non-analytic (general, mainstream) websites our members also visit.
  • Some of the websites where no seed keywords were found are actually analytic websites, and the lack of analytic keywords might be caused either by a glitch in our script, or in the way the webpage is encoded (iFrames, heavy Javascript, Flash and other page creation techniques giving a headache to our webcrawler, and indeed to all webcrawlers including Google). These represent only a small percentage (< 5%) of all websites. Maybe crawling a few webpages, not just the frontpage (for each website returning no seed keywords),could fix the issue. This implies deep crawling, following internal links found on the frontpage.

Here's the listing

  • google.com (2008) *** 
  • microsoft.com (2011) *** 
  • analyticszone.com (2012) *** 
  • cran.r-project.org (2008) *** 
  • stackoverflow.com (2011) *** 
  • timoelliott.com (2011) *** 
  • facebook.com (2008) *** 
  • r-project.org (2008) *** 
  • tdwi.com (2010) *** 
  • twitter.com (2008) *** 
  • archive.ics.uci.edu (2009) *** 
  • tableau.com (2010) *** 
  • faculty.chass.ncsu.edu (2009) ** 
  • communities.sas.com (2012) ** 
  • bx.businessweek.com (2011) ** 
  • forbes.com (2011) ** 
  • radar.oreilly.com (2011) ** 
  • amazon.com (2012) ** 
  • google.co.in (2008) ** 
  • wikipedia.org (2008) ** 
  • lexjansen.com (2009) ** 
  • minitab.com (2008) ** 
  • recordedfuture.com (2011) ** 
  • datapub.info (2013) ** 
  • youtube.com (2011) ** 
  • cran.us.r-project.org (2013) ** 
  • rbloggers.com (2012) ** 
  • cran.org (2012) ** 
  • buzzdata.com (2012) ** 
  • artplusdata.com (2012) ** 
  • theiiba.org (2011) ** 
  • informs.com (2011) ** 
  • theguardian.com (2013) ** 
  • salford-systems.com (2009) ** 
  • kontagent.com (2011) ** 
  • analyticalbridge.com (2008) ** 
  • rstudio.com (2013) ** 
  • quandl.com (2012) ** 

Views: 334

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service