Terms: Random Samples from Search Engines (0), query based sampling (2,980), query sampling (2,080), distributed query sampling (86), query based random (5),
Terms: ensemble of models (18,100), parameter space reduction (234),
Terms: prediction suffix tree (203), tuples (3,370,000), n-grams (308,000), terms (12,300,000,000), phrases (234,000,000), word tuples (1,430), bayesian (14,300,000), unbounded vocabularies (51), ensemble of experts (1,080),
Random Sampling from a Search Engine’s Corpus - Video
Terms: degree distribution (262,000), degree distribution sampler (8), rejection sampling (71,400), monte carlo simulation (2,850,000),
Google Has the Largest Number of Dead and Old Pages
Evaluating Sampling Methods for Uncooperative Collections
Toward Optimal Active Learning through Monte Carlo Estimation of Error
Toward Optimal Active Learning through Sampling Estimation of Error
Sampling random documents from uncooperative search engines
Terms: document size distribution (723), document size distributions (101), document probabilities (298),
Term Dependence: Truncating the Bahadur Lazarsfeld Expansion
Terms: term dependence (140,000), term dependencies (31,900), term dependency (121,000), bahadur lazarsfeld (406), bahadur lazarsfeld expansion (199),
Boolean: "bar-yossef" +gurevich (1,100), "active learning" +sampling (309,000), gulli +signorini (1,980),
Terms: active learning (7,460,000), optimal active learning (838), active learner (309,000), ensemble methods (161,000), set of classifiers (17,500), ensembles of classifiers (8,620), sets of classifiers (627), of classifiers (300,000), ensemble learning (109,000),
Terms: federated search (2,910,000), distributed information retrieval (81,900), overlap between collections (273), query bias (830), ranking bias (797),
Terms: search engine sampler (23), search engine samplers (16), objective benchmarks for search engines (46), benchmarks for search engines (51),
Terms: random sampling from the whole web (6), bharat-broder (591), bharat-broder sampler (11), pool based sampler (19), pool based sampling (33), pool based samplers (5),
Terms: conjunctive queries (116,000), pool of queries (210), query pool (53,600),
http://developer.yahoo.com/search/boss/
"search engines": G(60.0M), Y(256M), Aol(59.5M), Live(268M),
magnetometers: G(334k), Y(975k), Aol(330k), Live(143k), Wiki(318), Lycos(32k),
Terms: topic bias (10,100), domain bias (948), relative size of search engines (23), search engine size (9,360), measure search engine size (0), a good crawler (1,600), a good robot (55,000), absolute size estimation (11), relative size estimation (19),
Boolean: "search engines" +"absolute size" (2,380), "search engines" +"relative size" (64,200),
Terms: sampling algorithms (108,000), sampling algorithm (242,000),
Boolean: "lexicon-based algorithm" (37), lexicon +"random walk" (13,800),
Terms: web search api (22,600), web search apis (590), simulate near-uniform samples (49), rejection sampling (71,400), importance sampling (426,000), metropolis-hastings (222,000), maximum degree method (48),
API Directory - ProgrammableWeb
Microsoft Windows Live Search API
Web Search Documentation for Yahoo! Search Web Services
Terms: livesearch api (22), opensearch (3,270,000), search aggregator (1,520,000), search providers (1,280,000), add search providers (70,800), top search engines (3,550,000),
Terms: google search (102,000,000), yahoo search (121,000,000), lycos search (841,000), msn search (23,500,000), live search (57,500,000), aol search (57,000,000),
Terms: google.search.websearch (565), google.search.search (141,000), your api key (149,000), api key (3,030,000), searchrequest class (228), searchresponse class (108), webrequest class (22,700), webresponse class (1,850),
Terms: overlap statistics (5,420), database overlap (849),
Terms: benchmarks for search engines (51), search engine benchmarks (100), search engine profiles (974), search engine metrics (17,000), search engine statistics (201,000),
Terms: comparing search engines (12,600), search engine comparisons (7,720), search engine overlap (2,230), sampling search engines (11), profiling search engines (44), measuring search engines (15), evaluating search engines (10,800),
Boolean: "search engines" +"random samples" (20,300), "search engines" +"random sampling" (78,100), "search engine" +"random sample" (193,000), "search engines" +"random sample" (134,000), "search engines" +"statistical sampling" (13,000),
Terms: personal metasearch (120), metasearch (7,090,000), metasearch research (180),
Terms: google web api (82,100), google web apis (58,800),
Boolean: "query based" +"random sample" (1,580), "query based" +"random samples" (421), "query based" +"random sampling" (1,040), "query based" +"statistical sampling" (82),
Query-based sampling of text databasesSampling random documents from uncooperative search engines
Random Sampling from a Search Engine‘s Index
A technique for measuring the relative size and overlap of public Web search engines
Sampling the Web: The Development of a Custom Search Tool for Research
Terms: query based (914,000), random query (84,600), web research studies (190),
Terms: sampling the web (790), sampling the internet (1,340), random samples of the web (5), random samples of the internet (2), samples of web pages (908), random samples of web pages (12), random samples of web (28), random sampling from search results (1),