Random Samples from Search Engines

Terms: Random Samples from Search Engines (0), query based sampling (2,980), query sampling (2,080), distributed query sampling (86), query based random (5),

Beyond Word N-Grams - PDF

Terms: ensemble of models (18,100), parameter space reduction (234),

An Ensemble of Models of the Acute Inflammatory Response to Bacterial Lipopolysaccharide in Rats: Results from Parameter Space Reduction

Terms: prediction suffix tree (203), tuples (3,370,000), n-grams (308,000), terms (12,300,000,000), phrases (234,000,000), word tuples (1,430), bayesian (14,300,000), unbounded vocabularies (51), ensemble of experts (1,080),

Random Sampling from a Search Engine’s Corpus - Video

Terms: degree distribution (262,000), degree distribution sampler (8), rejection sampling (71,400), monte carlo simulation (2,850,000),

Google Has the Largest Number of Dead and Old Pages

Evaluating Sampling Methods for Uncooperative Collections

Toward Optimal Active Learning through Monte Carlo Estimation of Error

Toward Optimal Active Learning through Sampling Estimation of Error

Sampling random documents from uncooperative search engines

Terms: document size distribution (723), document size distributions (101), document probabilities (298),

Term Dependence: Truncating the Bahadur Lazarsfeld Expansion

Terms: term dependence (140,000), term dependencies (31,900), term dependency (121,000), bahadur lazarsfeld (406), bahadur lazarsfeld expansion (199),

Boolean: "bar-yossef" +gurevich (1,100), "active learning" +sampling (309,000), gulli +signorini (1,980),

Terms: active learning (7,460,000), optimal active learning (838), active learner (309,000), ensemble methods (161,000), set of classifiers (17,500), ensembles of classifiers (8,620), sets of classifiers (627), of classifiers (300,000), ensemble learning (109,000),

Terms: federated search (2,910,000), distributed information retrieval (81,900), overlap between collections (273), query bias (830), ranking bias (797),

Terms: search engine sampler (23), search engine samplers (16), objective benchmarks for search engines (46), benchmarks for search engines (51),

Terms: random sampling from the whole web (6), bharat-broder (591), bharat-broder sampler (11), pool based sampler (19), pool based sampling (33), pool based samplers (5),

Terms: conjunctive queries (116,000), pool of queries (210), query pool (53,600),

http://developer.yahoo.com/search/boss/

"search engines": G(60.0M), Y(256M), Aol(59.5M), Live(268M),

magnetometers: G(334k), Y(975k), Aol(330k), Live(143k), Wiki(318), Lycos(32k),

Terms: topic bias (10,100), domain bias (948), relative size of search engines (23), search engine size (9,360), measure search engine size (0), a good crawler (1,600), a good robot (55,000), absolute size estimation (11), relative size estimation (19),

Boolean: "search engines" +"absolute size" (2,380), "search engines" +"relative size" (64,200),

Terms: sampling algorithms (108,000), sampling algorithm (242,000),

Boolean: "lexicon-based algorithm" (37), lexicon +"random walk" (13,800),

Terms: web search api (22,600), web search apis (590), simulate near-uniform samples (49), rejection sampling (71,400), importance sampling (426,000), metropolis-hastings (222,000), maximum degree method (48),

API Directory - ProgrammableWeb

Microsoft Windows Live Search API

Web Search Documentation for Yahoo! Search Web Services

Google AJAX Search API

Terms: livesearch api (22), opensearch (3,270,000), search aggregator (1,520,000), search providers (1,280,000), add search providers (70,800), top search engines (3,550,000),

Terms: google search (102,000,000), yahoo search (121,000,000), lycos search (841,000), msn search (23,500,000), live search (57,500,000), aol search (57,000,000),

Terms: google.search.websearch (565), google.search.search (141,000), your api key (149,000), api key (3,030,000), searchrequest class (228), searchresponse class (108), webrequest class (22,700), webresponse class (1,850),

Terms: overlap statistics (5,420), database overlap (849),

Terms: benchmarks for search engines (51), search engine benchmarks (100), search engine profiles (974), search engine metrics (17,000), search engine statistics (201,000),

Terms: comparing search engines (12,600), search engine comparisons (7,720), search engine overlap (2,230), sampling search engines (11), profiling search engines (44), measuring search engines (15), evaluating search engines (10,800),

Boolean: "search engines" +"random samples" (20,300), "search engines" +"random sampling" (78,100), "search engine" +"random sample" (193,000), "search engines" +"random sample" (134,000), "search engines" +"statistical sampling" (13,000),

Terms: personal metasearch (120), metasearch (7,090,000), metasearch research (180),

Terms: google web api (82,100), google web apis (58,800),

Boolean: "query based" +"random sample" (1,580), "query based" +"random samples" (421), "query based" +"random sampling" (1,040), "query based" +"statistical sampling" (82),

Query-based sampling of text databases

Sampling random documents from uncooperative search engines

Random Sampling from a Search Engine‘s Index

A technique for measuring the relative size and overlap of public Web search engines

Sampling the Web: The Development of a Custom Search Tool for Research

Terms: query based (914,000), random query (84,600), web research studies (190),

Terms: sampling the web (790), sampling the internet (1,340), random samples of the web (5), random samples of the internet (2), samples of web pages (908), random samples of web pages (12), random samples of web (28), random sampling from search results (1),