Automated, large-scale data-gathering agents, known as bots (short for software robots), generate a large volume of traffic to Yahoo! and tend to tax Yahoo! with large quantities of queries. Yahoo! deals with bots by giving them a “fake” version of the information they seek. Because attempts to ignore the bot queries, once they are identified as such, simply result in a multiplication of even more bot queries, Yahoo! simply replies with a version of what the bot asked for, minimally satisfying the query, but well enough to pacify the bot and clear bandwidth for other users.
Hardware to process big data is easily accessible, the software is free, and the processing models are accessible, and so big data is no longer a niche market—there is no barrier to the commercial market. During the workshop discussion, a question was asked about whether parallel processing is difficult across multiple nodes with high-performance computing. Yahoo! does do parallel computing, with algorithms designed to solve the big data problem that are often separate and distinct from those ubiquitous to traditional high-performance computing. With big data, a whole lot of information comes in, and not much comes out. In high-performance computing, a little bit of information comes in, but the outputs are tremendous. So, a different type of tool is required for these two data environments.
Paul Twohey of Ness Computing
Ness Computing (not to be confused with Ness Corporation) is a small start-up headquartered in Los Altos, California. Currently with 15 employees, it embraces data analysis for commercial purposes. The firm’s LikeNess search engine, which draws data from various sources of information, such as social networking sites, applies machine learning techniques to tease out patterns that are then used to establish recommendations for users, generating a small profit per transaction. Ness Computing describes what it does in the following terms: “Ness creates products that connect our users with new experiences.”
Its flagship product, Ness, is an application that runs on mobile phones to provide users with restaurant recommendations. To seed the analysis, users are asked to input reviews of 10 restaurants. Based on these reviews and the powerful back-end analysis of data from other users and social networks, the Ness “app” provides recommendations on what other restaurants the user can be expected, in all probability, to like.
According to workshop presenter Paul Twohey, his firm’s approach to developing products is based on the emerging realities of electronic commerce and free social networking, whereby the user exchanges personal information for services. It requires extensive back-end computing power, which is different from high-performance computing. He stated that the computing approach is that of taking an enormous amount of data, performing complex mathematical analysis (including sentiment analysis), and providing customized output per user. Ness Computing hires only employees with superb math and computer science skills, and a workshop participant voiced that this approach is problematic, given the low availability of individuals with such skills. Several participants at the workshop noted the need for an emphasis by U.S. educational institutions on advanced math skills so that the U.S. workforce can remain competitive in the future.