Cloudera will provide a tutorial aimed at producers and users of large volumes of data. Do you deal with TBs on a regular basis? Are traditional databases not doing what you need? Are your challenges related primarily to processing and analyzing data, rather than simply finding it? Hadoop and MapReduce might be just what you need. Google developed an integrated storage and processing framework to scale with the web. After publishing their results, the Apache Software Foundation, along with major contributions from Yahoo!, Facebook and others got the Hadoop project off the ground. Hadoop provides a fully open source implementation of the same system Google uses to perform deep analysis on web scale data.
This half-day tutorial will teach you what you need to know to work more deeply with Hadoop, and help you think about the following questions:
We’ll cover basics around working with large scale data systems and introduce participants to the MapReduce programming model. More importantly, we’ll focus on how to “think in MapReduce.” We’ll go through examples of how to convert common tasks into MapReduce, and provide the foundations which enable you to convert your own specific tasks to this model. We’ll also point you to resources you need to get up and running with Hadoop in your own data center or in the cloud.
Christophe Bisciglia joins Cloudera from Google, where he created and managed their Academic Cloud Computing Initiative. Starting in 2007, he began working with the University of Washington to teach students about Google’s core data management and processing technologies – MapReduce and GFS. This quickly brought Hadoop into the curriculum, and has since resulted in an extensive partnership with the National Science Foundation (NSF) which makes Google-hosted Hadoop clusters available for research and education worldwide. Beyond his work with Hadoop, he holds patents related to search quality and personalization, and spent a year working in Shanghai. Christophe earned his degree, and remains a visiting scientist, at the University of Washington.
Aaron Kimball is a software engineer at Cloudera, Inc., the Commercial Hadoop company. Aaron is the principle developer of Sqoop, the SQL-to-Hadoop database import/export tool. Aaron has been working with Hadoop since early 2007, and contributes actively to its development. Through Cloudera, he additionally provides training to developers and system administrators working with Hadoop. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.
Natalia Dugandzic
415-947-6709
ndugandzic@techweb.com
Matthew Balthazor
949-223-3628
mbalthazor@techweb.com
Have a suggestion for a speaker or topic at Web 2.0 Expo San Francisco? Send an email to: sf-idea@web2expo.com
Maureen Jennings
707-827-7083
maureen@oreilly.com
or
Natalia Wodecki
415-947-6762
nwodecki@techweb.com
View a complete list of Web 2.0 Expo contacts.