Big data in practice using Hadoop (EN/NL/FR)
Startdata en plaatsen
Beschrijving
This 2 day ABIS course builds on the concepts which are set forth in the Big data architecture and infrastructure course. you will get hands-on practice on linux with Apache Hadoop: HDFS, Yarn, Pig, and Hive. You learn how to implement robust data processing with an SQL-style interface which generates MapReduce jobs. You also learn to work with the graphical tools which allow for easy follow-up of the jobs and the workflows on the distributed Hadoop cluster.
Intended for whoever wants to start practising "big data" (Hadoop); Familiarity with the concepts of data stores and "big data" is necessary in order to attend this cours, and minimal knowledge of SQL, UNIX and Java is usefull.
Remark…
Veelgestelde vragen
Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.
This 2 day ABIS course builds on the concepts which are set forth in the Big data architecture and infrastructure course. you will get hands-on practice on linux with Apache Hadoop: HDFS, Yarn, Pig, and Hive. You learn how to implement robust data processing with an SQL-style interface which generates MapReduce jobs. You also learn to work with the graphical tools which allow for easy follow-up of the jobs and the workflows on the distributed Hadoop cluster.
Intended for whoever wants to start practising "big data" (Hadoop); Familiarity with the concepts of data stores and "big data" is necessary in order to attend this cours, and minimal knowledge of SQL, UNIX and Java is usefull.
Remark: Course description in English; Dutch and French versions are available on the ABIS website. Courses are planned in Dutch, English, and French. Consult the ABIS website for alternate course formats.
Main topics:
- Motivation for Hadoop & base concepts
- The Apache Hadoop project and the components of Hadoop
- HDFS: the Hadoop Distributed File System
- MapReduce: what and how
- The workings of a Hadoop cluster
- Writing a MapReduce program
- Implementing MapReduce drivers, mappers, and reducers in Java
- Writing Mappers and Reducers by use of an other progamming or scripting language (e.g. Perl)
- Unit testing
- Writing partitioners for optimizing the load balancing
- Debugging a MapReduce program
- Data Input / Output
- Reading and writing sequential data from a MapReduce program
- The use of binary data
- Data compression
- Some frequently used MapReduce components
- Sorting, searching, and indexing of data
- Word counts and counting pairs of words
- Working with Hive and Pig
- Pig as a high-level basic interface, which will generate a sequence of MapReduce jobs for us
- Hive as a high-level SQL-style interface, which generates a sequence of MapReduce jobs
- The Parquet file format: structure and typical use; advantages of data compression; interoperability
- Short introduction to HBase and Cassandra as alternative data stores
Intended for/Audience: Whoever wants to start practising "big data": developers, data architects, and anyone who needs to work with big data technology.
Background/Prerequisites: Familiarity with the concepts of data stores and more specifically of "big data" is necessary; see our course Big data architecture and infrastructure. Additionally, minimal knowledge of SQL, UNIX and Java are useful. Experience with a programming language (e.g. Java, PHP, Python, Perl, C++ or C#) is a must.
Training Method/Didactics: Classroom instruction, with practical examples and supported by extensive practical exercises.
Duration: 2 days.
Blijf op de hoogte van nieuwe ervaringen
Deel je ervaring
Heb je ervaring met deze cursus? Deel je ervaring en help anderen kiezen. Als dank voor de moeite doneert Springest € 1,- aan Stichting Edukans.Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.