CP7019 MANAGING BIG DATA SYLLABUS 3RD SEM ME CSE SYLLABUS REG-2013 - Anna University Internal marks 2018

CP7019 MANAGING BIG DATA SYLLABUS 3RD SEM ME CSE SYLLABUS REG-2013

ANNA UNIVERSITY, CHENNAI
REGULATIONS - 2013
CP7019 MANAGING BIG DATA SYLLABUS
ME 3RD SEM COMPUTER SCIENCE AND ENGINEERING SYLLABUS
CP7019 MANAGING BIG DATA SYLLABUS
CP7019 MANAGING BIG DATA SYLLABUS
OBJECTIVES:
 Understand big data for business intelligence
 Learn business case studies for big data analytics
 Understand nosql big data management
 Perform map-reduce analytics using Hadoop and related tools

UNIT I UNDERSTANDING BIG DATA
What is big data – why big data – convergence of key trends – unstructured data – industry examples of big data – web analytics – big data and marketing – fraud and big data – risk and big data – credit risk management – big data and algorithmic trading – big data and healthcare – big data in medicine – advertising and big data – big data technologies – introduction to Hadoop – open source technologies – cloud and big data – mobile business intelligence – Crowd sourcing analytics – inter and trans firewall analytics

UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – aggregates – key-value and document  data models – relationships – graph databases – schemaless databases – materialized views – distribution models – sharding – master-slave replication – peer-peer replication – sharding and replication – consistency – relaxing consistency – version stamps – map-reduce – partitioning and combining – composing map-reduce calculations

UNIT III BASICS OF HADOOP
Data format – analyzing data with Hadoop – scaling out – Hadoop streaming – Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts – Java interface – data flow – Hadoop I/O – data integrity – compression – serialization – Avro – file-based data structures

UNIT IV MAPREDUCE APPLICATIONS
MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN – job scheduling – shuffle and sort – task execution – MapReduce types – input formats – output formats

UNIT V HADOOP RELATED TOOLS
Hbase – data model and implementations – Hbase clients – Hbase examples – praxis.Cassandra – cassandra data model – cassandra examples – cassandra clients – Hadoop integration. Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin scripts. Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation – HiveQL queries.

TOTAL: 45 PERIODS

OUTCOMES:
Upon Completion of the course,the students will be able to
 Describe big data and use cases from selected business domains
 Explain NoSQL big data management
 Install, configure, and run Hadoop and HDFS
 Perform map-reduce analytics using Hadoop
 Use Hadoop related tools such as HBase, Cassandra, Pig, and Hive for big data analytics

REFERENCES:
1. Michael Minelli, Michelle Chambers, and Ambiga Dhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013.
2. P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence", Addison-Wesley Professional, 2012.
3. Tom White, "Hadoop: The Definitive Guide", Third Edition, O'Reilley, 2012.
4. Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
5. E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012.
6. Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
7. Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
8. Alan Gates, "Programming Pig", O'Reilley, 2011.

No comments:

Post a Comment