Big Data Techniques, Technologies and Trend

Big Data Techniques, Technologies and Trends

News

WICHTIGE DEADLINES:
14.11 deadline for pre-submission (optional)
21.11 feedback from Dr. Tudoran for the pre-submission
28.11 final submission - hard deadline
5.12 final result notification by Dr. Tudoran
Die Materialien zur Vorlesung sind nun online: hier.

Die Beschreibung des abzugebenden Projektes ist online: hier.

Die Note ergibt sich aus einer mündlichen Prüfung. In dieser Prüfung werden Fragen zu einem Software-Projekt, welches als Einzelabgabe nach dem Kurs anzufertigen ist gestellt, sowie zum Stoff der Vorlesung. Weitere Informationen hierzu folgen zu Kursbeginn.

Schedule:

	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
08:30-10:00	Course	Course	Course	Course	Course	Practical Session
10:00-11:30	Course	Course	Course	Course	Course	Practical Session
12:30-14:00	Course	Course	Course	Practical Session	Practical Session
14:00-15:30	Course	Practical Session	Practical Session	Practical Session	Practical Session
15:30-17:30	Practical Session	Practical Session	Practical Session	Practical Session	Practical Session

Dozent

Dr. Radu Tudoran

Studiengang

Master-Studiengang Informatik

Leistungspunkte

5 LP

Course

Lecture „Big Data Techniques, Technologies and Trends“, 2 SWS
Hands-on exercises, 2 SWS

Content description

Course description: Big Data is one of the main buzz words nowadays, being a primary focus both for academic research and for industry. Big Data has emerged as a revolution driven by the continuous increasing volumes of data that are being collected at increasing velocities from various source: social networks, IoT, scientific simulations, finance, weather forecasting, etc. Tackling such challenges, commonly referred to as the V’s of Big Data, has lead to the development of a plethora of technologies and concepts. Batch and stream processing are the main classes of dealing with the data, which can be either offline or in real time. Starting from these two categories, different programming models such as MapReduce or reactive programming have been recently proposed. Additionally multiple technologies have been, and are developed to facilitate the processing and the data management of Big Data scenarios: HDFS, MapReduce, Spark, Storm, Flink, Kafka, HBase, Hive, etc. All these form today the Hadoop ecosystem. This course aims to give an introduction to technologies and concepts that build the Hadoop ecosystem, both as lecture courses and practical sessions. From the point of view of the lecture courses the focus lays with giving the theoretical backgrounds of the concepts and mechanisms that enable Big Data processing. The course will present the different programming models, strategies to deal with large data sets or with data sets on the fly (e.g., MapReduce and MapReduce pipelines, Stream topologies, Windows, SQL and Hive Queries and interactive queries). From the point of view of the practical sessions the objective is to make the students familiar with the main Big Data processing tools used today in industry such as MapReduce, HDFS, Spark, Flink, HBase, Kafka. At the end of the course the students will have a good understanding of feasible approaches to address various Big Data scenarios as well as hands-on experience with some of the most commonly used Hadoop tools.

Course Topics to be addressed:

Overview of Big Data: what it is, why it has emerged and future trends
Data models and large scale infrastructures (cluster, grid, cloud, HPC)
Batch processing
- Distributed storage systems concepts: GFC, HDFS and Cloud Public Storage (Azure Blobs and AWS S3)
- NoSQL storage and distributed message queues
- Google MapReduce programming model and Hadoop MapReduce
- High level semantics processing tools for offline data: Spark, Hive, Pig, Flink
Stream processing:
- Stream overview: what it is and what are the main difference with respect to batch processing,
- Stream concepts for data processing: operators, windows, sinks, ETLs
Project topics

Evaluation:

Project: A topic will be choosen from multiple available ones (sentiment analysis, twitter trends analysis, internet/social media search...)
Solution: A software solution will be design, built and delivered as the outcome of the project.
Technology: The solution will be built using multiple advanced technologies covered in the course.
Evaluation: The solutoin design will be presented toghether with a demo to show the specific use case.

Study results/competences

The result after complition of this course is that the students will:

Have an overview of the principles of Big Data analytics
Have an understanding of the data analytics ecosystem
Have knowledge about the Big Data technologies most used in industry and research
Have practical experience with Big Data tools from the Hadoop ecosystem, which will give competitive advantage for getting jobs in the domain
Have a reference project in the area of Big Data that they can showcase in the future to prove their practical experience for industry

Literature

Literature will be given during the lecture

Module usage

Compulsory module in the area of practical and technical computer science
As focus module
Individual complementary module
Application module for the complementary area in the Master studies mathematics

Prerequisites

Bachelor students must have been passed the following modules:

„Programmierung”
„Rechnerarchitektur“
„Algorithmen und Datenstrukturen”
„Theoretische Informatik”

Students should bring a laptop with them. Mandatory tools, listed below, should be installed on your laptop before the course starts:

Java JDK 7 (or higher; JRE is not enough)
Apache Maven 3.x:https://maven.apache.org/install.html
Eclipse Scala IDE 4.0.0: scala-ide.org/download/sdk.html 
ci.apache.org/projects/flink/flink-docs-release-1.1/internals/ide_setup.html

More tools will be installed and used during the practical sessions.

Requirements for credit points

Successfull participation in hands-on exercises
Submission of a final software project -> description
Oral exam: presentation of the own project, questions about the project and the lecture

Frequency of offering

Unregular

Lecturer

Dr. Radu Tudoran (Huawei, Munich)

Verantwortlichkeit: