Skip to main content
RISE logo

Data engineering: How to build industry-strength data lakes and processing platforms

This three-day course teaches practical data engineering, how to build industry-strength data lakes and data processing platforms, and how to use them to build robust, scalable, and high-performing data processing applications. 

Nota bene! The course is given online.

The focus is practical engineering, with real-life examples. The course covers architecture and development end-to-end, from data collection through batch and stream data processing, to exporting and serving data artifacts to users. In selected key areas, we will go down and cover implementation in detail. The course includes theoretical lectures as well as practical exercises that teach programming of scalable data processing frameworks. The contents are vendor neutral, but we will present a recommended selection of open source and public cloud components that serve as a starting point for a complete technology stack. 

Target audience

The course targets professionals requiring a hands-on understanding of state-of-the-art data engineering practices, such as backend engineers, data scientists, BI developers, database admins, and managers of those roles. Participants are expected to have at least three years of technical work experience.

Prerequisites

Course participants are supposed to be proficient in either a major object-oriented or functional programming language, e.g. Java, Python, C++, Scala, or to be proficient in data modelling and SQL. For the practical exercises, it is recommended that participants work in pairs, ideally one person with developer experience and one person with data modelling experience.

Practical exercises will be done in Scala. Participants who do not know Scala in advance need not fear, however. Advanced language features are not needed, so it is sufficient to go through a tutorial in advance. Links will be provided.

Participants need to use own laptops, and download the exercise source code a few days in advance. Links and instructions will be provided. The exercises depend on open source libraries, downloaded as part of the preparations.

Lecturer

Lars Albertsson

Lars is founder of Scling, providing data-value-as-a-service - a partnership solution for creating business value from data. He is a frequent conference speaker on big data technology and privacy protection. Before founding Scling, Lars has worked at Google, Spotify, and as an independent consultant, helping organisations create value with data processing technology. As independent consultant, his clients have ranged from startups to banks. LinkedIn profile at https://www.linkedin.com/in/larsalbertsson

Content

The following topics will be covered in the lectures. Practical exercises will be interleaved with the lectures.

  • Overview and motivation. Why building a data platform, and how to use it.

  • Data collection. Gathering data into a data platform.

  • Batch processing. How to process data with scalable frameworks, such as MapReduce, Spark, Flink, etc.

  • Intro into serving and NoSQL. How to export data from a data platform, and how to serve data-driven applications.

  • Workflow orchestration. Connecting batch processing flows into robust pipelines.

  • Real-time processing. Data processing with scalable stream processing frameworks.

  • Deployment. Deploying batch processing applications in production.

  • DataOps and quality assurance. Testing, continuous deployment, error handling, and engineering data quality.

  • Lifecycle, evolution, schemas. How to evolve data pipelines over time without breaking applications.

  • Privacy by design. Architecting data processing in order to comply with privacy regulations.

Other info

The course is held in English, unless there is unanimous agreement for holding it in Swedish. The course material is in English.  

Consider registering in pairs, coupling developer experience and data-modelling experience.

Maximum number of participants is 24, and the minimum for holding the course is eight. 

Practical information

Time

26 maj 2020 - 28 maj 2020

Venue

Online 

Type of event

Training

Validation

Course certificate

Registration info

Fee

11 000 SEK excluding VAT

Payment details

Payment by invoice before the course. 

Final registration date

10 maj 2020

Registration for Data Engineering Course

Processing of your personal data