Advanced analytics using Apache Spark
Advanced analytics using Apache Spark
MELBOURNE: 29-30 April
Date & Time
Mon., 29/04/2019, 9:30 am –
Tue., 30/04/2019, 5:00 pm AEST
Saxons Training Facilities
500 Collins Street
Melbourne, Victoria 3000 Australia
Developed by Jeffrey Aven, author of SAMS Teach Yourself Apache Spark and Data and Analytics with Spark using Python, this course will provide the core knowledge and skills needed to develop applications using Apache Spark.
The “Advanced Analytics using Apache Spark” module is the third of three modules in the “Big Data Development using Apache Spark” series, following the “Data Transformation and Analysis using Apache Spark” and “Stream and Event Processing using Apache Spark” modules.
This course provides attendees with practical knowledge required to perform statistical, machine learning and graph analysis operations at scale using Apache Spark.
The Apache Spark family includes APIs and libraries designed to implement machine learning and statistical analysis operations in a distributed processing environment, offering horizontal scalability and parallel computing power. The “Advanced Analytics using Apache Spark” module is designed to enable data scientists and statisticians who have experience in other statistical or machine learning frameworks to extend their knowledge and experience to the Spark runtime environment.
The course introduces R on Spark (using the SparkR package) to common R functions using the Spark framework, this includes hands on examples of how to use the Spark runtime with RStudio. The course continues on to introduce the Spark MLlib and Spark ML APIs, including practical exercises implementing regression, classification and clustering algorithms as well as feature extraction operations using Spark. Collaborative filtering applications such as recommendation engines are covered as well.
Additionally, the course provides an introduction to graph processing and analysis using Spark.
Using the Spark R API
Using Spark with RStudio
Machine learning using the Spark MLlib API
Machine learning using the Spark ML API
Feature extraction using Spark
Linear algebra using Spark
Classification using Spark
Clustering using Spark
Regression using Spark
Building a recommender using Spark
Using Spark with Jupyter
Graph processing and analysis using Spark
Who should attend?
This course is suitable for data scientists and statisticians working with data at scale using Apache Spark. Attendees should have a solid understanding of machine learning concepts and have implemented algorithms using other tools.
Data science and machine learning knowledge and skills
Basic Python programming skills
Basic Spark skills and knowledge (ability to program basic RDD and DataFrame applications in Spark)
Attendees should, by the end of the course:
Understand the SparkR package and its capabilities
Understand the implementation of machine learning algorithms in Spark
Be able to train and deploy models using the Spark MLlib and Spark ML libraries
Understand graph analysis using Spark
The instructor: Jeffrey Aven
Jeffrey Aven is a big data, open source software, and cloud computing consultant, author and instructor based in Melbourne, Australia.
Jeffrey has extensive experience as a technical instructor, having taught courses on Hadoop and HBase for Cloudera (awarded Cloudera Hadoop Instructor of the Year for APAC in 2013) and courses on Apache Kafka for Confluent in addition to delivering his own courses.
Jeffrey is also the author of several Big Data related books including SAMS Teach Yourself Hadoop in 24 Hours, SAMS Teach Yourself Apache Spark in 24 Hours and Data Analytics with Spark using Python.
In addition to his credentials as an instructor and author, Jeff has over thirty years of industry experience and has been involved in key roles with several major big data and cloud implementations over the last several years.
About our training
Eugene Dubossarsky’s courses are unlike those offered in universities, online, or by private providers. His data-science classes, in particular, give clients not just knowledge of a process, but the real power of understanding the underlying concepts, allowing them to confidently practice, manage, promote and risk-assess data science.
Dr Dubossarsky says “the way many courses teach data science is like teaching people to memorise and recite poetry in a language they do not understand”. By contrast, he confers an understanding of that language, taught in an intuitive, accessible way that leaves trainees with an instinct for data science. Keeping formulae and mathematics to a bare minimum and taking an intuitive, visual approach, Eugene’s courses deliver a compressed mentoring experience as much as they do content. This is difficult for an average trainer to replicate. Trainees benefit from his extensive knowledge and over 20 years of commercial data-science experience, as well as his unique teaching style.
The resulting testimonials speak for themselves, and candidates come from all walks of life: CEOs, general managers, salespeople, IT professionals, marketing staff, public servants and of course people from many functions in the finance world. These testimonials are extensive, and many more are available on request. With specific regard to finance, Eugene has mentored and advised senior leaders and their teams in a number of major Australian banks.
Questions and further details
Meals and refreshments
Catered morning tea and lunch are provided on both days of the course. Please notify us at least a week ahead if you have any special dietary requirements.
Course material may vary from advertised due to demands and learning pace of attendees. Additional material may be presented, along with or in place of advertised.
Frequently asked question(s) (FAQ(s))
Do I need to bring my own computer?
There’s no need to bring your own laptop or PC. Our courses take place in modern, professional training facilities that have all the computing equipment you’ll need.