Table of Contents
- Career Misconceptions
- 1. The Mathematics Behind Machine Learning Algorithms
- 2. Statistical Thinking
- 3. Defining The Data Scientist Role
- Mathematics of Machine Learning- Understanding the underlying mathematics of machine learning algorithms.
- Importance of Statistical Thinking- Why knowing statistics and even thinking statistics is important.
- Defining The Data Scientist Role- Exactly what is a Data Scientist’s responsibilities and why they vary so much across organizations.
=> Join the Waitlist for Early Access.
1. The Mathematics Behind Machine Learning Algorithms
Python libraries like Sci-Kit Learn and Keras are important especially in the beginning stages of a novice’s machine learning journey.
Though familiarity with these packages is no substitute for understanding underlying mathematics (linear algebra and Calculus) abstracted away within these user-friendly packages.
Linear Algebra- areas of focus are data structures, tensor operations, matrix properties Eigenvectors and Eigenvalues,
Calculus- Areas of focus are Limits, derivatives and differentiation, partial derivatives, integral calculus, automatic differentiation, and probability.
2. Statistical Thinking
Many prospective Data Scientists at first glance assume they can progress solely on learning basic data manipulations, data visualizations, and built-in machine learning algorithms to solve all problems data related, but data mining must be done to find any real salient insights to solve business problems or science problems.
That’s where statistics come in to play. There are the basic descriptive statistics that involves finding measures of center (including topics such as mean, median, mode, etc. )
Then there are inferential statistics that focus primarily on hypothesis testing and confidence intervals. Both are heavily based in probability theory.
So many prospective Data Scientists have bright-shiny-object syndrome; where they want to learn the main programming languages for Machine Learning Python, R, and Scala before having a strong foundation in statistics.
Some data problems only require statistics and sound survey methodology in order to be resolved, utilizing push button tools such as Excel, SPSS, Tableau, and in some cases more declarative programming languages like SQL (Structure Query Language).
Which ever statistical software that is chosen, the fact remains the same; in order to learn Data Science and to become a Data Scientist deep statistical knowledge is paramount. Even in job postings in both the public sector and private sector the most visible job postings are titled “Statistician (Data Scientist) or Mathematical Statistician (Data Scientist). Thus, statistical thinking is a necessity.
3. Defining The Data Scientist Role
If you search the job position “Data Scientist” on Indeed.com you will notice some similarities across job postings.
In most cases you will find more discrepancies across job postings when it comes to defining the actual role and qualifications.
Across each role there are tins of overlap, typically in these categories:
Responsibilities- Extract insights from data, gather data, and data mining.
Qualifications- Bachelor’s Degree and 1-3 year minimum.
Technical Skills- Excel, Statistics, a programming language or scripting language.
Position Description- Varies according to the company’s needs.
These differing job descriptions for similar roles…
Company A: Google
Company B: Department of Agriculture
Company C: Focal Systems
… all seem to come with their own company specific job description and expectations for the role of Data Scientist.
There are some key reasons for discrepancies amongst these job postings.
Data Science is a relatively new field and becomes more specialized as time goes on.
Data Science is a buzz-word and competing companies are in a data-arms race to be the first to capitalize on data in their respective industries.
The previous two reasons have left HR Specialists and other non-technical audiences in a perpetual state of antiquated data literacy.
Data Science is a rewarding and ever evolving career path. As time goes on the role will become more specialized and may even be redefined into a new title completely.
Tanner AbrahamData Scientist and Software Engineer with a focus on experimental projects in new budding technologies that incorporate machine learning and quantum computing into web applications.
=> Join the Waitlist for Early Access.