My Quick Guide to Data Science: Where to Start?

After three years of working as a Data Analyst, I have decided I want to pivot into Data Science. I think I already have a relatively solid base of SQL, Python & Data Visualization to start considering the next step in my professional career.

In this article, I will explain to you how I created my study plan to achieve my goal, what resources I selected and why, and my best practices in case you are willing to pursue a career in Data Science.

Your Starting Point

If you want to create a successful study plan to land a job in Data Science, I think you cannot just start taking courses. You need to know your starting point before considering what resources you need to study or which projects you want to do.

The fundamental areas I reviewed related to Data Science were the following:

  • Statistics & Mathematics
  • Programming languages (Python/SQL)
  • Machine Learning (specially Unsupervised Learning)
  • Fundamentals of Data Engineering
  • Other scopes, like A/B testing or working with Docker containers.

Depending on your level and interests, this list of areas may change. For example, maybe you prefer to study R instead of Python. Or you are starting from scratch, so you would choose to study with Excel, SQL, and data visualization tools.

What kind of questions do you need to ask? For example, regarding programming languages, such as SQL or Python, I asked myself about my level of understanding and how I practically mastered those topics. 

  • Which level of that specific programming language do I have? Do I know all the libraries to perform all the basic tasks?
  • Can I consider myself an expert in that field? Or are there some topics where I could still improve? 
  • What other advanced concepts do I need to know?

For example, I have been working with Python for around two years. I can perform advanced analyses with the Pandas library and do complex data transformations. However, I haven’t had the opportunity to work with OOP (Object-Oriented Programming). It can help you to organize your code into functions and classes. So, I have decided to learn more about those concepts.

Another example. I am a Math degree student, so I won’t spend time reviewing Linear Algebra or Calculus, as it’s part of my program at the University. However, you may want to review some concepts if you haven’t studied Math since High School.

The Goals

Once you have selected the areas you want to improve, you need to set your goals. I always encourage you to define SMART goals because it is easier to track the progress, and you need to provide all the information you can about it to achieve it – to be as specific as you can. If you haven’t heard of this terminology before, you can find all the information in this article.

You have to define what you want to achieve and in which period you plan to do it. Then, add all the things you plan to do to achieve it. Let’s give an example. Imagine you want to start learning Python as your first programming language. Your goal could be something like this:

“I will study Python.”

However, this is not specific, time-bounded, or measurable. However, it’s easy to change this. For example:

“I will study a Python basic course for seven days, starting tomorrow, 10th September, until 16th September 2022. I will spend at least eight hours per day.”

That’s better! However, do you think you could finish the course in seven days? You might have a full-time job, or maybe you are at University. Also, sparing time will help you to retain the concepts. I am missing some specifications about what you will learn on that course, so you don’t need to review that in the future.

Let’s try again:

“I will study a Python basic course (variables, conditionals, for/while loops, file management) for one month, starting tomorrow, 10th September, until 9th October 2022. I will spend at least one hour per day.”

Great! You got it, 🙂

Just one more piece of advice. Don’t set a lot of goals at the same time. From my point of view, it’s better to concentrate your efforts on one or two subjects and then start the following one once you are finished.

The Timings

Now, it is time to think about how much you can dedicate to learning Data learning goals. It doesn’t matter if you are self-studying or if you are taking a master’s program. You need to understand how much time you need to take your courses, practice, and build a portfolio.

In my case, I expect to finish this study plan in around one year and a half. As I only can dedicate six hours per week on average, I am aware it will take a while to finish. Your particular case may be different. Maybe you have a part-time job, or you can dedicate a few months to studying full-time. 

Two years ago, when I was taking the Bootcamp course, I spent around six months learning Python, SQL, and Power BI. It allowed me to find a job as a Data Analyst in a record time. However, at this moment, I am studying for a Math degree at University and working full-time.

For this reason, it is important to prioritize what areas of expertise are the most remarkable ones and what of them are just nice to have, so you don’t get lost before starting. Think about how many hours you have available per week and how many hours you will need to invest in the courses. 

The Study Plan

There are tons of Data Science roadmaps on the Internet. Probably, too many of them. At the very beginning of starting this blog, I researched a lot of different roadmaps, where areas of study were quite diverse. Even inside the same field, the knowledge that was recommended to acquire was pretty different. For this reason, I just decided to tailor my study plan, so I can adjust it to my necessities. 

If you are starting with Data Science and have no idea how to start, I recommend you to learn SQL, Excel, and a Business Intelligence tool. These tools will give you the basic knowledge to become a Data Analyst. Find more information about how to start in Data Analytics in my previous post

Let’s start!

Git & GitHub

As a Data Analyst, I have been using Git & Github in a very superfluous way. For example, I used in Looker’s backend as a control version environment. However, I don’t know all its potential. That’s why, before starting to develop my portfolio, I want to understand how to use the platform and how to work with the version system.

I have chosen the following course:

Statistics & Mathematics

As I am a Math degree student, I am not going to make emphasis on this area. I am already covering topics such as Linear Algebra, Calculus, and Probability, which are remarkable areas in the Data Science field.

However, I want to spend some time reviewing Statistics for Data Science, focusing on Python code. For this reason, I am going to start this journey with the following resource:

I like this resource because it includes a lot of coding snippets in Python and R. This is helpful, especially for Python. After all, some statistics functions are not so easy to write in this programming language. For example, creating a frequency table in R is much simpler than in Python. 

This book should give me enough knowledge to get the foundation of statistics for Machine Learning. That’s why I won’t include more Statistics resources at this stage.

Machine Learning

As I explained in another post, I took a Data Science Bootcamp almost two years ago. However, I didn’t have the chance of getting into learning the algorithms that I need the most in the Marketing field:

  • Time series & forecasting
  • A/B testings
  • Regression (linear, non-linear, logistics)
  • Clustering (for example, for audience segmentation)

During the Bootcamp, I was learning a lot of new concepts every day, so it was complicated to allocate time to learn all the algorithms and their applications.

My selection for this is:

  • Time Series Analysis, Forecasting, and Machine Learning. I work with time series constantly, so this course will give me the basics to learn how to create forecastings from the basic to the most advanced topics. I know this is not an essential topic in Machine Learning, but I prefer to push time series as it is one of the most important areas for marketers.
  • Deep Learning Prerequisites: Linear Regression in Python. This course will give me a basic understanding of linear regression (unidimensional, polynomial & multidimensional). It also speaks about weights, overfitting, and dummy dimensions… which are basic concepts within the Data Science domain.
  • Deep Learning Prerequisites: Logistic Regression in Python. Logistic regression is pretty useful when you predict whether a customer is likely to buy a product based on available information, like their age or the place where they live, for example. Also, this algorithm is the base of the concept of neural networks, in case you want to learn more advanced topics in the future.

I have compared a lot of Machine Learning courses. In the end, I have selected these Udemy courses because they explain how the model works internally and why you are getting some specific results. It’s the perfect mix between theory and practice.

I will probably want to study some NLP concepts after doing these courses. However, I think it’s a great starting point to get deeper into Machine Learning algorithms!

Programming languages (Python)

I have been working with Python for around two years. For this reason, I won’t cover most fundamental topics (variables, loops, conditionals) or NumPy/Pandas at a beginner level.

That’s why I will focus on more advanced functions in Pandas (melt/pivot/map/lambda) and the chain rule. Also, I want to learn more about OOP (Object-Oriented Programming) to improve the level of my scripts.

  • Advanced Python Programming: Build 10 OOP Applications. This course is not related to Data Science. However, I like it because it has different applications and projects, so I can further develop my skills in OOP. It will allow me to improve my understanding of classes and functions
  • Effective Pandas: Patterns for Data Manipulation. For me, this is one of the best Pandas books if you want to level up your skills. This book will provide more advanced best practices and apply more complex transformations to the data. It’s full of code snippets that you can use on your code.

Fundamentals of Data Engineering

Finally, I have selected two courses related to Data Engineering. There will indeed be a Data Engineers team who will be in charge of creating and maintaining the ETL process. However, sometimes the Data Scientist can help to develop some data pipelines. For example, when you need to run a Machine Learning model into production.

  • Data Warehouse Fundamentals for Beginners. This course covers the most fundamental concepts regarding data warehouses and data pipelines. It will review how to design a database, how to build an ETL process, and compare different types of modern schemas (start & snowflake).
  • GCP – Data Engineering Certification. I have selected this course because I have been working with the Google Cloud Platform for around three years. That’s why I prefer to continue studying this platform. However, if you are more interested in AWS or Azure, you could find similar courses. In this case, for AWS, I can recommend this complete course that has been updated through the years.

And this is all! These are the main courses I want to review to become a Data Scientist. Of course, I plan to combine these courses with two or three end-to-end projects. This means starting the projects from scratch (gathering the data from an API or web scrapping) and going across all the stages of the data cycle. Projects are the last topic I want to cover in this article.

The Projects

You need to put into practice all you learn. It is the only way to retain the new concepts and understand what you are doing. Experimentation is the key! The best way to do this is by developing your projects. I would recommend that you chose courses with different exercises and examples you can replicate. 

For example, my first project will include analyzing and predicting the number of subscribers and organic visualization hours from a YouTube channel. You need to get a certain number of those variables to monetize your channel. I have been running some awareness campaigns on Google Ads for some of the videos, so I will include this information in the project. It could be interesting to explore how those paid campaigns affected the organic performance of the videos.

I have chosen this project because it involves connecting different APIs, and time series to calculate the behavior of those variables, and it allows me to conduct some A/B tests as I am one of the proprietaries of the channel.

If you don’t know what projects you can develop with your current level, you can follow this KDNuggets guide. It has different resolved Data Science projects based on your level of expertise. Also, you can find a second part on this link, with a more advanced project. 

I recommend getting some inspiration from those projects but building your ones. That’s the best way to practice all your skills from beginning to end: gathering your data, cleaning and transforming it under your necessities, performing data exploratory analysis, and creating your predictions.

Conclusions

Let’s sum up the steps I have followed to create this study plan:

  1. Verify which areas you need to improve or learn from scratch to achieve your goal. I recommend you be as honest as possible on this step.
  2. Set SMART goals that are realistic and achievable in time. Be clear on why you want to focus on those goals.
  3. Be aware of your timings. How much time can you dedicate to improving your data skills? Could you apply all you have learned to your current job? Is there any daily task you could automate with Python?
  4. Write down a list of areas & subareas you need to improve. Be as specific as possible.
  5. Select the materials based on your necessities. I recommend you read some course reviews (YouTube is great for this) and verify the content of each course. Don’t waste too much time looking for the perfect course. Stick to one source and finish it before starting the next one. Just pick one!
  6. Think about what projects you will develop to apply what you learned in those courses. Don’t forget to think about your level of knowledge before picking one.
  7. Don’t try to learn all. It’s almost impossible to cover all the topics. You will never know all the Python/R functions or domains of all the machine learning models!

Leave a Reply