This page serves as a meta compilation of resources that have been helpful for me in my data science career. I’ll update this over time!
In General:
- Remember that you are driving this process! It is rare that people will push you or give you the time to prep or do any of these things. You have to remember to block out time for yourself and keep to it.
- If you look at the Refresh Technical Skills section, it’s going to look like we’re expected to know a lot. We kinda are. It really depends on the job you’re looking to do or are applying for. I’ve applied to jobs that actually wanted a hardcore statistician and failed those interviews. I’ve also applied to jobs that are looking for a SQL expert and failed those too. Lastly, I’ve applied to jobs where the technical interviews had a heavy focus on data structures and algorithms and definitely failed those. Try not to stress out: it’s extremely difficult to be a technical master on all of these topics all at once if you are not using them frequently.
Reddit on Applying:
- (Hopefully almost) everything you need to know about data science interviews (EU perspective) : datascience (reddit.com)
- Are you a data scientist without a PhD? Perhaps this is helpful: How I achieved a 6-figure base salary Data Scientist job with 1 year of work experience and a bachelor’s degree. : datascience (reddit.com)
On LinkedIn:
- From Nader Mowlaee: www.linkedin.com/in/engineeringcareercoach or Engineer Your Mission | Helping People End The Job Search
- Who I met from Albert’s List: https://www.linkedin.com/company/albert’s-list
- About section should use Problem, Skill, Impact
- Use the STAR framework when writing about work experience: Situation, Task, Action, Result
- Look at the job description when you applied to figure out how to relate PSI and STAR with your work and results
On Resumes:
- One page! Unless you are applying to academic positions that specifically ask for your whole CV. I’ve had tons of people tell me that resumes should be 1 page
- I heard somewhere that people spend 6-15 seconds looking at a resume
- Your application is very likely going to be digital. To save visual space on my resume, I used icons and embedded hyperlinks where possible
Classes:
Code:
- Free Code Camp, where you can pick specializations: https://www.freecodecamp.org/
- Learn with Leon’s 100 Devs for WebDev: https://www.twitch.tv/learnwithleon
- Got this from Yuka_with_Data’s Discord server: https://training.linuxfoundation.org/resources/
Statistics:
- Curtis Miller’s lectures on Intro to Statistics: http://www.math.utah.edu/~cmiller/classes/SU203070/
- The course site also has links to the YouTube recordings of his lectures
- Follow up class: http://www.math.utah.edu/~cmiller/classes/SP203080/
Applied Statistics/Machine Learning:
- Introduction to Statistical Learning: https://www.statlearning.com/
- The textbook is free and is a good way to refresh statistics concepts as applied to machine learning. I don’t think it’s a great statistics review, however
- Probability, Statistics, and Data: A fresh approach using R (slu.edu)
- Fast.AI if you have finished introductory courses in data science and want to move on to Machine Learning
Working on Code/Problems:
For the record, I don’t think working through algorithms problems are necessarily reflective of what coders or especially data scientists will do on a day to day basis. However, they are considered part of our interview processes.
Leetcode, Hacker Rank, and Code Wars are somewhat all similar and I find that solving most Easy level and some Medium level problems will probably be an accurate reflection of things you should expect to see in interviews. Medium is even a stretch.
For SQL, I’ve really liked both StrataScratch and Mode. SQL is an interesting language where, in my opinion, the difficult problems come from more complex databases and understanding syntax…until you get to the point where you need to optimize your queries on huge datasets. Both of these resources start teaching more advanced SQL functions.
Python:
- Leetcode: https://leetcode.com/
SQL:
- StrataScratch: https://www.stratascratch.com/
- Mode: https://mode.com/sql-tutorial/
Data Sets:
- Kaggle: https://www.kaggle.com/datasets
- Kaggle has TONS of datasets. You can compete there like with Leetcode but given that Kaggle has had scandals with people cheating, (and the dubiousness as to whether people actually check your Kaggle score), I wouldn’t try to compete, but instead leverage their datasets to start working on projects or problems that are more interesting to you!
- Microsoft Research Open Data: https://msropendata.com/
- US Government’s Open Data: https://data.gov/
Puzzles
- Advent of Code: https://adventofcode.com/
Compensation:
- Blind: https://www.teamblind.com/
- Blind is like an inverse LinkedIn: instead of people praising employers, it has anonymous but verified employees discussing negatives about the companies they work for. You have to list your total compensation when posting/commenting
- Levels: https://www.levels.fyi/
- Levels has employees anonymously share their offers and allows people to compare compensation packages
Content Creators:
You are more than welcome to follow along with my resources, but I believe the following creators have more established libraries of content and maybe you’ll vibe more with them!
Twitch:
- Learn with Leon: https://www.twitch.tv/learnwithleon
- Leon is leading a 100Devs class on webdev and people are getting employed from it!
- Yuka with Data: https://www.twitch.tv/yuka_with_data
- Yuka is hosting some co-working sessions where she works through R, Python, and/or SQL and can help you be accountable since she publicly works through problems
- LeahTCodes: https://www.twitch.tv/leahtcodes
- Leah is working through various problems and projects on stream and has said she eventually wants to go into data science and is pivoting into tech from teaching math
- TimeEnjoyed: https://www.twitch.tv/timeenjoyed
- Time is learning Python publicly! She is also a great artist!
- Earend: https://www.twitch.tv/videos/1311964666
- Earend lifts like me, but unlike me is way stronger and working on game dev
- AmbivalentBunnie: https://www.twitch.tv/ambivalentbunnie
- Bunnie is another artist pivoting into tech and publicly works on problems in Python
- VerolaFox: https://www.twitch.tv/verolafox
- Veronica is looking to get into Game Dev and is 100Devs taught! She works through various problems on stream
- TechyGrrrl: https://www.twitch.tv/techygrrrl
- Techy is working through various projects on stream and they are all in languages I’m not familiar with but extremely cool!
- Metal and Coffee: https://www.twitch.tv/metalandcoffee_
- She’s not working in languages I’m familiar with but has a cool channel and knows her stuff
- Nick Wan: https://www.twitch.tv/nickwan_datasci
- Director of Analytics for the Cincinnati Reds, also a streamer and YouTuber
- Rob Mulla: https://www.twitch.tv/medallionstallion_
- Rob is a data scientist streamer who consistently makes me realize I have much to learn in this field
- Al Sweigart: https://www.twitch.tv/alsweigart
- Author of “Automate the Boring Stuff with Python”
YouTube
- Bukola – YouTube: Bukola is a cloud solutions evangelist, I believe, at Google, and I like her career discussion content
- Tina Huang – YouTube: Tina was a data scientist at FAANG, and I like her content around balancing life and work
- Tina’s Study Sessions channel: Tina studies on camera for 2 hours a day 4 days a week
- Statquest
- Educational videos on stats, machine learning, and data science!
- Dr. Rachael Tatman
- Data scientist talking about NLP and AI/ML ethics and impacts
Non Video Based
- Chip Huyen at Chip Huyen | LinkedIn: posts a lot of helpful and interesting discussion on data science, machine learning, and career advice
Podcasts:
- Software Engineering Daily: https://softwareengineeringdaily.com/
- I think this is the biggest software engineering podcast and it has a subset discussing machine learning. There are some really cool episodes talking about tech deployment and history.
- In all honesty, the founder and main host decided to literally go off his meds when COVID shelter in place happened and revealed more of his true personality, conspiracy theory leanings, and generally entitled and somewhat fragile mindset and personality. I think the best content is before 2021. Afterwards, he begins pressuring guests to agree with him or provide funding for a start up he founded and the interviews become low quality. I find myself skipping half the episodes now.
- Some time in 2021, the host of Data Skeptic came in and started to host, and the quality of the episodes started going back up! Basically, the less Jeff Meyerson in the episodes, the better…although I wrote this line before I found out he passed away in July 2022
- Not So Standard Deviations: https://nssdeviations.com/
- Roger and Hilary talk about data science in academia and industry.
- This has actually become one of my favorite tech related podcasts
- Data Skeptic: https://dataskeptic.com/
- Data Skeptic and Not So Standard Deviations tend to talk about practical effects of using data science, which is why I find them to be so interesting. How do we integrate with teams to help everyone work better?
- Linear Digressions: http://lineardigressions.com/
- This podcast has ended, which is unfortunate, but the episodes should still be available!
- Partially Derivative: http://partiallyderivative.com/
- This podcast has ended, which is unfortunate, but the episodes should still be available!
- It’s been a while since I listened to either Linear Digressions or Partially Derivative, but I remember missing both when they announced their ends
- Towards Data Science: https://towardsdatascience.com/podcast/home
- To be frank and maybe a bit harsh, I don’t think a lot of the user-submitted/crowd sourced content on Towards Data Science or Analytics Vidhya is of the highest quality. However, this podcast has been more interesting as it teaches techniques, advice, or history of data science in different organizations.