Netflix logo

Netflix Data Scientist Coding Questions

36 practice questions for Netflix Data Scientist interviews

Netflix data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.

All Roles Software Engineer Backend Engineer Frontend Engineer Full Stack Engineer Mobile Engineer Data Engineer Data Scientist ML Engineer DevOps Engineer DevOps Engineer Product Manager SRE Security Engineer Engineering Manager Data Analyst UX/UI Designer QA Engineer
coding Easy Verified Question #1

1. Label Co-occurrence Finder


Category: String coding problem
You are given a list of label groups and a list of required labels. Each group is a list of strings. A group is considered valid if it contains every...
Input: Array of strings
Output: Array
coding Medium Verified Question #2

2. Region Grid Coloring


Category: Grid/matrix coding problem
You are given an M x N grid of security zones. Each cell contains one of the following values: - 1 -- the zone is cleared - 0 -- the zone...
Input: 2D grid
Output: Computed result
coding Medium Verified Question #3

3. Parallel Task Batching


Category: Graph coding problem
A pipeline must execute a set of tasks with dependency constraints. Each dependency [A, B] means task A must complete before task B can start....
Input: Graph (nodes and edges)
Output: Computed result
coding Medium Verified Question #4

4. Maximum Interval Overlap


Category: Interval-based coding problem
You are given a list of closed intervals on the number line, where each interval [start, end] includes both endpoints. Find the maximum number of...
Input: List
Output: Integer
coding Hard Verified Question #5

5. Interval Coverage Counter


Category: Interval-based coding problem
Given a list of closed intervals on the integer number line, build a data structure that efficiently answers point-coverage queries. A closed...
Input: List
Output: Computed result
coding Easy Verified Question #6

6. [CodeSignal] Movie Group Ranker


Category: Array coding problem
You are building a movie recommendation system. Given a source movie a user liked, you receive: - An array scores where scores[i] is the...
Input: Array
Output: Integer
coding Easy Verified Question #7

7. [CodeSignal] One-Hot Encoder


Category: Matrix coding problem
Given an integer array arr, return its one-hot encoded matrix as a 2D array. In a one-hot encoding: - Each row represents one element from arr. -...
Input: Matrix (2D array)
Output: Computed result
coding Medium Verified Question #8

8. Event Rate Limiter


Category: String coding problem
Design a rate-limited event logger for a streaming system. Events arrive in non-decreasing timestamp order. The system must suppress an event name if...
Input: String
Output: Printed output
coding Medium Verified Question #9

9. Viewing History Friends


Category: Algorithm coding problem
A streaming platform groups customers together based on shared viewing habits. You receive: - customerIds - a list of distinct customer IDs -...
Input: List
Output: Array
coding Hard Verified Question #10

10. Weight-Based Cache


Category: String coding problem
# Weight-Based Cache
Input: List
Output: Computed result
coding Medium database #1

1. [OA] Database Aggregation — Calculate average movie ratings across multiple genres

Netflix needs an efficient SQL query to derive average movie ratings from a large dataset spanning various genres for personalized recommendations. This should support querying by specific genres and timeframes.
Problem statement: Given a movies table with id, title, rating, and genre, write a SQL query to return the average rating for each genre in a specified from_date and to_date range.
Example 1:
Input: from_date = '2022-01-01', to_date = '2022-12-31'
Output: genre | average_rating
Action | 8.5
Comedy | 7.7
Example 2:
Input: from_date = '2021-05-01', to_date = '2021-10-01'
Output: genre | average_rating
Drama | 8.0
Documentary | 7.0
Constraints:
- Movie records are relevant for the last 5 years.
- Each movie belongs to at least one genre.
coding Hard sliding window #2

2. [OA] Sliding Window — Optimize our recommendation system for binge-watching

Netflix needs an efficient algorithm to optimize recommendations by analyzing user watch times across multiple shows. The goal is to find the longest sequence of shows that users have watched in a single binge-watching session with a view time >= k.
Problem statement: Given an integer array viewTimes, representing the view time of each show watched in order, and an integer k, return the length of the longest subarray where the sum of the view times is at least k. Utilize a sliding window approach to achieve optimal performance.
- Method Signature: def longest_binge_watch(viewTimes: List[int], k: int) -> int: — returns the length of the longest subarray.
Example 1:
Input: viewTimes = [1, 2, 3, 4, 5], k = 9
Output: 3
Explanation: The longest sequence with a total view time >= 9 is [3, 4, 5].
Example 2:
Input: viewTimes = [2, 1, 5, 2, 3, 2], k = 7
Output: 5
Explanation: The longest sequence with a total view time >= 7 is [1, 5, 2, 3].
Constraints:
- 1 <= viewTimes.length <= 10^5
- 1 <= viewTimes[i] <= 10^4
- 1 <= k <= 10^6

Related Netflix Data Scientist interview prep

Start practicing Netflix questions

Sign up for free to access walkthroughs, AI-generated questions, and more.

Get Started Free