Netflix logo

Netflix Data Scientist Interview Questions

36 practice questions for Netflix Data Scientist interviews

Netflix data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.

All Roles Software Engineer Backend Engineer Frontend Engineer Full Stack Engineer Mobile Engineer Data Engineer Data Scientist ML Engineer DevOps Engineer DevOps Engineer Product Manager SRE Security Engineer Engineering Manager Data Analyst UX/UI Designer QA Engineer
coding Easy Verified Question #1

1. Label Co-occurrence Finder


Category: String coding problem
You are given a list of label groups and a list of required labels. Each group is a list of strings. A group is considered valid if it contains every...
Input: Array of strings
Output: Array
coding Medium Verified Question #2

2. Region Grid Coloring


Category: Grid/matrix coding problem
You are given an M x N grid of security zones. Each cell contains one of the following values: - 1 -- the zone is cleared - 0 -- the zone...
Input: 2D grid
Output: Computed result
coding Medium Verified Question #3

3. Parallel Task Batching


Category: Graph coding problem
A pipeline must execute a set of tasks with dependency constraints. Each dependency [A, B] means task A must complete before task B can start....
Input: Graph (nodes and edges)
Output: Computed result
coding Medium Verified Question #4

4. Maximum Interval Overlap


Category: Interval-based coding problem
You are given a list of closed intervals on the number line, where each interval [start, end] includes both endpoints. Find the maximum number of...
Input: List
Output: Integer
coding Hard Verified Question #5

5. Interval Coverage Counter


Category: Interval-based coding problem
Given a list of closed intervals on the integer number line, build a data structure that efficiently answers point-coverage queries. A closed...
Input: List
Output: Computed result
coding Easy Verified Question #6

6. [CodeSignal] Movie Group Ranker


Category: Array coding problem
You are building a movie recommendation system. Given a source movie a user liked, you receive: - An array scores where scores[i] is the...
Input: Array
Output: Integer
coding Easy Verified Question #7

7. [CodeSignal] One-Hot Encoder


Category: Matrix coding problem
Given an integer array arr, return its one-hot encoded matrix as a 2D array. In a one-hot encoding: - Each row represents one element from arr. -...
Input: Matrix (2D array)
Output: Computed result
coding Medium Verified Question #8

8. Event Rate Limiter


Category: String coding problem
Design a rate-limited event logger for a streaming system. Events arrive in non-decreasing timestamp order. The system must suppress an event name if...
Input: String
Output: Printed output
coding Medium Verified Question #9

9. Viewing History Friends


Category: Algorithm coding problem
A streaming platform groups customers together based on shared viewing habits. You receive: - customerIds - a list of distinct customer IDs -...
Input: List
Output: Array
coding Hard Verified Question #10

10. Weight-Based Cache


Category: String coding problem
# Weight-Based Cache
Input: List
Output: Computed result
ml design Senior ml pipeline #1

1. Design a Streaming Feature Aggregator for Real-time Analytics

Netflix requires a robust component to aggregate viewer metrics for real-time analysis, allowing teams to make data-driven decisions during live events. Create a class that accumulates metrics and provides statistical insights on viewer engagement.
- Method Signature: class StreamingFeatureAggregator:
- def add_view(view_time: float): — Add a new viewer's view time.
- def get_average(self) -> float: — Get the current average view time.
- def get_median(self) -> float: — Get the current median view time.
- def get_total(self) -> float: — Get the total accumulated view time.
Example 1:
Initialize an aggregator and add several view times, then retrieve statistics.
Example 2:
Handle concurrent view updates efficiently to ensure accuracy.
Constraints:
- The number of view times can reach up to 1 million, with each view time being a positive float.
ml design Senior tree #2

2. [OA] Tree Traversal — Analyze streaming viewership data

As Netflix plans out its content strategy, we need to analyze viewership data represented in a tree-structured format. Each node represents a show, with children representing episodes. We want to determine the average viewership for a given show and all its episodes combined.
Problem statement: Given a tree node with the properties show_id, viewership, and children (representing episodes), implement a class method that calculates the average viewership for the show and its episodes combined.
- Method Signature: def average_viewership(root: TreeNode) -> float: — returns the average viewership.
Example 1:
Input: root = TreeNode(1, 100, [TreeNode(2, 50), TreeNode(3, 150)])
Output: 100.0
Explanation: Average is (100 + 50 + 150) / 3 = 100.0
Example 2:
Input: root = TreeNode(1, 200, [TreeNode(4, 100)])
Output: 150.0
Explanation: Average is (200 + 100) / 2 = 150.0
Constraints:
- Each node represents 1 ≤ show_id ≤ 10^6.
- The number of nodes in the tree can be between 1 and 1000.
coding Medium database #3

3. [OA] Database Aggregation — Calculate average movie ratings across multiple genres

Netflix needs an efficient SQL query to derive average movie ratings from a large dataset spanning various genres for personalized recommendations. This should support querying by specific genres and timeframes.
Problem statement: Given a movies table with id, title, rating, and genre, write a SQL query to return the average rating for each genre in a specified from_date and to_date range.
Example 1:
Input: from_date = '2022-01-01', to_date = '2022-12-31'
Output: genre | average_rating
Action | 8.5
Comedy | 7.7
Example 2:
Input: from_date = '2021-05-01', to_date = '2021-10-01'
Output: genre | average_rating
Drama | 8.0
Documentary | 7.0
Constraints:
- Movie records are relevant for the last 5 years.
- Each movie belongs to at least one genre.
coding Hard sliding window #4

4. [OA] Sliding Window — Optimize our recommendation system for binge-watching

Netflix needs an efficient algorithm to optimize recommendations by analyzing user watch times across multiple shows. The goal is to find the longest sequence of shows that users have watched in a single binge-watching session with a view time >= k.
Problem statement: Given an integer array viewTimes, representing the view time of each show watched in order, and an integer k, return the length of the longest subarray where the sum of the view times is at least k. Utilize a sliding window approach to achieve optimal performance.
- Method Signature: def longest_binge_watch(viewTimes: List[int], k: int) -> int: — returns the length of the longest subarray.
Example 1:
Input: viewTimes = [1, 2, 3, 4, 5], k = 9
Output: 3
Explanation: The longest sequence with a total view time >= 9 is [3, 4, 5].
Example 2:
Input: viewTimes = [2, 1, 5, 2, 3, 2], k = 7
Output: 5
Explanation: The longest sequence with a total view time >= 7 is [1, 5, 2, 3].
Constraints:
- 1 <= viewTimes.length <= 10^5
- 1 <= viewTimes[i] <= 10^4
- 1 <= k <= 10^6

Related Netflix Data Scientist interview prep

Start practicing Netflix questions

Sign up for free to access walkthroughs, AI-generated questions, and more.

Get Started Free