36 practice questions for Netflix Data Scientist interviews
Netflix data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.
Category: String coding problemYou are given a list of label groups and a list of required labels. Each group is a list of strings. A group is considered valid if it contains every...Input: Array of strings Output: Array
codingMediumVerified Question#2
2. Region Grid Coloring
Category: Grid/matrix coding problemYou are given an M x N grid of security zones. Each cell contains one of the following values: - 1 -- the zone is cleared - 0 -- the zone...Input: 2D grid Output: Computed result
codingMediumVerified Question#3
3. Parallel Task Batching
Category: Graph coding problemA pipeline must execute a set of tasks with dependency constraints. Each dependency [A, B] means task A must complete before task B can start....Input: Graph (nodes and edges) Output: Computed result
codingMediumVerified Question#4
4. Maximum Interval Overlap
Category: Interval-based coding problemYou are given a list of closed intervals on the number line, where each interval [start, end] includes both endpoints. Find the maximum number of...Input: List Output: Integer
codingHardVerified Question#5
5. Interval Coverage Counter
Category: Interval-based coding problemGiven a list of closed intervals on the integer number line, build a data structure that efficiently answers point-coverage queries. A closed...Input: List Output: Computed result
codingEasyVerified Question#6
6. [CodeSignal] Movie Group Ranker
Category: Array coding problemYou are building a movie recommendation system. Given a source movie a user liked, you receive: - An array scores where scores[i] is the...Input: Array Output: Integer
codingEasyVerified Question#7
7. [CodeSignal] One-Hot Encoder
Category: Matrix coding problemGiven an integer array arr, return its one-hot encoded matrix as a 2D array. In a one-hot encoding: - Each row represents one element from arr. -...Input: Matrix (2D array) Output: Computed result
codingMediumVerified Question#8
8. Event Rate Limiter
Category: String coding problemDesign a rate-limited event logger for a streaming system. Events arrive in non-decreasing timestamp order. The system must suppress an event name if...Input: String Output: Printed output
codingMediumVerified Question#9
9. Viewing History Friends
Category: Algorithm coding problemA streaming platform groups customers together based on shared viewing habits. You receive: - customerIds - a list of distinct customer IDs -...Input: List Output: Array
codingHardVerified Question#10
10. Weight-Based Cache
Category: String coding problem# Weight-Based CacheInput: List Output: Computed result
ml designSeniorml pipeline#1
1. Design a Streaming Feature Aggregator for Real-time Analytics
Netflix requires a robust component to aggregate viewer metrics for real-time analysis, allowing teams to make data-driven decisions during live events. Create a class that accumulates metrics and provides statistical insights on viewer engagement.- Method Signature: class StreamingFeatureAggregator: - def add_view(view_time: float): — Add a new viewer's view time. - def get_average(self) -> float: — Get the current average view time. - def get_median(self) -> float: — Get the current median view time. - def get_total(self) -> float: — Get the total accumulated view time.Example 1: Initialize an aggregator and add several view times, then retrieve statistics. Example 2: Handle concurrent view updates efficiently to ensure accuracy.Constraints: - The number of view times can reach up to 1 million, with each view time being a positive float.
ml designSeniortree#2
2. [OA] Tree Traversal — Analyze streaming viewership data
As Netflix plans out its content strategy, we need to analyze viewership data represented in a tree-structured format. Each node represents a show, with children representing episodes. We want to determine the average viewership for a given show and all its episodes combined.Problem statement: Given a tree node with the properties show_id, viewership, and children (representing episodes), implement a class method that calculates the average viewership for the show and its episodes combined.- Method Signature: def average_viewership(root: TreeNode) -> float: — returns the average viewership.Example 1: Input: root = TreeNode(1, 100, [TreeNode(2, 50), TreeNode(3, 150)]) Output: 100.0 Explanation: Average is (100 + 50 + 150) / 3 = 100.0Example 2: Input: root = TreeNode(1, 200, [TreeNode(4, 100)]) Output: 150.0 Explanation: Average is (200 + 100) / 2 = 150.0Constraints: - Each node represents 1 ≤ show_id ≤ 10^6. - The number of nodes in the tree can be between 1 and 1000.
codingMediumdatabase#3
3. [OA] Database Aggregation — Calculate average movie ratings across multiple genres
Netflix needs an efficient SQL query to derive average movie ratings from a large dataset spanning various genres for personalized recommendations. This should support querying by specific genres and timeframes.Problem statement: Given a movies table with id, title, rating, and genre, write a SQL query to return the average rating for each genre in a specified from_date and to_date range.Example 1: Input: from_date = '2022-01-01', to_date = '2022-12-31' Output: genre | average_rating Action | 8.5 Comedy | 7.7Example 2: Input: from_date = '2021-05-01', to_date = '2021-10-01' Output: genre | average_rating Drama | 8.0 Documentary | 7.0Constraints: - Movie records are relevant for the last 5 years. - Each movie belongs to at least one genre.
codingHardsliding window#4
4. [OA] Sliding Window — Optimize our recommendation system for binge-watching
Netflix needs an efficient algorithm to optimize recommendations by analyzing user watch times across multiple shows. The goal is to find the longest sequence of shows that users have watched in a single binge-watching session with a view time >= k.Problem statement: Given an integer array viewTimes, representing the view time of each show watched in order, and an integer k, return the length of the longest subarray where the sum of the view times is at least k. Utilize a sliding window approach to achieve optimal performance.- Method Signature: def longest_binge_watch(viewTimes: List[int], k: int) -> int: — returns the length of the longest subarray.Example 1: Input: viewTimes = [1, 2, 3, 4, 5], k = 9 Output: 3 Explanation: The longest sequence with a total view time >= 9 is [3, 4, 5].Example 2: Input: viewTimes = [2, 1, 5, 2, 3, 2], k = 7 Output: 5 Explanation: The longest sequence with a total view time >= 7 is [1, 5, 2, 3].Constraints: - 1 <= viewTimes.length <= 10^5 - 1 <= viewTimes[i] <= 10^4 - 1 <= k <= 10^6