36 practice questions for Netflix Data Scientist interviews
Netflix data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.
Category: String coding problemYou are given a list of label groups and a list of required labels. Each group is a list of strings. A group is considered valid if it contains every...Input: Array of strings Output: Array
codingMediumVerified Question#2
2. Region Grid Coloring
Category: Grid/matrix coding problemYou are given an M x N grid of security zones. Each cell contains one of the following values: - 1 -- the zone is cleared - 0 -- the zone...Input: 2D grid Output: Computed result
codingMediumVerified Question#3
3. Parallel Task Batching
Category: Graph coding problemA pipeline must execute a set of tasks with dependency constraints. Each dependency [A, B] means task A must complete before task B can start....Input: Graph (nodes and edges) Output: Computed result
codingMediumVerified Question#4
4. Maximum Interval Overlap
Category: Interval-based coding problemYou are given a list of closed intervals on the number line, where each interval [start, end] includes both endpoints. Find the maximum number of...Input: List Output: Integer
codingHardVerified Question#5
5. Interval Coverage Counter
Category: Interval-based coding problemGiven a list of closed intervals on the integer number line, build a data structure that efficiently answers point-coverage queries. A closed...Input: List Output: Computed result
codingEasyVerified Question#6
6. [CodeSignal] Movie Group Ranker
Category: Array coding problemYou are building a movie recommendation system. Given a source movie a user liked, you receive: - An array scores where scores[i] is the...Input: Array Output: Integer
codingEasyVerified Question#7
7. [CodeSignal] One-Hot Encoder
Category: Matrix coding problemGiven an integer array arr, return its one-hot encoded matrix as a 2D array. In a one-hot encoding: - Each row represents one element from arr. -...Input: Matrix (2D array) Output: Computed result
codingMediumVerified Question#8
8. Event Rate Limiter
Category: String coding problemDesign a rate-limited event logger for a streaming system. Events arrive in non-decreasing timestamp order. The system must suppress an event name if...Input: String Output: Printed output
codingMediumVerified Question#9
9. Viewing History Friends
Category: Algorithm coding problemA streaming platform groups customers together based on shared viewing habits. You receive: - customerIds - a list of distinct customer IDs -...Input: List Output: Array
codingHardVerified Question#10
10. Weight-Based Cache
Category: String coding problem# Weight-Based CacheInput: List Output: Computed result
codingMediumdatabase#1
1. [OA] Database Aggregation — Calculate average movie ratings across multiple genres
Netflix needs an efficient SQL query to derive average movie ratings from a large dataset spanning various genres for personalized recommendations. This should support querying by specific genres and timeframes.Problem statement: Given a movies table with id, title, rating, and genre, write a SQL query to return the average rating for each genre in a specified from_date and to_date range.Example 1: Input: from_date = '2022-01-01', to_date = '2022-12-31' Output: genre | average_rating Action | 8.5 Comedy | 7.7Example 2: Input: from_date = '2021-05-01', to_date = '2021-10-01' Output: genre | average_rating Drama | 8.0 Documentary | 7.0Constraints: - Movie records are relevant for the last 5 years. - Each movie belongs to at least one genre.
codingHardsliding window#2
2. [OA] Sliding Window — Optimize our recommendation system for binge-watching
Netflix needs an efficient algorithm to optimize recommendations by analyzing user watch times across multiple shows. The goal is to find the longest sequence of shows that users have watched in a single binge-watching session with a view time >= k.Problem statement: Given an integer array viewTimes, representing the view time of each show watched in order, and an integer k, return the length of the longest subarray where the sum of the view times is at least k. Utilize a sliding window approach to achieve optimal performance.- Method Signature: def longest_binge_watch(viewTimes: List[int], k: int) -> int: — returns the length of the longest subarray.Example 1: Input: viewTimes = [1, 2, 3, 4, 5], k = 9 Output: 3 Explanation: The longest sequence with a total view time >= 9 is [3, 4, 5].Example 2: Input: viewTimes = [2, 1, 5, 2, 3, 2], k = 7 Output: 5 Explanation: The longest sequence with a total view time >= 7 is [1, 5, 2, 3].Constraints: - 1 <= viewTimes.length <= 10^5 - 1 <= viewTimes[i] <= 10^4 - 1 <= k <= 10^6