43 practice questions for LinkedIn Data Scientist interviews
LinkedIn data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.
Category: String coding problemConfiguration files at LinkedIn are written in JSON, YAML, and HOCON formats. Malformed config files can bring down multiple services, so validators...Input: String Output: Printed output
codingMediumVerified Question#2
2. Words From Phone Number
Category: String coding problemA standard phone keypad maps digits to letters as follows: ` 2 -> a, b, c 3 -> d, e, f 4 -> g, h, i 5 -> j, k, l 6 -> m, n, o 7 -> p, q, r, s 8 ->...Input: List Output: Array
codingMediumVerified Question#3
3. Circular Signal Window
Category: Array coding problemYou are given a circular array signal of 0s and 1s representing antenna readings logged in sequence, where 1 means good signal and 0 means...Input: Array Output: Integer
codingEasyVerified Question#4
4. Active Sprint Filter
Category: Graph coding problemA project tracking system logs team activity throughout the workday. Each log entry has the format "teamId action timestamp", where action is...Input: Graph (nodes and edges) Output: Printed output
codingMediumVerified Question#5
5. Dependency Task Executor
Category: Graph coding problemA build system manages pipeline steps where each step may depend on other steps completing first. Implement the BuildPipeline class:...Input: Graph (nodes and edges) Output: Computed result
codingMediumVerified Question#6
6. Daily Branch Pruning
Category: Tree coding problemA file system manages a directory tree. Each day, all leaf directories (those with no child directories) are simultaneously removed. Directories that...Input: Array Output: Array
codingHardVerified Question#7
7. [OA] Minimum Weight Ceiling Path
Category: Graph coding problemA network topology connects n servers labeled 1 to n. Each connection is a bidirectional link with a bandwidth cost. A network engineer needs...Input: Graph (nodes and edges) Output: Integer
codingHardVerified Question#8
8. Priority Cache System
Category: String coding problemA CDN (Content Delivery Network) maintains a fixed-capacity cache of web content. Each content item has an associated priority score. When the cache...Input: String Output: Integer
codingMediumVerified Question#9
9. Distribution Center Placement
Category: Array coding problemA logistics company is expanding its distribution network along a single highway. You are given an array of integers locations representing the...Input: Array of integers Output: Computed result
codingMediumVerified Question#10
10. Manual String Substitution
Category: String coding problemA template engine needs to substitute all occurrences of a pattern in a template string with a replacement string, without using any built-in...Input: String Output: Printed output
codingHardVerified Question#11
11. Combine N-ary Trees
Category: Tree coding problemYou are given the roots of two N-ary organization charts, each representing a hierarchical department structure. Every node has an integer...Input: List Output: Computed result
codingMediumVerified Question#12
12. Closest Value Pair
Category: Array coding problemAn inventory system has two sorted product catalogs A and B. Each value in the catalog represents a product size. Find a pair [a, b] where a...Input: Array Output: Computed result
codingMediumVerified Question#13
13. Digit Replacement Maximizer
Category: String coding problemA numeric optimization system performs exactly k substitution operations on a number string s. In each operation, choose any digit in s that is...Input: String Output: Computed result
ml designSeniorml pipeline#1
1. [OA] Streaming Feature Aggregator — Design a System for Real-time Data on LinkedIn
LinkedIn's data pipelines require streaming feature aggregation for real-time analytics to improve user engagement and provide timely insights. Design a system that aggregates features from user activities as they occur.Class Signature: class FeatureAggregator: def __init__(self, time_window: int): pass def add(self, user_id: int, features: Dict[str, Any]) -> None: pass def get_aggregated_features(self, user_id: int) -> Dict[str, Any]: passExample 1: Input: aggregator = FeatureAggregator(60) aggregator.add(1, {'likes': 2, 'comments': 3}) aggregator.add(1, {'likes': 1}) aggregator.get_aggregated_features(1) Output: {'likes': 3, 'comments': 3}Example 2: Input: aggregator.add(1, {'comments': 1}) aggregator.get_aggregated_features(1) Output: {'likes': 3, 'comments': 4}Constraints: - 1 <= time_window <= 600 - User IDs and feature keys are non-negative integers, and feature values are also integers.
ml designSeniorcaching#2
2. [OA] LRU Cache — Build an Efficient Cache for LinkedIn API Responses
LinkedIn's API serves a high volume of requests for data, necessitating an efficient caching mechanism to reduce latency and improve user experience. Implement an LRU (Least Recently Used) cache to store and retrieve API responses. Function Signature: class LRUCache: def __init__(self, capacity: int): pass def get(self, key: int) -> int: pass def put(self, key: int, value: int) -> None: passExample 1: Input: lru = LRUCache(2) lru.put(1, 1) lru.put(2, 2) lru.get(1) Output: 1 Explanation: Returns 1, as it is the value for key 1. The cache is now: {1=1, 2=2}.Example 2: Input: lru.put(3, 3) lru.get(2) Output: -1 Explanation: Returns -1, as key 2 was evicted when key 3 was added. The cache is now: {1=1, 3=3}.Constraints: - 1 <= capacity <= 3000 - 0 <= key, value <= 10^4.
codingHardgraph#3
3. [OA] Dijkstra's Algorithm — Finding the Shortest Path in LinkedIn Connections
LinkedIn users are connected through various relationships, and understanding the shortest connection path between users can enhance user recommendations. Given a graph representation of users and their connections as an adjacency list, implement Dijkstra's algorithm to find the shortest path from one user to another.Function Signature: def find_shortest_path(connections: Dict[int, List[Tuple[int, int]]], start: int, end: int) -> List[int]Example 1: Input: {0: [(1, 1), (2, 4)], 1: [(2, 2)], 2: [(3, 1)], 3: []}, start=0, end=3 Output: [0, 1, 2, 3] Explanation: The shortest path from user 0 to user 3 is 0 -> 1 -> 2 -> 3, with a total weight of 4.Example 2: Input: {0: [(1, 5), (2, 1)], 1: [(2, 2), (3, 1)], 2: [(3, 4)], 3: []}, start=0, end=3 Output: [0, 1, 3] Explanation: Path is from 0 to 1 to 3, with a weight of 6.Constraints: - 1 <= len(connections) <= 1000 - Each connection weight is a positive integer.
codingHardsliding window#4
4. [OA] Sliding Window — Calculate the Most Engaged Users in a Sliding Time Frame
LinkedIn's platform requires identifying the most engaged users over various time frames for enhanced user experience and targeted marketing strategies. Given a list of user activity timestamps (in seconds) and a fixed time frame, write a function that returns the number of unique users who were active during that window.Function Signature: def count_active_users(timestamps: List[int], time_frame: int) -> intExample 1: Input: [1, 2, 3, 4, 5, 6, 7], time_frame=3 Output: 5 Explanation: The unique users active between timestamps 1 and 3 are 1, 2, 3.Example 2: Input: [1, 2, 2, 3, 5, 7, 9], time_frame=4 Output: 5 Explanation: The unique users between timestamps 2 and 5 are 1, 2, 3, 5.Constraints: - 1 <= len(timestamps) <= 100000 - 0 <= timestamps[i] <= 10^9 - time_frame is a positive integer.