40 practice questions for Amazon Data Scientist interviews
Amazon data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.
Category: Binary tree coding problemYou are given the root of a binary tree. You need to install the minimum number of cameras on the tree nodes such that every node in the tree is...Input: Binary tree Output: Integer
codingHardVerified Question#2
2. [CodeSignal] Warehouse Emergency Deliveries
Category: Array coding problemAmazon has opened a new warehouse recently. There are no products in the warehouse currently. The warehouse is under inspection for n days. The...Input: Array Output: Integer
codingHardVerified Question#3
3. [CodeSignal] Permutation Sorter
Category: Combinatorics coding problemAmazon engineers are testing a new tool, the Permutation Sorter, built to reorder sequences using limited operations. Given a permutation of...Input: Integer(s) Output: Integer
codingHardVerified Question#4
4. [CodeSignal] Maximum Product Rating
Category: Array coding problemThe engineers at Amazon are working on a new rating system for their products. For each product, an array customer_rating is maintained for the...Input: Array Output: Computed result
codingMediumVerified Question#5
5. [CodeSignal] Drone Hub Travel
Category: Array coding problemAmazon is expanding its next-generation drone delivery network, consisting of m hubs arranged in a circular ring (Hub 1 is adjacent to Hub m)....Input: Array Output: Computed result
codingMediumVerified Question#6
6. [CodeSignal] Minimum Security Groups
Category: Array coding problemA financial services company has requested AWS for a private deployment of its cloud network. There are n servers in the network where the security...Input: Array Output: Integer
codingMediumVerified Question#7
7. [CodeSignal] Maximum Secure Deliveries
Category: Array coding problemYou are given an array deliveryLogs of size n, where each element represents the number of parts delivered in the i-th log. You are also given...Input: Array Output: Integer
codingMediumVerified Question#8
8. Maximum Interval Overlap
Category: Interval-based coding problemYou are given a list of closed intervals on the number line, where each interval [start, end] includes both endpoints. Find the maximum number of...Input: List Output: Integer
ml designMediumapi design#1
1. Design a Streaming Feature Aggregator for Amazon Music
Amazon Music needs a streaming feature aggregator to process user interactions and aggregate metrics in real-time. You are required to design a class in OOP style. Problem statement: Implement a class FeatureAggregator which will track user actions, aggregate counts, and return features like total plays and skips over time. The class should contain: - recordPlay(user_id: str) -> None: Record a play action for a user. - recordSkip(user_id: str) -> None: Record a skip action for a user. - getMetrics() -> Dict[str, int]: Return total plays and skips aggregated. Example 1: Input: recordPlay('user1') Output: None Explanation: User 'user1' recorded a play. Example 2: Input: getMetrics() Output: {'plays': 1, 'skips': 0} Explanation: Metrics show 1 play and 0 skips. Constraints: - 1 <= user_id.length <= 100
codingMediumdatabase#2
2. [OA] SQL Window Function — Analyze customer trends for Amazon Fresh
To improve customer targeting and marketing strategies, Amazon Fresh wants to analyze shopping trends over a sliding time window. Problem statement: Write a SQL query to find the average purchase amount of customers over the last 7 days for the given date range. Assume we have a table named purchases with the columns customer_id, purchase_date, and amount. Example 1: Input: SELECT ... FROM purchases ... WHERE ... (actual SQL query) Output: SELECT customer_id, AVG(amount) AS avg_purchase FROM (SELECT customer_id, amount, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date DESC) AS rn FROM purchases WHERE purchase_date >= DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)) sub WHERE rn <= 7 GROUP BY customer_id; Explanation: This query calculates the average amount purchased by each customer for the past 7 days. Constraints: - Assume no duplicate purchases occur for a specific customer_id on purchase_date. - purchase_date is of type DATE and has no NULL values.
codingHarddynamic programming#3
3. [OA] Dynamic Programming — Build an Amazon Prime recommendation system
To enhance the user experience for Amazon Prime members, we need a dynamic programming solution that optimizes the personalized recommendations based on user purchase history. Problem statement: Given a list of item_ids a user has purchased and a recommendation_count, determine the maximum number of unique item recommendations based on previous purchases. You should implement the function maxRecommendations(item_ids: List[str], recommendation_count: int) -> int that returns the maximum number of unique items that can be recommended. Example 1: Input: maxRecommendations(['A', 'B', 'C', 'A', 'E'], 3) Output: 3 Explanation: The user can be recommended up to 3 unique items from their purchase history. Example 2: Input: maxRecommendations(['A', 'B', 'B', 'C'], 2) Output: 2 Explanation: The user can be recommended up to 2 unique items from their purchase history. Constraints: - 1 <= item_ids.length <= 10^5 - 0 <= recommendation_count <= item_ids.length
codingMediumhash map#4
4. [OA] Sliding Window — Implement the session tracking for Amazon's Service Usage
Amazon often needs to monitor user sessions in a seamless manner to improve personalization and service recommendations. By keeping track of user activities in a sliding window, we can effectively manage and analyze session data. Problem statement: You are tasked with implementing a SessionTracker class that maintains session usage data for users based on their active sessions within a specified time frame. The methods you need to implement are: - start_session(user_id: str, timestamp: int) -> None: Start a session for a given user at the specified timestamp. - end_session(user_id: str, timestamp: int) -> None: End the session for the user at the specified timestamp. - get_active_users(current_time: int) -> List[str]: Return a list of users who have active sessions within the last 30 minutes of the current time. Example 1: Input: start_session('user1', 100) Output: None Explanation: session started for 'user1' at timestamp 100. Example 2: Input: start_session('user2', 200) Output: None Explanation: session started for 'user2' at timestamp 200. Constraints: - 1 <= user_id.length <= 100 - 0 <= timestamp <= 10^9