47 practice questions for Stripe Data Scientist interviews
Stripe data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.
Category: Tree coding problemYou are building a role-based access control (RBAC) system for a multi-tenant platform. The system manages user roles across a hierarchical account...Input: List Output: Array
codingHardVerified Question#2
2. Rate Limiter
Category: Sliding window coding problemDesign a rate limiter that tracks API requests per client and enforces limits using a sliding time window. Your system must support: - hit(key,...Input: Given input Output:** Computed result
codingMediumVerified Question#3
3. Shipping Cost Calculator
Category: Algorithm coding problemYou are building a shipping cost calculator for an international e-commerce platform. The cost depends on the destination country and the product...Input: Integer(s) Output: Computed result
codingMediumVerified Question#4
4. Transaction Fee Calculator
Category: Trie-based coding problemYou are building a fee calculation system for a payment processing platform. Given transaction data as a CSV string, calculate fees based on payment...Input: String Output: Computed result
codingMediumVerified Question#5
5. Bitmap to Image Conversion
Category: Grid/matrix coding problemYou are designing a bitmap character rendering system. Given a lookup table mapping characters to 2D binary arrays, implement functionality to print,...Input: 2D grid Output: Printed output
codingMediumVerified Question#6
6. [Onsite Integration] Bike Map
Category: Trie-based coding problemYou are building a map visualization tool that generates static maps from location data. Implement a system that reads GPS coordinates, constructs...Input: Array Output: Computed result
codingHardVerified Question#7
7. Email Subscriptions
Category: String coding problemDesign a subscription management system that tracks user subscriptions and sends automated emails at specific lifecycle events. Email Types: -...Input: List Output: Computed result
codingHardVerified Question#8
8. [Bug Squash] Mako Template Engine
Category: Tree coding problemIn this bug squash round, you will find and fix errors in a Python template library. You will receive a link to a GitHub folder containing a version...Input: List Output: Printed output
codingHardVerified Question#9
9. [Bug Squash] Moshi JSON Library
Category: String coding problemIn this bug squash round, you will find and fix mistakes in a Java library called Moshi. You will receive a link to a GitHub folder containing a...Input: String Output: Computed result
codingHardVerified Question#10
10. Data Center Load Scorer
Category: Graph coding problemA data center operations team monitors server energy usage to optimize resource allocation. You receive a daily dataset of all incoming requests to...Input: Graph (nodes and edges) Output: Array
codingMediumVerified Question#11
11. Content Validation Pipeline
Category: String coding problemA platform ingests user-generated content records in a simplified CSV format. Before indexing or displaying any content, each record must pass a...Input: Array of strings Output: Array
codingHardVerified Question#12
12. Wallet Transaction Ledger
Category: String coding problemA fintech platform processes streams of wallet transactions and needs to consolidate them into account summaries. Each transaction is logged as a...Input: List Output: Computed result
codingHardVerified Question#13
13. Employee Record Matcher
Category: Array coding problemA data-quality team needs to detect duplicate or near-duplicate employee records in a large HR dataset. Each record is a row in a 2D string array...Input: Array Output: Array
codingHardVerified Question#14
14. Candidate Tech Stack Filter
Category: String coding problemA hiring platform screens candidates by comparing their declared technology stack against a job's required skills. A candidate submits a...Input: Array of strings Output: Array
codingHardVerified Question#15
15. Subscriber Notification Planner
Category: Trie-based coding problemA subscription service sends automated notifications to subscribers based on their subscription window. You are given a list of subscriber records...Input: List Output: Array
codingMediumVerified Question#16
16. Support Ticket Dispatcher
Category: Graph coding problemA customer support platform assigns incoming tickets to agents to keep workloads balanced. You are given a list of agent names and a list of tickets...Input: Graph (nodes and edges) Output: Array
codingMediumVerified Question#17
17. Order Payment Reconciler
Category: String coding problemA billing system needs to match incoming payments to open orders. Each payment arrives as a comma-separated string with three fields: a payment ID, a...Input: List Output: Computed result
codingMediumVerified Question#18
18. Service Usage Cost Calculator
Category: Array coding problemA cloud billing module computes the total cost for a customer's monthly usage. You are given a usage_report specifying the target region and...Input: Array Output: Computed result
ml designSeniorapi design#1
1. [OA] Model Versioning Registry — Implement a model versioning system for Stripe ML applications
As Stripe scales its machine learning capabilities, a versioning system to track different model iterations is crucial for accountability and reproducibility. Design a class that manages model versions. Problem Statement: Create a ModelRegistry class with methods to add new models, retrieve, and list models by version. Ensure models can be tagged and versioned effectively. - addModel(model: MLModel): Adds a new model with a version. - getModel(version: str): Retrieves a model by its version. - listModels() -> List[str]: Returns the list of all model versions.Example 1: Input: registry = ModelRegistry() registry.addModel(MLModel(name='TransactionClassifier', version='1.0.0')) model = registry.getModel('1.0.0') Output: model.name == 'TransactionClassifier'Constraints: - Model names can be unique strings.
ml designMediumhash map#2
2. [OA] Hash Map — Implement a feature toggle for Stripe products
Stripe offers different features based on user subscriptions and needs dynamic control over which features are enabled. Implement a feature toggle using a hash map. Problem Statement: Design a FeatureToggle class to manage active features that can be turned on or off dynamically. The class should allow adding features and checking if they're enabled. - addFeature(name: str): Adds a feature to the toggle. - enableFeature(name: str): Enables a feature. - disableFeature(name: str): Disables a feature. - isFeatureEnabled(name: str) -> bool: Returns true if the feature is active.Example 1: Input: toggle = FeatureToggle() toggle.addFeature('paymentProcessing') toggle.enableFeature('paymentProcessing') Output: True if toggle.isFeatureEnabled('paymentProcessing')Constraints: - 1 <= name.length <= 100
codingMediumsliding window#3
3. [OA] Sliding Window — Compute the moving average for Stripe transactions
In a large-scale payment processing system, Stripe requires a method to compute real-time moving averages of transaction amounts for analytics and reporting. You need to implement an efficient algorithm. Problem Statement: Given a stream of integers representing transaction_amounts, and an integer k, return the moving average of the last k transaction amounts as a float.Example 1: Input: transaction_amounts = [1, 10, 3, 5, 2], k = 3 Output: [4.66666, 5.0, 3.33333] Explanation: Moving averages computed for the last 3 transactions.Constraints: - 0 < k <= transaction_amounts.length <= 10^5 - 0 <= transaction_amounts[i] <= 10^4
codingHardml pipeline#4
4. [OA] Time Series Forecasting — Build a fraud detection estimator for Stripe
Stripe needs to predict potential fraudulent activities using historical transaction data. You need to create a machine learning model that forecasts transaction anomalies based on past trends. Problem Statement: Given a time series of transaction_amounts, your task is to build a forecasting model that outputs potential anomalies in the next n transactions. The model should be evaluated based on precision and recall metrics.Example 1: Input: transaction_amounts = [100, 110, 115, 90, 200, 85], n = 2 Output: [False, True] Explanation: The model detects that there is a spike in the transaction after a series of normal transactions.Constraints: - 1 <= transaction_amounts.length <= 10^5 - 0 <= transaction_amounts[i] <= 10^4