Stripe logo

Stripe Data Scientist Interview Questions

47 practice questions for Stripe Data Scientist interviews

Stripe data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.

All Roles Software Engineer Backend Engineer Frontend Engineer Full Stack Engineer Mobile Engineer Data Engineer Data Scientist ML Engineer DevOps Engineer DevOps Engineer Product Manager SRE Security Engineer Engineering Manager Data Analyst UX/UI Designer QA Engineer
coding Hard Verified Question #1

1. Filter Roles


Category: Tree coding problem
You are building a role-based access control (RBAC) system for a multi-tenant platform. The system manages user roles across a hierarchical account...
Input: List
Output: Array
coding Hard Verified Question #2

2. Rate Limiter


Category: Sliding window coding problem
Design a rate limiter that tracks API requests per client and enforces limits using a sliding time window. Your system must support: - hit(key,...
Input: Given input
Output:** Computed result
coding Medium Verified Question #3

3. Shipping Cost Calculator


Category: Algorithm coding problem
You are building a shipping cost calculator for an international e-commerce platform. The cost depends on the destination country and the product...
Input: Integer(s)
Output: Computed result
coding Medium Verified Question #4

4. Transaction Fee Calculator


Category: Trie-based coding problem
You are building a fee calculation system for a payment processing platform. Given transaction data as a CSV string, calculate fees based on payment...
Input: String
Output: Computed result
coding Medium Verified Question #5

5. Bitmap to Image Conversion


Category: Grid/matrix coding problem
You are designing a bitmap character rendering system. Given a lookup table mapping characters to 2D binary arrays, implement functionality to print,...
Input: 2D grid
Output: Printed output
coding Medium Verified Question #6

6. [Onsite Integration] Bike Map


Category: Trie-based coding problem
You are building a map visualization tool that generates static maps from location data. Implement a system that reads GPS coordinates, constructs...
Input: Array
Output: Computed result
coding Hard Verified Question #7

7. Email Subscriptions


Category: String coding problem
Design a subscription management system that tracks user subscriptions and sends automated emails at specific lifecycle events. Email Types: -...
Input: List
Output: Computed result
coding Hard Verified Question #8

8. [Bug Squash] Mako Template Engine


Category: Tree coding problem
In this bug squash round, you will find and fix errors in a Python template library. You will receive a link to a GitHub folder containing a version...
Input: List
Output: Printed output
coding Hard Verified Question #9

9. [Bug Squash] Moshi JSON Library


Category: String coding problem
In this bug squash round, you will find and fix mistakes in a Java library called Moshi. You will receive a link to a GitHub folder containing a...
Input: String
Output: Computed result
coding Hard Verified Question #10

10. Data Center Load Scorer


Category: Graph coding problem
A data center operations team monitors server energy usage to optimize resource allocation. You receive a daily dataset of all incoming requests to...
Input: Graph (nodes and edges)
Output: Array
coding Medium Verified Question #11

11. Content Validation Pipeline


Category: String coding problem
A platform ingests user-generated content records in a simplified CSV format. Before indexing or displaying any content, each record must pass a...
Input: Array of strings
Output: Array
coding Hard Verified Question #12

12. Wallet Transaction Ledger


Category: String coding problem
A fintech platform processes streams of wallet transactions and needs to consolidate them into account summaries. Each transaction is logged as a...
Input: List
Output: Computed result
coding Hard Verified Question #13

13. Employee Record Matcher


Category: Array coding problem
A data-quality team needs to detect duplicate or near-duplicate employee records in a large HR dataset. Each record is a row in a 2D string array...
Input: Array
Output: Array
coding Hard Verified Question #14

14. Candidate Tech Stack Filter


Category: String coding problem
A hiring platform screens candidates by comparing their declared technology stack against a job's required skills. A candidate submits a...
Input: Array of strings
Output: Array
coding Hard Verified Question #15

15. Subscriber Notification Planner


Category: Trie-based coding problem
A subscription service sends automated notifications to subscribers based on their subscription window. You are given a list of subscriber records...
Input: List
Output: Array
coding Medium Verified Question #16

16. Support Ticket Dispatcher


Category: Graph coding problem
A customer support platform assigns incoming tickets to agents to keep workloads balanced. You are given a list of agent names and a list of tickets...
Input: Graph (nodes and edges)
Output: Array
coding Medium Verified Question #17

17. Order Payment Reconciler


Category: String coding problem
A billing system needs to match incoming payments to open orders. Each payment arrives as a comma-separated string with three fields: a payment ID, a...
Input: List
Output: Computed result
coding Medium Verified Question #18

18. Service Usage Cost Calculator


Category: Array coding problem
A cloud billing module computes the total cost for a customer's monthly usage. You are given a usage_report specifying the target region and...
Input: Array
Output: Computed result
ml design Senior api design #1

1. [OA] Model Versioning Registry — Implement a model versioning system for Stripe ML applications

As Stripe scales its machine learning capabilities, a versioning system to track different model iterations is crucial for accountability and reproducibility. Design a class that manages model versions.
Problem Statement: Create a ModelRegistry class with methods to add new models, retrieve, and list models by version. Ensure models can be tagged and versioned effectively.
- addModel(model: MLModel): Adds a new model with a version.
- getModel(version: str): Retrieves a model by its version.
- listModels() -> List[str]: Returns the list of all model versions.
Example 1:
Input: registry = ModelRegistry()
registry.addModel(MLModel(name='TransactionClassifier', version='1.0.0'))
model = registry.getModel('1.0.0')
Output: model.name == 'TransactionClassifier'
Constraints:
- Model names can be unique strings.
ml design Medium hash map #2

2. [OA] Hash Map — Implement a feature toggle for Stripe products

Stripe offers different features based on user subscriptions and needs dynamic control over which features are enabled. Implement a feature toggle using a hash map.
Problem Statement: Design a FeatureToggle class to manage active features that can be turned on or off dynamically. The class should allow adding features and checking if they're enabled.
- addFeature(name: str): Adds a feature to the toggle.
- enableFeature(name: str): Enables a feature.
- disableFeature(name: str): Disables a feature.
- isFeatureEnabled(name: str) -> bool: Returns true if the feature is active.
Example 1:
Input: toggle = FeatureToggle()
toggle.addFeature('paymentProcessing')
toggle.enableFeature('paymentProcessing')
Output: True if toggle.isFeatureEnabled('paymentProcessing')
Constraints:
- 1 <= name.length <= 100
coding Medium sliding window #3

3. [OA] Sliding Window — Compute the moving average for Stripe transactions

In a large-scale payment processing system, Stripe requires a method to compute real-time moving averages of transaction amounts for analytics and reporting. You need to implement an efficient algorithm.
Problem Statement: Given a stream of integers representing transaction_amounts, and an integer k, return the moving average of the last k transaction amounts as a float.
Example 1:
Input: transaction_amounts = [1, 10, 3, 5, 2], k = 3
Output: [4.66666, 5.0, 3.33333]
Explanation: Moving averages computed for the last 3 transactions.
Constraints:
- 0 < k <= transaction_amounts.length <= 10^5
- 0 <= transaction_amounts[i] <= 10^4
coding Hard ml pipeline #4

4. [OA] Time Series Forecasting — Build a fraud detection estimator for Stripe

Stripe needs to predict potential fraudulent activities using historical transaction data. You need to create a machine learning model that forecasts transaction anomalies based on past trends.
Problem Statement: Given a time series of transaction_amounts, your task is to build a forecasting model that outputs potential anomalies in the next n transactions. The model should be evaluated based on precision and recall metrics.
Example 1:
Input: transaction_amounts = [100, 110, 115, 90, 200, 85], n = 2
Output: [False, True]
Explanation: The model detects that there is a spike in the transaction after a series of normal transactions.
Constraints:
- 1 <= transaction_amounts.length <= 10^5
- 0 <= transaction_amounts[i] <= 10^4

Related Stripe Data Scientist interview prep

Start practicing Stripe questions

Sign up for free to access walkthroughs, AI-generated questions, and more.

Get Started Free