ByteDance logo

ByteDance Data Scientist Interview Questions

33 practice questions for ByteDance Data Scientist interviews

ByteDance data scientist interviews test statistical reasoning, ML model design, SQL proficiency, A/B testing methodology, and Python-based algorithm implementation.

All Roles Software Engineer Backend Engineer Frontend Engineer Full Stack Engineer Mobile Engineer Data Engineer Data Scientist ML Engineer DevOps Engineer DevOps Engineer Product Manager SRE Security Engineer Engineering Manager Data Analyst UX/UI Designer QA Engineer

No verified questions yet for ByteDance.

ml design Senior api design #1

1. [OA] Model Versioning Registry — Design a System for Douyin's Machine Learning Models

As ByteDance continuously improves the Douyin application using machine learning, a robust model versioning registry is essential to track, manage and deploy different ML models efficiently.
Design the class ModelRegistry:
- def __init__(self) - Initializes an empty registry.
- def add_model(self, model_id: str, version: str, metadata: Dict[str, Any]) -> None - Adds a new model with the specified ID, version, and metadata.
- def get_latest_model(self, model_id: str) -> Tuple[str, Dict[str, Any]] - Retrieves the latest version and its metadata for the specified model ID.
- def get_all_models(self) -> List[Tuple[str, str, Dict[str, Any]]] - Returns a list of all models with their IDs, versions, and metadata.
Example 1:
Input: add_model("recommendation", "v1.0", {"trained_on": "2023-01-01"})
Input: add_model("recommendation", "v1.1", {"trained_on": "2023-02-01"})
Output: get_latest_model("recommendation") should return ("v1.1", {"trained_on": "2023-02-01"})
Constraints:
- Model ID and version are non-empty strings.
- Metadata is a dictionary containing string keys and any value.
coding Medium sliding window #2

2. [OA] Time Series Analysis — Implementing a User Engagement Trends Model for TikTok

ByteDance needs to analyze user engagement trends over a specified time period to optimize content delivery. This involves interpreting user behavior patterns from raw engagement metrics collected in real-time.
Problem statement: Write a function that should compute the moving average of user engagement over a window of d days. The function's signature should be def moving_average(engagements: List[int], d: int) -> List[float]: where engagements are the number of engagement occurrences per day.
- Example 1:
Input: engagements = [100, 200, 300, 400, 500], d = 3
Output: [200.0, 300.0, 400.0]
Explanation: The moving average for each day starting from day 3 is calculated as follows: day 1-3, sum = 600 / 3 = 200; day 2-4, sum = 900 / 3 = 300; day 3-5, sum = 1200 / 3 = 400.
- Example 2:
Input: engagements = [10, 20, 30, 40], d = 2
Output: [15.0, 25.0, 35.0]
Explanation: The moving averages are computed over the provided two-day window.
Constraints:
- 1 <= engagements.length <= 10^5
- 1 <= engagements[i] <= 10^6
- 1 <= d <= engagements.length
coding Hard heap #3

3. [OA] Heap — Implement Global Trending Topics Tracker for Douyin

Douyin's success depends on identifying trending topics in real-time based on user interactions. ByteDance needs an efficient way to track and update these trends.
Problem statement: Write a function that keeps track of the top k trending topics from a stream of topic interactions. The function signature should be def track_trending_topics(interactions: List[Tuple[str, int]], k: int) -> List[str]: where each interaction is a tuple of topic name and interaction count.
- Example 1:
Input: interactions = [("dance", 10), ("music", 15), ("sports", 8), ("dance", 20)], k = 2
Output: ['dance', 'music']
Explanation: The most interacted topics are 'dance' with 30 and 'music' with 15.
- Example 2:
Input: interactions = [("comedy", 5), ("news", 5)], k = 1
Output: ['comedy']
Explanation: Both have the same interactions, return any one of them.
Constraints:
- 0 <= interactions.length <= 10^5
- 1 <= k <= interactions.length
- 1 <= interactions[i][1] <= 10^6
coding Medium sliding window #4

4. [OA] Sliding Window — Implement a View Count Analyzer for TikTok

In order to maintain engagement on TikTok, ByteDance needs to efficiently analyze user view counts over a specified time frame. By examining the view counts, we can identify trends and improve content recommendations.
Problem statement: Write a function that computes the maximum number of views within any sliding window of k minutes. The function signature should be def max_view_count(views: List[int], k: int) -> int: where views is a list of integers representing view counts per minute.
- Example 1:
Input: views = [100, 200, 300, 400, 500], k = 3
Output: 1200
Explanation: The maximum views in any window of 3 minutes is for the window [300, 400, 500] which sums to 1200.
- Example 2:
Input: views = [10, 20, 30, 40, 50], k = 1
Output: 50
Explanation: The maximum views in any window of 1 minute is simply the highest single view count.
Constraints:
- 1 <= views.length <= 10^5
- 1 <= views[i] <= 10^6
- 1 <= k <= views.length

Related ByteDance Data Scientist interview prep

Start practicing ByteDance questions

Sign up for free to access walkthroughs, AI-generated questions, and more.

Get Started Free