Table of Contents

[Class Report] Intro to System Development – Week 29: Verifying Improvements & Introduction to A/B Testing

In Week 29, we implemented improvements addressing the issues found in last week’s log analysis and learned the basics of retesting and A/B testing (comparative evaluation) to measure their impact. By experiencing the implement → verify → compare cycle, this week fostered the mindset that “improvement isn’t done until the numbers prove it.”

■ Instructor’s Kickoff: “Form a hypothesis, verify it, then refine again”

Mr. Tanaka: “Even if you think something got better, you won’t know without numbers. A/B testing is a handy method to objectively decide which version is truly better.”

■ Today’s Goals

Implement last week’s improvements (e.g., extending cache TTL, input length limits, fallback adjustments).
Compare key metrics like response time, success rate, and regenerate rate before vs. after improvements.
Design a simple A/B test and measure which of two variants is more effective.

■ Exercise ①: Implementing and Deploying Improvements (within the lab environment)

Each team selected two to three high-priority improvements from the prior analysis, implemented them, and rolled them out to the test environment. Representative examples:

Cache TTL extension: 300 seconds → 900 seconds (reduces duplicate requests in short windows)
Stronger input pre-processing: Replace overly long inputs with a summarization prompt
Improved fallback copy: Provide concrete next actions to users (e.g., “Check the official site”)

After implementation, we re-ran load tests and collected logs with the same load script.

■ Exercise ②: Measuring and Comparing Key KPIs

From the collected logs, we compared key metrics of the pre-improvement (control) vs. post-improvement (test) versions. Example metrics covered in class:

Average response time (ms)
Success rate (ok / total requests)
Regenerate request rate (ratio of users who clicked “regenerate”)
Fallback occurrence rate (ratio of fallback status)

Simple aggregation snippet (for class use):

# logs_control / logs_test are the respective log lists
def summarize(logs):
    total = len(logs)
    ok = sum(1 for l in logs if l["status"] == "ok")
    return {
        "total": total,
        "ok_rate": ok / total,
        "avg_latency": sum(l["latency_ms"] for l in logs) / total
    }

Student observations (examples):

Extending the cache TTL improved average response time by about 15%.
Teams that added input summarization saw fewer timeouts and improved success rates.
(Numbers are sample measurements from class and varied by team.)

■ Exercise ③: Designing and Running a Simple A/B Test

We learned the purpose and design steps of A/B testing and ran a simple in-class A/B test.

Basic A/B Testing Steps

Form a hypothesis: e.g., “Extending cache TTL reduces average response time and lowers regenerate rate.”
Create variants: A = current (TTL = 300), B = improved (TTL = 900)
Split traffic: Route half of requests to each (in the lab, we simulated with random assignment or time-based splits)
Collect enough samples: Short windows create high variance
Compare on metrics: Evaluate differences in key KPIs and discuss statistical significance (we only introduced significance testing in class)
Conclude and feed into the next improvement cycle

In class, we used a lightweight method: run A and B in separate one-hour windows and then compare metrics.

■ Discussion: How to Read Results and What to Watch Out For

Key points we confirmed as a class when interpreting results:

Small sample sizes can lead to misjudgments
Seasonality and network conditions can cause variance
Consider multiple metrics for a holistic view (not just response time; weigh success rate and UX metrics, too)
Consider side effects of improvements (e.g., longer TTL could reduce content freshness)

Student takeaway: “Variant A is faster, but Variant B has a lower regenerate rate… which to prioritize depends on our objective.”

■ Instructor’s Closing Comment

“A/B testing builds a data-driven culture. Form a small hypothesis, verify it, and connect the results to the next hypothesis—this is a good habit for engineers. The key is to face the results honestly.”

■ Student Reflections

“Actually comparing showed that outcomes often differ from our expectations.”
“A/B design seems simple but is hard—avoiding bias is the crux.”
“Shortest learning loop is improvement → verification in short cycles.”

■ Next Week’s Preview: Operationalizing Improvements & Documentation

Next week, we will operationalize these improvements for production-like use (create checklists) and document change history and impact reports. We’ll work on the “systems” that sustain continuous improvement.

By implementing improvements and verifying them with numbers, Week 29 let students experience the cycle of hypothesis → implementation → verification → re-hypothesis, taking a solid step toward practical improvement skills.

[Class Report] Intro to System Development – Week 29: Verifying Improvements & Introduction to A/B Testing

[Class Report] Intro to System Development – Week 29: Verifying Improvements & Introduction to A/B Testing

■ Instructor’s Kickoff: “Form a hypothesis, verify it, then refine again”

■ Today’s Goals

■ Exercise ①: Implementing and Deploying Improvements (within the lab environment)

■ Exercise ②: Measuring and Comparing Key KPIs

■ Exercise ③: Designing and Running a Simple A/B Test

Basic A/B Testing Steps

■ Discussion: How to Read Results and What to Watch Out For

■ Instructor’s Closing Comment

■ Student Reflections

■ Next Week’s Preview: Operationalizing Improvements & Documentation

By greeden

Leave a Reply Cancel reply

You Missed

What Is Moore’s Law? A Gentle, Detailed Guide to the Rule of Thumb Behind Semiconductor Progress—Its Limits and What Comes Next

[Field-Proven Complete Guide] Laravel Performance Optimization — Finding the Root Cause of Slowness, N+1 Fixes, Caching/HTTP Caching, Queueing, DB Indexes, Redis, Front-End Optimization, and an Accessible “Fast” Experience

What Is SillyTavern: A Thorough Guide From Setup and Usage to Character Design, Extensions, and Safe Operation

Top Global News on December 30, 2025: Year-End “Tension” and “Stagnation” Shake Logistics, Prices, and Investor Sentiment at Once

[Class Report] Intro to System Development – Week 29: Verifying Improvements & Introduction to A/B Testing

■ Instructor’s Kickoff: “Form a hypothesis, verify it, then refine again”

■ Today’s Goals

■ Exercise ①: Implementing and Deploying Improvements (within the lab environment)

■ Exercise ②: Measuring and Comparing Key KPIs

■ Exercise ③: Designing and Running a Simple A/B Test

Basic A/B Testing Steps

■ Discussion: How to Read Results and What to Watch Out For

■ Instructor’s Closing Comment

■ Student Reflections

■ Next Week’s Preview: Operationalizing Improvements & Documentation

Share this:

By greeden

Related Post

Leave a Reply Cancel reply

You Missed