[Class Report] Systems Development (Year 3), Week 48~ Evaluating Generative AI and Running an Improvement Cycle: “Growing” Smartness from Logs ~

greeden

19 hours ago

[Class Report] Systems Development (Year 3), Week 48

~ Evaluating Generative AI and Running an Improvement Cycle: “Growing” Smartness from Logs ~

In Week 48, we worked on the generative AI feature we implemented last time and did a hands-on exercise in evaluating outputs and running an improvement cycle.

The theme was:

“AI is never finished. It’s something you keep tuning.”

This was an important week in which we moved from “integrating” generative AI to the stage of “growing” it.

■ Teacher’s introduction: “AI will never be 100% accurate”

Mr. Tanaka: “Generative AI works probabilistically.
So rather than aiming for perfection, it’s important to build a system that keeps improving.”

The teacher wrote this cycle on the board:

Collect logs
 ↓
Evaluate
 ↓
Identify problems
 ↓
Improve prompts and validation
 ↓
Re-evaluate

■ Today’s goals

Record generative AI outputs in logs
Classify outputs into good / bad
Analyze the causes of problems
Improve prompts and validation logic

■ Exercise ①: Strengthening the logging of AI outputs

First, we revisited the log design.

Added log fields

Input text (only the necessary parts)
Prompt used
Full AI output
Output character count
Whether validation errors occurred
Whether a fallback was triggered

Student A: “Without logs, you can’t tell what went wrong.”

The teacher emphasized: “Logs are the only material you have to improve AI.”

■ Exercise ②: Evaluating outputs

Next, we ran the AI multiple times and classified the outputs.

Evaluation criteria

Is the length as instructed?
Has it added facts that weren’t given?
Does the writing make sense?
Does it include any banned words?
Could it mislead the user?

Each group organized:

Examples of good outputs
Examples of problematic outputs

Student B: “Even with the same input, it’s slightly different every time!”

■ Exercise ③: Root-cause analysis of problems

We analyzed cases where the output was poor.

Common causes

The prompt is vague
Constraints are weak
The output format isn’t specified
The validation logic is too loose

Example:

✗ “Please summarize briefly.”
✓ “In a single sentence of 100 characters or fewer, summarize using facts only.”

Student C: “It’s true—prompts really are like a ‘design document’…”

■ Exercise ④: Prompt improvements and re-testing

We actually improved prompts and tested again.

Improvement examples

Clarify the role specification
List prohibitions as bullet points
Change the output format to JSON

Example (conceptual):

Return your output in the following JSON format:
{
  "summary": "summary goes here"
}

After improvements:

Less variation in length
Less unnecessary explanation
Easier validation

Student D: “Specifying the format makes it more stable!”

■ Exercise ⑤: Strengthening validation logic

Instead of relying only on prompts, we also strengthened validation on the code side.

Examples added:

JSON parse checks
Regular-expression checks
Re-checking character count
Expanding the banned-word dictionary

Teacher: “Don’t leave it all to the AI. The final responsibility lies with the code.”

■ Class-wide learnings

AI is something you “improve”
Without logs, you can’t improve
Prompts and validation are two wheels of the same cart
Clear evaluation criteria make improvement easier

■ The teacher’s closing remark

“Generative AI development
doesn’t end when you ‘implement’ it.

You look at logs,
think about causes,
improve it,
and validate again.

That’s the same as traditional software development.
The difference is only that the output is probabilistic.

The improvement cycle you learned today
is an essential skill for engineers in the AI era.”

■ Homework (for next week)

Submit an AI logging improvement report
- Two problem examples
- Causes
- Fixes
Summarize a before/after output comparison
Name one thing you want to improve next

■ Next week preview: Safety, ethics, and responsibility design

Next week, we’ll cover
safety, ethics, information management, and responsibility
in the use of generative AI.

It will be a week about learning not only the technology,
but also the responsibility of using it.

Week 48 was an important class that moved us from “using” generative AI to “growing” it.
Students are clearly learning to treat AI not as a magic box, but as a system component that must be improved.