Site icon IT & Life Hacks Blog|Ideas for learning and practicing

[Class Report] Systems Development (Year 3), Week 48~ Evaluating Generative AI and Running an Improvement Cycle: “Growing” Smartness from Logs ~

boy in green shirt

Photo by CDC on Pexels.com

[Class Report] Systems Development (Year 3), Week 48

~ Evaluating Generative AI and Running an Improvement Cycle: “Growing” Smartness from Logs ~

In Week 48, we worked on the generative AI feature we implemented last time and did a hands-on exercise in evaluating outputs and running an improvement cycle.

The theme was:

“AI is never finished. It’s something you keep tuning.”

This was an important week in which we moved from “integrating” generative AI to the stage of “growing” it.


■ Teacher’s introduction: “AI will never be 100% accurate”

Mr. Tanaka: “Generative AI works probabilistically.
So rather than aiming for perfection, it’s important to build a system that keeps improving.”

The teacher wrote this cycle on the board:

Collect logs
 ↓
Evaluate
 ↓
Identify problems
 ↓
Improve prompts and validation
 ↓
Re-evaluate

■ Today’s goals

  1. Record generative AI outputs in logs
  2. Classify outputs into good / bad
  3. Analyze the causes of problems
  4. Improve prompts and validation logic

■ Exercise ①: Strengthening the logging of AI outputs

First, we revisited the log design.

Added log fields

  • Input text (only the necessary parts)
  • Prompt used
  • Full AI output
  • Output character count
  • Whether validation errors occurred
  • Whether a fallback was triggered

Student A: “Without logs, you can’t tell what went wrong.”

The teacher emphasized: “Logs are the only material you have to improve AI.”


■ Exercise ②: Evaluating outputs

Next, we ran the AI multiple times and classified the outputs.

Evaluation criteria

  • Is the length as instructed?
  • Has it added facts that weren’t given?
  • Does the writing make sense?
  • Does it include any banned words?
  • Could it mislead the user?

Each group organized:

  • Examples of good outputs
  • Examples of problematic outputs

Student B: “Even with the same input, it’s slightly different every time!”


■ Exercise ③: Root-cause analysis of problems

We analyzed cases where the output was poor.

Common causes

  • The prompt is vague
  • Constraints are weak
  • The output format isn’t specified
  • The validation logic is too loose

Example:

✗ “Please summarize briefly.”
✓ “In a single sentence of 100 characters or fewer, summarize using facts only.”

Student C: “It’s true—prompts really are like a ‘design document’…”


■ Exercise ④: Prompt improvements and re-testing

We actually improved prompts and tested again.

Improvement examples

  • Clarify the role specification
  • List prohibitions as bullet points
  • Change the output format to JSON

Example (conceptual):

Return your output in the following JSON format:
{
  "summary": "summary goes here"
}

After improvements:

  • Less variation in length
  • Less unnecessary explanation
  • Easier validation

Student D: “Specifying the format makes it more stable!”


■ Exercise ⑤: Strengthening validation logic

Instead of relying only on prompts, we also strengthened validation on the code side.

Examples added:

  • JSON parse checks
  • Regular-expression checks
  • Re-checking character count
  • Expanding the banned-word dictionary

Teacher: “Don’t leave it all to the AI. The final responsibility lies with the code.”


■ Class-wide learnings

  • AI is something you “improve”
  • Without logs, you can’t improve
  • Prompts and validation are two wheels of the same cart
  • Clear evaluation criteria make improvement easier

■ The teacher’s closing remark

“Generative AI development
doesn’t end when you ‘implement’ it.

You look at logs,
think about causes,
improve it,
and validate again.

That’s the same as traditional software development.
The difference is only that the output is probabilistic.

The improvement cycle you learned today
is an essential skill for engineers in the AI era.”


■ Homework (for next week)

  1. Submit an AI logging improvement report
    • Two problem examples
    • Causes
    • Fixes
  2. Summarize a before/after output comparison
  3. Name one thing you want to improve next

■ Next week preview: Safety, ethics, and responsibility design

Next week, we’ll cover
safety, ethics, information management, and responsibility
in the use of generative AI.

It will be a week about learning not only the technology,
but also the responsibility of using it.


Week 48 was an important class that moved us from “using” generative AI to “growing” it.
Students are clearly learning to treat AI not as a magic box, but as a system component that must be improved.

Exit mobile version