STDF (Standard Test Data Format) is the industry standard for storing wafer sort and final test results, defined by SEMI Standard E10 and its successors. It's a binary record format — efficient for storage and EWS equipment output, but not something you can POST to a REST API directly. This guide covers the practical steps to get from a raw STDF file to a well-formed request to the Wafertune classification endpoint.
We'll go through: STDF record structure relevant to wafer maps, the field mapping to Wafertune's JSON schema, common pitfalls, and how to handle multi-bin maps without losing information.
What's in an STDF File (the Parts That Matter)
An STDF file is a sequence of typed records. For wafer map extraction, the relevant records are:
- MIR (Master Information Record) — job name, lot ID, part type, node name. This is where you get the wafer lot context.
- WIR (Wafer Information Record) — marks the start of a wafer's test data; contains the wafer ID.
- PIR / PRR (Part Information Record / Part Results Record) — one pair per die tested; contains X/Y coordinates, hardware bin, and software bin.
- WRR (Wafer Results Record) — wafer-level summary; contains total die count, pass count, and wafer-level bins.
- WCR (Wafer Configuration Record) — optional but important: contains die pitch, wafer diameter, center die coordinates, and positive X/Y axis directions. Without this, you have to infer the coordinate system.
For classification purposes, you need PIR/PRR pairs to build the die map and WCR to correctly normalize coordinates. The MIR lot context is useful metadata but not required for classification.
Parsing STDF: A Practical Starting Point
The most reliable open-source STDF parser for Python is pystdf, though it handles only the most common record types. For production use with files from older EWS equipment (particularly Teradyne J750 or Advantest T2000 variants), you may need to handle vendor-specific extensions. Here's a minimal extraction loop:
import pystdf.IO as stdf_io
from pystdf.Importer import STDF2DataFrame
def extract_die_map(stdf_path: str) -> dict:
"""
Returns dict with keys:
wafer_id: str
die_map: list of {x, y, hbin, sbin}
wafer_diameter_mm: float | None
die_pitch_x_um: float | None
die_pitch_y_um: float | None
"""
parser = STDF2DataFrame()
records = parser.run(stdf_path)
wafer_id = None
die_entries = []
wcr_data = {}
for rec_type, rec_data in records:
if rec_type == 'WIR':
wafer_id = rec_data.get('WAFER_ID', '')
elif rec_type == 'PRR':
die_entries.append({
'x': rec_data['X_COORD'],
'y': rec_data['Y_COORD'],
'hbin': rec_data['HARD_BIN'],
'sbin': rec_data['SOFT_BIN'],
})
elif rec_type == 'WCR':
wcr_data = {
'diameter_mm': rec_data.get('WAFER_SIZ'),
'pitch_x': rec_data.get('DIE_WID'),
'pitch_y': rec_data.get('DIE_HT'),
'center_x': rec_data.get('CENTER_X'),
'center_y': rec_data.get('CENTER_Y'),
}
return {
'wafer_id': wafer_id,
'die_map': die_entries,
'wafer_diameter_mm': wcr_data.get('diameter_mm'),
'die_pitch_x_um': wcr_data.get('pitch_x'),
'die_pitch_y_um': wcr_data.get('pitch_y'),
}
Building the Request JSON
The Wafertune POST /v1/classify endpoint accepts two map representations: a structured die_array (preferred) or a base64-encoded PNG wafer map image. The die array representation preserves bin information and is more accurate for multi-bin maps.
import requests, base64, json
def classify_wafer(stdf_path: str, api_key: str) -> dict:
extracted = extract_die_map(stdf_path)
payload = {
"wafer_id": extracted["wafer_id"],
"map_type": "die_array",
"map_data": {
"format": "xy_hbin",
"dies": extracted["die_map"], # list of {x, y, hbin, sbin}
"wafer_diameter_mm": extracted.get("wafer_diameter_mm", 200),
"die_pitch_x_um": extracted.get("die_pitch_x_um"),
"die_pitch_y_um": extracted.get("die_pitch_y_um"),
},
"options": {
"multi_label": True, # detect compound patterns
"include_bbox": True, # spatial bounding boxes per pattern
"include_process_hints": True,
}
}
resp = requests.post(
"https://api.wafertune.com/v1/classify",
headers={"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"},
json=payload,
timeout=15
)
resp.raise_for_status()
return resp.json()
The response will contain a pattern_classes array. Each element has class_id, confidence, bbox (x, y, width, height in die coordinates), and process_origin_hint.
Common Pitfalls
Coordinate system mismatch
STDF PRR records report X/Y die coordinates relative to a reference point that varies by equipment vendor. Some testers use center-referenced coordinates (0,0 at wafer center), others use corner-referenced. The WCR record's POS_X / POS_Y fields indicate the positive axis direction. If you skip WCR normalization, you'll send a geometrically mirrored die map and get incorrect spatial classifications — edge patterns appear in the wrong position, scratch orientations are flipped. Always normalize coordinates to center-referenced before sending.
Missing WCR records
Older STDF files from legacy Advantest equipment often omit WCR. In this case, infer the wafer diameter from the PRR coordinate range and assume standard die pitch for the node (50–200 µm for most 200mm analog nodes). Pass wafer_diameter_mm: 200 and omit pitch fields — the API will infer from the die map density.
Multi-bin maps and classification semantics
A common mistake when sending multi-bin maps: summing all non-bin-1 dies into a single "fail" category before sending. This collapses bin semantics that carry spatial information. A bin-3 (HV leakage) cluster has a different spatial signature from a bin-7 (parametric contact resistance) cluster. Send the full hbin per die; the API's xy_hbin format preserves this and the classifier accounts for it.
Die count limits
The synchronous /v1/classify endpoint supports up to 50,000 dies per wafer map. Standard 200mm analog wafers with typical die sizes have 1,000–8,000 dies — well within limit. MEMS wafers with very small die sizes can exceed 50,000 dies; for these, use the /v1/batch endpoint with chunked submission, or request a limit increase for high-die-count MEMS workflows. See the API Reference for the batch endpoint spec.
Handling the Response in Your Pipeline
For a yield management system integration, the typical downstream routing is:
def route_classification(response: dict, alert_threshold: float = 0.75):
"""Route a Wafertune API response to yield management actions."""
patterns = response.get("pattern_classes", [])
review_flag = response.get("review_recommended", False)
high_conf_patterns = [
p for p in patterns
if p["confidence"] >= alert_threshold
]
low_conf_patterns = [
p for p in patterns
if p["confidence"] < alert_threshold
]
for pattern in high_conf_patterns:
# Route to automated alert queue
send_yield_alert(
wafer_id=response["wafer_id"],
class_id=pattern["class_id"],
confidence=pattern["confidence"],
process_hint=pattern.get("process_origin_hint", []),
)
if review_flag or low_conf_patterns:
# Route to human review queue
queue_for_review(
wafer_id=response["wafer_id"],
reason="low_confidence" if not review_flag else "ambiguous_pattern",
)
The review_recommended flag in the response is set by the API when the top two class confidences are within 0.08 of each other, or when overall confidence is below 0.60. Don't suppress this flag in your pipeline — it's the mechanism by which the classifier defers to human judgment on genuinely ambiguous patterns.
A Note on SECS-GEM Integration
Some fabs prefer to trigger classification from equipment events via SECS-GEM rather than from a post-sort batch job. Wafertune's API works equally well in both patterns — the interface is the same REST call. The SECS-GEM integration pattern typically involves a thin middleware layer that subscribes to S6F11 (Equipment Event Report) events from the EWS tester at WRR completion, extracts the die map from the equipment's local STDF, and calls the API. We don't publish a reference SECS-GEM adapter yet, but it's on the roadmap. In the meantime, the STDF extraction pattern described above works as the middleware's extraction layer. If you're building this integration and want to compare notes, the contact form routes directly to the engineering team.