Run Settings
Model
Choose the foundational model for your task. Models vary in capabilities (reasoning, creativity), context window size, speed, and cost.
Considerations:
- Capability: Pro versions generally offer superior reasoning and instruction following. "Preview" or "Experimental" models offer the latest features but may change.
- Use Cases:
- Flash: Ideal for high-volume, low-latency tasks (e.g., quick summarization, chat).
- Pro: Good for complex tasks requiring deep understanding (e.g., detailed analysis, creative content).
- Context Window: The amount of information (text, images, etc.) the model can process. Larger is better for complex prompts or long documents.
- Speed & Cost: Flash models are faster and often cheaper. Pro models are more powerful but may have higher latency/cost.
Token count 0 / 1,048,576
Tracks tokens used by your input (prompt, files) against the model's limit. 1 token ≈ 4 chars or 0.75 words in English.
Understanding Tokens:
- Both your input (prompt, files, images) and the model's output consume tokens.
- Exceeding limits causes errors or truncated output. Monitor closely, especially with large inputs.
- Images also consume tokens, varying by size/complexity.
- Pitfall: Forgetting that system instructions and previous conversation turns also count towards tokens in multi-turn scenarios.
Temperature
Controls output randomness. Low = focused & deterministic. High = creative & diverse. Affects token probability scaling.
Typical Ranges & Use Cases:
- 0.0 - 0.3 (Low): More factual, predictable. Good for code generation, summarization, Q&A, data extraction. Output is highly consistent.
- 0.4 - 0.7 (Balanced): Good for general tasks, creative writing with some constraints.
- 0.8 - 1.0+ (High, up to 2.0): Highly creative, diverse, surprising. Good for brainstorming, poetry, novel ideas. Risk of incoherence or topic drift increases.
- 0 = greedy decoding (always picks most probable token).
Common Pitfalls:
- Setting too high for factual tasks can lead to "hallucinations" (made-up information).
- Setting too low for creative tasks can result in repetitive or dull output.
Advanced Tip:
"Temperature Annealing": Start with a higher temperature for initial creative generation, then reduce it for subsequent refinement passes on the generated text.
Tools
Structured output
Ensures model responses conform to a specific data structure (e.g., JSON schema). Vital for data extraction and system integration.
Use Cases:
- Extracting entities (names, dates, products) consistently.
- Converting natural language to structured API calls.
- Generating data for databases or applications. Example schema for a user:
{"name": "string", "email": "string"}
Common Pitfalls:
- Overly complex schemas may be difficult for the model to follow perfectly.
- Schema must exactly match the desired output structure.
Code execution
Enables the model to run code (primarily Python) in a sandboxed environment to perform calculations, data analysis, or simple scripting.
Important Notes:
- Supports Python in a restricted, sandboxed environment (no network, limited libraries).
- Model generates and decides to execute code based on the prompt.
- Always review executed code and outputs for accuracy and safety.
- Useful for math, data manipulation, simple algorithms.
Function calling
Define custom functions (tools) that the model can invoke to interact with external APIs, fetch real-time data, or perform specialized actions.
How it Works:
- Provide function declarations (name, description, parameters with types).
- Model returns JSON with function name & arguments if it decides to call one.
- Your application executes the function and returns the result to the model.
- Advanced Tip: Design function descriptions clearly and make functions idempotent where possible.
Grounding (via Google Search)
Source: Google Search
Enhances responses by fetching and incorporating information from Google Search, providing more current and factual answers.
Benefits:
- Access to recent events and information beyond model's training data.
- Improved factuality and reduced hallucination for topics needing current knowledge.
- Citations may be provided, indicating search result sources.
URL context
Provide URLs in prompt; model attempts to fetch and use their content. Useful for summarizing articles or web-based Q&A.
Use Cases & Limitations:
- Summarize news articles or documentation pages.
- Answer questions based on specific web content.
- Up to 20 URLs (experimental, verify current limit).
- Pitfalls: May fail on paywalled sites, complex JS-rendered pages, or very large content.
Advanced Settings
Top P
Nucleus sampling: Considers tokens with cumulative probability > P. Lower P = focused, higher P = random. Alternative to Temperature.
Understanding Top P:
- 1.0: All tokens considered (effectively disabled).
- 0.1: Only tokens in top 10% probability mass are chosen from.
- Lowering Top P reduces diversity, can improve coherence. Often used with a fixed, moderate Temperature.
- If both Temp & Top P are set, Top P is usually applied first.
Top K
(Model-dependent) Considers only K most probable tokens. Lower K = focused, higher K = random. Less common than Top P.
Note: Top K availability and behavior can vary by model.Understanding Top K:
- 1: Greedy decoding (same as Temp 0).
- 0: Disables Top K (or not supported by model).
- Reduces chance of picking unlikely tokens, can improve coherence.
- Using Temp, Top P, and Top K aggressively together can over-constrain output. Often, one or two are primary.
Safety settings
Configure blocking thresholds for Harassment, Hate speech, Sexually explicit, and Dangerous content. Adjust sensitivity per category.
Threshold Options:
- Block most: Highest sensitivity, blocks more potentially harmful content.
- Block some: Balanced approach (often default).
- Block few: Lowest sensitivity, allows more but may include harmful responses.
- Block none: (Availability varies) Disables blocking for the category.
- Important: Carefully consider implications. Even with "Block none," API usage policies apply.
Add stop sequence
Specify up to 5 character sequences. If generated, model stops output. Not included in response.
Usage & Pitfalls:
- Controls output length/format (e.g., stop at "END_OF_RESPONSE").
- Sequences are case-sensitive.
- Pitfall: Choosing a sequence that might naturally appear in desired output, causing premature truncation.
Output length 8192
Max tokens model can generate in response (not including prompt tokens). Actual output can be shorter.
Considerations:
- Model may finish or hit stop sequence before max length.
- Too low truncates responses; too high increases time/cost.
- Max value depends on selected model. Current is 1,048,576 for some models (like shown in UI image's token count). The "8192" is often a default output length for some APIs.
Interplay of Settings
Temperature, Top P, and Top K all control output randomness by influencing next-token selection. Understanding their interaction is key for fine-tuning.
Temperature vs. Top P:
- Temperature: Rescales probabilities of all potential next tokens. High temp flattens distribution (more randomness); low temp sharpens it (less randomness).
- Top P (Nucleus Sampling): Selects the smallest set of tokens whose cumulative probability is >= P. Ignores tokens outside this "nucleus." More adaptive than Top K.
- Common Strategy: Many users pick one primary method. E.g., set Top P to a high default (0.9-0.95) and adjust Temperature. Or, set Temperature moderately (0.7-0.9) and tune Top P. Setting Temperature to 0 effectively makes Top P/K irrelevant (always picks the single most likely token).
Top K vs. Top P:
- Top K: Selects from a fixed number (K) of the most probable next tokens. Simpler but less adaptive than Top P.
- If the probability distribution is very flat (many tokens have similar probabilities), Top K might include low-probability tokens if K is large. If it's very steep, Top K might be very restrictive if K is small. Top P adapts to the shape of this distribution.
Example Scenarios:
- Factual Q&A: Temp=0.2, Top P=1.0 (or not set). Result: Focused, deterministic.
- Creative Brainstorming: Temp=0.9, Top P=0.95. Result: Diverse, novel ideas.
- Over-Constrained (Avoid): Temp=0.1, Top P=0.1, Top K=1. Result: Very limited, repetitive, likely poor quality.
Curated Prompt Examples
Task: Creative Story Snippet
Base Prompt: "Write a short, mysterious opening for a story set in an old library."
Variation 1 (Focused): Temp=0.3, Top P=0.9
"The dust motes danced in the single shaft of moonlight that pierced the gloom of the Grand Library. A tome, bound in cracked leather, lay open on a pedestal, its pages whispering secrets only the silence understood."
Variation 2 (Imaginative): Temp=0.9, Top P=0.95
"Old Man Hemlock swore the library breathed. Tonight, as a rogue constellation winked through the grimy oriel window, the very ink in the forgotten grimoires seemed to squirm, yearning for a reader brave, or foolish, enough to translate their sighing script."
Task: JSON Data Extraction
Base Prompt: "Extract the name and email from this text: 'Contact Jane Doe at [email protected] for more info.' Your output must be a JSON object."
Settings: Temp=0.1, Structured Output (Schema: {"name": "string", "email": "string"})
{ "name": "Jane Doe", "email": "[email protected]" }
Note: Forcing JSON via prompt instruction is good, but using the 'Structured Output' tool with a schema is more reliable.