A Practical Approach to Humanitarian AI Safeguarding and Guardrails

*The SignpostAI platform grounded in approved knowledge, utilizing prompt-based safety guardrails*

**The real work of humanitarian AI is mobilizing human experts to** balance safety with impact

Introduction

There's no shortage of AI ethics frameworks, principles documents, or theoretical guidelines. But the reality we've learned is simpler and harder: AI safety is built by humans with hands-on expertise making adaptive decisions in real contexts.

Abstract policies matter less than the daily choices made by technologists who understand both the engineering and the human consequences. Safety isn't a switch you flip on—it's a continuous practice that requires:

Technical depth and “capital E” Engineering that goes beyond generalist knowledge
Subject matter expertise to evaluate output quality in context
Interdisciplinary collaboration between engineers, program staff, data protection experts, and leadership
Adaptive thinking because perfect theoretical guardrails often fail in practice as context evolves

This is why our approach centers on applied AI safety through human review and iterative automation—embedding safety practices directly into implementation teams rather than treating governance and deployment as separate functions.

Humanitarian AI Guardrail Principles

AI systems handling humanitarian information operate in complex territory. They might answer straightforward questions about program data one moment, then face queries about politically sensitive conflicts, organizational challenges, or donor relationships the next.

All this means that guardrails have to properly address the generality which makes the AI utility worth it’s impact and price, with the safeguarding required for an implementation to be useful. Through applied applications of dozens of conversational, geospatial, and workflow automation applications of AI we assert the following principles about guardrails:

The best theoretical guardrails are usually wrong and need to be adaptive because reasoning, especially when grafted onto reality is messy.
Guardrails change as context changes. A guardrail that does not change with context will decrease its utility and lead to diminishing safety.
Guardrails cannot be properly assessed without centralized technical expertise that goes beyond generalist skillsets (technologists who have a very specific engineering skillset). This is not your normal engineer but a conscientious engineer who can make sense of how an agent reaches its conclusions from the uninterpretable black box of foundation model to its applied application (eg. an education assistant).
Centralized technical experts that provide safety and guardrail advisory need applied checks and balances for A) engineering, B) subject matter expertise, C) Data protection and organization ethics D) Leadership who can determine program utility and return on investment. This skillset is both human and technical and needs to be both embedded into but independent from engineering or the actual digital platforms.
AI guardrails must be designed to preserve both safety and utility. Excessive constraints can undermine the system’s purpose, while insufficient safeguards can compromise trust and safety, leading to real and mitigable human harm.

Essential to this, is that if safety, governance, and compliance cannot function outside of the practical application of engineering systems. Grounding in shared tools and technical language is a pre-requisite to good governance of orchestrated AI systems. If safety professionals cannot “look under the hood” of the AI, then proper safeguarding has not been conducted.

Types of Guardrails

**Guardrail 1: Prompt Guardrails**

Instructions that shape the AI's behavior and tone—teaching it when to cite sources, when to acknowledge uncertainty, and when to defer to human judgment.
*ex. On escalation match such as a life threatening situation - immediately send category’s response to the LLM*

**Guardrail 2: Retrieval Guardrails**

Rules that determine what information gets surfaced and how it's weighted. Should internal institutional knowledge take priority? How do we balance organizational perspective with external expertise? Should we cap our knowledge at a certain Similarity (Sim:) or Confidence score?
*ex. Upon retrieval only send the AI articles that have a similarity (confidence) score of over 0.79.*

**Guardrail 3: Response Guardrails**

Logic, often LLM-as-judge, that reviews outputs before delivery—adding context, flagging assumptions, or requesting human review when stakes are high. The art is in the calibration. Too strict, and you build a system that frustrates users and limits learning. Too loose, and you risk misinformation or repetitional harm.
*ex. After an AI agent generates an answer about a users migration status, use an LLM-as-judge to determine if that information over asserts itself as legal advice.*

Common Tradeoffs and Recommendations

A. Transparency vs. Control

We should disclose when AI responses are modeled assumptions rather than objective information

Pros
- Builds staff trust through transparency.
- May overwhelm or confuse users if overused.
- Encourages critical thinking and safe internal use.
Cons
- Adds cognitive load — every answer may feel uncertain.
- Allows researchers to evaluate bias and modeling quality.
- Risks undermining confidence in legitimate factual outputs.

Recommendation: Include a lightweight disclosure system (e.g., “This summary includes modeled assumptions XYZ”) for inference-heavy answers, but not for routine retrieval-based responses that have clearly attributed internal sources.

Our commitment: Users should always understand when they're getting institutional knowledge versus AI inference.

B. Weighting of Internal Content vs. External Sources

Internal content should be prioritized through similarity scoring

Pros
- Ensures alignment with IRC’s official voice and policies.
- May amplify outdated or narrow institutional knowledge and downplay the utility of the AI.
- Reduces hallucination and inconsistent tone.
Cons
- Could downrank important neutral or external expert sources.
- Simplifies auditability and provenance tracking.
- Risk of internal echo chambers (“organizational blindness”).

Recommendation: Default to 80/20 weighting (internal > external) for sensitive topics; use context flags (e.g., #advocacy, #partnerships) to dynamically adjust weighting in orchestration logic.

Our commitment: Ensure that all content has provenance (validation and accuracy) by balancing the value of the open internet with safely validated and prepared content.

C. Guardrail Strength vs. LLM Utility

Foundation models should be constrained sparingly with negative constraint prompts in “spicy” contexts

Pros
- Prevent misrepresentation of org policy.
- Allow richer, more exploratory conversation.
- Consistent tone and retrieval because sources are controlled.
Cons
- May expose staff to more candid critiques (useful for learning). This also allows staff to learn about what users care about.
- Easier to moderate outputs at scale.
- Higher relevance for nuanced research questions.

Recommendation:

We apply graduated guardrails:

High-risk topics (active crises, sensitive partnerships) → strict controls, human oversight
Medium-risk areas (policy questions, strategic decisions) → balanced guardrails with clear sourcing
Routine operations → flexible responses that maximize utility

Recommendation: For low-risk content use a steadfast global rule that any information used from internet source. As long as it is attributed to the source is fair game.

Our commitment: Apply safety measures aproportionate to actual risk, not applied uniformly so that we can see returns from AI’s utility and be safe with a minimal margin for error.

D. Escalation Logic vs. Autonomy

“Red zone” or High risk questions (e.g., internal criticism, legal topics) should automatically trigger human escalation

Pros
- Offsets risk to a human, who may or may not provide a better response.
- Protects brand and legal standing.
- There actually needs to be costly setup and staff time to answer such questions. Determinations of who this should be and how it is resourced is expensive.
Cons
- Slows knowledge access; can feel “censored.”
- Encourages internal accountability for risky responses.
- Users may work around guardrails if blocked too often.

Recommendation: Implement a “soft escalation” pattern — AI provides a neutral placeholder (“This topic requires verification or policy input”) and auto-tags it for moderator review rather than full refusal. It should give an answer, but that answer should be hedged to specific by-lines issued in prompts.

Our commitment: Build systems that enhance human judgment and optimize users convenience.

E. Consistency vs. Adaptability

Prompts should be adaptive (based on context metadata) not static (institutionally hard-coded)

Pros
- Improves personalization by region, role, and topic.
- Adds complexity to auditing and debugging.
- Enables finer-grained neutrality levels (e.g., Gaza vs. Sudan).
Cons
- Risks inconsistent behavior across sessions and degredation of quality without upkeep
- Requires manual upkeep

Recommendation: Use context-adaptive prompting controlled by a small number of policy archetypes (e.g., “Crisis Response,” “Internal Ethics,” “Public Comms”) only if human resources allow for continuous audits.

Our commitment: Flexibility in service of better outcomes, with transparency about how decisions are made.

What This Means for the Humanitarian Sector

Too often, we equate safety with governance. We write policies, but fail to invest in the systems and practices that actually mitigate harm for end users.

Our call to action is clear: prioritize safety by developing practical techniques, testing applied methods, and openly sharing what works.

The humanitarian sector needs frameworks that function in the real world — in crisis zones, across cultures, and in service to the world’s most vulnerable communities. Big Tech will not build these for us because they lack exposure to context. It’s on us.

We are committed to open learning. Across our global operations, we are testing these approaches, documenting both successes and failures, and sharing insights so others can build on our experience. Our responsibility is to ensure this work remains ethical, transparent, and worthy of the trust placed in us by the people we serve.

A Practical Approach to Humanitarian AI Safeguarding and Guardrails

The real work of humanitarian AI is mobilizing human experts to balance safety with impact

Introduction

Humanitarian AI Guardrail Principles

Types of Guardrails

Guardrail 1: Prompt Guardrails

Guardrail 2: Retrieval Guardrails

Guardrail 3: Response Guardrails