Skip to content
Three chainlinks

PREV-HED

PREVHED Assistant: An AI driven chatbot!

Advanced AI Support for Safe, Confidential Reporting & Guidance

1. Overview

What is the PREVHED assistant?

The PREVHED assistant is a multilingual, confidential chatbot developed for Higher Education Institutions (HEIs) to help students identify and report cases of sexual harassment and to guide them to the right first steps and support services.

Built on specialist-validated guidance, the PREVHED assistant offers empathetic, practical, and secure assistance, anytime, in the language of your campus.

  • Live Pilot: https://chat.prevhed.eu/
  • Feedback: User Questionnaire

2. Functionalities & Features

Key Functionalities

  • Case Identification: Helps students determine if an experience falls under harassment or related categories by analyzing their description.
  • First-Step Guidance: Provides actionable instructions on immediate steps (documentation, evidence preservation, reporting).
  • Referral System: Direct links to campus counselors, reporting offices, and local support services.
  • Multilingual Support: Available in all languages of the project’s HEI partners.
  • Dynamic Visualizations: Capable of generating charts to visualize data or trends where appropriate.
  • Live Web Analysis: Can browse and summarize live content from university web pages to provide up-to-date policy information.

Design Principles

  • Knowledge-Base Driven: Answers are strictly grounded in PREVHED modules and institutional policies.
  • Empathetic Tone: Responses are non-judgmental and designed to reduce distress.
  • Privacy-First: No personal information is required/stored. Communications are encrypted (SSL/TLS).
  • Safe Fallback: “I’m sorry, I’m not sure about that. Let me connect you to a support person.”

3. Technical Specifications

The PREVHED system has been implemented as a high-performance architecture leveraging Google’s Gemma 3 and the AnythingLLM orchestration layer, as a massive-reasoning engine capable of complex logic, long-term memory, and multimodal output.

A. Core Inference Engine (LLM)

  • Model: Google Gemma 3 27B IT (Instruction Tuned)
  • Capabilities: A state-of-the-art open model with 27 billion parameters. It offers superior reasoning capabilities for complex harassment scenarios compared to smaller models.
  • Role: Generates empathetic responses, interprets vague user queries, generates visualization code, and summarizes complex legal/university documents.
  • Context Window: Large context window allows for extensive document analysis without losing the thread of conversation.

B. Orchestration & Memory Layer

  • Framework: AnythingLLM
  • Role: Manages the RAG pipeline, document ingestion, and vector database connections.
  • Memory Type: Long-Term Memory. The system retains context across the conversation (and potentially across sessions when configured), allowing users to reference details they shared messages ago without repeating themselves.

C. Vector Database (Storage)

  • Database: LanceDB
  • Architecture: Serverless, on-disk vector storage.
  • Why LanceDB? Unlike memory-only databases, LanceDB allows for massive scalability of the PREVHED knowledge base without exhausting system RAM, ensuring fast retrieval of policies even as the document library grows.

4. RAG & Data Processing Pipeline

To ensure the chatbot provides accurate, grounded information, we utilize a strict Retrieval-Augmented Generation (RAG) configuration.

Step 1: Ingestion & Chunking Strategies

Documents (PDFs, Policies, Guidelines) are processed using specific parameters to balance context with precision:

ParameterSettingReason
Tech Chunk Size1000Creates large, coherent blocks of text (approx. 3-4 paragraphs). This is crucial for complex harassment policies where context must not be cut short.
Overlap20Minimal overlap to ensure distinct data boundaries while maintaining sentence continuity at the edges.

Step 2: Embedding

  • Model: all-minilm-l6-v2
  • Function: Converts the 1000-character chunks into 384-dimensional vector representations.
  • Performance: Chosen for its speed and high accuracy in semantic clustering, ensuring the bot identifies the concept of a question (e.g., “unwanted touching”) even if the user uses different words than the policy document.

Step 3: Retrieval & Generation

  • User Query: Student asks, “I received inappropriate texts, is this harassment?”
  • Vector Search: The system queries LanceDB using the all-minilm embedder.
  • Retrieval: Top matching chunks (Rules on Digital Harassment) are retrieved.
  • Synthesis: Gemma 3 27B combines the retrieved policy with its empathetic instruction set to generate a supportive answer.
  • Action: If data is requested (e.g., “Show me statistics on reporting”), Gemma 3 generates the code to render a chart.

5. Frontend & Deployment Stack

  • User Interface: AnythingLLM Custom Interface (React/Tailwind based) or Custom Embedded Chat Bubble.
  • Backend Orchestrator: AnythingLLM (Node.js/Python).
  • Infrastructure:
    • Deployment: Docker containerized for easy deployment on university servers.
    • Security:
      • Role-Based Access Control (RBAC).
      • Fully local execution capability (no data sent to external APIs if hosted on-premises).
European Union flag Erasmus+

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

©2023-2026 PREV-HED Erasmus+

|

Privacy policy

Terms of use

Cookies settings