Document Upload & Management

Complete guide to managing your knowledge base documents

Document Management Overview

The platform supports comprehensive document management with intelligent processing, metadata enrichment, and advanced search capabilities. Documents are processed using AI to extract meaning and create searchable knowledge bases.

Smart Processing

AI-powered document analysis and chunking

Rich Metadata

Comprehensive tagging and categorization

Intelligent Search

Semantic search and AI-powered queries

Supported File Types
File formats that can be uploaded and processed

✅ Fully Supported

PDF Documents
Full text extraction, images, tables
Microsoft Word (.docx)
Text, formatting, embedded content
Plain Text (.txt)
Raw text content
Markdown (.md)RECOMMENDED
Optimal format for AI processing

⚠️ Limited Support

PowerPoint (.pptx)
Text extraction only
Excel (.xlsx)
Text content, limited formatting
RTF Documents
Basic text extraction

Note: File size limit is configurable per workspace (default: 50MB)

🎵 Media Files (Transcription Service)

Video FilesUsage Balance Required
MP4, MOV, AVI, MKV - Transcribed to markdown
Audio FilesUsage Balance Required
MP3, WAV, M4A, FLAC - Transcribed to markdown

Transcription Service: Audio and video files are automatically transcribed into searchable markdown format. Charges apply based on content duration.

Convert Files for Upload
Transform your files into the optimal format for enhanced AI performance

Recommended: Markdown (.md) Format

We strongly recommend converting your documents to Markdown format before uploading. Markdown files provide the highest quality results with our Smart Language Model (SLM) and deliver superior search accuracy and AI responses.

Why Markdown Works Best:
  • Clean, structured text without formatting artifacts
  • Preserves document hierarchy and relationships
  • Optimal chunking and context preservation
  • Enhanced AI understanding and citation accuracy
Performance Benefits:
  • Faster processing and indexing
  • More accurate search results
  • Better AI response quality
  • Improved citation precision

Automatic File Conversion

How It Works

Our conversion service automatically transforms your documents into optimized markdown format, preserving structure while removing formatting noise that can interfere with AI processing.

Preserves headings, lists, and tables
Removes unnecessary formatting
Maintains document structure
Optimizes for AI processing
Supported Conversions
PDF → Markdown
Word (.docx) → Markdown
PowerPoint (.pptx) → Markdown
HTML → Markdown
RTF → Markdown

Audio & Video Transcription ServiceRequires Usage Balance

Transform your audio and video content into searchable, AI-ready text. Our transcription service converts spoken content into high-quality markdown documents that integrate seamlessly with your knowledge base.

Supported Media Types
  • Audio files (MP3, WAV, M4A, FLAC)
  • Video files (MP4, MOV, AVI, MKV)
  • Meeting recordings
  • Training videos and webinars
  • Interviews and presentations
Usage & Benefits
  • High-accuracy speech recognition
  • Automatic speaker identification
  • Timestamp preservation
  • Formatted markdown output
  • Charged per minute of audio/video

Usage Balance Required: Transcription services consume usage balance based on the length of audio/video content. Check your workspace balance before uploading large media files.

Why Use File Conversion & Transcription?

Enhanced Accuracy

Markdown format provides cleaner text that AI can process more accurately, leading to better search results and more precise answers.

Faster Processing

Optimized formats process faster and consume fewer resources, making your knowledge base more responsive and efficient.

Better Context

Structured markdown preserves document hierarchy and relationships, enabling more contextual and relevant AI responses.

Upload Methods
Different ways to add documents to your workspace

Single File Upload

Upload individual files with detailed metadata configuration.

Drag and drop or click to browse files
Add comprehensive metadata during upload
Configure chunking options for large documents
Real-time processing status and feedback

Batch File Processing

Upload multiple files simultaneously with shared metadata templates.

Select multiple files at once
Apply metadata templates to all files
Queue management with progress tracking
Error handling and retry mechanisms
Metadata Management
Organize and categorize your documents effectively

Core Document Information

Document ID
Unique identifier (auto-generated if not provided)
Document Title
Human-readable title for the document
Author
Document creator or responsible person
Version
Document version (default: 1.0)
Description
Brief summary of document content
Tags
Comma-separated keywords for searchability

Organizational Categories

Department
• Engineering
• Marketing
• Sales
• Finance
• HR
• Legal
• Operations
Document Type
• Manual
• Report
• Policy
• Contract
• Guide
• Analysis
• Specification
Priority Level
• High
• Medium
• Low

Business Context

Business Unit
Organizational unit or division
Project
Associated project or initiative
Cost Center
Financial tracking code
Confidentiality
Public, Internal, Confidential, Restricted

Custom Metadata SchemaWorkspace Customizable

Every workspace can define its own custom metadata schema tailored to your organization's specific needs. This powerful feature allows you to create structured data fields that enhance AI filtering and improve response relevance for your unique use cases.

Customization Benefits
  • Filter responses by your organization's structure
  • Create industry-specific categorization
  • Implement compliance and governance tags
  • Enable role-based content filtering
  • Support multi-language or regional variants
Enhanced AI Filtering
  • More relevant search results
  • Context-aware AI responses
  • Automated content routing
  • Precision document discovery
  • Intelligent content recommendations

Industry-Specific Examples

Healthcare
• Patient Category (Adult, Pediatric, Geriatric)
• Medical Specialty (Cardiology, Oncology, etc.)
• Compliance Level (HIPAA, FDA, Clinical Trial)
• Treatment Phase (Diagnosis, Treatment, Follow-up)
• Evidence Level (Research, Guidelines, Protocol)
Legal
• Practice Area (Corporate, Litigation, IP)
• Jurisdiction (Federal, State, International)
• Document Status (Draft, Final, Archived)
• Client Privilege (Privileged, Work Product)
• Matter Type (Transactional, Regulatory)
Manufacturing
• Product Line (Electronics, Automotive, etc.)
• Quality Standard (ISO, Six Sigma, Lean)
• Safety Classification (OSHA, Environmental)
• Production Stage (Design, Testing, Production)
• Supplier Category (Tier 1, Tier 2, Critical)
Financial Services
• Product Type (Banking, Insurance, Investment)
• Risk Level (Low, Medium, High, Critical)
• Regulatory Framework (SOX, Basel III, MiFID)
• Client Segment (Retail, Corporate, Institutional)
• Geographic Region (Domestic, EMEA, APAC)
Education
• Grade Level (K-12, Undergraduate, Graduate)
• Subject Area (STEM, Liberal Arts, Vocational)
• Learning Objective (Knowledge, Skills, Assessment)
• Accessibility (ADA Compliant, Multi-language)
• Content Type (Curriculum, Assessment, Research)
Technology
• Technology Stack (Frontend, Backend, DevOps)
• Development Phase (Planning, Development, Production)
• Security Level (Public, Internal, Confidential)
• API Version (v1, v2, Beta, Deprecated)
• Platform (Web, Mobile, Desktop, Cloud)

Setting Up Custom Metadata

Configuration Steps
  1. 1Access workspace settings as an administrator
  2. 2Navigate to "Metadata Schema" configuration
  3. 3Define custom field names and types
  4. 4Set validation rules and default values
  5. 5Apply schema to new and existing documents
Best Practices
  • Start with essential fields, expand gradually
  • Use consistent naming conventions
  • Provide clear field descriptions for users
  • Test filtering scenarios before full deployment
  • Train users on metadata importance and usage
Document Chunking
Breaking large documents into searchable segments

What is Document Chunking?

Large documents are automatically split into smaller, manageable segments (chunks) that can be independently searched and referenced by the AI. This improves search accuracy and enables precise citations.

Chunking Options

Chunk Size
Number of characters per chunk (default: 2048)
Recommended: 1024-4096 characters
Overlap Size
Characters overlapping between chunks (default: 200)
Recommended: 10-20% of chunk size

Best Practices

Use smaller chunks (1024-2048) for detailed search
Use larger chunks (3072-4096) for context preservation
Maintain 10-20% overlap to preserve context
Test different sizes based on document type
Document Management Actions
Available operations for managing uploaded documents

Viewing & Navigation

View Document
Browse document content and chunks
Search Within
Find specific content in the document
Download
Download original file

Management Actions

Edit Metadata
Update document information and tags
Reprocess
Re-chunk with different settings
Delete
Remove document and all chunks
Common Issues & Solutions
Upload Fails or Times Out

Check file size limits, internet connection, and file format compatibility. Large files may need to be split or compressed.

Poor Search Results

Add more descriptive metadata, use appropriate chunking settings, and ensure document content is text-searchable (not image-only PDFs).

Processing Takes Too Long

Large documents with complex formatting may take several minutes to process. Check the processing queue and be patient with large files.

Next Steps
After uploading documents, explore these features