Skip to main content

Supported Formats

Finch Fusion supports a variety of document formats for upload and analysis. This guide covers supported file types and their characteristics.

Fully Supported Formats

PDF Documents (.pdf)

PDF is the most common format for business and financial documents. Supported features:
  • Native text extraction
  • OCR for scanned documents
  • Multi-page documents
  • Embedded images and tables
Best for:
  • Annual reports (10-K, 10-Q filings)
  • Research papers
  • Presentations converted to PDF
  • Scanned documents
For best results, use text-based PDFs rather than scanned images when possible. Text-based PDFs process faster and more accurately.

Word Documents (.docx)

Microsoft Word documents are fully supported. Supported features:
  • Full text extraction
  • Formatting preservation for analysis
  • Tables and lists
  • Embedded images (text from images may not be extracted)
Best for:
  • Draft reports
  • Internal memos
  • Written analyses
  • Meeting notes
Older .doc format may have limited support. Convert to .docx for best results.

Plain Text Files (.txt)

Simple text files are processed quickly and accurately. Supported features:
  • Full text extraction
  • Fast processing
  • Universal compatibility
Best for:
  • Data exports
  • Transcripts
  • Simple notes
  • Log files

Format Comparison

FormatText AccuracyProcessing SpeedBest Use Case
PDF (text-based)ExcellentFastOfficial reports, filings
PDF (scanned)GoodSlowerPhysical document digitization
DOCXExcellentFastDraft documents, notes
TXTPerfectVery FastPlain text content

File Size Considerations

File TypeRecommended MaxNotes
PDF50 MBLarger files take longer to process
DOCX25 MBImages increase file size significantly
TXT10 MBVery large text files may timeout

Handling Large Documents

If your document exceeds recommended sizes:
  1. Split the document - Break it into logical sections
  2. Compress images - Reduce embedded image sizes
  3. Remove unnecessary pages - Include only relevant content

Optimizing Documents for Upload

PDF Best Practices

  • Use text-based PDFs when possible
  • Ensure scanned documents are at least 300 DPI
  • Flatten complex PDFs before upload
  • Remove password protection before uploading

Word Document Best Practices

  • Save as .docx format (not .doc)
  • Compress images before adding to the document
  • Remove track changes and comments
  • Ensure fonts are embedded or use standard fonts

General Tips

  • Give files descriptive names
  • Ensure documents are not corrupted before upload
  • Remove any password protection
  • Verify the document opens correctly on your computer first

Unsupported Formats

The following formats are not currently supported:
  • Spreadsheets (.xlsx, .xls, .csv)
  • Presentations (.pptx, .ppt)
  • Images (.jpg, .png, .gif)
  • Web pages (.html)
  • Email files (.eml, .msg)
  • Compressed archives (.zip, .rar)
For unsupported formats, consider converting to PDF or copy-pasting content into a text file before upload.

Format Conversion Tips

If you need to convert documents:

Spreadsheets to PDF

  1. Open in Excel or Google Sheets
  2. Use Print > Save as PDF
  3. Upload the resulting PDF

Presentations to PDF

  1. Open in PowerPoint or Google Slides
  2. Use File > Export as PDF
  3. Upload the PDF

Web Pages to PDF

  1. Open the page in your browser
  2. Use Print > Save as PDF
  3. Upload the PDF

Troubleshooting Format Issues

The PDF may be image-based without embedded text. Finch Fusion uses OCR for scanned documents, but results depend on image quality. Consider using a higher-resolution scan.
Formatting is used for analysis but not preserved in display. Focus on the text content being extracted correctly rather than visual formatting.
Convert the file to a supported format (preferably PDF) before uploading. Most applications have Export as PDF functionality.