Skip to content

Gathering Docs

We need to create our own indexed library of documentrs so that we know what information we have.

These will be collated for a specific medical domain.

It would be helpful to have metadata about each document to help with filtering and correct retrieval.

Data types can include:

  • PDFs of articles.
  • Tabular data within these articles.
  • Images within these articles that have a text summary.
  • Word, Powerpoint and Excel documents.
  • Audio and video files to create transcripts to text as well video analysis of sequential images that are then summarised with text.

MULTIMODAL