Gathering Docs¶
We need to create our own indexed library of documentrs so that we know what information we have.
These will be collated for a specific medical domain.
It would be helpful to have metadata about each document to help with filtering and correct retrieval.
Data types can include:
- PDFs of articles.
- Tabular data within these articles.
- Images within these articles that have a text summary.
- Word, Powerpoint and Excel documents.
- Audio and video files to create transcripts to text as well video analysis of sequential images that are then summarised with text.
MULTIMODAL