Episode 104. It’s all about Apache Tika, the project that lets you index EVERYTHING.

So we continue to have guests in our show to talk to us about interesting thingsā€¦ This time is about Apache Tikka. This is an incredible tool to do search file processing and metadata extraction. Think about that you have tons of unstructured files, like emails, or documents, and you want to extract, index and then search theses. This is Tika’s purpose. And who best to walk us through how it does its magic that its Project Management Committee (PMC) Chair, Tim Allison!

So take a listen as we go deeper on ingesting tons of content (which is fundamental for things like training LLMs).

We thank DataDogHQ for sponsoring this podcast episode

Don’t forget to SUBSCRIBE to our cool NewsCast OffHeap!

Apache Tika

OpenSearch Project and OpenSearch Neural Plugin Tutorials

Selected Advanced File Processing toolkits/services

Selected Hybrid Search/RAG toolkits (there are MANY others!)

Search/Relevance Conferences

Tim’s personal project

Do you like the episodes? Want more? Help us out! Buy us a beer!

And Follow us!

Leave a Reply

Your email address will not be published. Required fields are marked *