Week 104 - Cortex (AI / ML)

How fascinating are mermaid and pirate stories? Some would spend their entire summer vacations reading every word of these tales, while others would rather be swimming in the sea. For those who prefer the latter, having a summary of the story is a great reason to sit in front of the computer and use Cortex!

The goal of this challenge is to create a complete pipeline in Snowflake for processing PDF files, extracting and chunking their text content, embedding the text for further analysis, and generating summaries of the text chunks.

Setup steps:

  • Create the environement: database, schema, warehouse.
  • Define a Python function that given the file_url as parameter, it returns a table with a single column chunk of type VARCHAR. (Hint: check here, very carefully xd ).
  • Create a stage with a Directory Table. (You can do that directly from Snowsight).
  • Upload into the stage the three PDF files (you can find them here).

The goal is to have a table STORIES_SUMMARY_TABLE with three rows (as shown in the figure below), where for each story, you have the relative_path as story_title and the summary of the text that is produced by using a specific Cortex function (guess which one? The challenge's title could be useful).

Stories about mermaids and pirates are undeniably fascinating. But can we say that it is equally amazing to create summaries of them in this way? Once you solve this challenge, the sea awaits, maybe you will find treasure!

Previous
Previous

Week 105 - Cortex (AI / ML)

Next
Next

Week 103 - Time Travel & Cloning