Week 16 - Semi-structured Data

30 Sept

Written By

Tis Friday and what a wonderful day for flexing those JSON-parsing muscles!

Below we have the set up script:

create or replace file format json_fftype = jsonstrip_outer_array = TRUE;create or replace stage week_16_frosty_stageurl = 's3://frostyfridaychallenges/challenge_16/'file_format = json_ff;create or replace table <schema>.week16 asselect t.$1:word::text word, t.$1:url::text url, t.$1:definition::variant definitionfrom @week_16_frosty_stage (file_format => 'json_ff', pattern=>'.*week16.*') t;

END RESULT

And below is what you should end up with if you were to run the following:

select *from (<your query goes here>) subwhere word like 'l%';

If correct, then (without the "where word like 'l%'" filter) :

count(word) should get you 32,295 rows
count(distinct word) should get you 3,000 rows

DATA SOURCES:

https://www.ef.co.uk/english-resources/english-vocabulary/top-3000-words/
https://dictionaryapi.dev/

BONUS POINTS

If you're lucky enough to be in one of the following regions, then try and apply search optimization on your table using a variant path.

Remember, if you want to participate:

Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘
Post your code to GitHub and make it publicly available (Check out our guide if you don't know how to here)
Post the URL in the comments of the challenge

Medium

Week 16 - Semi-structured Data

Week 17 - Geospatial

Week 15 - Stored Procedures & UDFs

Proudly sponsored by