Week 16 - Semi-structured Data
Tis Friday and what a wonderful day for flexing those JSON-parsing muscles!
Below we have the set up script:
create or replace file format json_fftype = jsonstrip_outer_array = TRUE;create or replace stage week_16_frosty_stageurl = 's3://frostyfridaychallenges/challenge_16/'file_format = json_ff;create or replace table <schema>.week16 asselect t.$1:word::text word, t.$1:url::text url, t.$1:definition::variant definitionfrom @week_16_frosty_stage (file_format => 'json_ff', pattern=>'.*week16.*') t;END RESULT
And below is what you should end up with if you were to run the following:
select *from (<your query goes here>) subwhere word like 'l%';
If correct, then (without the "where word like 'l%'" filter) :
- count(word) should get you 32,295 rows
- count(distinct word) should get you 3,000 rows
DATA SOURCES:
- https://www.ef.co.uk/english-resources/english-vocabulary/top-3000-words/
- https://dictionaryapi.dev/
BONUS POINTS
If you're lucky enough to be in one of the following regions, then try and apply search optimization on your table using a variant path.
Remember, if you want to participate:
- Sign up as a member of Frosty Friday. You can do this by clicking on the sidebar, and then going to ‘REGISTER‘
- Post your code to GitHub and make it publicly available (Check out our guide if you don't know how to here)
- Post the URL in the comments of the challenge