Skip to main content

Create ML Models with BigQuery ML: Challenge Lab [GSP341]



Link to the Video: https://youtu.be/iYrWLbl1Z6o 

STEPS

Task 1: Create a dataset to store your machine learning models

It can be called any name to pass the test but for the remaining instructions to work use austin as the dataset name.


Go to BigQuery


Click on Project id 


Create dataset named austin


Task 2: Create a forecasting BigQuery machine learning model.

Create the first ML model using a JOIN between two bike share tables. Again any names will work but keep them as ‘austin_1’ and ‘austin_2’ for the remaining instructions to work.



CREATE OR REPLACE MODEL austin.location_model

OPTIONS

  (model_type='linear_reg', labels=['duration_minutes']) AS

SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,

    duration_minutes,

    address as location

FROM

    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

JOIN

    `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations

ON

    trips.start_station_name = stations.name

WHERE

    EXTRACT(YEAR FROM start_time) = <Training Year>

    AND duration_minutes > 0



Task 3: Create the second machine learning model. 


CREATE OR REPLACE MODEL austin.subscriber_model

OPTIONS

  (model_type='linear_reg', labels=['duration_minutes']) AS

SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    subscriber_type,

    duration_minutes

FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips

WHERE EXTRACT(YEAR FROM start_time) = <Training Year> 

 

Task 4: Evaluate the two machine learning models.

 

Query 1

 

SELECT

  SQRT(mean_squared_error) AS rmse,

  mean_absolute_error

FROM

  ML.EVALUATE(MODEL austin.location_model, (

  SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,

    duration_minutes,

    address as location

  FROM

    bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips

  JOIN

   bigquery-public-data.austin_bikeshare.bikeshare_stations AS stations

  ON

    trips.start_station_name = stations.name

  WHERE EXTRACT(YEAR FROM start_time) = <Evaluation Year>)

)

 

QUERY 2:

 

SELECT

  SQRT(mean_squared_error) AS rmse,

  mean_absolute_error

FROM

  ML.EVALUATE(MODEL austin.subscriber_model, (

  SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    subscriber_type,

    duration_minutes

  FROM

    bigquery-public-data.austin_bikeshare.bikeshare_trips AS trips

  WHERE

    EXTRACT(YEAR FROM start_time) = <Evaluation Year>)

)

 

Task 5: Use the subscriber type machine learning model to predict average trip durations

Use the second model, that model (model austin_2 in this case) to to predict the average duration length of all trips from the busiest rental station in 2019 (based on the number of rentals per station in 2019) where the subscriber_type=’Single Trip’.

 

 

The following query will list busiest stations in descending order. The busiest station for 2019 was “21st & Speedway @PCL”.

 

 

SELECT

  start_station_name,

  COUNT(*) AS trips

FROM

  bigquery-public-data.austin_bikeshare.bikeshare_trips

WHERE

  EXTRACT(YEAR FROM start_time) = <Evaluation Year>

GROUP BY

  start_station_name

ORDER BY

  trips DESC

 

Then predict trip length.

 

SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length

FROM ML.predict(MODEL austin.subscriber_model, (

SELECT

    start_station_name,

    EXTRACT(HOUR FROM start_time) AS start_hour,

    subscriber_type,

    duration_minutes

FROM

  bigquery-public-data.austin_bikeshare.bikeshare_trips

WHERE 

  EXTRACT(YEAR FROM start_time) = <Evaluation Year>

  AND subscriber_type = 'Single Trip'

  AND start_station_name = '21st & Speedway @PCL'))

Congratulations You have Completed the Challenge Lab!
SAKURA SATIO


Comments

Popular posts from this blog

A Tour of Google Cloud Hands-on Labs [GSP282]

Link to the Lab:  https://www.cloudskillsboost.google/focuses/2794?parent=catalog Link to the Video:  https://youtu.be/KFjyI-o60W8 Link to the Channel:  https://www.youtube.com/channel/UCiWP5KYh4MWj-yp3j-DMIrA Answers are marked bold and are highlighted in light yellow This builds a temporary environment in Google Cloud. Start lab (button) Credit Time Score When the timer reaches 00:00:00, you will lose access to your temporary Google Cloud environment. False True Some labs have tracking, which scores your completion of hands-on lab activities. False True In order to receive completion credit for a lab that has tracking, you must complete the required hands-on lab activities. False True What field is NOT found in the left pane? Project ID System admin Password Open Google Console The username in the left panel, which resembles googlexxxxxx_student@qwiklabs.net, is a Cloud IAM identity. True False An organizing entity for anything you build with Google Cloud. Password...

Insights from Data with BigQuery: Challenge Lab [GSP787]

Link to the Lab:  https://www.cloudskillsboost.google/focuses/11988?parent=catalog Link to the Video:  https://youtu.be/jdZd834oysE Link to the Channel:  https://www.youtube.com/channel/UCiWP5KYh4MWj-yp3j-DMIrA STEPS 1.     In the Cloud Console, navigate to Menu > BigQuery . 2.     Click + ADD DATA > Explore public datasets from the left pane. 3.     Search covid19_open_data and then select COVID-19 Open Data 4.     Use Filter to locate the table covid19_open_data under the covid19_open_data dataset. Query 1: Total Confirmed Cases SELECT    SUM ( cumulative_confirmed ) AS total_cases_worldwide FROM    `bigquery-public-data.covid19_open_data.covid19_open_data` WHERE    date = "2020-04-15" Query 2: Worst Affected Areas with deaths_by_states as ( SELECT subregion1_name as state, sum ( cumulative_deceased ) as death_count FROM `bigquery-public-data.covid19_open_data.covid19_open_da...

Deploy and Manage Cloud Environments with Google Cloud: Challenge Lab [GSP314]

Link to the Lab:  https://google.qwiklabs.com/focuses/10417?parent=catalog Link to the youtube video:  https://youtu.be/eraSK1T45Do Link to my youtube channel:  https://www.youtube.com/channel/UCiWP5KYh4MWj-yp3j-DMIrA   DO NOT START WITHOUT LAB RESOURCES! DO NOT MAKE HASTE. COPY PASTE STEP BY STEP OTHERWISE IT WILL NOT WORK. TASK 1: GO TO COMPUTE ENGINE > VM INSTANCES > OPEN SSH OF kraken-jumphost TYPE THE COMMANDS IN SSH: cd /work/dm sed -i s/SET_REGION/us-east1/g prod-network.yaml gcloud deployment-manager deployments create prod-network --config=prod-network.yaml gcloud config set compute/zone us-east1-b gcloud container clusters create kraken-prod --num-nodes 2 --network kraken-prod-vpc --subnetwork kraken-prod-subnet --zone us-east1-b gcloud container clusters get-credentials kraken-prod cd /work/k8s for F in $(ls *.yaml); do kubectl create -f $F; done