<- "
script
/* this query has
intentionally messy
comments
*/
create table prout as
Select
2 as COL_A -- with a comment here
;
-- or wherever
/* and some more commen ts here */
create table prout2 as
Select
* -- with a comment here
FROM
-- and some helpful comment here
prout
;
-- or wherever
"
Challenge: I have an old SQL script that I want to run. I also want to edit a part of that script using a value provided by R.
Let’s say this is my full script that we will want to run.
All it does is create a table called “prout” out of thin air, then creates a new table “prout2” from that first table.
We will also want to replace the final table name ‘prout2’ with some variable.
library(dplyr)
library(dbplyr)
library(DBI)
library("RSQLite")
<- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:") con
Step 1 - cleaning the script
# pass in either a text query or path to a sql file
<- function( text_or_path = '//example/path/to/some_query.sql' , echo = FALSE){
clean_script # modified from Tori Oblad's script https://stackoverflow.com/a/58446028/5657234
# if sql path, read, otherwise assume text input
if( stringr::str_detect(text_or_path, "(?i)\\.sql$") ){
<- text_or_path %>% read_lines() %>% stringr::str_c(sep = " ", collapse = "\n")
text_or_path
}
# echo original query to the console
# (unnecessary, but helpful for status if passing sequential queries to a db)
if(echo){cat("\nThe query you're processing is: \n", text_or_path, "\n\n")}
# remove comments
<- text_or_path %>%
clean_script # remove all demarked /* */ sql comments
gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>%
# remove all demarked -- comments
gsub(pattern = '--[^\r\n]*', replacement = ' ') %>%
#remove any line break, tab, etc.
gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%
# remove extra whitespace
gsub(pattern = ' +', replacement = ' ')
}
<- clean_script(script)
cleaned_script cleaned_script
[1] " create table prout as Select 2 as COL_A ; create table prout2 as Select * FROM prout ; "
let’s try and execute it
dbExecute(con, cleaned_script)
[1] 0
Here we get a warning that is pretty clear “Ignoring remaining part of query: create table prout2 Select * FROM prout ;”. This means the 2nd part of the script won’T be executed.
At work, SQL Server just tries to run all queries in parallel and will return “Invalid Object Name: prout” because the second script tries to read from a table the first script hasnt created yet.
Step 2: split the script into multiple queries and execute them
The quick fix is to split the queries and execute them one at a time:
dbRemoveTable(con, "prout")
<- function(script){
split_script unlist(strsplit(script, "\\s*;\\s*"))
}
<-split_script(cleaned_script)
queries
<- function(con, queries){
execute_queries for (query in queries){
message("executing query- ", query)
dbExecute(con, query)
}
}
execute_queries(con, queries)
Awesome!
Step 3 - edit the SQL script from R to inject some values
Now let’s say that I want the output of the final output table name to be editable by R. This is where glue_sql() comes in and it is pretty neat.
First, we modify the original script to replace ‘prout2’ with {output_table_name}.
<- "
script_with_editable_table_name
/* this query has
intentionally messy
comments
*/
create table prout as
Select
2 as COL_A -- with a comment here
;
-- or wherever
/* and some more commen ts here */
create table {output_table_name} as
Select
* -- with a comment here
FROM
-- and some helpful comment here
prout
;
-- or wherever
"
Then we clean the query, and we apply the glue_sql function to all queries:
dbRemoveTable(con, "prout")
dbRemoveTable(con, "prout2")
<- "new_table_name"
output_table_name <- clean_script(script_with_editable_table_name)
cleaned_script <- split_script(cleaned_script)
queries <- purrr::map(queries, ~ .x %>% glue::glue_sql(.con = con))
updated_queries execute_queries(con, updated_queries)
dbGetQuery(con, "select * from new_table_name")
COL_A
1 2
Woohoo!
Réutilisation
Citation
@online{coulombe2024,
author = {Coulombe, Simon},
title = {Executing an SQL script from R after injecting some values
from R in that script},
date = {2024-09-12},
url = {https://aidememoire.netlify.app/rstats/clean_and_execute_sql_script.html},
langid = {fr}
}