Rule
tipped
tip_class
Class 0: tip_amount = $0
Class 1: tip_amount > $0 and tip_amount <= $5
Class 2: tip_amount > $5 and tip_amount <= $10
Class 3: tip_amount > $10 and tip_amount <= $20
Class 4: tip_amount > $20
CREATEPROCEDURE[dbo].[PlotHistogram]
AS
BEGIN
SETNOCOUNTON;
DECLARE@querynvarchar(max)=
N'SELECTtippedFROMnyctaxi_sample'
EXECUTEsp_execute_external_script@language=N'R',
@script=N'
image_file=tempfile();
jpeg(filename=image_file);
#Plothistogram
rxHistogram(~tipped,data=InputDataSet,col=''lightgreen'',
title=''TipHistogram'',xlab=''Tippedornot'',ylab=''Counts'');
dev.off();
OutputDataSet<data.frame(data=readBin(file(image_file,"rb"),what=raw(),n=1e6));
',
@input_data_1=@query
WITHRESULTSETS((plotvarbinary(max)));
END
GO
Be sure to modify the code to use the correct table name, if needed.
The variable @query defines the query text 'SELECTtippedFROMnyctaxi_sample', which is passed to the R
script as the argument to the script input variable, @input_data_1.
The R script is fairly simple: an R variable image_file is defined to store the image, and then the rxHistogram
function is called to generate the plot.
The R device is set to off.
In R, when you issue a highlevel plotting command, R will open a graphics window, called a device. You can
change the size and colors and other aspects of the window, or you can turn the device off if you are writing to a
file or handling the output some other way.
The R graphics object is serialized to an R data.frame for output. This is a temporary workaround for CTP3.
To output varbinary data to viewable graphics file
1. In Management Studio, run the following statement:
EXEC[dbo].[PlotHistogram]
Results
plot
0xFFD8FFE000104A4649...
2. Open a PowerShell command prompt and run the following command, providing the appropriate instance name,
database name, username, and credentials as arguments:
bcp"execPlotHistogram"queryout"plot.jpg"S<SQLServerinstancename>d<database
name>U<username>P<password>
Note
Enterthefilestoragetypeoffieldplot[varbinary(max)]:
Enterprefixlengthoffieldplot[8]:0
Enterlengthoffieldplot[0]:
Enterfieldterminator[none]:
Doyouwanttosavethisformatinformationinafile?[Y/n]
Hostfilename[bcp.fmt]:
Results
Starting copy...
1 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 3922 Average : (0.25 rows per sec.)
Tip
If you save the format information to file bcp.fmt, the bcp utility generates a format definition that you can apply to similar
commands in future without being prompted for graphic file format options. To use the format file, add fbcp.fmt to the
end of any command line, after the password argument.
4. The output file will be created in the same directory where you ran the PowerShell command. To view the plot, just open
the file plot.jpg.
CREATEPROCEDURE[dbo].[PlotInOutputFiles]
AS
BEGIN
SETNOCOUNTON;
DECLARE@querynvarchar(max)=
N'SELECTcast(tippedasint)astipped,tip_amount,fare_amountFROM[dbo].
[nyctaxi_sample]'
EXECUTEsp_execute_external_script@language=N'R',
@script=N'
#Setoutputdirectoryforfilesandcheckforexistingfileswithsamenames
mainDir<''C:\\temp\\plots''
dir.create(mainDir,recursive=TRUE,showWarnings=FALSE)
setwd(mainDir);
print("Creatingoutputplotfiles:",quote=FALSE)
#Openajpegfileandoutputhistogramoftippedvariableinthatfile.
dest_filename=tempfile(pattern=''rHistogram_Tipped_'',tmpdir=mainDir)
dest_filename=paste(dest_filename,''.jpg'',sep="")
print(dest_filename,quote=FALSE);
jpeg(filename=dest_filename);
hist(InputDataSet$tipped,col=''lightgreen'',xlab=''Tipped'',
ylab=''Counts'',main=''Histogram,Tipped'');
dev.off();
#Openapdffileandoutputhistogramsoftipamountandfareamount.
#Outputstwoplotsinonerow
dest_filename=tempfile(pattern=''rHistograms_Tip_and_Fare_Amount_'',tmpdir=
mainDir)
dest_filename=paste(dest_filename,''.pdf'',sep="")
print(dest_filename,quote=FALSE);
pdf(file=dest_filename,height=4,width=7);
par(mfrow=c(1,2));
hist(InputDataSet$tip_amount,col=''lightgreen'',
xlab=''Tipamount($)'',
ylab=''Counts'',
main=''Histogram,Tipamount'',xlim=c(0,40),100);
hist(InputDataSet$fare_amount,col=''lightgreen'',
xlab=''Fareamount($)'',
ylab=''Counts'',
main=''Histogram,
Fareamount'',
xlim=c(0,100),100);
dev.off();
#Openapdffileandoutputanxyplotoftipamountvs.fareamountusinglattice;
#Only10,000sampledobservationsareplottedhere,otherwisefileislarge.
dest_filename=tempfile(pattern=''rXYPlots_Tip_vs_Fare_Amount_'',tmpdir=mainDir)
dest_filename=paste(dest_filename,''.pdf'',sep="")
print(dest_filename,quote=FALSE);
pdf(file=dest_filename,height=4,width=4);
plot(tip_amount~fare_amount,
data=InputDataSet[sample(nrow(InputDataSet),10000),],
ylim=c(0,50),
xlim=c(0,150),
cex=.5,
pch=19,
col=''darkgreen'',
main=''TipamountbyFareamount'',
xlab=''FareAmount($)'',
ylab=''TipAmount($)'');
dev.off();',
@input_data_1=@query
END
The output of the SELECT query within the stored procedure is stored in the default R data frame, InputDataSet.
Various R plotting functions can then be called to generate the actual graphics files.
Most of the embedded R script represents options for these graphics functions, such as plot or hist.
EXECPlotInOutputFiles
Results
STDOUT message(s) from external script:
[1] Creating output plot files:
[1]C:\\temp\\plots\\rHistogram_Tipped_18887f6265d4.jpg[1]C:\\temp\\plots\\rHistograms_Tip_and_Fare_Amount_1888441e542c.p
df[1]C:\\temp\\plots\\rXYPlots_Tip_vs_Fare_Amount_18887c9d517b.pdf
2. Open the destination folder and review the files that were created by the R code in the stored procedure. The numbers in
the file names are randomly generated.
rHistogram_Tipped_nnnn.jpg: Shows the number of trips that got a tip 1 vs. the trips that got no tip 0. This histogram is
much like the one you generated in the previous step.
rHistograms_Tip_and_Fare_Amount_nnnn.pdf: Shows the distribution of values in the tip_amount and fare_amount
columns.
![](../Image/RSQL%20images/rsql_devtut_TipAmtFareAmt.PNG)
rXYPlots_Tip_vs_Fare_Amount_nnnn.pdf: A scatterplot with the fare amount on the xaxis and the tip amount on the y
axis.
![](../Image/RSQL%20images/rsql_devtut_TipAmtByFareAmt.PNG)
3. To output the files to a different folder, change the value of the mainDir variable in the R script embedded in the stored
procedure.
You can also modify the script to output different formats, more files, and so on.
Next Step
Step 4: Create Data Features using TSQL
Previous Step
Step 2: Import Data to SQL Server using PowerShell
See Also
InDatabase Advanced Analytics for SQL Developers Tutorial
SQL Server R Services Tutorials
2016 Microsoft