Importing Stock Quotes Into SQL Server Through SSIS

I’ve worked at a publicly traded company for the past year and a half and one thing I always thought would be fun to do was to download stock values through SSIS and use it on reports or display it in a dashboard. Sounds easy enough, but as it turns out it was a bit harder than I expected for a few reasons:

1. Finding a stock quote provider that worked well with SSIS was a challenge.
2. I wasn’t finding any other good tutorials on how to download or process stock quotes using SSIS.
3. Lack of time to research and investigate a solution.

However, all three of the above have finally been overcome and that is why I am posting this article in case others were hoping to do this as well. So let’s take a look at how this can be done. In my examples, I am using Visual Studio 2012 and SQL Server 2012. If you are going to load data for just one particular stock, there won’t be a massive amount of data, so it’s easy to just truncate the destination table and reload each day which is what I’ll show here.

First, decide what stock symbol you would like to use. In my examples we will use Microsoft (MSFT). Their stock dates back to March 13th, 1986 and provides us with some good data. To get historical stock quotes, we are going to use an API from Yahoo Finance. We will make a URL call to download a csv file which accepts a stock quote parameter.

Here is what it looks like: http://ichart.finance.yahoo.com/table.csv?s=MSFT
The csv file includes data starting the day before (yesterday) and goes back to the fist IPO.

Now that we have a way to get stock quotes, let’s create an SSIS package and work on processing the data. Retrieving data through an HTTP connection is the next challenge we need to overcome. SSIS has an option to create a HTTP connection under Connection Managers but unfortunately it will not accomplish what we need. Let’s take a different approach. Start by creating two variables:
1. LocalFilename (point to a local folder and filename)
2. RemoteFilename (http://ichart.finance.yahoo.com/table.csv?s=MSFT)

variables used

Next, add a Script Task to the Control Flow. Edit the Script Task and add the two variables we just created to the ReadOnlyVariables.

add script task

Click on ‘Edit Script’. Under the public void Main() section, add the following:

Variables varCollection = null;

Dts.VariableDispenser.LockForRead(“User::RemoteFilename”);
Dts.VariableDispenser.LockForRead(“User::LocalFilename”);
Dts.VariableDispenser.GetVariables(ref varCollection);

System.Net.WebClient myWebClient = new System.Net.WebClient();
string webResource = varCollection[“User::RemoteFilename”].Value.ToString();
string fileName = varCollection[“User::LocalFilename”].Value.ToString();
myWebClient.DownloadFile(webResource, fileName);

Dts.TaskResult = (int)ScriptResults.Success;

script task code

At this point you can run the package and it should download the csv file to your local folder.

You can open the csv file and see that there are 7 columns (Date, Open, High, Low, Close, Volume, Adj Close).
(CAUTION: If you leave the csv file open and try to run the package again, you will get an obscure error because the script task cannot get a lock on the file)

error message

Now is also a good time to create a table where we can store the incoming data. Here is an example of the table I created. Feel free to modify it to fit your needs.

CREATE TABLE [dbo].[FactStockValue](
[StockValueKey] [int] IDENTITY(1,1) NOT NULL,
[DateKey] [int] NULL,
[StockSymbol] [varchar](5) NULL,
[OpenAmount] [money] NULL,
[HighAmount] [money] NULL,
[LowAmount] [money] NULL,
[CloseAmount] [money] NULL,
[Volume] [int] NULL,
[AdjustedCloseAmount] [money] NULL,
CONSTRAINT [PK_FactStockValue] PRIMARY KEY CLUSTERED
([StockValueKey] ASC)
)

Instead of storing just the date value, I convert the date to a smart key value which can then join to a date dimension table. Feel free to skip this step and store just the date if you like.

Now let’s hop back into the SSIS package and add a Data Flow Task. Within the Data Flow Task, add a Flat File Source and create a new connection that points to the local csv file. Since the csv file doesn’t include the stock symbol, I added a Derived Column transform to specify the symbol we are processing.

derived column

Next, I added a Data Conversion transform to convert Date to a database date [DT_DBDATE] which was needed to do a lookup on my date dimension table, and to convert the stock symbol value from unicode to string [DT_STR].

data conversion

In my example, I am doing a date lookup on a date dimension table to get a DateKey value. Finally, end with an OLE DB Destination which points to the destination table. Run the package and verify that it works.

data flow success

Check the destination table and verify that everything looks correct.

table results

Now that we have a way to retrieve and store stock data values, we can use it on dashboards and reports!

stock value chart

Advertisement

Cleansing Data Through SSIS with DQS

While Data Quality Services (DQS) takes some time to build up an effective knowledge base through manual data review, there are times when it sure would be handy to use SSIS. This is only recommended when you need quick results and are feeling pretty good about your knowledge base or are perhaps using an external knowledge base through reference data. Should you decide to cleanse data through SSIS using DQS, it’s important to note that it can export some quality control metrics such as Status and Confidence score with the final data which can later be used to review the results and make final decisions. So let’s take a look at using the DQS Cleansing transform in SSIS.

In this example, I created a basic SSIS package with one Data Flow. I selected an Excel Source and pointed it to a file that has some data to be cleansed. In my example I am going to send in some bad addresses and attempt to cleanse them using Melissa Data. To learn more about the Melissa Data Address Check add-in and reference data in general, view my other blog post here.

Create a new package, add a Data Flow Task, and then add a DQS Cleansing transform object.

dqs cleansing transform setup

Click on the New button next to the Data quality connection manager to setup a new connection to an instance of DQS and then select a knowledge base.

dqs cleansing transform setup with connection

Click on the mappings tab. Here you will map input columns from your data source to the domains on the knowledge base.

dqs cleansing transform with mappings

If you would like to see additional columns on the final results such as the Confidence score and the Reason then click on the Advanced tab.

dql cleansing transform advanced tab

When finished, click OK and test it out. In my example I am just going to a Union All object as my destination for demo purposes but I can still see the resulting cleansed data through the Data Viewer.

dqs cleansing results from transform

As you can see from the results, two addresses were corrected, two had suggestions, and one failed to be cleansed. We could get fancy and split up the results based on the Status values. Corrected ones may be considered ready to ship. Auto Suggest records may want to be reviewed, and the New ones would need to be reviewed since it was not found in the knowledge base.

Using the DQS Cleansing transform in SSIS is useful but doesn’t eliminate the need to review the final cleansed results. Cleansing data through SSIS is dependent on the quality of the underlying knowledge base but does provide a quick and efficient way to do an initial data cleanse. The resulting cleansed data can then be evaluated using the Status, Reason and Confidence score values. Once the data has been processed and reviewed, only then should it be considered cleansed and ‘ready for use’.