Data Quality Services is a new tool in SQL Server 2012 that helps cleanse data. At the heart of this product are knowledge bases. Knowledge bases are a repository of managed information with correct data values. Once you have a good working knowledge base in place, you can then run ‘dirty’ data against the knowledge base. Fuzzy logic is applied to try and correct, or cleanse, the unclean data and you are presented with suggested clean data which you can approve or reject.
After cleansing data against a knowledge base, the question usually arises. If there are new values in the ‘dirty’ dataset which I manually corrected/approved and are not in the knowledge base, does it automatically get added to the knowledge base? The short answer is no, it doesn’t. While it sounds like a great idea to just keep automatically adding these values to the knowledge base, in reality it’s supposed to be more of a managed process.
Even though new values aren’t automatically added to the knowledge base, DQS does provide the ability to import newly cleansed values into the knowledge base afterwards. Let’s take a look at how this is done. First, select Domain Management on the Knowledge Base.
Select the desired Domain and on the left side. Now, near the top right grab the drop down on the ‘Import Values’ button. You are presented with two choices; Import project values, and Import valid values from Excel. The first option allows you to import values from a previous cleansing project. As stated, the second option allows data to be imported from Excel.
In our example, we will import values from a previously ran cleansing project. Simply select the desired project from the list.
The next screen will present new values which may include Correct values, Errors, and Invalid values. All of which will continue to add value to the knowledge base. In this screenshot, we see that there will be two corrected values (Errors) and one new value (Correct) added.
Click finish to close the import box and let’s take a look at what happened. After looking at the Domain values, we can see that our values have indeed been added and therefore helped improve the knowledge base.
Knowledge bases are very important when cleansing data in DQS. In this example we saw how to continually improve knowledge bases by importing cleansed data. As your knowledge bases continue to evolve, you should also see the quality of cleansed data improve over time.
Jeff,
Excellent article. What I like about your example: short, simple, right to the point and very visual.
Way to go 🙂
Thank you very much for taking the time to put this together.
David Kats
You bet. Glad it helped you out David.