Posts Tagged ‘SQL Server’

Thought this would be useful to some and for me to reference on future projects.


DECLARE @SEARCHSTRING VARCHAR(255)
SELECT @SEARCHSTRING = 'some string'
SELECT DISTINCT sysobjects.name AS [Object Name] ,
case when sysobjects.xtype = 'P' then 'Stored Proc'
when sysobjects.xtype = 'TF' then 'Function'
when sysobjects.xtype = 'TR' then 'Trigger'
end as [Object Type]
FROM sysobjects,syscomments
WHERE sysobjects.id = syscomments.id
AND sysobjects.type in ('P','TF','TR')
AND sysobjects.category = 0
AND CHARINDEX(@SEARCHSTRING,syscomments.text)>0

In a previous post, we saw how easy it was to start consuming the Twitter Streaming API and display the messages on a console window. In this post, we’ll take it a step further and try to derive some useful information about the the activity stream on Twitter in real-time. I’ve given myself a hypothetical goal of deriving an answer to the following question:

Which Twitter users are mentioned most often when a given keyword is included in the tweet?

How useful is this information?  One idea could be to identify influencers in real time when you’re expecting a sudden increase in a keyword or hashtag for a sponsored event or TV spot.  If people are mentioning a particular user over and over again with your brand in it, you could connect with that user and help spread your message.

Let’s Get Started

For demonstration purposes, the components I’ll be building include:

  1. twitter_stream_db (SQL Server Database) – This will store the mention count for individual users
  2. MSMQ – I’ll be creating a queue that will be sent messages from the Twitter Streaming API. I’m going to leave this for a future post since I’m just doing a POC at this point. If I were to even consider using this in production I would definitely develop a queuing system but at this time it’s a bit of overkill.
  3. TwitterReader (console) – A small application that will read the Twitter API and drop messages into an MSMQ channel.
  4. TwitterWriter (console) – A small application that will read messages from MSMQ and update the database No MSMQ as explained above so no need to read off the queue. Again, I’ll write these components so we can scale in a later post. For now, consider this just a POC

TwitterReader

This will be a slightly modified version of the console application we created in the previous post. Instead of writing to a console window, we’re going to parse the JSON objects using Json.net and then insert users mentioned in the tweet into a table using the SQL MERGE command.

Looking at the code, you’ll see we modified the stream URL slightly to include the keyword we want to track and sending that JSON result to a new method ParseJson.   In this case, let’s see who are the most popular users mentioned every time someone tweets something with the keyword “love” in it.   (I know it sounds corny but I needed something popular so that I can show off the results.)

We’re using Json.net’s Linq to Json feature to navigate to the user_mentions array. Once we have it, we just loop through all the users in the array and MERGE them into the database table through the stored procedure (see below).

twitter_stream_db

For simplicity, I’m going to create a single table to store the data as it comes in.  The primary key is the Twitter user’s id since there should be only one record per user at any given time.

I used the new MERGE command to perform an “upsert” of the data.  If its the first time the user has been mentioned, it will perform an insert and set the mention_count to 1.  Otherwise, we’ll update the record by setting mention_count to mention_count + 1.

Since I’m using the MERGE command, I encapsulated it into a stored procedure as oppose to writing LINQ queries. The stored procedure receives a user id and twitter name and performs the insert/update logic.

The MERGE command is an incredibly useful feature introduced in SQL Server 2008.

Parsing the JSON Result

As mentioned above,  we’re just going to parse the JSON object and iterate through the users mentioned in the tweet.  As we do that, we’ll pass the users into the stored procedure above and MERGE the data into the SQL table.


Important: Since we’re not using a queuing system, the rate at which we can process tweets will depend largely on the speed of our SQL stored procedure. If you’re considering something similar in a production environment, please implement a queuing system to handle the load

Results

Once you start running the application, you’ll start to see Twitter ids and screen names appearing on the console window.  Let it run for a few minutes and, depending on the popularity of your search term, you should start to see some results.  You can then go to SQL Server Management Studio and run a simple query to get a view of the activity on Twitter for that keyword / user mention combination.

Hope you enjoyed this post and have some ideas for implementing something similar with your next social media and Twitter campaigns!

I found this neat (and free) trash destination for SSIS.  It’s really great as a development aid.  It allows you to quickly terminate a data flow path, and does not require any configuration. It will consume the rows without any side effects, and prevents warnings or errors you may otherwise receive when executing the data flow. 


You can read more and download it from here 

I was working on an SSIS project to dump data from an Excel spreadsheet into a database table.  However, the Excel file happened to be a beautifully formatted, colorful, presentation of charts and graphs, subtotal rows and all sorts of interesting images.  The actual data I needed to dump didn’t even start until about 15 rows down and 6 rows across.  My mind started wandering on some custom component that would skip rows, strip subtotal columns and basically rewrite the entire spreadsheet.  But SSIS had better plans.


Thank goodness for the OpenRowset() property on the Excel Datasource component. This property allows you to specify the range to be considered by the datasource in the format Sheet1$B15:Z2000.  By specifying a range, I was able to ignore all the titles and data spread all over the sheet and concentrate just on my data range.  I thought skipping rows and ignoring headers / titles was going to add considerable hours to this project. I was wrong.

The next challenge was getting rid of the subtotal rows.  Subtotal rows were distinguished by a cell with the word “RESULT” and an empty cell next to it.  I added a Conditional Split using the ISNULL([expression]) to separate “real data rows” from “subtotal rows”.  After this, the import was straightforward into a OLEDB Destination.  

I had been working on a simple ETL SSIS package to simply pull data from a file, parse it, and insert it into some tables on a SQL 2005 box.  In my dev and testing environment the package ran extremely fast… under a couple of seconds.  But as soon as I added the package to our QA environment and schedule the package to be executed as a step on a SQL Agent job, the package took up 4-5 minutes to execute.


I knew the data loading and extracting could not be causing the problem.  When I looked at what services were running on the box, I noticed SSIS was not running.  I tried to start it and it errored out with “Service could not be started in timely fashion”.  However, the package still ran (slow, granted, but it still executed).  I finally found some insite into the problem from this KB article on Microsoft: http://support.microsoft.com/?kbid=918644 .


Apparently, SSIS needs to ping this host http://crl.microsoft.com/ in order to start correctly.  Instead of applying the hotfix, I just added the line 127.0.0.1     crl.microsoft.com to the host file on the box.  I attempted to start the SSIS service again and “voila”, it was running.  The package is executing within a respectable time range once again.