Rotcanti.com

Software full of Performance

My current employer is an executive search firm that recruits candidates for the construction and real estate development industries. I direct your internet marketing efforts; I am also in charge of market research. If I can provide competitive industry intelligence to recruiters in our office, they can use it to match executive candidates with employers who need them but can’t find them on their own.

The challenge

There are a large number of companies worthy of being tracked by my company. ENR (Engineering News & Record) creates different topic-specific authoritative lists containing the names of the largest commercial construction companies. Builder Online lists the largest residential home building companies. Between these two sources, I have identified around 1,700 companies where it would be valuable to track any and all news about them. Fortunately, on any given day, only a small percentage of these companies will make any big news… however, to make sure I capture all the valuable information, I need to monitor all of them.

The most logical thing is that you would have to implement some kind of RSS solution to keep track of 1,700 companies. Manually entering 1,700 feeds into a feed reader is not an option; I have no programming knowledge; and I don’t have the budget to find someone to program a solution.

So what did I do?

The Solution: Part 1

When I examined a typical Google News RSS URL, I was able to determine that the only variable information within each length URL is the search term:

http://news.google.com/news?sourceid=navclient&ie=UTF-8&rls=GGLJ,GGLJ:2006-37,GGLJ:en&oe=UTF-8&tab=wn&q=builder+magazine&output=rss

From this information, I can quickly and easily render a Google News RSS URL for each item on my list quickly and easily in Excel.

I then do the following (illustrated more clearly in the attached spreadsheet that I recommend you download by clicking this link):

1) Once I open Excel, I put my entire list of relevant companies in Column B.

2) In Column A, I put the first part of the Google RSS URL (from http:// to the variable part).

3) In Column C, I put the final part of the URL (&output=rss).

4) I also know from experience that to make my feed as relevant as possible, I want to use exact match whenever I can. “%22” is the token used for “quote”. I add “%22” as the last characters in Column A and the first characters in Column C.

5) I make sure that the code from Columns A and C is copied to each row that contains a relevant company in Column B.

I also need to perform the following global steps on Column B:

1) Replace “space” and “&” with “+”

2) Replace “++” with “+”

3) Remove all instances of “apostrophe”.

Here comes the fun part:

1) I widen the columns so that there is enough white space to the right of the text in each column.

2) I save the document as a formatted text space-delimited document (.prn).

3) I reopen the document, choose “Delimited”, click “Next” and then “Finish”.

4) I then globally replace “space” with nothing, generating the list of long RSS URLs.

The Solution: Part 2

(…catch your breath…)

In every feed reader I’ve used, I’ve noticed that one can bulk import feeds if they’re in OPML, something I admit I don’t quite understand. I found that if I could convert my URLs to OPML, I could get my entire list of company news RSS feeds into a feed reader. I found Feedshow Goodies (http://www.feedshow.com/goodies/opml/OPMLBuilder-create-opml-from-rss-list.php) via Google, put my URLs in the form located on this page and clicked Click “Create OPML” (note that it will process up to 200 URLs at a time). I then saved each OPML file to my hard drive.

Then I made the mistake of thinking that my trusty Feedreader 3.07 would cleanly accept 1,700 new fonts. The first OPML import of 200 feeds consumed all my internet bandwidth and slowed down my computer. Google Reader was the only other feed reader I’ve ever used, so I logged in and uploaded 9 large OPML files.

It worked. Google Reader allowed me to seamlessly (albeit somewhat slowly) import 1,700 feeds and read them all together, giving me a real-time news update from all the residential and commercial construction companies I wanted to track.

My public source (http://www.google.com/reader/shared/user/14537180468496839026/label/main-folder) (Note that I configured it somewhat differently than the example above).

Suggestions

1) Even though Google Reader is doing all the “heavy lifting”, having it open crawling 1,700 feeds will hog your memory, especially in IE. I found that opening Google Reader in Firefox or Opera significantly reduced the amount of memory used.

2) The only downside to this process is that the “title” of each feed is the long, ugly Google RSS URL rather than something more descriptive.

3) If you have a lot of stories in your Google Reader, go to Preferences and check “In expanded view, mark items as read as you scroll past them.” This feature is of great help in navigating through large amounts of data.

4) I found that creating Google news feeds for all news created “in the last week” gave me the best results. A Google news feed for all company news for each company was generating too much information and the news feed that covered only the previous day’s news seemed to miss news. In this example, I used news RSS URLs for all company news because they are the easiest to use. To use this combination with “time-restricted” Google News, you’ll need to click “Advanced News Search”, perform a sample search with a defined time period, generate the news RSS URL, and copy the first part into the column A. of your spreadsheet before following the prescribed process.

Now I can follow the news of the 1700 companies in my Google Reader. Kudos to those fine folks at the ‘Plex.

Leave a Reply

Your email address will not be published. Required fields are marked *