Reinstalling Vista

January 28, 2008

VISTA INSTALLATION

A few days ago my operating system destroyed my naturally sunny disposition by committing suicide.  Since this is the fourth time I’ve had to install Vista and each time I forget steps or do things in the wrong order so that it takes longer than it should, I decided to record the sequence of events for myself and my posterity.

Vista is currently (early 2008) an unstable operating system. This means that the odds are pretty good that you will need to (re)install Vista a few times. Here’s how I do it. [With thanks to http://windowssecrets.com/comp/070201#story1. It is a sign of the immaturity of Microsoft’s operating system design that most of the steps that follow are controversial in one way or another. Alas.]

  1. Buy a Vista Upgrade: As of this writing (early 2008), there is no functional difference that I am aware of between a Vista Upgrade and a Vista Full Version. You can use either version to upgrade your computer from a prior Windows operating system; or you can use either version to do a de novo installation onto a freshly-formatted hard disk in case you need to totally wipe out an earlier version of Windows before installing. Since the Vista Upgrades are cheaper than the Full Versions, I recommend that you buy a Vista Upgrade to Vista Home Premium.
  2. Back up everything you want to save: If you need to repartition drives, etc. the installation process will destroy everything in the partition you install Vista on. If you do not need to (re)partition or (re)format drives, the installation process should not destroy the data on your drives; however, it is a good idea to back things up anyway.
  3. Boot from the installation DVD: Put the install DVD in your CD/DVD drive and boot the PC from the Vista installation DVD.
  4. Select “Install Now”, but don’t enter the product key! Also, do not select to automatically activate Vista. Continue and confirm that you don’t want to enter the product key.
  5. Don’t lie about the version of Vista that you are installing: You would be sorry later.
  6. Select the “Custom(Advanced)” install, not the “Upgrade”: At this point you will be given the option to repartition or format your drives. If you don’t know what this means, don’t worry about it. If you do know what this means and need to do it, go ahead, but be aware that this is the point of no return!
  7. Install Vista, but do not activate Vista: Vista copies its files onto your drive. Create username and password as necessary. (I find typing a password each time I want to use my own computer too annoying for words, so I leave the password blank.) After the installation is complete do not activate Vista. First, you need to install the Vista upgrade as follows…
  8. Run the DVD’s setup.exe program from the Vista desktop: Normally, if you eject and then re-insert the DVD the setup.exe program will run automatically.
  9. Select “Install Now”: Do not get the latest upgrades for installation.
  10. This time around, enter the Product key from your Vista packaging: Enter the product key, but turn off the option to “Automatically activate Windows when I’m online”.
  11. Now select “Upgrade”: It seems weird, but it works (as of January 2008).
  12. Let Vista install itself again: Enter username and passwords as necessary.
  13. Now activate Vista by phone: Go to Start/Control Panel/System and Maintenance/System. Click to activate your installation:
    1. Save the Activation Code: If this is the first time you’ve installed this copy of Vista, follow the menus to the point where you are given the option to choose telephone activation. Activate by telephone! You will be given a phone number to dial and you will need your Product Key. At the end of the phone activation you will be given a 48 digit activation code. Write it down and store it with your installation DVD. Use the activation code to activate Vista. [Note: if you use the easier internet-based methods to activate your copy (which admittedly don’t involve typing 48 character activation codes) you are exchanging information with Microsoft’s servers. They might not use the information they gather against you, but then again they might. Consider that if your operating system crashes you will need to reinstall your copy of Vista. Microsoft could conceivably check its database at that point and discover that your copy of Vista has already been activated. To Microsoft, that might look suspicious. It is a recipe for disaster for you. Why take the chance? Microsoft is not known for its customer-friendly policies.]
    2. Use the Activation Code you saved from the original Vista installation: If this is a reinstallation of Vista, follow the menus to the point where you are given the option to choose telephone activation. Manually type in the 48 digit activation code that you saved from the last installation. [For instance, my code is 354474-581606-570742-116991-161172-697084-126715-805481.]

You can install Vista from scratch in a little less than 2 hours. Software backup/restore will take more time. Customizing Vista will take more time. Loading your application software and configuring it will take more time.

For an example, my Vista installation became unstable on 25 January 2008 about noon when it stopped talking to my Bluetooth keyboard. I tried to repair the situation but within 8 hours of increasingly desperate driver reloads, etc. I had completely lost the user interface – no Bluetooth keyboard or mouse, no PS/2 keyboard or mouse. Luckily I found an old USB mouse in a dusty box in the garage. Although Vista complained mightily, I was able to use the mouse alone to backup everything on the system before it completely crashed. I spent all day Saturday and all day Sunday working to get Vista reloaded and all my software reloaded. It took me 31 hours total.

RELOADING SOFTWARE

After installing Vista, all system drivers and application software must be reloaded.

1. Video Display: Vista doesn’t recognize… Load Video Card Drivers from CD

2. Change Vista display resolution to 1920 x 1200 @ 60 Hz. [Display is Dell 2407 WFP-HC] Reboot.

3. Internet Connection: Load Netgear WG111 v 3 Drivers for USB 2.0 adapter. Allow the installation disk to run the wizard. When prompted, enter the WEP Hex key manually. The key is [XXXXXXXXXX]. The computer should find the local 802.11(g) network.

4. Install Vista Updates: Start/Control Panel/Windows Update, etc.

5. Install Remote Printer Driver: Pixma MP 530 from disk.

6. Install Wireless Print Server: WPSM54G from disk

7. Attach Local Printer: HP PSC 1410. Drivers load automatically.

8. Configure Speaker/Sound. Download sound device driver from Microsoft.

9. Attach Bluetooth Adapter: Do not use the Bluetooth installation software on disk! Drivers load automatically. Connect to keyboard. Load SetPoint 3.3 from disk.

10. Load Microsoft Office and FrontPage: Activation code for Office is on the installation sleeve

11. Configure Outlook Email Accounts: Also import any saved .pst files.

12. Load Visual Studio 2005: Requires download of extensive update

13. Reload Backup Files.

a. Software

b. Project Documentation

c. Project Data

d. Other Stuff

14. Reload Application Software:

a. Lightning Download

b. 7-Zip

c. Anonymizer

d. Corman Lisp

e. Retrospect – Western Digital Software

f. Qt & Visual Studio Integration: compile static version

g. IcoFx

h. XEmacs & SLIME

i. CLisp

j. Ghostscript

k. GSView

l. Tex/LaTex

VISTA CUSTOMIZATION

Out of the box, Vista is an unbelievably annoying operating system. You will want to make several changes in order to live sanely with Vista.

User Account Control: Stop the nag! Start/Control Panel/User Accounts and Family Safety/User Accounts/Turn User Account control on or off. Uncheck the feature and restart your computer. Then Start/Control Panel/Security/Security Center/Change the way Security Center alerts me/Don’t notify me and don’t display the icon.

Automatic Updating: Turn it off! Start/Control Panel/Security/Windows Update/Change Settings. Set the button “Never check for updates (not recommended)”. If you do not do this Vista will periodically check for software updates. When it finds one it will download it and install it. Most updates require a reboot, so your system will reboot. If you are doing something like a long calculation or editing a file or downloading a large file when the system reboots, all your work will be trashed.

Drag something useful to your desktop: Go to Start/All Programs/Accessories and drag a Command Prompt to the Quick Launch portion of your task bar. Right click the icon, select Properties/Shortcut/Advanced/Run as Administrator. (The task bar is along the bottom of the screen; the Quick Launch portion is to the immediate right of the Start button on the far left of the task bar.) Go to Start and drag Computer to your Quick Launch. Go to Start and drag Internet Explorer to your Quick Launch.

Expand the size of your Quick Launch area: Right click on the task bar. Uncheck the “Lock the Taskbar” menu item. Grab the Quick Launch area handle and drag it right as much as you need. Right click on the task bar. Recheck the “Lock the Taskbar” menu item.

Turn off autoplay: Start/Control Panel/Hardware and Sound/AutoPlay. Deselect “User AutoPlay for all media and devices”. Go to “Software and Games” and change default to “Open folder to view files…”

Adjust blinking cursor: Start/Control Panel/Ease of Access/Ease of Access Center/Make the Computer Easier to See/Set the Thickness of the Blinking Cursor: 2

Set up Internet Explorer: Start IE7. Tools/Internet Options/General/ set home page to www.google.com. Tools/Internet Options/General/Tabs:Settings. Set “Open home page for new tabs instead of a blank page”.

View hidden files: Start/Control Panel/Appearance and Personalization/Folder Options/View/. Select “Show hidden files and folders”. Deselect “Hide extensions for known file types”. Deselect “Hide protected operating system files”. Start/Control Panel/Appearance and Personalization/Folder Options/View/Search/. “Always search file names only”.

Update path: Start/Control Panel/System and Maintenance/System/Change Settings/Advanced/Environment Variables/Path += “.;c:\bin;”

The Linguistic Data Consortium makes a number of linguistic corpora available for research and development purposes.  They are generally tagged by part-of-speech, which is nice; but for my purposes the corpora are generally overpriced and undersized. 

The largest corpus I can reasonably handle is about 1 terabyte — representing about 100 billion words in 1 billion sentences.  I am willing to allocate 100 days to acquiring the corpus.  How can I get a terabyte of text for cheap in 100 days?

Data Sources

  • Download the Gutenberg corpus.
  • Download WordNet.  WordNet is a dictionary database that will be fantastically useful for boot-strapping semantic analysis.
  • Use rss feeds from several of the news organizations.
  • Scrape text from the web.  My impression is that pdf files will offer the highest  quality source for text for the following reasons:
    1. Pdf is the format of choice for longer and more formal documents;
    2. Short pdf’s tend to be advertising brochures, but longer pdf’s tend to be advertising-free;
    3. A number of programs already exist for extracting text from pdf files.  Thus, if we focus on pdf files alone the parsing job is simplified.

Data Volumes

The Gutenberg corpus contains roughly 16 gigabytes of text.  Probably 10 gigabytes of that is usable, non-duplicated English. 

WordNet is relatively small, but powerful.

It is difficult (a priori) to estimate the amount of text I can get from news sources, but it might be possible to get ~10 MB per day of text spanning a large number of different topic areas.  In 100 days, this could accumulate to 1 gigabyte.  If I added blog sources I could get more text, but the advantage in using news sources is that the text tends to follow a predictable format.  The parsing problem for blogs would be overwhelming.

Unique amongst all the search engines, Alexa Web Search is willing to sell you the complete results of a web search query.  With Alexa, I can request, and they will deliver, a list of all the pdf files on the web.  I registered as a user and requested a list of all URLs of all PDFs larger than 128 kBytes.  I got a list of about 1/2 million URLs.  This represents a text corpus of about 0.6 terabyte.  It cost me a few dollars.

Method of Acquisition

My internet connection averages about 100 megabytes per hour downloading.  Thus, I can download the Gutenberg corpus in a couple of days.  WordNet downloads in less than an hour.

100 megabytes per hour is also plenty sufficient to keep up with the rate of news production.

However, at 100 megabytes per hour it would take about 9 months to download the 0.6 terabytes in my pdf database.  For this, I will need a bigger pipe.  I will need to rent space in a server farm.  At 2 or 3 megabit per second (a typical server farm rate), you can get 600 gigabytes in about a month.

How Big is the Internet, Really?

Any discussion of internet data volumes is bound to cross into a starry-eyed, self-congratulatory contemplation of mind-numbing and exotic prefixes like peta-, exa-, and zetta- so let me perform the obligatory obeisances now:  The Internet is big.  It is really, really big.  So big your mind cannot hold it; the stars cannot enfold it.  If your measly brain were a betel nut, the internet would be Betelgeuse.

That said, let’s restrict our attention to text on the web and proceed to consider some numbers.

[In what follows I am only considering the amount of text on the web!  I am not considering images, video, audio, data, or anything else -- just text.]

Absolute Upper Bound on Rate of Increase

How fast could the textual internet possibly grow?

There are currently about 6*10^9 people on the earth.  Figure earth’s population might stabilize around 10^10 people.

In the future all of our words may be automatically transcribed and recorded to the internet by speech recognition technology.  Since we can speak faster than we can write, this gives us an upper bound on the amount of text any person might be able to generate.

If we figure an average of 10,000 spoken words per day per person and multiply by 10^10 people, we get about 10^14 words per day.  At 10 bytes per word, this is 10^15 bytes per day. (Pennebaker et al. measured approximately 15,000 words per day per college student.  Multiply this number by 2/3 to account for the more muted very young and very old.)

This works out to about 4*10^17 bytes of text per year =  400 petabytes per year added to the internet.

Reasonable Upper Bound on Rate of Increase

How fast could the textual internet reasonably grow?

If every human had a blog (somebody’s Satanic Vision, I’m sure) and every human were assiduous about keeping it, we can figure 10 kB per day per person as an absolute maximum.  Multiply it out and we get a maximum increase of textual data per day on the web of 10^14 Bytes.  That is 100 terabytes added to the internet per day.

More realistically, we should imagine an output of one thousandth of that.  [Currently, there are 10^9 people with net access, about 1/100 as many blogs, and about 1/1000 as many have active blogs.] 100 terabytes / 1000 =  0.1 terabyte per day increase.  This corresponds to about 40 terabytes per year of text added to the internet.

Reasonable Estimate of Current Rate of Increase

How fast is the textual internet currently growing?

Spinn3r.com says there are currently approximately 12 M blogs.  If everybody wrote 10 kB per day (highly unrealistic!), this would work out to 12 GB per day of text.  More realistically, we should figure about a tenth of that for blogs.  Add another tenth for web pages.  A realistic estimate of daily growth in web text volume is more like 2.5 GB.

In a year, this works out to about 10^12 bytes of text per year.

Reasonable Estimate of Accumulated Human Text

How must text is there in the world now?

Pandia.com estimates there are about 20 * 10^9 documents on the web as of early 2007.  If we figure an average of 1500 bytes per document we get 3*10^13 bytes of text already on the web.

There are about 10^6 distinct books published per year world wide.  At 10^5 bytes per book, this is 10^11 bytes per year.  Figure we’ve been publishing at near this rate for 100 years.  This means an accumulated store of 10^13 bytes.

Adding the internet and printed sources together gives about 4*10^13 bytes of text in the total human corpus.

Summary

The accumulated human output of text is somewhere around 40 terabytes.  The total is growing at the rate of about 1 terabyte per year.  The rate of growth might realistically increase to as much as 40 terabytes per year as more people get web access and the technology becomes more familiar to people.

This is a lot of text, but figure this: you could go down to your local computer store and buy a 1 TB hard drive for about $100.  It is well within the budget of most of us to afford to store everything ever published by a human being anywhere.  In addition, we could store  humanity’s yearly output of text for about $100 per year.

Furthermore, there is a limit on the amount of original textual information humanity might produce in a given year.  That limit is approximately 400 petabytes per year. 

Post Script: Real-Time Stream of All Human Text

1 terabyte per year is about 2.5 gigabytes per day.  This works out to an uncompressed data rate of less than 250 kbit per second.  The compressed data rate would be less than 100 kbit per second.  By way of comparison, the OECD defines a broadband connection as 256 kbits per second.  So any person with a broadband connection could stream all human electronically published information from the entire globe in real time.

So here’s an idea for a web service.  Register the domain www.humancorpus.com. (It’s free; I checked.)  Spider the web and extract all text.  Throw away pictures, embedded audio, everything but text. Compress it and stream it: real time.  Google could do it as a public service.  Cool, huh? 

Here’s the succinct table of programming languages I
would consider for this project:

Language Implementation Cost Comment
C++ Microsoft Visual Studio 2005 $750 The current PC standard. Microsoft changes its C++ spec with every new release of Visual Studio. MSVC 2008 is already out.
C++ Libraries Qt by Trolltech $1000 Microsoft’s C++ libraries are appallingly bad. I cannot condemn them strongly enough. Although
Trolltech does not do enough testing of their products, they are still far superior to Microsoft’s implementations.
Java Sun
Microsystems
$0 Superior to C++ in many ways, but a nightmare to support. Even *less* stable than C++, if you can believe it.
LISP Corman Common LISP $250 Stable language. Easier to use for development than C++.
LISP CLISP $0 Free implementation, but is it ready for prime-time?
LISP Allegro Common LISP $5,000 A standard, but the price is obscene.

Java has a lot to recommend it for this project.  It is cross-platform and free.  It has a large base of open source software (especially Lucene).  And it isn’t published by Microsoft.  Unfortunately, it is neither as fast as C++ nor as clean as LISP.  If I chose Java, I would still need to use another language for the user interface.

I am disgusted with C++.  The core specifications for the language (as implemented by Microsoft) are in constant flux.  Due to the frequent changes in the language and in the compiler I am always afraid to revisit code more than a year or so old.  Visual Studio 6 projects will not compile with Visual Studio 2003 will not compile with Visual Studio 2005 will not compile with Visual Studio 2008 et seq. ad infinitum. 

See if you can follow me:  C++ was created out of C by grafting objects onto it. The reason for doing this was, ostensibly, so that code would be reusable. The C++ compiler changes significantly every couple of years.  Therefore no C++ code is reusable. Does this make sense?

I loathe const declarations: who gave the Nanny State the keys to my compiler?  I loathe template declarations: by comparison C’s syntax was nice and clean!  Here’s one:

template<int attributeCount>
typename Mesh<attributeCount>::MeshIterator
 &Mesh<attributeCount>::MeshIterator::advanceToNextColumn(void)
{
 if(iter == mesh.end()) return(*this);
  do {
    ++iter;
  } while(!atColumnBegin() && iter != mesh.end());
  return(*this);
}

Can anyone make sense out of all this?  Does it strike nobody’s notice that C++ is simply not designed for language extensibility? 

Moreover, I cannot stand Microsoft’s penchant for Hungarian variable notation.  m_cstrtVars make programs *more* readable!?  The mind boggles. 

Here’s a fun exercise: Take Microsoft’s documentation for a function, it doesn’t matter which one, and show it to a LISP programmer.  Ask the programmer to figure out what the function is trying to do.  Grab a beer and stand back! 

So I will use LISP for most tasks. Unlike C++, LISP is designed for language extension.  Unlike C++, LISP is designed with garbage collection in mind.  Unlike C++, I can count on a program I wrote last decade running when I need it.  I will use C++ for time-critical things and a graphical user interface.  But I won’t use Microsoft’s libraries for GUI extensions, etc.  I will use Trolltech’s QT.  Trolltech has proven to have a dismayingly cavalier attitude toward pre-release testing, but at least they try to write code like grown-ups instead of high-school students or Stanford drop-outs, and they don’t use Hungarian variable names.

Which LISP will I use?  I’d prefer Allegro CL, but I don’t want to pay the extortionate price.  Corman Common Lisp looks pretty good – except for no SLIME support.  For the time being I will use Corman CL and CLISP in tandem.  Later, if it turns out to matter I will discard one or the other language.