Tuesday, June 24, 2008

Sending Email Via Wireless

Normally, my Outlook is setup to send email via SMTP (port 25) using SSL encryption to send.columbia.edu. Oftentimes when I connect via a wireless access point, for example, at a hotel or conference, I get the error "Your server does not support the connection encryption type you specified." After troubleshooting using telnet, I found that port 25 was hijacked by the ISP's email server not to send.columbia.edu. The workaround is to use TLS and port 587, which the ISP does not hijack.

Friday, April 25, 2008

Advice for a New PhD Student

To succeed as a PhD student you must publish several papers in the best conferences and journals and become a world-renowned expert in your research area. For the first 2-3 years, classes and teaching will limit the amount of research you can do, so you should:

  • Review research literature - Before you can contribute, you must be an expert on the current state of research in your and peripheral areas.
    • Read lots of papers (IEEE, ACM, CiteSeer).
    • Take notes for each paper you read because you will forget! Write a Related Work-style paragraph. Write notes directly on the paper indicating the overall paper quality, whether it should be cited, tools mentioned that you can use, and important points.
    • Hint: References are valuable for finding relevant papers and conferences.

  • Start by publishing workshop papers - Don't make the mistake I did and submit to a top-tier conference like PLDI from the get go. You'll just waste a lot of time and have nothing to show for it but rejection letters. Make sure you vet your ideas by communicating extensively with your research peers and by publishing workshop papers. The benefits of publishing workshop papers are:
    • You can publish preliminary results and unfinished ideas to determine if your ideas are worth pursuing, find flaws in your logic/arguments, and learn what the main sticking points are and the common questions people ask.
    • Allows your paper/ideas to be peer reviewed so the workshop paper is really just a rough draft for a future conference paper
    • Meet people in your community
    • Pad your publication record
    • You gain practice giving talks. Hint: Practice your talk 2-3 times and make sure you don't exceed the alotted time.

  • Attend conferences and workshops - In the beginning of your PhD program, you should go to conferences and workshops even if you don't have a paper there. This helps to:
    • Make your name known. Remember, by the time you finish your PhD you should be "world renowned." This requires both publications and networking.
    • Find future co-authors
    • Get on future program committees

  • Learn your venues
    • Find out the best conferences and journals in your field (for Software Engineering, look here). Find out where important people in your field publish papers. Subscribe to the SEWORLD and ECOOP Info lists to find out about calls for papers.
    • Create a publication schedule that shows all the conferences and workshops in your area along with their deadlines. Eventually, your life will be structured around these deadlines. Periodically update the schedule so that you are always aware of upcoming deadlines.
    • When considering publishing at a venue, become familar with the papers published in previous years. Is it easy to obtain the papers? Do the papers show up on searches like Google, ACM, IEEE? If the answers are no, publishing at this venue may lessen your paper's impact.

  • Get a summer research internship
    • Line up post-PhD career - Summer internships are really just a recruiting vehicle. The company uses them to determine if they should hire you; for example, to determine if you can do great research. You should knock their socks off so they'll want to hire you after you graduate. Make sure you publish a paper.
    • Fun and profit - Internships are usually great fun, you meet a ton of other interns and contacts, and you make a lot more money than if you stayed at school.
    • Publish a paper - The best indicator of success for an internship is publishing a paper, either at the end of the internship or shortly after. This also communicates to the company that you can do research.
    • Make contacts
    • Hint:Apply in January
The last few years of your PhD should be devoted to research. You should be cranking out high-quality papers during this period and establishing a reputation for high-quality research.
  • Publish, publish, publish - My advisor (Al Aho) requires his PhD students to have "2 good ideas." This translates into 2 papers published in the best conferences (PLDI, POPL, ICSE, FSE, ECOOP, OOPSLA, etc.) and journals (TSE, etc.).
  • Participate - Try to serve on several program committees, serve on an organizing committee, and organize a workshop.
  • Find a job - You should be looking for a job now, not after you graduate.
    • Talk to people at conferences, etc. When these people are deciding who to hire, it will help tremendously that they know you.
    • Subscribe to job postings (CRA, Chronicle, SEWORLD)
    • Try to graduate in May since universities and research labs typically hire on a regular academic schedule. Assistant professor positions for Sept are usually posted in Oct and Sept of the previous year, application deadlines are usually in Jan and Feb, interviews are in Feb and Apr, and offers are made in Feb-May. Labs usually use the same schedule although the start date is immediate.

More advice:

  • Ruthlessly eliminate activities from your life that do not get you closer to finishing your PhD. Throughout your PhD, and especially before embarking on time-consuming activities, ask yourself, "Will this be in my thesis?" If you're spending most of your time coding, there's something wrong. No one cares how much code you write and the elaborate tools and systems you create. They only care about how many good papers you author. For example, I wasted several semesters creating an elaborate system for dynamic aspect-oriented programming and it never produced any publications and was not even mentioned in my thesis. Of course, all research requires some coding. However, rarely does the code see the light of day after you graduate. So keep it simple, quick, and dirty (e.g., PERL is fine). Also, use off-the-shelf software or get someone else to write it (see below). Research papers are a great source for finding out about tools that you can use such as metrics, profilers, compilers, static analyzers, etc.
    The only exception to this is if you are building something that many people will use, thus increasing your reputation. Your dissertation defense committee will appreciate this kind of research contribution. Companies also appreciate this kind of practical work. However, you'll need to develop, support, and market the software, which can be a huge time waster, especially if you don't have any help.

    The same caution about coding goes for teaching, reading, reviewing, meetings, demos, administrivia, etc.

  • Mentoring - Undergrads and master's students can help shoulder some of the coding burden. However, they can also be a huge waste of time and result in a negative net gain.
    • Experience - I like excellent programmers with industrial experience. Industrial experience means they are more likely to produce quality code, are good at collaborating and taking direction, and are likely to be familiar with source control. The more the student needs to learn (programming language, OS, source control, etc.), the more of your time they'll require, and the longer it will take for them to be productive (if ever). For this reason, I eschew undergrads, since they usually have very little programming or industrial experience.
    • Time constraints - I required all students to commit to at least 8 hours a week to working on my project. If the student is taking more than 2 other classes, it is likely they won't be able to meet this. This is another reason why I eschew undergrads, since they usually take a full course load and give the project low priority.
    • Management - You want self-motivated and self-managing students, otherwise you'll spend alot of time managing them. Create a project roadmap that provides enough detail for them to go complete the project. It should include an overview, requirements (checklist of things they need to do), use cases (concrete examples of scenarios they must support), and milestone dates. They should agree to the roadmap before signing up for the class. If you have the time, a good technique is to meet with the students in-person once a week for a combined status meeting and coding session. Create an agenda for student meetings to ensure all issues are addressed
  • Go deep - My mentor at Microsoft Research told me to find a concrete problem and "go deep." Don't try to solve a problem in the abstract. What do you want other people to consider you to be an expert on? Decide what that is and then know that thing better than anyone else in the world. While it may be fun to learn about tangential research areas, you want to be a world renowned expert in one thing not a jack-of-all-trades.
  • Focus on the hard problems - What's the point of getting a PhD if your work has no impact. Your success is measured by your contribution. If you're not working on the most important and hardest problems in your field, then you're contribution is muted. My advisor liked to ask, "Is this a $1000 problem or a $1B problem?"

Tuesday, February 19, 2008

Did I mention that graphics in Word 2007 suck?

A graphic in Word 2007 can be a drawing (created using Shapes), text box, drawing canvas, picture, equation, table, chart, etc. In addition to the layout issues I mentioned in the last post, it is very hard to transfer a graphic between Word 2007 and other programs, including other Microsoft Office programs. In one case I needed to transfer an equation to PowerPoint. In other case I needed to save a drawing and found out that I first had to copy-and-paste the drawing into another program to save it (i.e., there is no way to directly save a graphic in Word). Let me demonstrate how this goes horribly wrong.

I created the following graphic in Word 2007 using shapes inside a drawing canvas (the image below was created using a screen capture (ALT+Print Screen)):

The font for the code is Courier New and for the rest is Palatino Linotype. If you look closely at the screenshot you'll see that the letters are not completely black. This is because they were rendered using the Clear Type font smoothing technology.

The first problem is that there is no way to save this graphic to a file directly. I have to first paste it into another program. Okay, no problem, let's try Microsoft Paint:



WYSIWYG my ass! As you can the fonts and line spacing have changed drastically. The Java code inside the middle box and the "Program Elements" text have been cutoff. Okay, so next I try pasting into PowerPoint 2007. Here's what I got after trying all the different paste options:

As HTML

Above is the closest I get to the screenshot. Notice how the font is different and much darker.

As PNG (same As GIF and As Extended Metafile)

These (PNG, GIF, and Extended Metafile) all look the same as pasting into Microsoft Paint, that is, pretty bad.

As Microsoft Office Graphic Object




What's the most disturbing about the above is that this is Microsoft's own graphic object format for Microsoft Office, designed for compatibility with other Microsoft Office programs. As you can see, it looks horrible. The Windows Metafile format below, another one of Microsoft's compatability formats, also looks really bad:

As Windows Metafile

However, nothing prepared me for what came next:

As JPEG



I guess black is the new white.

It is sad that the best reproduction of a Word 2007 graphic is via ALT+Print Screen.

Thursday, February 14, 2008

Layout in Word 2007 sucks!

Layout of pictures, tables, drawing canvases, text boxes, spreadsheets, and equations is seriously broken in Word 2007. First, there's no universal way to layout non-text. For example, Drawing Canvases were a nice addition to Word 2007 but they are not very useful since you can't embed tables, spreadsheet, or equations. For tables, you can work around this by embedding the table in a text box, and then embedding the text box in the drawing canvas. However, for spreadsheets and equations, you're out of luck. Perhaps this is a problem when trying to embed any OLE document, although I would assume Microsoft could get this to work properly for their own office suite.

Far more annoying is the schizophrenic layout algorithm in Word 2007. I assume that behind the scenes it's trying to balance keeping paragraphs whole, avoiding orphaned section titles, anchoring, etc., but the outcome is often surprising and wrong. For example, I'll insert a graphic into a page and it will cause the previous page to now have a blank spot that takes up 90% of the page. Most surpising: the graphic could easily fit inside that blank spot!

Most of the time I want a block graph at the top or bottom of the page but Word seems to have a prediliction for inserting graphics inline (e.g., newspaper style) and moving the graphics with the text. To accomplish this Word appears to "anchor" images at the point of insertion. I wish there was a way to completely turn off the anchors (no, Lock Anchor and Top and Bottom Text Wrapping doesn't do this) and for Word to just flow text around the blocks.

I've been struggling with this issues for years. I wish I had learned LaTex!

Sunday, February 10, 2008

Table of Contents in Word 2007 sucks!

The Table of Contents feature in Word 2007 is seriously broken. Sometimes the entries for figures have different fonts, which makes the toc look like a ransom letter. I had to go to a correct figure caption and then copy-and-paste it into the incorrect captions to make all the entries consistent. What's most annoying is that the incorrect entries effectively ignore the style setting specified in "Insert Table of Contents..." -> Formats="From Template" -> "Modify..."

Worse, I'll save my document and then reopen it, and a bunch of seemingly random lines from the cover page, other parts of the document, etc. suddenly appear in the Document Map and TOC. For some unknown reason, the Outline Level for these lines is set to "Level 1." After I change the Outline Level for all the lines to "Body Text," everything looks fine. However, if I change the Outline Level for the "Table of Contents" line to "Body Text," and save and then reopen, all the lines are set back to Level 1 and show up in the TOC again!

Workaround
The field code for my TOC is

{TOC \o "1-3" \h \z}

This pulls in all headers at outline level 1-3 and all other text with Outline Level 1-3. To pull in things like "List of Figures" and "Bibliography" I create a new style and set the Outline Level to 1. Make sure the text "Table of Contents" that immediately preceeds the actual TOC is not a heading style.

I hope this helps! If anyone has a better solution, please leave a comment!