Monday, January 14, 2013

Download an entire blog from blogger.com

*Updated table with some improvements*

I have been reading a site called Queryshark as part of the process to refine a query letter for courting literary agents. Most of the time that I have to read is on the bus, and while I do now have internet access, the connection is spotty along the route, and I am somewhat bandwidth constrained. As such, I opted to download the site to read it offline.

This is not as easy as I thought it would be.

Originally I tried a program called backstreet browser, but try as I might, I never got the settings to work so that it would download the entire archive. Out of 200+ entries, it only ever grabbed about 15 or so.

And I thought to myself, “there’s a better way, I just know it.”

So, here is a handy dandy guide if you ever wish to transform a blogger site into something that easy to read for offline viewing. I wasn’t going to post this, as it required some commercial software, but Microsoft just made Expression Web 4 free for all. If you want a free web page editor, you aren’t going to do better.

Disclaimer: These instructions assume that there is an RSS feed setup for the site, and you’re using windows.

1. Download and use blogger backup. The settings are fairly simple. One thing I had trouble with is downloading with comments to a single file. If you want the comments, I recommend you have one file per post. This creates a file for every single comment, but we can combine them later. Choose a date where you want to start, or get everything, and click the go button.

2. Go do something else for a while.

3. Once done, you’ll have a folder filled with xml files. The backup utility is designed for pulling all of one’s data from blogger for migrating it to a different blogging platform. As such, the files are not especially readable. There’s also a lot of them, but consolidating them is surprisingly easy.

4. Launch the command prompt and navigate to the folder where the files are kept. Now type in the following command:

copy *.xml consolidated.txt

This will convert all of the files in chronological order of oldest to newest into a single file. Although, the comments will be reversed for a given post, so that the most recent comment for a given post will be at the top.

5. Download and install Notepad++.

6. Open the text file. At the top of the file, add

<html>

<body>

At the bottom of the file, add

</body>

</html>

File –> Save As, under options choose “hypertext markup language”

Save as consolidated.html

7. Here’s where things get a bit hairy. Under Search –> Replace (CTRL+H)

Find Replace
<id> <!--<id>
</id> </id>-->
<email> <!--<email>
</email> </email>-->
<updated> <!--<updated>
</updated> </updated>-->
<uri> <!--<uri>
</uri> </uri>-->
<published> <!--<published>
</published> </published>-->
<title type="text"> &lt;h2&gt;
</title> &lt;/h2&gt;

This is commenting out a bunch of xml metadata that isn’t relevant to reading the posts and comments. These were determined by trial and error, so there may be more or less depending on the nature of the blog and when this is performed.

Save and close the file.

8. Open a web browser, preferably IE or firefox (I had trouble getting this all to work in Chrome), and open the file consolidated.html. If you aren’t sure how to open a local file, hit CTRL+O (the letter, not the number).

9. Go do something else. This can actually take a bit depending on the size of the file. The browser can interpret the XML and translates it into HTML code. It’s not going to look right, but that’s okay for now.

10. Once it’s done loading the site, Choose File –> Save As. Under options, choose a text file .txt. Called it consolidated2.txt

11. Open consolidated2.txt in notepad++.

Redo the html code listed above:

At the top of the file, add

<html>

<body>

At the bottom of the file, add

</body>

</html>

File –> Save As, under options choose “hypertext markup language”

Save as consolidated2.html and close.

12. Now go to the browser and open consolidated2.html.

You should now have something that is mostly readable, or at least you can tease the content out of the remaining cruft.

13. Here is where Expression Web comes in. Open the file consolidated2.html.

14. Split consolidated2.html into smaller pages with fewer entries. It was worth consolidating to do the bulk of the work all at once, but just 200 entries with comments is enough code to bring just about any browser to its knees. After a while, it gives up trying to parse the error prone html and the stuff at the bottom just looks weird. 25 entries seems to strike a good balance of readability and browser speed.

15. The nice thing about using a web editor like Expression is that you can have the page and the underlying code open side by side. It will also call out html code that was opened but never closed. In my experience, a lot of things that get italicized with the <i> command never get a closing </i> command. You can find those quickly and easily using this tool.

16. The one thing I never figured out is how to detect comments vs. actual posts, so they are treated equally. When cleaning up the document for easier reading, I opted to give actual posts Header 2 <h2> and posts Header 3 <h3>

17. That’s about it, but some handy dandy tricks for using the tool:

CTRL + (down arrow) jumps to the next header. If you have a lot of text and don’t want to scroll while editing, that gets you there a lot quicker.

If you highlight something and hit CTRL+SHIFT+S, you can change the header immediately.

You’ll get random characters at the end of some comment titles that happen to truncate at an apostrophe. I have no idea why.

Starting at the bottom of a page, near the top of the window you can see all of the open html codes that need to be closed. You can click on them to jump straight to that code wherever it is in the page. You can right click and choose remove directly. NEVER click on it and hit the delete key! The act of clicking on the code highlights everything on page between where you are and where that piece of code is, and hitting delete gets rid of all of that content.

CTRL+Z is your friend Smile

Friday, January 11, 2013

Frustration with the publishing process

It’s not easy to get a book published if you want to go the traditional route.

  1. First you have to write a book.
  2. Then you have to throw it away and write a different, better book.
  3. Then you have to get that book ripped apart, critiqued, and properly edited.
  4. Then you have to find a literary agent.
  5. Then you have to get it published.

I’m currently on step 4, and have been working on a good query letter for some time, ready heavily through a site called QueryShark. It’s run by a literary agent who has gotten fed up with how bad query letters can be, especially compared to how good the books can be.

Writing, I think for a lot of people, starts out as something for the ego. It feels good to write. I suspect this because the process of publishing is designed largely to separate the good books from those that are written purely for the sake of the author. Each of the above steps, aside from the first one, cause a lot of anguish and frustration. Who wants to throw away their first book, after all the time they put into it? Who wants to listen to a bunch of people who just don’t get your story? Why can’t the agent and publishing company just read the book, and know that it is as good as you believe it to be?

But, you suck it up, you throw away that book, you remove some of your proudest writing because it takes away from the story for the reader, you do your homework, you deal with rejection, and acknowledge that there is a very real probability that no publisher will greenlight your book.

And then, one day, while walking through the DVD section at Target, you’ll spot this.

Assuming that getting a movie made and into distribution is at least as hard as getting a book published, you’ll think to yourself, “I’ve gotta be doing something wrong.”

For those who can’t be bothered to click the link. The movie is about a bunch of people trapped in a flooded grocery store with a great white shark. No, I’m not making that up.

Monday, January 7, 2013

Parenting: the lessons you don’t expect

One of the things that I was not prepared for with Alex, that has since been reinforced with Amelia, is that a small child produces as much, if not more laundry than a full grown adult. Typically Julie handles one or two loads during the week when she has a free moment, and we tackle it together on the weekends. I originally expected our son’s laundry to be once every few weeks, because his clothes are so much smaller.

I did not account for the messes and redresses .

WP_000234

Sure, it’s smaller, but there are a lot more of them. Like army ants.

Kids go through a lot of clothes. They make messes when they eat, they make messes when they go outside, they make messes when they stay inside. They have to wear extra layers to say warm because they have much less thermal mess, and all of that has to get washed. More than once has the kids laundry exceeded the adult laundry, and folding all of those tiny socks takes quite a bit of time.

And it’s all worth it.