Scripting News for 5/26/2008

Should Twitter charge high-spew users? 

Om Malik asks if Twitter should charge users like Scoble who have huge numbers of followers.

It’s a fair question because these users are super-expensive for Twitter, much more so than users with modest numbers of followers.

To get an idea, I have a little agent script that counts and ranks people I have followed in the recent past to get a rough idea of how much work they generate for Twitter’s system software.

http://twitter.scripting.com/spewage.html

You can see that Scoble tops the list with a “spew factor” of 308,359,436. I’m #5 with a spew of 77,174,172.

Imho, they shouldn’t charge these people because they’re feeding the growth of Twitter. If you charge them a competitor will come along and might actually pay them to use their system because it will attract so many other users.

Right now Twitter doesn’t need more money. They need a design that works, and an implementation of that design. They have lots of money and can get lots more.

How to do data portability 

I’ve heard a lot about data portability conferences and workshops, I’ve even been criticized for not going to one which happened on the west coast while I was in the east earlier this month. I don’t plan to go to any of them, I don’t see what’s accomplished by having public meetings about this stuff. People who control users’ data can accomplish a lot more by finding ways to give them the power to use it more effectively. Talking about principles of data portability only achieves talk. It gives people a sense of propriety over talking, not data, and people giving up propriety over talking are just yielding the floor, not yielding any power over users.

The best way to achieve data portability is to just do it.

I know that sounds silly, or obvious, but there is so much pretending that there’s more to it, that it has to be said.

If you want to accomplish something by talking, call up a friend who works at Netflix or Yahoo and ask them if they’ll let users move around their movie rating data. I’ve been asking about this for years. No one’s email addresses are involved. All I want is the power to give Netflix permission to read an XML file on yahoo.com that contains my movie rating data (assuming Yahoo goes first). Anyone can see how much power this would give Yahoo. Why don’t they do it? I honestly don’t know. If I were them, I would.

Another example — if Twitter wanted to buy itself some time and growth, and give developers something exciting to do, they would store as much user profile data as they can off twitter.com servers and on Amazon. Simple XML formats, use some of their ability to raise investment capital (which they have proven) to grow the human network while they patch up or rewrite their system software. The more data they can move off their outage-prone systems, the more the network can grow around them, but not dependent on them. Amazon has proven they can keep their servers running. Leverage that.

The discussion about data portability so far has fixed on the hardest most vexing technical, privacy and economic issues, the ones that probably don’t have a resolution. My advice is to instead pick a few relatively easy data portability problems and solve them. Flying around the world to go to conferences to talk about the hardest problems won’t actually achieve any data portability.

Update: Brad Feld argues for APIs. A few months ago I would have agreed, but today I don’t think an API is enough. As we’ve seen with Twitter, when the service goes down, there is no API and there is 100 percent lock-in. We need more. The most vital data must be stored off-site, so it doesn’t go away when the service goes down.

The 16-year rewrite 

In February 1992, I started work a piece of Frontier called the scheduler. It’s the equivalent of what they call “cron” in Unix-Land. You can put scripts in four different places: 1. everyMinute scripts, 2. hourly scripts, 3. overnight scripts and 4. threads. It was a simple bit of code that’s been running now for 16 years, on every copy of Frontier, Radio, or the OPML Editor.

It was built on the foundation for background processes that existed in 1992. A few years later a better foundation was built, but the scheduler was never adapted to run on that.

It’s always had a certain flakiness, and I never had the patience to track it down. It’s old code, written before I learned a lot of things about the Frontier environment, what works and what doesn’t. I just lived with the flakiness.

Yesterday I got tired of it, and I did what programmers like to do, I rewrote it. It took a few hours, but the new version is *much* cleaner, and already runs much more reliably.

Proving the point that sometimes code rewrites are the way to go.

I’ve released the new part to OPML Editor users. There’s no code that uses it yet, but there will be soon. 🙂

Leave a comment