Jun 25 2009

Alpha Release of SemanticTweet

Recently, I’ve been playing a little with the Twitter REST API, and with Sinatra, the new Ruby web framework that all the cool kids are into.

On the back of said playing, I’ve just released a pre-alpha (if there is such a thing) version of SemanticTweet.

Basically, SemanticTweet is a simple web service that generates a FOAF RDF document for you from your list of Twitter friends and followers. It does this using the Twitter REST API. This service uses public Twitter data only, and so doesn’t need your Twitter username or password.

FOAF, which stands for friend-of-a-friend, if you’re unfamiliar with it, is a semantic web representation of your list of friends. It’s typically represented in a semantic web format known as RDF: resource description framework. To give you an idea of what a FOAF document looks like, here’s my one, as generated by SemanticTweet.

One of the benefits of this approach is that it ensures that you don’t have to build and maintain your FOAF file by hand (or using a service like FOAF-a-matic), which is a real pain. This service will dynamically generate the FOAF file each time its queried. The second big benefit is that it turns your friends’ Twitter pages into dereferenceable URIs, which means that a semantic web browser or search engine can traverse from link-to-link, just like a standard web page, and all without having to explicitly call the Twitter API.

One way you can use this service/document is by embedding it in your blog/website. Just add a line to the <head> section of your template which reads:


<link rel="meta"
  type="application/rdf+xml"
  title="FOAF"
  href="http://semantictweet.com/your-twitter-screen-name" />

This approach is what Tim Berners-Lee refers to as Linked Data. Check out his excellent talk at TED to get a better idea of this movement.

There’s plenty more to do, and plenty of ways in which Twitter data can be presented in a semantic webby way, to allow more interesting documents to be produced, so watch this space.

So run, don’t walk, over to semantictweet.com, and check it out. You too can have that FOAF document you’ve always wanted but were afraid to ask for. Let me know if you have any comments or observations.

You can follow developments on blog.semantictweet.com and @semantictweet.

Related Posts:


Jun 11 2009

Google’s play in the translation space

Over the past few years, Google have been moving more and more into the machine translation (MT) space – see, for example, their language tools page, which allows you to translate an arbitrary webpage, or a snippet of text from one language to another.

Google’s approach to machine translation is what’s called statistical machine translation (SMT).  Essentially, they take the millions of human translated webpages that their search engine has already indexed, and align them – that is, they match sentences in one language (let’s say English) with their counterpart sentences in the second language (let’s say Spanish).

By doing this process across millions and millions of webpage they can build up pretty robust statistical methods of guessing a particular phrase’s correct translation.

This approach had been proposed relatively early on in the development of machine translation – as far back as the 1940’s or early ’50’s, indeed, but until recently, it could not compete with the other major school of thought in the area: rule-based machine translation. Google’s innovation, of course, was that because of their enormous web index they could bring several orders of magnitude more data (translated web pages) to the party than any other previous approach to SMT. In so doing, they showed that an abundance of data can lead to a significant improvement in the quality of the resulting translation.

Why is all of this relevant now? For two reasons: firstly, SFI is funding the Centre for Next Generation Localisation CSET (one of the grants in my portfolio), part of whose work includes machine translation. Second, by way of TechCrunch, I learned of the newly released Google Translator Toolkit. This toolkit is designed to work with the existing Google translation system, but also to allow human translators to add or correct the translations as they see fit.

Of course, there are many professional software tools to support human translation of software packages, websites, documents, etc., but the new Google Translator Toolkit appears to be aimed more at crowd-sourced translations. This is the latest development in website localisation (in particular), led by companies such as Facebook, where the casual (as opposed to professional) translator can translate some of the content of a site into another language. Indeed, crowd-sourced translation is also one of the areas of particular interest to CNGL.

This is a very hot area, and with the release of this toolkit, looks likely to get hotter. It’ll be interesting to see what impact this has, on the translation research community, the amateur/enthusiast translator, and indeed, also the professional translation business.

Related Posts:

  • No Related Posts

Jun 9 2009

No iPhone 3GS for O2 yet

There’s no mention of the new Apple iPhone 3G S on the O2’s iPhone page yet. Let’s see how long it will take for the 3G S to make it to these shores.

Hopefully, O2 will fully support the new tethering option in the next rev of the iPhone OS.

Related Posts:

  • No Related Posts