notes from semtech2009

June 19, 2009

I’m at SFO with three hours to kill, and not many brain cells left, after spending the week at the Semantic Technologies Conference. I am, it turns out, so short on functioning brain cells that I’m writing a blog post, after all these months of self-censoring because I didn’t have anything worth saying here.

The dominant question at SemTech was whether we’ve finally reached the point in the adoption curve where (to mix 2d curve metaphors) it’s all downhill from here. Has this thing really caught on? Can we just coast and let the Semantic Web take over the world now?

I still think there are some vital pieces of the architecture missing, but I can’t deny that more and more people seem to “get it”, and be spreading the word. People I’ve never even heard of. That’s a pretty good indicator.

Of course, maybe they don’t really get it. I’m not quite sure anyone really gets it, since they don’t seem to notice those missing pieces.

I was happy to see Evren Sirin talking about their work at Clark and Parsia on using OWL for integrity constraints. I’m not sure they’ve got it quite right — it’s hard to tell — but I’m really glad they are trying.

What else?

The divisions within the community are still great. You have the rules folks, who really don’t get this whole Description Logic thing. You have the Description Logic folks who usually try to not be too condescending to the rules folks. We have the natural language folks, who I still mostly ignore.

RIF is pretty badly misunderstood and mis-characterized. That might not be my fault, but it’s kind of my responsibility. As one step, I put a little more time into starting a RIF FAQ this afternoon. Feel free to send me more questions that I don’t know how to answer.

Looking over my notes…

Peter Rip had some words of wisdom for startup founders that I think may apply much more broadly. He said to predict your own behavior, then live up to it. That’s what gives people confidence in you. So step one is know thyself, eh? I’m working on it, I’m working on it. One of the things I think I’ve learned about myself is that I don’t want to found a startup, despite all the fantasies about it, ever since ninth grade. What’s changed, I think, is that I’m being more realistic about what it would do to my life, what it would cost. (I suppose at some level, I’ve always known that, since I never actually did start a company with employees.)

Allegro Graph 4.0 filled me with triplestore lust. I don’t know if it could live up to the impression Jans painted of it, but … I want one. Interestingly, I didn’t feel envy; perhaps I’m also ready to give up on the fantasies of writing a killer triplestore.

In the Semantic Search session, David Booth and someone I don’t know separately expressed the concern that computers are getting too smart for our own good. Not really in the Skynet sense, but in the sense that when google tweaks their algorithms, in ways even they don’t understand, or the web just shifts a little, suddenly you can’t find some page any more. I thought Peter Norvig looked apologetic, when he responded with a terse “use plus as a workaround!”. I need to remember that more, when I get frustrated with Google not really doing keyword searching. Conclusion: make sure the compu-smarts always have an off-switch, and that humans never forget where the off-switch is.

I’m a big fan of Tom Gruber, but I’d short sell Siri if it were publically traded. I’m not sure I can put my finger on it. All the arguments he makes for agent computing are compelling, but my gut says thumbs down. Could it be because there is (as far as I know) not a drop of SemWeb technology in there? I don’t think that’s it. I think it’s that I can’t imagine it will actually work better than more conventional alternatives. People don’t want an assistant; they want tools that are simple yet powerful enough for them to complete the task themselves. It’s a bit like GUI vs command-line. But who knows… (I wonder how this bunch of folks from SRI decided to name their new company and product SIRI. Odd….)

Data Portability. Wow, is this a difficult space. I am not optimistic here, either. I think we have many years of stumbling around in our future on this one. And that’s even if facebook isn’t being evil. On the plus side, the community is starting to realize what a hard problem it is. I’ve heard that admitting you have a problem is a good place to start. Still, the meme that there exists a quick solution, if we just get a few smart people together, … it’s damn compelling.

And here we are, going backward in time, back to the first session I attended, Monday morning, from some freebase folks (Jamie Taylor and Colin Evans). I should play with freebase more. If I ever get back to playing with scripts to manage my movie collection, I should probably use their movie data instead of IMDB. Jamie kept saying great things about W3C; I wanted to ask him why they don’t just join, but though I saw him many more times at the conference, he was always walking by in a hurry.

It’s not like I could have made much of a case, anyway. One of the problems with the W3C’s current business model is that folks no longer join W3C just because they (a) use our stuff, and (b) think we’re cool. They actually want business value in return for their dues! Losers.

(How do you measure the business value of a standard existing, anyway? In most cases, it’s a commons problem, where lots of folks get enormous value from the standard, but we don’t know how to monetize that. We can only charge for being part of the conversation when the standard is drafted, and sometimes that doesn’t seem to be worth very much.)

Speaking of W3C, OWL 2 went to CR last week. I think that was my most challenging round of publications yet. Do I say that every time? Still, stepping in as editor of rdf:PlainLiteral, and doing the whole transition process, in the midst of a getting a new manager, … it was challenging.

I’m not actually worried about the CR phase itself. OWL 2 looks pretty darn good. Until SemTech, I was a bit worried about us getting RL implementations, but several people mentioned they planned to try it, and after a while I realized it’s kind of a no-brainer. If you play in the SemWeb space and have a rule engine (as many folks do), why wouldn’t you give OWL RL a try? Of course, you might not get around to running all the test cases, and reporting back by July 30, but at this poing I’m no longer very worried about it.

Okay. One hour to flight time. That’s better. Have fun on the ground, everyone.

(topics I’m leaving out: Ian Horrocks taught a kick-ass class in description logics. The Jena team are pretty optimistic about their post-HP future (as am I). I’m sad that Jamie Taylor didn’t understand the need for 303 redirects.)

What Is OWL Good For?

October 28, 2008

Here’s another middle-of-the-night-while-travelling post. It’s about OWL….

Last week, the W3C OWL Working Group decided that it was essentially done with the design of the OWL 2 language. All that remains is some editorial work and fixing bugs that might crop up. In W3C process terminology, the Working Group decided it was (mostly) ready for “Last Call” of its multi-part specification. Happily, it’s pretty much on-schedule

I’ve been the primary W3C staff contact for this Working Group, and I was the secondary staff contact for the OWL 1 Working Group for its last six months (back in 2003-2004). I’ve tried to support both groups however I could, in infrastructure, process advice, management advice, and in some limitted areas with technical design advice. But, awkwardly, I’m not really an OWL user, so I sometimes feel like this OWL work is only my “day job”.

That’s okay — lots of us do things for work that we’re not passionate about — but it isn’t how I usually operate. More interestingly, it’s kind of odd. I mean, OWL is on the very short list of Semantic Web standards, and I’m quite passionate about decentralization, which is more-or-less what the Semantic Web is about. So why am I not passionate about OWL?

Personal Reasons?

Okay, there may be some personal reasons. I’ll describe them in the hope of factoring them out, and in the hope of generally learning more about how people interact with standards committees.

  1. Although I was involved fairly near the beginning of work on OWL, I still feel late-to-the-party. For me, when I’m late to a party, I tend to have a nagging feeling that I missed something important, and I stay very cautious. I’m less likely to become enthusiastic.

  2. Perhaps because of that, or perhaps because I’m not a Description Logics researcher, I don’t feel like part of the “in-crowd” — even though I know and like and feel personally comfortable with many people who are.

  3. I don’t think I could ever get into the top tier of experts in either OWL implementation or OWL usage. There are some very, very smart people who have been dedicated to those subjects for years already. I doubt I could catch up.

These wouldn’t stop me from being an implementor (I did one strange OWL implementation called surnia) and/or user, but it makes it harder for me to feel ownership. Maybe that’s related to excitement.

Technical Reasons?

The important issue here, though, is technical. Is OWL important for decentralization? Perhaps it is, but I’m not convinced. Really, my gut says “no”, the experts say “yes”, and I’m just confused. And so we end up with a blog post.

What purposes might OWL serve?

  1. It might be a data interface definition language, along the lines of BNF, ASN.1, and XML Schema, except on some subtley-different level. Decentralized systems certainly need something like this — some computer format for defining the interfaces — but is it OWL? (cf my work last year on asn07 (rif wiki, rif e-mail)). Some people tell me it’s entirely unsuitable for this; some people say they’ve been using it for this for years.

  2. It might give people some computer support for defining a vocabulary, something like a spell-checker helps people writing prose. I have the feeling this is where the core Ontology community is. In this view, OWL is there to help you find errors and get insights into your vocabulary specification (aka your ontology).

  3. It might be used as a declarative programming language for expressing Semantic Web shims, the transformations needed so systems using related-but-different RDF vocabularies can interoperate. But maybe this is better done with RIF or conventional programming languages? This is the Ontology Mapping problem. Clearly OWL isn’t expressive enough to do all kinds of mapping, and it probably is the best option for trivial kinds of mapping (stuff that just uses owl:sameAs), but what about everything in between?

Sometimes in standards work there is constructive ambiguity in the spec and also in the charter. It makes the technology easier to sell, and draws in a wide base of potential support. What is this technology for? Well, it’s for lots of things. It might be just what you need! If the charter were more specific, picking out only points 1, 2, or 3, then we’d have a narrower spec, serving a narrower community. Maybe serving it better, but still with fewer economies of scale and network effects.

If we look back at the OWL Use Cases and Requirements, we see use cases somewhat in line with these three options, although taking a rather different approach. The first two are about using automated reasoning in mapping between vocabularies; that’s sort of number 3, but with human vocabularies à la wordnet. The other four are less clear cut. Number five look like classic decentralization, but I don’t see any case being made for OWL in there. Number six looks like classic planning; is OWL the best logic language for automated planning?

Anyway, I’d love to hear comments (below or via trackback/pingback) about what
you think OWL is good for.

[Oh, look, it sounds like there’s some interesting discussion of this going on at OWLED. A shame I wasn’t up for travelling three weeks in a row.]

Please note that I’m really not complaining about OWL. I’m fairly comfortable with it’s design, and quite happy with the design process. I’m just saying that I have trouble seeing why it’s as cool as some people seem to think it is. Maybe I’m missing something. Maybe my goals are different. Maybe it’s utility is exactly the same to both of us, and for whatever reason it doesn’t turn me on the same way. I’m just trying to understand that….

What Is The Web?

October 27, 2008

This question doesn’t usually matter very much, but sometimes we get into arguments about whether something (eg e-mail, streaming video, instant-messaging) is part of “The Web” or not.

Due to various circumstance last week I found myself drafting this text about it (while kicking myself and telling myself it was a waste of time). Jet-Lag Insomnia, more or less.

I think the main value of my text, or any effort like this, is in revealing our often-unstated ideas about what the Web can or should become. Is this “architecture”? When you renovate a building, adding new wings, how much are you changing its architecture? How much are you changing its subtle good and bad features? Certainly it’s good to think about that question before making the changes.

It is also nice to get our terms straight. I loathe the term “Resource” and quite dislike the term “URI”. Along the way here, I try to define what I think are the key terms in Web Architecture. For extra credtit, I end up with defining a “Web Standard”.

Wikipedia says the Web is:

…a system of interlinked hypertext documents accessed via the Internet.

In contrast, WebArch says it is:

…an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI)

Without going into my dislike for WebArch, I really wish the Semantic Web would/could (somehow) stick to a definition closer to Wikipedia’s.

My one-line defintion is this:

The Web is a global communications medium provided by a decentralized computer system.

In more detail:

“The Web is a global…”: Although conceptually there could be many different “webs”, there is one which is understood as “The Web”. The Web follows (and uses) The Internet in being designed to connect different local systems. An installation of web technology usually ends up connected to others, becoming part of the unified global Web, because in most situations the value of doing so greatly outweighs the costs. The end effect is a single, integrated system built up of all available, connected components.

“communications medium” : The Web provides a way for people to communicate with each other. It does this by letting them create web pages (often collected into web sites), with unique names (the web address, or URL), which other people can view and interact with. The system does not restrict what exactly constitutes a “page” (sometimes called a “resource): originally, Web pages were essentially an on-line of paper documents, but they have evolved to now provide, within each “page”, a full user interface to remote computer systems. The web addresses are essential, because they allow people to communicate about particular pages and, crucially, they allow one page to name another (to link to another) so that user can learn about and “visit” (use) other pages.

Although generally intended for use by people, the Web is sometimes used by other computer systems. A search engine traverses the Web like a user, and then helps users find the pages they want. The Web Services and Semantic Web standards provide various ways for computer systems to interact with each other over the Web, attempting to leverage Web infrastructure as an element in new systems.

“decentralized computer system”: While the Web is in one sense a single system, it is composed of other computer systems, most of which serve as web servers or web clients. It has no central point of control (except perhaps the Domain Name System (DNS), which is part of the underlying Internet); instead, the system’s behavior for a particular user depends on the clients and servers being used by that user. Many features of the Web rely the behavior being essentially the same for all users, and that consistency depends on the underlying systems behaving consistently. Where there is consistent behavior, and that behavior is documented, the document is a Web Standard.

My Web 3.0 Prediction

October 18, 2008

In the wrap up session for the Web 3.0 event, I tried to crystalize my prediction for Web 3.0. In doing so, I was trying to step away from the Semantic Web as a goal, and instead just think about what I think is inevitable.

So my prediction is like this:

Some people say Web 2.0 is about Ajax. Other people say it’s about websites connecting users to users, forming on-line communities around read-write websites. I think these two notions are related in that without Ajax, websites were so clunky that full participation by the mass of users was impractical. With Ajax, developers could make sites that were both powerful and comfortable.

By the same token, Web 3.0 will be about Semantic Web technologies enabling a set of noticeably more powerful and convenient applications. Most crucially, it will be about everyone who maintains some data making it available in a standard form, so applications can be written to use data from many sources. These application will feel different; they will appear to “know” a lot.

For the techies, Web 3.0 will be about RDF, like Web 2.0 was about Ajax. But for users, it will be about software systems which have access to all the data they can effectively use, instead of being dumb little things, trapped each in its own little box.

In the event’s analogy format, turned sideways, I guess we could say RDF is to Web 3.0 as Ajax is to Web 2.0. Ajax was the enabling, trigger technology for Web 2.0. RDF (or something like it) will be the enabling technology for Web 3.0, enabling a whole set of applications that are prohibitively difficult without it.

Reblog this post [with Zemanta]

Someone whose name I didn’t catch, from the BBC, pointed out in a session Friday that news reporting has two modes of assertion. There’s what the reporter says, which is supposed to be more or less factual and verifiable, and there are things (of unknown truth) which others are reported as saying.

We see this all the time, right?

Yesterday, while walking in the White House Rose Garden, President Bush tripped on a crack in the pavement. White House spokesperson Dana Perino explained that the crack has been caused by terrorists who were still at large. She said the White House would be submitting legislation increasing Homeland Security funding by $250B to address this problem. Democratic House Speaker Nancy Pelosi disagreed, claiming the crack was a sign of our crumbling national infrastructure, and saying she would propose the $250B be spent on repairing our inner cities.

There are various statements made here, and if we formalized this kind of information, we would need to be careful to properly attribute it. The news source itself is making a claim about the president tripping, and then it’s making some claims about what other people said.

We’ve been thinking about these kind of provenance issues for many years in the Semantic Web arena, but I hadn’t thought about how thoroughly common they were in news reporting. That’s all.

I gave a talk Friday — a 10-minute-intro on a panel — (slides) in which I pointed out that there is a ton of powerful data that could be collected right now. I used the example of all the information your cell phone could, in theory, be collecting. I suggested putting it all on the Semantic Web, and then working on our access control models. I wasn’t a great fit for this panel, but I tried.

Alas we had almost no chance for discussion. This was a shame, since we had some great discussions among the panelists in preparing for the panel. Also, the audience was a mere 20 people or so.

On the plus side, I met substitute panelist Daniela Barbosa and was thus reassured that the Data Portability Work Group wasn’t evil. I tried to talk to her over lunch about how her experience in this effort suggests W3C should evolve, but we got interrupted and never had that conversation.

[ Edit: See Posting by Lidija Davis about the talk… ]

The Case for URLs

October 18, 2008

I gave a talk on Thursday in which I tried to argue the case for why one should use URLs to name classes, properties, and instances when publishing one’s data. It was supposed to be 10-15 minutes. Honestly, my heart wasn’t really in it, my argument wasn’t very cogent, and (no one has to know this) it was presented poorly. Fundamentally, the talk was out of place at this event because it was arguing one side of a technical design choice, and this was pretty much a business strategy conference. Looking at the forty people in the audience, I saw no sign that any of them had even considered it an issue. Either it was obvious to them that using URLs like this was good and proper, or (in most cases) they didn’t even think about it being a choice.

That said, here are my slides.

(Along the way to making the case for URLs, I tried to also make the case for standards. Again, most people are probably already convinced or don’t even see any choice there. There’s certainly a choice between the de facto and de jure approaches to making standards, and in some markets a standard may never emerge without concerted effort.)

Perhaps this whole bit is best forgotten. Or maybe I’ll get inspired to make the more cogent argument in a blog post. Maybe I need to find someone to argue against, at least in my head — an imagined audience member I was trying to convince. Maybe someone from the XML-without-namespaces crowd. Or maybe I could do the technical argument for shims, slightly refactoring my xtan story.

From Footpaths to Freeways

October 17, 2008

Here at the Web 3.0 Event, they’re giving out t-shirt with two blanks on them, saying “Web 3.0 is to Web 2.0 what ____ is to _____”. Somewhere, there are pens, and you’re asked to fill them in.

My first thought (surprise, surprise) was that Web 3.0 is about decentralization. I couldn’t think of the right words to capture that, but Dave Beckett sat down next to me and after I explained what I was looking for suggested something I liked:

Web 3.0 is to Web 2.0 what the Web is to Walled Gardens.

This morning I thought of another angle, which I also like:

Web 3.0 is to Web 2.0 what freeways are to footpaths.

Get it? On Web 1.0 and Web 2.0, humans “walk” everywhere on the web, strolling from website to website, going about their business. Web 3.0 will allow you to zoom across sites (gathering the data you want) at inhuman (machine) speeds.

That’s not how I normally think of the web’s future, and it’s not a very pleasant view, but it may be accurate.

More broadly, what is Web 3.0?

My sense from this meeting is simple:

The good news is that Web 3.0 is the Semantic Web.

The bad news is that we still don’t know what the Semantic Web is.

That is, the long standing issues in the Semantic Web community are comparably rife within the nascent Web 3.0 community. Is it about machine learning? Is it about formal logic? Is it about sharing data? Is it about searching the natural language web?

My take, quite plainly, is that it may be about all of these, but the core is RDF-style data sharing. Natural language processing has a place in generating RDF. Formal logic has a place in helping us work with RDF data. But at heart, the core thing we need to do is share the data.

(This is my inaugural post on my new WordPress blog. I decided I wanted one using off-the-shelf tech, separate from W3C and MIT. Any bets how often I’ll post?)

[eta testing link to