Categories

Clayton Christensen and Siri

At the Open Source Business Conference in March 2004 Clayton Christensen gave a presentation. It’s available as an audio file for download here: Clayton Christensen | Capturing the Upside. I strongly recommend listening to the whole thing because it’s the quintessential Disruption lecture.

It has relevance in many areas of analysis, but when I was trying to think of a way to characterize the potential for Siri I recalled one particular passage that I saw as almost clairvoyant. Seven and a half years ago, Clay said:

… the next time you go to a computer superstore, go to the voice recognition software shelf and pick up a box there that’s called the IBM ViaVoice.  Now don’t buy it, but just look at it!  They have a picture of the customer on the box, and it’s an administrative assistant who is sitting in front of her computer wearing a headset speaking rather than word processing.

You think about the value proposition that IBM has to be making to this woman.  She types 90 words a minute.  She is 99% accurate.  If she needs to capitalize something, she just instinctively presses shift and cruises through.  And IBM has to say, “No, don’t do that anymore.  I want you to put this headset on and teach yourself to speak in a slow and distinct and consistent manner in complete sentences.  If you must capitalize, you must pause, speak the command “capitalize,” pause, speak the word you want to capitalize, pause, speak the command “uncapitalize,” pause, please be patient, we are 70% accurate, this will get better we promise.”

This is not an attractive proposition to this customer.  And IBM has — I’ve not worked with them at all, but as I understand it — they’ve spent maybe $700 million trying to make voice recognition technology good enough that it can be used in that market.  This is a very difficult technical hurdle to surmount. Meanwhile, while they are investing that aggressively, Lego comes up with these robots that recognize “stop,” “go,” “left,” “right,” and the kids are thrilled with the four word vocabulary, and then “press — or — say — one” kinds of applications take root, and now directory assistants ask you to say the city and state and so on, and much simpler, and an interesting market is emerging.

I bet maybe the next place it takes route is in chatrooms because the kids don’t spellcheck or capitalize anyway, and they would rather speak than type.  And maybe then the next application would be, when you see these stubby fingered executives with their BlackBerrys trying to peck out emails, and their fingers are four times the diameter of the keys, they’re only 70% accurate.  If somebody gave them a voice recognition algorithm that really didn’t have to be very good so that they could speak their wireless email rather than peck it out, I bet they’d be thrilled with the crummy product.  And ultimately, as it takes route in these new applications, it may get good enough that we can do word processing with it, but it’ll be a long time.

The thing that strikes me about Siri is how “crummy” it is. It’s trying to be an intelligent assistant but it’s not nearly good enough. Many have called it “toy like”. It was so uninteresting that there was minimal coverage of it after the keynote launch. One can almost forgive the cynics. It was “yet another feature”.

However, it does seem to be good in a very limited set of tasks. It is something you can hire for some minor jobs that you’d rather not do. Basic calendar booking, context aware search and a few delightful surprises. But it’s not trying to be much more. It’s not trying to be a typist. It’s not trying to be a companion. It’s not trying to be smarter than you and make you redundant. It’s only trying to help lubricate your life. This is what makes it so exciting.

Like the example of the toy robot that delights the child, Siri delights with simple competence. It’s not profound but as Clay points out in his talk, it’s a necessary first step. What must follow are many more steps. Siri must get better. And because it will have at least 200 million users, I’m betting it will. So over time it will take on more tasks and will eventually help us in ways that we cannot yet conceive possible today. This is just like the introduction of the capacitive touch screen. Popularizing the touch screen has led to experiences with phones and tablets which we did not think possible four years ago.

But it takes time. Like any truly useful breakthrough, it takes a long time to mature. And also like any disruption, the potential of Siri is rooted in four principles:

  1. Humble early goals which it accomplishes well
  2. A large population of enthusiastic adopters who give it sustenance
  3. Plenty of headroom in improvement giving it areas to grow into with positive feedback
  4. A patient sponsor who makes a stable living

There’s no magic to it. In fact it’s banal. These are only the principles that every parent uses to raise a child.

 

  • Anonymous

    I really think it’s about time for tech blogs (especially more serious ones like asymco.com) to calm down a bit from the initial excitement about Siri and start thinking a little. :)

    • Charles Knight

      Thinking a little about what? I think Horace is correct in his analysis, this isn’t an end-point but a starting point. The crux here as always with Apple is that they have been very careful to limit the feature set to things that it does well (and after using an iphone 4s for a few hours, it does those few things very well). As a basis for development, it’s lightyears ahead of what is available on other platforms.

      • Anonymous

        About the fact that it’s a bunch of circus tricks that have been available for years now. There are ICQ bots that do the same thing, along with a plethora of Turing-test-fooling projects. Now, this may still be a great feature and all, because of the way it’s packaged, engineered and presented, but it’s not any “basis for development”.

        Apple is using someone else’s voice-recognition technology (Nuance?) that’s available on several platforms, has a database of preset human-like replies (which is very easy to do), and has almost zero experience (and replies) outside US English. It’s easy to see – it won’t recognize foreign names even if said in English.

        Will it be successful? Maybe. Will the public care about the fact that it’s not “real AI”? No. But anybody with at least some knowledge of the underlying tech should know what it’s about. There’s zero technological lead Apple has over anybody. Both Google and Microsoft can do it (they probably won’t, for totally different reasons), although Apple’s job is a little easier due to the fact that they control the microphone.

        Real natural voice (and, more generally, natural language) recognition is still 3-5 years from now. Real replies pulling data from various sources based on natural language queries isn’t ready for prime time either. A breakthrough won’t come from Apple.

      • Michel

        You seem to be completely out of touch with what Siri actually is and focus way too much on the input method (voice). And you are delusional if you think MS or Google could pull this off but just don’t want to for some reason.

      • Anonymous

        No, I’m not talking just about voice, and it’s easy to see if you care to actually read what I wrote.

        And yes, giving human-like replies for a limited scope of topics is easy.

      • Michel

        You obviously have no clue as to what Siri actually is.

      • MD

        You seem to think the database of stock replies is what makes everyone else think the technology is impressive (e.g. references to ICQ/IRC bots above, others to text adventure games).

        It’s not.

      • http://pulse.yahoo.com/_KEVQDCIMSOI6AFXXFFS5ZFFWBA PeterK

        Why don’t you do some research into the DARPA AI project which birthed Siri?

        “has a database of preset human-like replies (which is very easy to do)”

        If you’re right, Android’s Siri copy will ship by November. Let’s read the reviews.

      • Anonymous

        That’s a bit early. Unless Google knew what Apple was planning that is.

      • Anonymous

        Apple bought Siri 18 months ago, so google had plenty of notice. Google just introduced android 4.0 today, it has nothing resembling Siri.

        Siri is a massive threat to google, it will become the default search engine on iOS devices, completely bypassing google as a starting point for search. When Siri redirects a user to yelp or a location based result for a business – apple is getting the search revenue – not google. This will only increase once Siri adds extra direct data inputs.

      • Anonymous

        A breakthrough won’t come from Apple? Perhaps not, but it may yet come through Apple, if Apple positions itself correctly.

        Apple didn’t invent capacitive touch. They didn’t invent portable MP3 players. They didn’t invent the mouse. Yet Apple was the first firm to successfully market these technologies.

        Voice control may yet prove the same. Obviously it’s been used for years to automate directory enquiry calls, as a crude control mechanism for phone-based menu systems and as a dictation system – mostly used by people with RSI.

        It has even been on existing iOS devices and Android devices, but it has never captured the public’s imagination until now.

        This is without a doubt the most consumer excitement that we’ve ever seen about voice control.

      • Anonymous

        This may well be true, I don’t disagree. Don’t mistake me for an Apple hater.

        It’s just not there yet. Apple may succeed in popularizing this, sure, and competition may again oversleep. But they can react effectively if they choose to, that’s my point.

      • Anonymous

        Ok, I see where you’re going here. So the question is, what can Apple do with this tech that Google can’t or won’t? It obviously won’t come down to technical capability, it will be due to business model.

        Apple has control over the mic hardware as you already said, and they can ensure that noise cancellation is present and that mic response is a known quality. They also have control over the SoC, and could conceivably add special silicon to perform some of the processing on the fly.

        Apple’s revenue stream doesn’t require them to data-mine your Siri related data for keywords to profile you, or to serve you Siri related adverts (and speech ads would be FAR more annoying than current in app adverts).

        Finally there is the long term possibility of a voice controlled iPhone-nano. Google would simply not want to make their voice control that good, because an android phone that isn’t being used to serve in-app adverts, or to spend a lot of time searching the web, isn’t a profitable prospect for them.

      • Anonymous

        I think it’s going to take a while before processing can be done on your device.

        Otherwise I agree, the difference in business model, modus operandi and overall approach to product development will hold back both Google and Microsoft.

      • http://pulse.yahoo.com/_KEVQDCIMSOI6AFXXFFS5ZFFWBA PeterK

        It’s done – 4S includes two best in class GPUs which are TODAY doing the device based voice processing.

      • Anonymous

        Yeah, but the magic is happening in the cloud.

      • Anonymous

        I’ve seen a lot of excitement for Facetime, Apple TV, and, unbelievably, even Google TV. Let’s see if it persists

      • Gregg Thurman

        If natural language recognition is still 3-5 years away then Apple gave the clock a healthy kick start by bringing the technology mainstream.

        Break throughs in VR and AI don’t have to come from Apple to be successful or disruptive. They require developers addressing a market. Before SIRI there was very little awareness outside the tech/geek world of what VR could do. That has changed.

        Consumer demand for improvements to a potentially very useful tool, will now drive future development.

      • David V.

        “About the fact that it’s a bunch of circus tricks that have been available for years now. ”

        Clearly, most of “the trick” was previously available via the Siri app. However, Apple acquiring and integrating it with the system moved it from “circus trick” (i.e., impressive to watch but not practical because you’d have to navigate to the Siri app before being able to do anything) to “useful but limited feature”. It’s useful because there are now a limited set of tasks where a Siri command is more efficient than a touch sequence command. So far, for me, the tasks where Siri is preferred is (a) setting an alarm or a timer, (b) recording a reminder, (c) calling a phone number and (d) playing a song. I rarely send text messages, but I imagine that that task is now also more efficient for some (or under some circumstances).

        “There’s zero technological lead Apple has over anybody.”

        I think you under-estimate Siri’s lead in identifying and tracking semantic context. It is their major contribution (and I assume the part that Apple acquired Siri for).

      • Anonymous

        I think you seriously underestimate what Google and Microsoft have in terms of identifying semantic context.

      • Anonymous

        Fair enough. Point me to the consumer grade product that I can buy that does similar things.

      • Anonymous

        There’s none that I know of. Siri is king (queen?) ATM. No doubt about it.

      • Anonymous

        Then, with all due respect, who cares what MS and Google has. Especially Google, who’s track record of “outside of the box” stuff is not good. Wave was as pure an R&D tech as you can get and it bombed big time.

        Remember Courier? Yikes! I posted something further down with my thoughts about Siri take a look and tell me what you think.

      • Anonymous

        Ah, but didn’t Apple just show the way? Google certainly knows a thing or two about how to use that.

        Courier was a commercial uncertainty, it wasn’t clear whether it would succeed or fail.

        Siri, I think, will show that this kind of tech can be popular, and how it should be packaged.

      • Anonymous

        [Getting tight here!] You would think that Apple has shown the way,but look at Google tablets, for example. Tremendously bad UI. I agree that Apple shows how to package, but as demonstrated by the mobile ‘Spec Wars’ the competition doesn’t “listen”. Mixing metaphors here, but you get it.

      • Anonymous

        Here again, tablets are made by a committee. Nobody’s in charge. An industry as a whole needs much more time to get things right.

      • Anonymous

        What? Apple got the tablet right 2 years ago. The industry has proven it can’t copy it because they can’t match the software/hardware combination apple has developed.

        Siri is in the same position. Google can try and match the software, but with 50+ android handset launches every quarter with differing hardware specs, it’s going to be impossible for them to have a successful competing product (it will work great on some phones, ok on others, and won’t work at all well on others)

        If you don’t build both the software & hardware, in today’s environment you can’t hope to compete quality wise.

      • David V.

        Possibly.

        I’ve used what Google has made available on Android handsets (or at least, what was available on a Nexus S a few months ago): That wasn’t competitive with Siri (it didn’t really track context at all), but perhaps something newer or some other product does something competitive? I’ve also had Google’s iOS search app installed on my phone, but it doesn’t track context either and it’s really just voice transcription for a search query (and unlike on the Nexus it isn’t a system service, and hence of very limited use).

        I haven’t used Microsoft’s offerings in this area and so I may be missing something significant there (but then I wonder why no demos have attracted my attention).

        Any suggested pointers where I can learn more about Google’s and/or Microsoft’s state of the art?

      • Anonymous

        There are two key components of Siri: voice recognition and human-like replies to queries. All this is brilliantly engineered and built into the OS in more contexts than any competitor has made so far. There you have the best implementation available.

        That being said, both key components are available (and Apple isn’t even using its own tech), and, while neither Google nor MS have showcased their natural language interaction products yet, they have massive amounts of data and a lot of tech they’re working on in this area. Their problem with presenting a product now is ambition, they’ve been aiming at much more comprehensive solutions then what Siri does.

        It IS a problem for Google and Microsoft, and they may never get there because… well, because they’re Google and Microsoft.

        What I’m skeptical about is this perceived Apple technology lead based on the fact that they presented Siri now, and the widespread notion that it’ll be hard for competition to catch up. It may be, but for organizational reasons mostly, I think.

      • Anonymous

        What exactly is your argument?
        Let’s review the tape.
        Apple created iPod — first as a device (with iTunes) then as ecosystem.
        There was no part of this that was TECHNICALLY hard to reproduce — but it’s what, ten years on, and no-one has done so STILL.

        Apple created iPhone. Again nothing that was TECHNICALLY hard to reproduce, but we (as far as I and the iPhone buying public) don’t have a serious competitor yet, 4+ years on. Likewise for iPad.

        Of course someone COULD reproduce Siri in theory. The question is: why is Siri the point at which the competition can get their act together and overcome their myriad internal dysfunctions?
        Your argument seems to be “well it’s ONLY technology, and therefore it’s easy to clone”. However how about this as a highly likely scenario: project manager at MS/Google/RIM insists “we’re not going to copy Siri, we’re going to do FAR better”. And so rather than sticking to a limited set of tasks, commensurate with current skills, the competitor is expected to, is advertised as doing far more. But doing far more, of course, means much more scope for error — the precise reason Apple limited what it can do. Net result — Siri does a few things and delights, the competition claims to do everything and does most of it badly.

      • Anonymous

        While I generally agree with you and have stated here more than once that I’m rather skeptical about them, but there’s a significant difference here.

        iPod and iPhone are devices. Making those for companies like Microsoft or Google requires coordination throughout the industry, striking deals, reaching compromises, etc., which takes tremendous amount of time and cross-adjustments.

        iTunes Store was a service. But it also required a lot of negotiations with labels which takes years to complete.

        When we talk about Siri-like stuff, this isn’t needed. Both Google and Microsoft can do it, the barriers are for the most part internal (which doesn’t mean they are easy to overcome, of course). But it’s doable.

      • Anonymous

        And google search is just an algorithm

      • http://pulse.yahoo.com/_KEVQDCIMSOI6AFXXFFS5ZFFWBA PeterK

        “Apple isn’t even using its own tech”

        It’s called Siri AND it has 100s of patents. (Hi Google.)

      • http://www.asymco.com Horace Dediu

        I did not address competitive response in my post so I don’t know if your skepticism is toward this post or comments or some other materials.

        I don’t think AI is a unique technology by any means. It’s certainly something many companies can acquire. I also don’t think that Apple will use Siri to disrupt ‘the usual suspects’. The most likely victims of Siri are probably still unknown and they themselves have no idea of what fate awaits them.

      • David V.

        “There are two key components of Siri: voice recognition and human-like replies to queries.”

        I think of it as more (interacting) components at the top level: (1) Voice transcription, (2) Context modeling, (3) Query handling, and (4) Response delivery. (It can be broken down in other ways, of course, including your way, but I feel this better delineates areas for progress.) This is mostly a repeating sequence, but not completely. E.g., Siri appears to be using what it’s got so far for (2) to direct (1), and (4) is sometimes meshed with (3).

        Siri’s particular strength (and added values to Nuance for (1), various information sources for (3), and Apple’s own voices for (4)) appears to be (2): In a limited set of domains, it binds context-specific terms fairly reliably to intended designations (making use of location, calendar, address book, previous interactions, and, very rarely, query results). From things Siri employees (Tom Gruber in particular) have presented prior to Apple’s acquisition, I _think_ this is nontrivial technology that is not that straightforward to replicate (I could be wrong–I work in a different corner of computer science).

        In the area of query handling (component (3) above), I think much near-term progress is held up not by technology but by legal/security and business constraints. For example, Siri clearly knows about “airplane flights”, but any queries on that topic are either turned down or turned into generic web search queries. I wouldn’t be surprised if Apple is in the process of negotiating for use of flight (and perhaps reservation) information via Siri. Similarly, if Siri were allowed to receive Wolfram Alpha in a form that’s more structured than just images, its responses would be more useful (it could read it back for starters, but possibly also chain the result into a new query).

        What is regularly most jarring to me when I use Siri are the small shortcomings of the response delivery (component (4) above). I’d expect that to be the most-mature/least-challenging part, but Siri usually completely “gets” an interaction in every way and the on-display response is right on, but the narrated response is occasionally terribly mispronounced.

        So each of these components still has considerable room to evolve.

      • Anonymous

        Sourcing of data is exactly where Microsoft (and, to a much greater degree) Google will have an edge. There are millions upon millions of sources provided to them voluntarily.

      • David V.

        True, but on the flip side, the best-quality sources are likely to resist being “mined for free” and Apple’s culture of deal-making may be a asset there (e.g., see the use of Yelp by Apple, vs. the Google-Yelp conflict).

      • http://pulse.yahoo.com/_KEVQDCIMSOI6AFXXFFS5ZFFWBA PeterK

        Shipping in Android or Win Mobile?

        See 2014.

      • Anonymous

        [jumping up here, it's getting tight down there]
        “Here again, tablets are made by a committee. Nobody’s in charge. An industry as a whole needs much more time to get things right.”

        Same deal with the PC industry. Look at what Apple is doing compared to the industry. They’ve had decades to get it right. Instead, their approach of load in every feature, no matter how poorly implemented and drop the price is showing long term problems.

        Ultimately, the problem is that unless there is ad revenue to be had, Google, I think won’t be a leader in this space. Ditto with MS. as long as they insist on tying to Windows.

        So, we are back to my original statement. Until Google or MS has something to ship, who cares what they have inhouse? And until that shipping product succeeds, who cares if it ships?

      • Anonymous

        This discussion is getting increasingly confusing, with threads and subthreads.

        Who cares? I don’t know, I do, it’s an interesting topic. Is it good enough reason?

        My comment was about the technological lead Apple supposedly has, not Siri in general. So it does matter what technology Apple’s competitors have.

        Re industry vs. single player, there’s an ocean of difference. To make a good PC, lots of things should happen – OEM has to design one properly, source good components, ensure the OS works well, make drivers etc. Lots of actors involved. A Siri-like feature in Android or WP needs Google’s or Microsoft’s internal efforts for the most part. So it CAN be much faster.

      • poke

        It’s not about the underlying technology. It’s entirely, completely about the fact that Apple is in a position to make the current technology USEFUL because it has an integrated platform. Apple’s edge here is that it knows exactly what apps will be available on the device and which 3rd party services it can connect to. It’s the basis for development because they can keep adding services and apps that Siri can connect to, they can update the back-end with whatever new relevant technology comes down the pipe, and, importantly, these technologies are based on machine learning and having millions of active customers means Apple will have access to a much larger dataset on which to base improvements.

      • Anonymous

        You talk like adding those services is a cakewalk. It’s not. It’s incredibly difficult to improve those things, and the better you become, the more expensive and difficult it gets to add 1% accuracy/relevance/etc.

        Apple’s competitors are coming to the solution from this back end, lacking any sort of compelling front end solutions. They have search engines that give them more information about what kind of searches people perform, what hits and misses occur, they have tons of data sources to tap on, etc., and they have immensely more users whose behavior they analyze each day. It’s all meaningless without a human-facing interface, of course, so here’s where they are way behind.

        Apple did a front end with not so much of their own stuff to back it up. They’ll be fairly successful with it, but the next step will be difficult.

      • http://pulse.yahoo.com/_KEVQDCIMSOI6AFXXFFS5ZFFWBA PeterK

        Apple has 1000 Siri focused sw engineers cranking out code. The NC data centers is all about Siri.

      • David V.

        I very much doubt it. Apple has claimed that the NC data center was to support iCloud, and I believe it. It’s possible that it’s also where the Siri back end lives, but I don’t think it occupies so much of the hardware there as to justify the claim that “The NC data centers is all about Siri”.

        Similarly, it not in Apple’s culture to throw large numbers of developers at a problem: I doubt there are even 100 Siri-focused engineers (the real number is probably close to 10).

        I agree with vangrieg that significant progress is not easy. E.g., I’ve seen much speculation about “opening up Siri for third-party developers”, but that’s actually quite tricky to do without over-constraining the technology (e.g., how does one create a stable criterion for routing command/query categories without freezing the internal architecture in a way that will hamper future evolution?).

        That said, in support of poke, Siri did claim (before being acquired by Apple) that their architecture made it relatively straightforward to connect to additional data sources. The tricky bit, I think, is getting the query context and semantics right. Furthermore, the more domains are added, the more opportunity arises for ambiguity.

      • Anonymous

        Siri is already learning from 4 million users gained over the weekend – it will likely have over 25 million users by the end of the quarter.

        That’s more than enough user feedback to vastly improve results.

        In comparison, google search requests are hardly ever made in natural language, but instead are keyword based.

    • Anonymous

      Horace seems pretty calm to me

    • Anonymous

      It appears this article is full of careful thinking. Perhaps it is your limited comment is lacking? Siri is quite masterful at effective tasks like reminding me to do something somewhere or at some point, or telling me the weather, traffic, searches, conversions, maths, science, settings alarms and timers, organising around my schedule, etc. And without me “training it” or speaking slowly. I’m seeing about a 9/10 hit rate of accuracy, which is crazy impressive considering my heavy Australian accent.

      • Anonymous

        Maybe it’s my comment that’s lacking, of course, but I’m not the one to judge.

        9/10 isn’t good enough, by the way, as anyone who tried to push recognition will attest.

      • Anonymous

        9/10 is extremely impressive, and good enough for even the Mums and Dads to keep using it and enjoying it. iOS Voice Control and Android voice recognition is about 4-5/10. Have you even tried Siri? I bet you haven’t.

      • Anonymous

        9/10 is good enough to impress but not good enough to rely on, especially in a car via handsfree. Android’s VR is crap, but Smart Keyboard uses Nuance (which, I think, is the same backend as Siri), and it’s equally as good (or bad, depending on your perspective).

        More importantly, though, I’m not trying to diminish Siri as a feature. I’m just pointing out that it’s not the breakthrough tech many people rush to think it is. It’s a wonderfully engineered combo of existing tricks. Not a “development” foundation, and not even owned by Apple for the most part.

      • Anonymous

        Perhaps it is you with the unreasonable expectations?

        What else is software development other than a wonderfully engineered combo of existing tricks?

        Consider Pixar. I don’t think there was any single dramatic breakthrough that allowed for the creation of CGI movies, or if there was it happened many years before Toy Story. It was a slow accumulation of features and capabilities.

      • Anonymous

        Sure, but sometimes you need new tricks.

      • Anonymous

        Sometimes you need a product that actually allows the customer to use a feature that works as stated, and does what it promises extremely well, creating joy and an emotional connection with the user who is keen to use said feature and find use for it. The 4S and Siri are all of these things.

      • Anonymous

        4S is a wonderful phone, and, unlike many others, I said from the beginning that it would sell very well. Siri is a delighter that’ll help its sales, too.

        Also, I’m sure that expectations for working voice control will raise from now on, and that there will be pressure on other platforms to implement Siri-like stuff.

        All because of what you said, so you won’t find disagreement here.

        All I’m saying is that Apple has a long way ahead of them to make it a truly mainstream thing, and a seriously usable way of interacting with their phone.

        They call it a beta for a reason.

      • Anonymous

        It may be “beta”, but it works fantastically, and is already a feature I use numerous times in the day. I love telling it to set my alarms instantly while I watch TV :)

      • http://twitter.com/adriancjr Adrian Constantin

        9/10 is good enough for me. My typing precision is much worse, probably also due to a mild dyslexia and typing on a virtual keyboard is not a pleasant experience. I am not a good customer sample, just highlighting that 9/10 precision is good enough for some cases.

      • Anonymous

        9/10 while typing is different because with Siri you don’t have a quick and easy way to correct things – you need to wait until it recognizes what you said. The bar for hands-free operation is higher.

        You can recall the handwriting recognition efforts. It went up to 98-something percent accuracy, and it still wasn’t enough, so the idea was (at least temporarily) abandoned on tablets.

        For some people this will be enough (heck, some people even seem to be happy using VR on Android/WP). But there’s a long way for Siri to go to become usable for mainstream beyond toying for a week or two.

        It’s a technology showcase at this point. Way too early to talk about disrupting anything, in my opinion.

      • David V.

        9/10 is good enough in conversational modes, which is how Siri operates. In a typical exchange with a human, 10% miscommunication is not unusual, but the conversational form permits feedback and correction. E.g., two days ago I asked Siri “Wake me up at 8am” when I should have asked 7am; when it told me it had set an alarm for 8am, I said “Change it to 7am”, but I mumbled a bit, so it told me it changed it to “7pm”. I repeated “Change it to 7am” and it fixed it to perfection. So the conversational mode, allowed to compensate both for my error and Siri’s misunderstanding. That’s how humans operate, and that’s an important part of what makes Siri practical.

      • Anonymous

        Your doubts sound exactly like the people who said a hard drive based mp3 player wasn’t a new featured device. Or that a touch screen phone wasn’t a new invention. Or that buying music online wasn’t anything new.

        The point is that apple has an awesome proven history of taking little used new technology and successfully mainstreaming it.

        Henry ford didn’t invent the car.

      • Anonymous

        Yes, I may be wrong. But not all Apple’s endeavors were successful in becoming true mainstream things. Apple TV wasn’t, and I was always right about it.

      • Anonymous

        ATV was never intended to be. It doesn’t even have a category to itself on the Apple website.

        ATV has always been a toe in the water – Siri is clearly a full plunge.

      • Anonymous

        The current $99 iteration of Apple TV is selling millions of units a year – more than all its competitors combined, And its software offering seems to offer more each iOS update – I expect it to sell very well this xmas as an iOS device accessory. However its not guarenteed of course, and I will be willing to eat humble pie if its not evident by January earning report.

      • publiclee

        I’m waiting patiently for a couple of these phones to drop out of the Telstra truck so that I can try the ‘toy turing tests’ but I know this for certain – of all the people who interact with computers of all sorts its clear that the majority of interfaces attend to those with good visual skills.

        My wife is a very vocal member of that minority for whom reading and typing – and yes, time management, is a form of torture.

        She is dying to have a phone that responds intelligently to her voice. Maybe Siri is only baby steps in that direction – but then it will make her day – and my life, much easier.

      • Anonymous

        > 9/10

        Your criticism lacks technical foundation.

        What is actually needed on blogs is more technical foundation.

        Nuance already gets 99/100 words right just by listening. Siri guesses the rest by “thinking,” same as humans. It is the same with TCP and IP. IP loses a ton of stuff and did not work, it was too dumb. TCP finds what is missing and puts it back together. Even a digital cable loses bits, even a hard drive loses bits, they are guessed through error correction constantly. No tech is any more than 99% accurate. The trick is creating the illusion that it is. Humans also misunderstand each other routinely. (What are the lyrics to Purple Haze? gets 50 different answers from 50 listeners.)

        And right in this post, Horace points out that although Siri is not 100% accurate, it is more accurate than phone typing. That is counter-hyperboly.

    • http://kaizenity.blogspot.com/ FalKirk

      “I really think it’s about time for tech blogs…to calm down a bit from the initial excitement about Siri and start thinking a little”-vangrieg

      Vangrieg, I respect what you’re saying. There is a tremendous amount of hyperbole surrounding Siri. Much of it is speculative. Some of it is based on no foundation at all. However, it seems to me that Horace’s discussion is the exact opposite – a thoughtful attempt to examine the disruptive potential of Siri.

    • http://kaizenity.blogspot.com/ FalKirk

      “I really think it’s about time for tech blogs…to calm down a bit from the initial excitement about Siri and start thinking a little”-vangrieg

      Vangrieg, I respect what you’re saying. There is a tremendous amount of hyperbole surrounding Siri. Much of it is speculative. Some of it is based on no foundation at all. However, it seems to me that Horace’s discussion is the exact opposite – a thoughtful attempt to examine the disruptive potential of Siri.

    • Anonymous

      At first I thought you were being sarcastic, then I realized that you really did mean that Horace should calm down and think a little.

      Sheesh, I can’t imagine a calmer post. As a parent of three, I find the analogy is perfect. You have to think long term, because what you see right now is just fleeting glimpse of what they will become. Apple is well-positioned to be a parent.

      Also, you are conflating the voice recognition technology (Nuance) with the AI technology (SIRI) far too much. Apple has some sort of deal with Nuance, which makes sense because they are tops in the voice recognition tech, but they own SIRI, and SIRI is not just a bunch of parlor tricks. It will be hard for others to start from scratch. It’s what you do with the voice input that matters, not just the collection of the voice input.

      • Anonymous

        I chose very poor wording and seem more provocative then I intended to be.

        What I meant was rather “we need some more than initial reaction type stuff” about Siri at this point.

        This is an important topic, and it may change our lives at some point, so it deserves better than Siri talk.

        Competition doesn’t need to start from scratch technically. What they need is rethink readiness of their solutions. This may take even longer though. :-)

      • Anonymous

        I’m curious. When you say someone needs to calm down and start thinking a little, how much less provocative did you intend to be.

        You start with a statement suggesting that Horace was out of control, then end with he’s not thinking enough, at least for you.

        Your post was hostile, it seems to me.

    • Anonymous

      From reading your posts, you don’t understand the product. That’s pretty evident. You are focusing on a single aspect, the humor part and missing the other 95%.

      You, perhaps, need to start thinking A LOT.

  • Michel

    Funny side effect of Siri: I keep imagining myself saying things to Siri (remind me of this, call that restaurant, message my wife), and I don’t even have an iPhone (of any kind).

    • Anonymous

      They got you. Ask Siri in your head to show you the way to Apple Store.

  • http://twitter.com/formasymphonic formasymphonic

    Many have been quick to overlook Siri, unable to perceive its long term potential. Already existing voice solutions from competitors are often cited to accompany that downplaying. They are looking at it on a feature basis.

    Apple never looks at things like that though – only from a view of how something can add holistic value (immediate or extended) to their platform product.

    The hat tip to Siri being an extended value addition is that they consider it a “tentpole feature” for iOS – even to the point that they are releasing something and referring to to it as a beta. I don’t recall them ever releasing software they themselves openly refer to as beta to anyone other than developers.

    As is often the case, few will truly grasp what Apple’s moves will mean for the industry as a whole – but I predict that in time Siri & iCloud will be as important to Apple’s overall ecosystem as iTunes was to the iPod and the App Store was to the iPhone.

    • http://www.linkedin.com/home?trk=hb_tab_home_top Matthew Gunson

      Totally agree.

      The potential for disruption with Siri + iCloud is astounding. Think of all the tasks that can be enabled to almost everyone that once were only performed by “experts,” highly trianed professionals and technicians.

      Last week I spent the better part of my week working with a data viz platform that claims to be “simple and fast.” Well it turned out to do any sort of analysis that was even one layer below the surface, I had to perform some rather complex programming. I am not a programmer. So here I am trying to learn how to command this application through what is a foreign language to most people, to do a moderately complex analysis and visualization. It took me the better part of the week. Imagine if I could just tell the computer in plain english, which data table I wanted to work with, and what question I wanted an answer to. That level of sophistication, when it arrives will totally disrupt many, markets.

      Siri is far away from being able to do something like this, but that is it’s potential. Could you imaging in five or six years, saying, “Connect to the database and show me how many first time customers we gained each quarter, and how many existing customers we lost?” and then in a matter of seconds (maybe it’s minutes depending on the size of the data set) having a visualzation of the answer. You could then tell it, “show me this graph by day rather than quarter and then send it to the marketing director”

      Also, think of all the tasks that could then be moved from desktop to mobile device and it’s easy to see even more clearly, how mobile is the future and laptops will soon become door stops, and perhaps desktops to follow as well.

    • Anonymous

      They have done public betas a surprising number of times with large scale products that require a lot of user feedback and learning and a lot of developer participation before they can succeed:

      • Mac OS X Public Beta in 2000
      • iTunes+iPod when it was Mac-only for the first 2 years or so
      • Safari Public Beta in 2003
      • original iPhone in the US only
      • AppleTV, the “hobby” (beta)
      • Mac App Store before Lion.

      Siri needs to adapt better on one side to users based on their input, and on the other side, adapt to more services built by developers. It can’t just appear fully formed.

      Here we are discussing Siri, and so are many other users. Even discussing the sociological implications of AI phones. Creating new customs. That had to get underway. And every time Siri boots you out to a Google Search, that is an opportunity for an existing or yet to be founded startup. Many developers are mobilizing right now around Siri like they mobilized around the other products I mentioned above.

      So yes, it speaks to the scale and ambitions of Siri that it had to be beta. These are all giant products.

      Also, all were panned at introduction by the savants of the status quo.

    • Jomy Muttathil

      As Siri matures it will take on bigger challenges.
      One area I think Siri could be revolutionary is turn-by-turn directions.
      It is nice having the computer tell you where to turn but it would be a whole lot better if you could talk to it and give it commands.
      This would allow iOS to leapfrog Android in this area.

  • Alan

    I believe that with a few changes Siri could power a phone with a minimal (or no) screen. Imagine something the size of an iPod Nano (the older ‘tall’ one) or shuffle – you speak into it for all the current Siri functions and it responds in voice. Phone calls, texting, Wolfram Alpha, Yelp, appointments, email. Yes it would need GPS, 3G (or better), and wouldn’t be able to do things like facebook (as it stands now) but it could be a compelling product built at lower cost that would excel at a subset of features.

    It reminds me of iPods – initially most mp3 players were flash based but held only dozens of songs. Then the iPod came out and held 1000, then 2000 and so on. The flash players dried up. Then, suddenly, Apple comes out with the Nano and flash is back and the competition is decimated. I’m looking at this as a disruptor to current voice only phones a few years down the road.

    • Anonymous

      That is actually how phones used to work at the very beginning. The only interface was microphone and speaker. You said, “Operator?” and a voice answered you and you could interact with it like a human because it was a human. You could say “get me the police” instead of dialing an emergency number like 911 or 999.

  • Leon Hurst

    Comparing Siri to the introduction of capacitive touch screens is fraught with risk. Touch technology literally bounded over a chasm, going from years of niche solutions to mainstream success with the introduction of the iPhone 2007. The iPhone interface was very well tuned and comprehensive in its scope from birth – Apple wholesale replaced the traditional handheld HCI model.

    I wait to try Siri first hand, but suspect today it may still be on the wrong side of the chasm. I trust voice support is a much more complex problem to crack than a touch HCI, it requires a lot of the AI IBM created 25 years ago… which requires teaching… which requires the 200M users (something IBM could never muster thanks to Windows success over OS/2).

    So it seems the cost to move Siri past the chasm, in terms of time and collective investment, will be high. What will mankind get in return?
    Intelligent answers to complex questions?
    True multitasking between the human and the machine?
    Emotionally richer way to work with computers?

    Today, I would bet on the true multitasking experience e.g. book meetings while crunching some numbers in Excel. Liberating the human – machine interaction from the long serving pointing device + keyboard could be a big revolution.

    • Anonymous

      Siri *is* AI. It is *not* voice recognition. The voice recognition in iPhone 4S is Nuance. Nuance is basically Siri’s older brother. Both were created by SRI, which is now part of Apple. Nuance was spun off before Apple bought SRI, so Apple licenses Nuance.

      It’s actually not that hard anymore for a computer to recognize what you said with 99% accuracy. What is hard is a) the other 1%, which essentially always has to be guessed, and b) figuring out what you actually meant, pulling the meaning out of the words. The hard parts require AI.

      So we see Apple ignoring Nuance for many years while some users pleaded for it, and then they suddenly license Nuance after they buy Siri. That is because Nuance is just ears. Siri is a brain.

      It doesn’t matter where Siri is on some academic AI yardstick. Siri is a product now, with 4 million users already. What matters is: is it useful? It only has to save each user a few minutes to save a whole human lifetime. The phone already had 2 microphones and sometimes a 3rd over Bluetooth. iPhone had to start listening to the user at some point. It does not have to stop having a touchscreen to be a success.

  • Anonymous

    A patient sponsor who makes a stable living

    Isn’t a key requirement for disruption that the disruptor be profitable from the start? One problem for Siri is that as yet it has no revenue stream, in fact it has a negative revenue stream – and the more it is used the more it will cost to provide the back end services.

    There are a few obvious ways that Apple could monetize it, beyond simply subsidizing it out of phone prices.

    1. A small App purchase charge or yearly fee, similar to music-match – problem, once a service is free it’s hard to charge for it, and we can be sure that Google will be producing a free competitor at some point. Charging for Siri would drastically reduce take-up.
    2. A data or advertising supported product which is free to users – problem, this isn’t in Apple’s DNA, it doesn’t have a good track record of turning consumers into products for others.
    3. Operator kickbacks from the increased data use that Siri should produce – problem, hard to negotiate.
    4. Creation of an iPhone-nano which uses Siri as the primary UI – problems are mostly engineering related, though there is a big question how big such a market would be.

    I personally think Apple should take option 4, because I think it fits their philosophy best.

    • http://kaizenity.blogspot.com/ FalKirk

      “Creation of an iPhone-nano which uses Siri as the primary UI”-EduardoPellegrino

      I think you’re missing the point of the article. Viavoice was trying to replace touch typing with voice dictation. Not only wasn’t voice dictation up to the task, but it was also trying to replace something that people were already supremely competent at.

      Siri is only tackling those tasks that are done poorly via touch. It’s easier to tell the phone to schedule an appointment than it is to 1) Find the clock App; 2) Open the clock App; 3) Hit the “plus” icon; 4) Change the hour, minute and am/pm sliders; 5) Click the save button.

      If Siri were to try to replace ALL the touch functions associated with a phone, it would go from being great at what it does to being mediocre and really bad at a whole bunch of things that touch already does better.

      • Chupatribra

        That’s a very good observation: redefining “task” from the granular touch-based interactions to potentially more complex groups or sequences of tasks. Thanks for pointing that out.

      • Alan

        I wouldn’t consider it a replacement for the “smartphone” but rather a replacement for a “dumbphone” and “featurephone”. Doing a smaller subset of things well enough with some significant advantage in cost, size, or usage model.

      • Anonymous

        Exactly, or a bridge device between the two. As small as a feature phone but with many of the capabilities of a smartphone.

    • Anonymous

      Using your logic, A5 also does not have a revenue stream and should be sold in a box by itself for $n.

      Siri is a component of iPhone 4S, the fastest-selling consumer product ever, and the most profitable phone by far. That is Siri winning the jackpot. She is lying on a bed of virtual cash.

      • Anonymous

        You’re misunderstanding me. The claim that Horace is making is that SIri is disruptive, which nobody is saying the A5 is.

        Consider an old technology like Front Row. The purpose of Front Row was to sell more Macs, but it clearly wasn’t disruptive, and eventually Apple even removed it because it took more effort to support than it delivered in benefits to Apple.

        Instead, Apple’s solution is that you should buy an ATV instead, which does generate them an income stream – and does have the potential to be a disruptive technology, at some point in the future.

    • Anonymous

      “Isn’t a key requirement for disruption that the disruptor be profitable from the start? One problem for Siri is that as yet it has no revenue stream, in fact it has a negative revenue stream – and the more it is used the more it will cost to provide the back end services.”

      Siri pays for itself by selling phones (and eventually other devices). But you’re right that normally Apple makes its services support themselves financially. Perhaps Siri will eventually make money by improving/enabling other paid cloud services that it interfaces to (even 3rd party ones).

  • http://www.informationworkshop.org Mark Hernandez

    In Apple’s keynotes Steve notes in retrospect of the huge changes associated with new input methods beyond the keyboard – mouse (Mac), clickwheel (iPod), touch(iPhone/iPad). It’s clear to many that voice may well be the next input method that will have huge changes associated with it.

    And looking at these input methods we can see a progression from discrete and deterministic to fuzzy. Even with touch, while it has a few basic gestures considered “required” (tap, drag, pinch) there are many other gestures possible, especially on trackpads, and it’s moving into fuzzy territory. On iOS5 you can even create custom gestures that are proxies for others (under Accessibility/Assistive Touch)

    Now with voice things become even fuzzier still, with the attendant power and possibilities.

    As long as Siri remains a little mysterious and fascinating and maintains a critical level of participation, its slow march to becoming a disruptive factor will continue.

    The Mac wasn’t the first computer with a mouse, the iPod not the first music player with sliding controls, and the iPhone not the first device you could control with gestures, and Siri is not the first voice recognition layer. But when you bring something to the masses and there’s a huge level of participation which takes it “from the lab to the people” well…

    One of the many intangibles involved in Apple’s success is Apple’s ability to say little or nothing itself yet still somehow keep everyone talking about Apple constantly. (I’ve cataloged over 150 Apple tech/news sites and blogs.) Siri will be no exception. Within 24 hours of the iPhone 4S launch there were sites devoted to “things Siri says.”

    (shaking my head) Brilliant, remarkable! Taking something that already existed for a long time and giving it the Apple treatment to give it the *appearance* of a discrete “invention” to the general population. Amazing.

    It has already become fascinating to watch everyone fall over each other trying to get a grip on Siri and figure out what it means, and most will fail in doing so, yet they’ll be unwitting participants in maintaining the buzz around it, contributing to its ultimate “success” which, this time around, will be even harder to measure.

    (e.g. See http://www.quora.com/Siri-product/Why-is-Siri-important and all the related questions in the sidebar.)

    • Anonymous

      A really interesting nugget is how Siri’s easter-eggs have become such a big deal. I can’t think of any significant easter eggs in Apple products for over a decade – yet Siri is not only full of them, but has created huge buzz around them.

      I’m willing to bet that in future point releases of Siri we’ll see new easter eggs, and pop culture references. Apple will encourage us to use Siri as a toy, in order to accustom us to using it as a tool.

      • http://www.informationworkshop.org Mark Hernandez

        So true. But as I understand it, since Siri is in the cloud, it’s responses can evolve and change continuously. I imagine a team of people running around inside Siri’s brain keepin’ it fresh daily.

    • Anonymous

      It is not just bringing it to the masses, it’s making it work. Actually work in the real world. Invention is step 1, designing product solutions based on that invention is step 2, advancing the state of the art with a successful solution is step 3.

      The electric car was invented in the 1800′s. Since then, we poisoned the planet and gave little girls asthma with the internal combustion car. So invention by itself is pointless. It’s just the beginning. It’s a sketch only.

  • rashomon

    To the questioners above: Moore’s Law! Moore’s Law! Moore’s Law! See Kurzweil re: exponential v linear growth in performance, and why we overestimate the near-term, and underestimate the long term.

    • Anonymous

      Also, Cook’s Law: sell twice as many of everything every year. Siri will have more users and more money going forward as well as more transistors.

    • Visualign

      Moore’s law of constant time intervals (18 months) for doubling (of processing speed) leads to exponential growth.
      Kurzweil’s thesis is that the time intervals for doubling are decreasing rapidly, which leads to hyper-exponential growth.

      Both growth models are much faster than linear growth. However, with hyper-exponential growth, you reach infinity in finite time, i.e. you have a discontinuity, hence Kurzweil’s “Singularity” theory (for the year 2029 last I checked).

  • http://www.informationworkshop.org Mark Hernandez

    And one more thing… Few think about this, but I have no doubt Apple already has it on its roadmap…

    Siri is 100% “reactive.” Just imagine one day when Siri is “proactive” and you get a message out of the blue from Siri saying something like…

    “Clay Christensen has just published an article you might find interesting.” or “The San Diego Early Music Festival season has just been announced and there will be performances of Telemann.”

    • The CW

      Google “Apple Knowledge Navigator” for a video made in 1985 that describes exactly what you’re proposing. Not only did they think about it but they had it on the roadmap 25 years ago.

      • http://www.informationworkshop.org Mark Hernandez

        I know, that is pretty amazing! I actually saw that video when it came out in 1987 having designed hardware for Apple products back then, and it made me dream. The August 1981 issue of Byte magazine had the same affect on me (which I still own) which had 14 articles on Smalltalk’s object-oriented systems and programming.

        Interestingly, in that video the assistant was never “proactive.” He only responded directly to commands and incoming calls. However, he often summarized and reinterpreted.

        So there are three capabilities so far. Being able to 1) respond, 2) summarize and reinterpret, and 3) anticipate. Apple is now popularizing the first one. :-)

      • Alan

        Tidbit courtesy of Andy Baio (http://waxy.org/2011/10/apples_1987_knowledge_navigator_only_one_month_late/) the date used in the KN video was 9/16/2011. One month off from the Siri demo!

      • Anonymous

        Apple has code to summarize text in OSX; it’s had that code (which works adequately) since at least the days of OS9. That’s a fairly easy problem.

        I also think people are being exceptionally silly in not thinking what an “intelligent assistant” can do TODAY.

        Let’s give just two examples:
        - why not the ability to tell your phone “I don’t want to be disturbed for the next three hours”. Then when you are called, the phone could intercept the call, say “Joe doesn’t want to be disturbed. Would you like to leave a message or is this really important?” Likewise for sleeping.

        - write rage. We’ve all had the experience of writing something (email or otherwise) in anger, sending it off, then feeling ashamed the next day. How about the “assistant” monitors what we’re writing for emotional cues and then just gently suggests “What you’ve written appears to be written in haste. Would you like me to delay sending it for a day, so you can think about whether you’d like to revise it?”

        The point of both these examples (and Clayton’s and Horace’s point) is that some technologies take the form of “boil the ocean” — everything has to change, all at once, to make them useful, and that’s a really hard bar. But if you can get your technology to achieve something useful while not being perfect, you can refine it over time as it becomes more and more widespread. Of course the internet provides a perfect example of that — we didn’t kickstart the WWW by saying “Well the only thing this is good for is watching HD movies, so unless you’re able to switch from your 33.6kbps modem to an 18Mbps connection, you might as well ignore web pages”.

        What Apple has done is to switch the problem from hard and needs to be in place all at once — close to perfect dictation — to useful today — assistant tasks which can grow over time, and of which there are a huge number, many of which people don’t even think of as being something their computers/phones could help them with.

    • Anonymous

      We already have that kind of AI, it is called Twitter.

      • GeorgeS

        Twitter is NOT “AI” because every tweet is entered by a person. Try again.

  • The CW

    The other day, I was late meeting a friend at a bar. He got there and iMessaged me that he was there and waiting outside. I was in dense traffic and try to avoid texting while driving. When traffic cleared I tried to get a short reply to him. “12 mins out.” What was sent instead was “1/2 ins put.” It was the first time I thought I needed Siri. Texting while driving is dangerous and useless with a capacitive keyboard. Siri may save lives. There’s your value proposition.

    • Anonymous

      WP7 does it without making you touch your phone even once.

      • Anonymous

        And?

      • Anonymous

        Sometimes things are simply what they look, without deeper contexts and insinuations.

        The suggested value proposition isn’t unique and it’s not really the killer aspect of Siri, which is more than handsfree reply to an incoming SMS. It’s not even the best implementation of this feature.

      • Anonymous

        True. But I didn’t know where you were going, so I asked. ;-) Now, I know.

      • Anonymous

        What are you saying? That WP7 has a form of voice recognition good enough to bang out texts? OK, that’s not surprising — Android has much the same.

        But how does the not “making you touch your phone even once” work? I suspect you are being deliberately disingenuous here. How is the phone to understand that you are addressing it and not someone nearby? Surely (battery life) it can’t constantly be on and running recognition SW on what you are saying.

      • Anonymous

        There’s a setting that activates the function when a headset, wired or BT, or handsfree kit is connected, as far as I remember. You can also enable it on demand.

        So in this mode, when you have an incoming sms, the phone asks you whether you want it read aloud to you. If you say yes, it does so and then asks if you want to reply. And so on.

      • GeorgeS

        Interesting. My Quadra 660AV could read text to me in 1994. It also did voice recognition.

      • Anonymous

        What is WP7? Is that an iPhone app?

    • Anonymous

      Yes, good use case. But also, you can let your friend wait rather than killing people with your car. Was he on fire at the time? Needed breathing instructions?

      The BlackBerry outages show in traffic accident statistics. It is so bad that the first thing that happens after an accident now is your phone records are pulled and times correlated. You’re less likely to kill someone if you are drunk.

  • http://joeclark.org/weblogs/ Joe Clark

    “[I]t takes route [sic] in these new applications”?

    • http://twitter.com/bennomatic bennomatic

      You beat me to it. Either way, I’m rooting for Siri!

  • http://www.facebook.com/people/Minsoo-Park/100001265179164 Minsoo Park

    Although Siri is in beta stage right now, we can expect it would get better and better with more feedbacks and data from millions of users … I think Apple is using iPhone 4S as a training platform for Siri so that iPhone 5 can be shipped with Siri without the “beta” tag. A year from now, Siri would have advanced so much that the competitors would have hard time to catch up… if ever.

    • Just Iain

      Look at it this way. How many updates will iOS 5 have in a year? 3 to 5 minimum!

  • Anonymous

    1. For good background, read Pogue on the really good voice software: http://www.nytimes.com/2010/07/29/technology/personaltech/29pogue.html?pagewanted=all

    2. There are labs right now that are working on delivering the original sound file as an MP3. And a few visionaries have a plan for “Real-Time Waveform Delivery”. An extra security option was planned based on a nationwide network of copper wiring, but that was cancelled due to the exorbitant costs. A certain teacher of the deaf, A. G. Bell, vows to continue development despite a complete lack of VC funding…

    • Anonymous

      I think there is a technical flaw in number 2. You can shoot a video of yourself on an iPhone and email it to anyone in the world and they can view it on any device because MP4 is universal. Remove the video track and send audio only and your bandwidth goes down by 90%, that is easier. What do copper wires have to do with it? If it is not TCP/IP it is already dead.

      ViaVoice also provided the user with the original audio of their dictation. The computer has to record the user. Saving the audio means not clearing a cache.

  • David V.

    For long-form typing, I agree Siri (or any other fully automatic voice-transcription software) isn’t very useful yet. However, for short-form (SMS, tweet, short e-mail) typing, it’s adequate, and one of Siri’s strengths is exactly that it handles voice-based correction quite well for those applications. (See also my reply above, which I wrote before I saw your note here). I don’t think it’s too early to “talk about” disruption, because it’s a significant step forward both in terms of technology and in terms of deployment choice. I would agree, however, that it’s too early to _conclude_ that disruption is occurring here. IMO, it’s more than a showcase because it’s useful and practical for a limited set of circumstances today. We’ll see if Apple (or anyone else) can evolve it to become a primary interaction method in the relatively near future.

    • Anonymous

      I agree.

    • Anonymous

      The goal for voice for some time now has been “enable the user to accomplish 30% of routine tasks by voice,” and this was estimated by many to come in 2012. Siri will accomplish that for many users. Routine tasks excludes things like authoring a Keynote presentation or playing Angry Birds, but most certainly includes calendar appointments and messaging.

      So Siri doesn’t have to be your primary interface to the entire phone to be a success. Being your primary interface to your calendar and messaging and other routine tasks is already a huge win.

      • airmanchairman

        Even the humble VoiceOver feature of earlier iPhones like the 3GS and the 4 can prove invaluable in certain, mostly mobile situations: for instance, not having to break stride/jog while scurrying to a train platform, unable to raise one’s wristwatch or smartphone to eye level on account of numerous, heavy bags – “what time is it?” via a headset being enough to allow you to decide whether to increase your pace to a mad dash or to relax to a stroll.

  • http://fahrenheit98.wordpress.com/ VrDrew

    Any time tech bloggers whine about how “little” Apple spends on R&D, they ought to think about the implications of Christensen’s talk: IBM spent ~ $700 million on voice recognition research, in the hopes of selling a few thousand boxed copies of a software program that did (badly) something nobody wanted.

    Mobile Search is the next big business frontier. Its why Google is throwing billions into Android. But Google’s Search box isn’t the right interface for mobile search. An ideal mobile search interface is ~ 80% sound based: You ask a question, the device reads an answer back to you.

    You also need to think about what SORT of search is done by people on mobile devices. You aren’t going to use your smartphone to research the Peloponnesian Wars or Avagados Constant. You are going to ask your phone where the nearest Thai restaurant, gas station, or hardware store is. This is the sort of search that actually is worth something to advertisers. And its also precisely the sort of Search that Siri seems ideally suited to perform. Wanna bet Apple is already working on a way to monetize that capability?

    • Anonymous

      Apple doesn’t spend much on R&D because it doesn’t do a lot of what is usually considered R&D.

      Apple’s forte is execution, not exploration. This company rarely (if ever) does things others haven’t tried before. It just consistently manages to deliver them in a way which is meaningful to users. Siri is a great example of this.

      So they don’t spend billions searching for the next disruption that may or may not occur in 20 years. They roll up their sleeves and, well, disrupt markets by doing what needs to be done. They try to avoid technologies that don’t work but because they don’t have this absolutist scientific mentality they often don’t wait for technologies to reach the state where they are perfect. Instead, they engineer them into working products. Microsoft Research e.a. have grand visions of things to come. Apple seeks opportunities.

      • Anonymous

        Apple’s R&D:

        • buy PARC
        • buy NeXT
        • buy SoundJam
        • buy FingerWorks
        • buy SRI

        … startup culture.

        Apple only pays for successful research, not all the failed startups that they don’t buy. Why create startups inside Apple when you can’t walk through Palo Alto without tripping on one? You buy the good ones.

        Steve Jobs talked about this in 1997. Funnily enough it was when somebody basically asked him for Siri. He said there is so much headroom just bringing stuff that already exists to the consumer in a way they can use right now that Apple doesn’t need to work on stuff that is 10 years away and may turn out to be 20 years away or never work at all.

      • Anonymous

        Exactly. When I think about Apple, I recall Japanese electronics in the ’80s. They mostly didn’t invent things, but made them beautiful, reliable and well-specced. For the time, of course.

      • Michael Corrado

        When Siri was announced, I also recalled that comment by Steve about AI R&D at the 1997 keynote and am glad you mentioned it. That talk was so much like a blueprint for the future, I wonder if it reveals any clues to what’s next. I think that flash memory changed his attitude about carrying around hard drives, “a lot of state,” because it is so light and easy to carry and didn’t exist at that time. So, why iCloud? It can be used to fill in that “headroom” he talked about, but Time Machine and flash drives fulfill part of the function he described. Having a home directory anywhere – is the North Carolina server farm necessary for such a task? Does Siri really require all of that backend?

        What if intimate personalization in search and web interaction is the goal? And in such a way to avoid the Supreme Failure in privacy that is Facebook and Google Analytics et al. What if the goal isn’t to sell customers to advertisers, but to give Apple customers an intimately crafted experience that they will love and find irreplaceable by any competing service?

      • Davel

        I don’t know anything about the 1997 speech, but I heard Thompson and Ritchie talk on separate occasions about a project they were working on called plan 9.

        Essentially a network computer where the storage and the CPU and the program can be anywhere. Controlled by a console that essentially is a lightweight pc.

        In essence they were talking about cloud computing and software as service.

        Of course the issue was not storage or CPU, but the network pipe which at the time was very narrow and made the above architecture unworkable – at the time.

        So here we are in the 21C and their vision is all around us.

      • Anonymous

        http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs

        (as opposed to plan 9 from outer space)

        It’s still around, it even runs on ARM.

      • Davel

        Thanks for the link.

        Didn’t know they actually released it!

  • berult

    AI’s human creed: “all for one, one for all”; ‘serial’ intelligence, stem cell humanism …to the power of one. One within a whole, the whole withinin one; function specificity metabolizes the cellular network, metabolism lights up the reproduction fuse …on to the next level of inhabited comprehension.

    AI’s human greed: Hal for one, one for Hal; the solipsistic intelligence humane endeavor …’Deus ex machina’ as subprime directive. Scratch their backs right through hearts and souls on through the memory crux, the legacy belly buttons. Wipe the sore, borne out wisdom off consciousness’ sensual journey…

    Both are cut from the same ‘grey matter’ perishable cloth; the longer lasting one being, as authentic core intuition would demand, the one with the woven-in redeemable plot…

    Apple-nurtured Siri, the embryonic thin client to the human brain …in all its generically enhanced ‘singular network’ identity. In order to sequence Siri’s DNA, …well the genome’s gist ferments on Apple’s mailing list. 225 million calls to arms …for help …to chat along, and no end in sight, …and learning thin client life through it all…, with a deferring sense of humor…

    …glad Siri didn’t fall onto greed’s lap…

    • Asymco fan

      I love and hate your comments at the same time.

      • Anonymous

        I’m sure berult knows what he’s talking about, but the fact is, if you have to spend so much time deconstructing the language to get just a basic gist of understanding, then as a means of communication, it’s a failure.
        I’m sure it’s not gibberish but neither is it communication if conveying an idea to someone else is what you want to do.

      • Anonymous

        Oh it’s definitely gibberish.

      • http://twitter.com/Marcos_El_Malo Marcos_El_Malo

        It’s poetry. If poetry isn’t your cup of tea, you won’t get it and it will probably scan as gibberish. Some people just aren’t into poetry. Even if you are a poetry fan, it still might not be appealing and still might come across as gibberish.

        I think of Berult’s comments as little breaks that manage to stay on topic (more or less). He’s not hurting anyone or detracting from the conversation, and he might even be helping and adding to it.

      • Just Iain

        I like it. Read slowly and think about each line and as you progress, the idea emerges.

  • Anonymous

    Thank goodness Apple has committed to voice command. As Horace notes, it’s been out there as an inevitability for a decade. I long thought it was the next thing. Apple knew better. Touch was the next thing. But now voice is the next thing. This time I am positive…

    • Anonymous

      They were demanding enough to recognize touch needed multi-touch, not stylus, and that voice needed AI, not preset commands. The tech has to adapt to the user who has fingers and says 1 thing in 20 ways.

      • Davel

        They tried stylus years ago. It didn’t work out

  • Rudolf Charel

    i am afraid that Siri will at first be very much a local/national system.
    WolframAlpha is in English only. Searching for places and routes as well as general info is very language and national based.

    Siri is offered in English, German and French and I presume gives information based on France, Germany, UK and USA.
    Without using Siri myself, will it be applicable to an English speaker in other countries like Spain or Hungary? Will French or German speakers in Switzerland find it useful?

    I’ll wait till I get my iPhone 4s, but in the Netherlands Siri is not mentioned by Apple on their website.

    Siri is much more difficult than it at first appears and Apple has a lot more to do to make it ubiquitous. They were right to launch it as Beta.

    • Anonymous

      You are right, it’s a ton of work, and just getting started. But we have only had UTF-8 for a decade. The entire 20th century of computing was an international disaster. Apple has committed to one product for the whole world, so it will come faster than we think.

      My understanding is that some languages are truly unsuited to autocorrect, which makes them very hard to type on a phone, so those users will benefit even more than English speakers.

      It won’t just be Apple doing the work. Startups are launching right now. Calls are being made to Cupertino, pitching existing services. WolframAlpha has an even more clearly articulated need for more languages now. iPhone 4S was a huge launch. Siri is quite famous. To many 3rd parties, she is an opportunity. (Although she is a he in UK and Germany.)

      • Anonymous

        Indeed the UK male voice is big turn off for me, he sounds smug without even being funny. The single most important feature for me before I will get interested is the ability to choose the output voice ‘skin’ independently of the input recognition and region locale.

        I’d happily pay £5 or so for different voice skins. The right voice will build an emotional connection to the device while the wrong one will do the exact opposite.

    • Davel

      You are right. It is difficult. The demo I saw was astounding.

      Very carefully crafted. Very circumscribed, but if it can do in real life what it did in the demo they have a disruptive force that may be greater than the first iPhone.

      There will be issues like can it understand you when you have a cold? And your point about localization is taken. It may not be able to handle that scenario.

      I am waiting for them to release a Chinese Siri. If they can do that apple will take over china.

  • tmay

    What is interesting about Apple’s voice solution, is that the basis for the Siri experience is ubiquitous on identical hardware that will almost certainly be in production for 3 years. My WAG would be 100 million iPhone 4S devices alone. Neither Google nor Microsoft will ever be in a similar position. Having the hardware variables (mostly) out of the equation would appear to be a huge advantage for Apple. That is no quarantee, but it does suggest that Apple will be very efficient at upgrading Siri over its lifecycle. The real question in my mind is whether more acquisitions (I’ve heard Quora mentioned as desirable) might be an advantageous path for Apple to pursue.

  • Anonymous

    That is a great point about task disruption. Recently, I purchased a copy of Dragon Dictation to see how well the state of the art has advanced. It’s good. Very good. But it is not great. The single biggest problem is that it still requires too much thinking. We do not think in terms of punctuation for example. I don’t think “Hello comma my name is The Eternal Emperor period.”

    As long as the system cannot properly punctuate, the solution will be inferior to our ability to automatically construct sentences. And that is ignoring that the accuracy is still only about 99%.

    But Siri, already is different. Couple with iMessage, it is a surprisingly quick way to communicate with someone who cannot do so by phone. It is great as some have said for timers, events, quick searches, reminders, finding contacts, playing songs or playlists and just for laughs, can sometimes amuses.

    But the discrete tasks are killer. It does’t take a great leap of logic to move from “Create a message telling wife that I have the milk” to “create a tweet” or “create a facebook post”.

    Or, when you have 200 apps buried in folders, to say “I want to play pocket god.” or “Turn off wifi”.

    To me, the crucial element is activation. We need a keyword that can fire it up ala Star Trek “Computer, perform a task”. When they drop the requirement to hold the button(or put the phone to your head) and allow the system to enter conversational mode via voice AND add a few more tasks, this thing is really going to explode.

    • Anonymous

      Incidentally, just did this “Send message to wife telling her to call person at 123-456-7890″

      Pow. Done. No going to text, no typing. I spoke while reading the name and number off the notepad where I took the message.

      Very useful. This is Apple’s advantage. Integration. iMessage has complete supplanted Textfree(3rd party free text app) with respect to messages with the wife.

    • unhinged

      I see the possibilities, but I also see situations such as casually mentioning work matters to a friend and dropping a line such as “I should tell the boss this thing is never going to work” and the computer interprets that as a command…

  • Anonymous

    Looked around at others of Clayton Christenson’s dox. Lovely…

    “And if your attitude is that only smarter people have something to teach you, your learning opportunities will be very limited. But if you have a humble eagerness to learn something from everybody, your learning opportunities will be unlimited.” http://hbr.org/2010/07/how-will-you-measure-your-life/ar/5

    Humility is key for this very reason. It’s a learning stance, as opposed to arrogance, presumption and perfectionism, which strangle change and transformation. Imagine a caterpillar refusing to give up a perfectly usable “business model” in favor of one that can only promise flight.

  • Anonymous

    What has amazed me over the past 10 years with Apple is their pace: they seem to be outrageously patient, yet relentlessly pushing forward. They’re behind everyone else (no phone, no tablet, no TV) until they ship something that obsoletes them all. Then non-stop small improvements add up very quickly. Their ambitions are so large, they have to be humble to reach them, willing to take one step at a time.

    Look at OS X in 2001: clearly way better than anything else if you take a long-term view, but in the short term, basically just a pretty place to run Xcode. And the graphics were so slow because the GPU’s of the day were built for a different display subsystem architecture. We had to keep reminding ourselves this is just the beginning of something. But then only 2 years later, Panther was a joy to use; then 2 more years, jump to Intel; then 2 more years, iPhone.

    So I think you are very correct that Siri has a bright future.

  • Anonymous

    What I am struck by (still) is how each iPhone becomes a different device for each person. It’s because of the apps — pick up somebody else’s iPhone and they’ll have a whole different set of apps and organization than you, and it feels like a different device. That has added another dimension or meaning to these “personal” devices.

    And now Siri feels like it’s taking that personalization and its maleability to us to another level. How I use Siri a year or two from now will likely be quite different from how you use it.

    Simple, but so very remarkable.

  • Derik

    Horace,
    You mentioned in the “getting to know you” episode something that started a thought.
    You stated “Apple will now be handling the query request”, essentially causing Google Analytics (or similar) to be obsoleted because all queries will then be (if from Apple) simply identified to Google as Apple.
    It would seem then, SEO as we now know it will morph from satisfying Google search requirements to satisfying perhaps a Siri request (Siri Request Optimization -SRO). I’m interested to understand what Siri friendly web development looks like and how it’s measured.

    • Secular Investor

      Derik,

      You’re right. Siri could be devastating for Google.

      iOS owners are by far the heaviest internet users. According to Congressional testimony two-thirds of mobile search requests to Google come from iOS devices. In due course Siri will allow Apple to control search requests diverting many away from Google to Yelp, WolframAlpha, Bing and others.

      Those that are sent to Google will be anonymous, as you say, simply identified to Google as Apple and therefore blocking Google’s Analytics.

      Additionally Siri may strip the Google response of any Ads i.e. zero revenue for Google from Siri iOS.

      Sweet revenge for Apple!

      • Anonymous

        As I mentioned above, Siri’s fall-through Google queries aren’t anonymous, and its silly to think that Google is going to let Apple make anonymous, ad free requests without some business arrangement.

      • Afh100

        good point SecularInvestor, I do some analytics on a website for a large international property business (B2B, non-consumer site) and it’s been interesting that the google analytics show an 88% iOS internet usage for mobile (tablet and phone), with the remainder Android and just 1 percent to Windows. Seems that despite so many Android phone sales many of those users are just playing Angry Birds and leaving it at that… ;-)

    • Anonymous

      For now, Siri punts to a plain-old google search in Safari if it doesn’t have a suitable result. Apple gets a chance to log the query for later analysis, and google still gets a query, associated with a user (or at least the cookie on the users mobile browser).

      As Apple starts learns more about its user (and its users), they’ll be able to handle more requests without Google’s involvement. I think it is extremely unlike that Apple is going to be doing anonymous Google querys on its users behalf without Google’s consent.

      • Paul

        Maybe not, but Yahoo & Bing sure will

  • Anonymous

    Siri is a smart technology move but as with all successful AI it is about limiting the domain more than impressive intelligence. Your # 3 “plenty of headroom” assumes that these technologies scale well. From what we’ve learned so far it is very likely that they do not scale at all. The “winter of AI” (1990-2010) may be over and the popular press is once more printing enthousiastic stories about AI’s ‘headroom’, but that does not mean it is true.

    Maybe something could dig up the amounts Microsoft spent on technologies like these (and Japan’s 5th generation, the EU’s Eurotra and DARPA’s programs). Billions went in those programs only to show time and again that the technologies do not scale even if we would like them to.

    Still, it is still nifty, useful and even with its limitation a very good implementation of what you can do.

  • Anonymous

    Audio-based interaction is definitely going to be more used in the future. Apple is an early participant with Siri. BTW, Microsoft bought Tellme in 2007 for $800M. Whatever happened with that?

  • Acqua

    Thanks Horace,

    looks like I’m not the only one who thinks that Siri will be big, very big, in the not so distant future. By the way, I would never ever post anything personal on Facebook, as a matter of fact, I shun Facebook. However, I wouldn’t hesitate to tell Siri everything about me and my relations. A zillions users’ input would bring about a fine teaching lecture for Siri. That’s the kind of social web I’d really like to participate.

  • Anonymous

    Application is everything. If you heard some sort of AI was only 10% accurate, you’d think it was crap, and yet people love the Google because they put 10 results on a page and let the user’s brain pick the best one.

  • Pingback: Perspektive für Siri « Matthias Deuschl's Blog.

  • Eric D.

    I’m currently using DragonDictate on my Mac to compose this comment. I can easily type 85 words a minute, but after 15 minutes, my repetitive stress injury starts acting up. My right wrist gets sore pretty fast, but that’s nothing compared to the shooting pain in my elbow, at the ulnar canal. There is a surgical fix, but it involves wrapping the nerve around the other side of the elbow. No thanks.

    I started working with the PC version–Dragon NaturallySpeaking–five or six years ago. It was pretty good then, and it has continuously improved over time. You can let it add punctuation by itself, but it’s more reliable to speak the punctuation yourself, and frankly once you’re used to that, it seems quite natural.

    The Mac version still has some catching up to do, but it’s really nice to be able to just lean back in my chair and let my thoughts flow out without worrying how long I can keep this up before the pain takes over. Repetitive stress injuries are cumulative. Scar tissue builds up and pinches the nerve. Working through the pain causes more injury and more scar tissue. Dictation is a fantastic solution to this problem. I’m grateful it has arrived just as I was getting disabled.

    NaturallySpeaking lives on your computer and its database is formidable. For example, it recognized Jules Verne’s globetrotting character Phineas Fogg the first time I spoke it. And you can train it to recognize new words, or learn to understand your pronunciation in general.

    Siri however, is working at a whole new level. Beyond recognizing speech patterns for the purposes of transcription, it seeks to understand the intent of the user’s command or question and respond appropriately. Yes, it’s relatively limited, but as millions of people start using it on a daily basis, it’s going to get a whole lot smarter. I’m looking forward to watching it grow up.

  • airmanchairman

    “But it takes time. Like any truly useful breakthrough, it takes a long time to mature…
    …There’s no magic to it. In fact it’s banal. These are only the principles that every parent uses to raise a child.”

    And out of these innocuous, banal beginnings, the Earth-shattering threat of SkyNet was slowly brought into terrifying being… :-)

  • Rob

    Its worth mentioning that Apple are going to find it easy to add useful functionality to Siri, because the nature of the server based processing means they’re going to get a record of all the failed queries and can focus on the areas that will have the biggest impact.

    I expect Siri to evolve in leaps rather than steps.

  • Anonymous

    “The thing that strikes me about Siri is how “crummy” it is. It’s trying to be an intelligent assistant but it’s not nearly good enough. Many have called it “toy like”. It was so uninteresting that there was minimal coverage of it after the keynote launch. One can almost forgive the cynics. It was “yet another feature”.”

    What are you talking about? Siri has been all over the news since then. It’s gotten to be a pretty big joke on tech sites. There are still Siri jokes everywhere. Everyone knows at this point that the real “live demos”, the ones that aren’t pre-scripted by the engineering department, show that Siri doesn’t understand things very often. Depending on the person, Siri may not be able to understand over half of what they say. Not much of a global launch either. Most of the features only work in the US.

    And I’m not sure I follow Clayton Christensen. ViaVoice is an IBM product that was “the world’s first continuous dictation retail product,” way back in 1997. It was distributed by Nuance, the same company that made Dragon NaturallySpeaking, an accessibility product. “Crummy” or not, it allows people who are blind work in a workplace without needing help from other people. I’ve also seen coworkers using Nuance products because they had carpal tunnel. And does he not understand the concept of stock imagery?

    I honestly don’t see how you can draw a comparison between 2 products that were created almost 15 years apart.

  • http://areyousirious.blogspot.com/ Mark S.

    I know Siri has trouble with accents, but it can be hilarious from time to time… especially with deaf accents: http://areyousirious.blogspot.com