When Microsoft introduced the Cortana digital personal assistant last year at the company’s Build developer conference, the company already left hints of its future ambitions for the technology. Cortana was built largely on Microsoft’s Bing service, and the Cortana team indicated those services would eventually be accessible to Web and application developers.
As it turns out, eventually is now. Though the most important elements are only available in a private preview, many of the machine learning capabilities behind Cortana have been released under Project Oxford, the joint effort between Microsoft Research and the Bing and Azure teams announced at Build in April. And at the conference, Ars got to dive deep on the components of Project Oxford with Ryan Galgon, the senior program manager at Microsoft Technology and Research shepherding the project to market.
The APIs make it possible to add image and speech processing to just about any application, often by using just a single Web request. “They’re all finished machine learning services in the sense that developers don’t have to create any model for them in Azure,” Galgon told Ars. “They’re very modular.” All of the services are exposed as representational state transfer (REST) Web services based on HTTP “verbs” (such as GET, PUT, and POST), and they require an Azure API subscription key. To boot, all the API requests and responses are encrypted via HTTPS to protect their content.
Currently, the Project Oxford services are free to try for anyone with an Azure account, though there are limits on the rate of usage. While its idiosyncrasies are worked out, the services can be leveraged through software developer kits for a number of platforms plus Microsoft’s Azure—bringing speech-to-text, text-to-speech, computer vision, and facial recognition capabilities to virtually any application, mobile, web, and otherwise.
For now, the missing piece is the intelligence that can take text and speech interactions for applications to the next step. That capability is wrapped in what Microsoft calls LUIS (Language Understanding Intelligent Service), a text-processing capability that will be able to determine user intent from a string of text whether it’s typed or spoken. LUIS identifies “entities” within text such as names, dates and times, actions, concepts and things, and the service can be wired into cloud applications to perform the appropriate task.


Loading comments...