A California-based AI start-up has built a fully autonomous, AI-based database dubbed the Diffbot Knowledge Graph, which it claims has 500-times the knowledge of Google's equivalent, storing one trillion facts and ten billion entities

diffbot knowledge graph

It claims to be 500-times bigger than the database that underpins Google’s search engine – and now the Diffbot Knowledge Graph has launched today vowing to take on the online behemoth.

Tech start-up Diffbot has used a combination of machine learning, natural language procession and artificial intelligence (AI) to create a fully autonomous system that has curated more than one trillion facts and ten billion entities – including people, organisations and locations.

It says its knowledge graph will act as a single data repository for all the knowledge on the internet, making it substantially larger and more intelligent than the Google Knowledge Graph – the knowledge base used by Google to enhance its search engine’s results using information from various sources.

Mike Tung, CEO of California-based Diffbot, said: “Ours is a web-wide, comprehensive and interconnected knowledge graph that has the power to transform how enterprises do business.

“Google’s knowledge graph is little more than restructured Wikipedia facts with the simplest, most narrow connections drawn between them and built solely to serve advertisers.

“What we’ve built is the first knowledge graph that organisations can use to access the full breadth of information contained on the web.

“Unlocking that data and giving organisations instant access to those deep connections completely changes knowledge-based work as we know it.”

diffbot knowledge graph
Google presents information and facts from various sources (Credit: Wikipedia)

Breaking down the Diffbot Knowledge Graph

Most knowledge graphs in use today are partially autonomous with the majority of their content being curated manually, making the Diffbot Knowledge Graph (DKG) unique in as far as that its AI-based system makes it fully automated.

It was also built with the express intention of providing knowledge, unlike other knowledge graphs which primary objective is to support ad-based search engines.

The DKG’s natural language processing technology also means it can identify, understand and make searchable any information in any language.

Additionally, it is constantly being rebuilt in an effort to ensure its data is fresh, up-to-date and accurate.

It offers business intelligence, including the following:

  • People: skills, employment history, education, social profiles
  • Companies: rich profiles of companies and the workforce globally, from Fortune 500 to small firms
  • Locations: mapping data, addresses, business types, zoning information
  • Articles: Every news article, dateline, byline from anywhere on the web, in any language
  • Products: pricingspecifications, and, reviews for every stock keeping unit across major ecommerce engines and individual retailers
  • Discussions: chats, social sharing, and conversations everywhere from article comments to web forums like Reddit
  • Images: billions of images on the web organised using image recognition and meta data collection

Aydin Senkut, founder and managing director of Felicis Ventures, one of Diffbot’s investors, said: “Simply put, Diffbot is using the power of AI on a scale we’ve never seen before.

“It’s the first profitable AI company on record, it is the ‘secret ingredient’ powering applications from many of the largest companies in tech, and the launch of the knowledge graph is going to further elevate Diffbot’s status as a clear leader in the space.”

diffbot knowledge graph
Diffbot team August 2018 (Credit: Diffbot)

Diffbot, established in 2008 at Stanford University, develops machine learning and computer vision algorithms, as well as public APIs (application programming interfaces) for extracting data from web pages.

It powers applications for customers including eBay, cloud computing company Salesforce and multinational tech firm Cisco.

In 2015, the company announced it was working on the Diffbot Knowledge Graph by crawling the web and using its automatic web page extraction to build a large database of structured web data.