Long Take: OpenAI, backed with $1B+ by Elon Musk & MSFT, can now program SQL and write Harry Potter fan-fiction

Jul 27, 2020

Hi Fintech futurists --

So much good stuff this week! Much of it is from different corners of our industry, so look out for individual write-ups in the Wednesday short takes. Here's what you'll get there:

WhatsApp plans to offer credit, insurance and pension products in rural areas in India and help digitize local small and medium-sized businesses. Definitely not a financial company, right?
Kabbage launches online checking accounts for small businesses. One of the best small business banking accounts. Meanwhile, ONDK is still barely alive.
Banks in US Can Now Offer Crypto Custody Services, Regulator Says. Major tide shift, potentially huge outcome, needs detailed analysis of regulator incentives.
Yearn Finance surges more than 70x pushing governance limits. As DeFi passes the $3 billion mark, understand this to understand what is being built.

This week, we look at a breakthrough artificial intelligence release from OpenAI, called GPT-3. It is powered by a machine learning algorithm called a Transformer Model, and has been trained on 8 years of web-crawled text data across 175 billion parameters. GPT-3 likes to do arithmetic, solve SAT analogy questions, write Harry Potter fan fiction, and code CSS and SQL queries. We anchor the analysis of these development in the changing $8 trillion landscape of our public companies, and the tech cold war with China.

Announcement

Nominate and Support Your Favorite Startup for a 2020 Global FinTech Award!

Benzinga is holding their 6th Annual Global FinTech Awards Ceremony this November 10th, 2020. This event aims to celebrate and honor those companies and executives that are striving to revolutionize the FinTech industry. You can vote for your favorite startup(s) for the Global FinTech Awards here and if you don’t see your favorite company nominated, click here to nominate them today. Buy your discounted tickets to the virtual event full of keynote speakers, networking and more! Use code BLUEPRINTVIP for 15% off tickets!

Long Take

We really like writing about artificial intelligence, but lots of people don't really like reading about it. It is too abstract and unfocused! You can't place a finger on AI. It's religion, philosophy, and existential dread all wrapped up into a bunch of techno-utopian speak. So let's boil it down to something the public investor can understand.

The top 10 most valuable public companies today are worth $8 trillion, and are 70% technology companies. They run massive data center clouds, chugging through the arithmetic of the collective human economy with mathematics of immeasurable scale. Apple, Microsoft, and Amazon alone are worth more the entire set of the top 10 most valuable public companies in 2010, each clocking in at nearly $1.5 trillion in enterprise value. Their mana is our flesh, and their exhaust is our lives in number.

The global conflict of the next century is defined not by the savagery of the World Wars, nor the mushroom clouds and realpolitik of the Cold War, but by who builds the hearts and brains of our technological titans. The economic thought leadership of the West is obsessed with the question of the Tech Cold War between the US and China, with 5G and Huawei providing the first set of waged battles (paygate):

We don't have to read the articles to know their conclusion, as it is like a scent in the air. China will leverage its full national strength to build out global infrastructure in bandwidth, blockchain, and artificial intelligences. Its anointed knights -- Alibaba, Tencent -- are already giving Facebook and Amazon envy in payments technology and the provision of financial services. When Amazon ($1.5T) buys Visa ($500B), the game will begin.

Saving Humanity

There is a good chance that the artificial intelligences we create will swallow us whole. We must be like Kronos, shackling our algorithmic offspring and teaching it to value human existence. The fear of the Singularity or some other runaway implementation of self-improving general AI plagues science fiction writers and billionaires alike. From Asimov's Three Laws, to Yudkowsky's Machine Intelligence Research Institute, to Musk's OpenAI -- all these concepts are meant to prevent an accidental, final mistake.

Which world would you like?

OpenAI started out as a non-profit initiative with $1 billion in funding to make sure we don't end up in the bad timeline. But it is hard to be an idealist in a capitalist world, and the entity recently switched into a for-profit status, so as to be able to raise more money and attract better talent -- raising another $1 billion from Microsoft. A profit motive increases innovation pace, as well as the chance of running at full speed off a cliff (e.g., self driving cars). Since then, OpenAI has been delivering on some uncanny technology.

Enter GPT-3: an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model. The research paper is linked here and a functional overview is here and here.

Yes, that abstract is impossible to read. Our takeaway is that OpenAI created a natural language processing algorithm that is yielding unbelievable results from a machine learning approach called a Transformer model. The model ingests a very large amount of data in a training process, then breaks it up into various weights of sequential interrelationship based on a derived attention variable, and is able to manufacture probabilistic outputs based on limited inputs.

Here's a diagram of how it would translate the German "Komm bitte her" to "Please come here", stacking neural networks one on top of the other and sequencing encoding and decoding (i.e., inputs and outputs):

The reason GPT-3 is performing so well is because it is brute-forcing the math on 175 billion parameters. The model's training process was performed on the Common Crawl data set, which was built on 8 years of crawling the web and includes a trillion words, as well as other bespoke curated data sets. Let's say this again.

This natural language processing robot has ingested the language content of the entire Internet, and is able to probabilistically answer queries using this corpus.

The Internet is the collated output of human knowledge and intelligence, as well as human misknowledge and disintelligence.It grows exponentially, not linearly. There will be only more, not less. Computational power is similarly only increasing. Whatever results GPT-3 has achieved today will appear child-like and simple in just 5 years. In engaging with GPT-3, we are engaging with a mathematical abstraction of all humanity. Forgive me for being reductive and naive, but it reads to me like we are working on simulating not the human brain, but humanity's brain.

What does GPT-3 do today?

Unlike other machine learning approaches which are trained on a specific set of tasks, this one is more general. To that end, it's not necessarily the best at anything. But it can take a short prompt and leverage its vastness to create a written result that approaches human-level quality. Below you can see how well it can do arithmetic, solve SAT analogy problems, and perform reading comprehension tasks. Similar problems would be multi-lingual translation, grammar and spelling correction, as well as generating Bloomberg articles.

The vertical axis is accuracy, with 95% or so being human performance and 50% being a pure guess. The zero-shot, one-shot, and few-shot categorizations refers to how much of a prompt the model is given before being asked for an answer. As we add more parameters along the horizontal axis, performance increases, even on deterministic tasks like arithmetic. But benchmarks are hard to interpret emotionally. Here are some results (source here, here), including a play about Harry Potter being a detective, and investment advice about Litecoin:

If you want to see the same concept applied to images, where the blanks are filled in by the model based on a large data set of visual data, here's a taste. These are pixelated, but we know it is a matter of years before resolution increases 10x and becomes hyper realistic as is already the case with GANNs.

Finally, here is the model generating programming code from written English language. The image included is about CSS -- a markup that styles websites -- but the same trick applies to back-end programming languages too. Importantly, the model doesn't know anything about the meaning or structure of the underlying programming language, where and how it is used, and so on. It only knows the probabilistic and relational algorithm that has been extracted from the Internet corpus to auto-fill prompts that a human user (today) provides.

One by one, white collar professions are incorporated into the capabilities of the computer. It is true that the only way GPT-3 can generate is answers is by digesting existing information. Someone somewhere had to do the ground work.

But the Internet already causes this. People contribute freely to the collective knowledge for a host of different reasons -- from money to altruism to a seeking of social status. Just look at Wikipedia or YouTube for reference. The digital digestion of this knowledge is merely getting started. Each cultural datum of information will be mined, sorted, and retrieved through a conversational interface in the years to come.

Impact on Financial Services

We have previously suggested an impact of $1 trillion in cost to the global financial services sector from a successful AI revolution, or about 20% of the existing cost structure across the front, middle, and back offices of the industry.

All you need to do to form an opinion is revisit the rank-order of the largest public companies in 2020 to see where the power lies. Unstructured data, like text and images, will continue to accrue benefit to the AI powerhouses, who will bring to life products, websites, and technologies with a synthetic conversational front-end. The banks don't own either the media or the broader cultural conversation, and in the West, are largely precluded from doing so by law. To that end, they are not able to host social data exhaust and mine it the way high-tech companies do today.

Structured data -- like the functioning of the financial instruments or the rule-based software infrastructure which underpins it -- will anchor in global blockchains. It is possible that as the space matures and accrues more examples, AIs will be able to generalize such smart contract code too, and perpetrate the creation of an infinite number of software-based decentralized autonomous organizations.

If we take at face value the $1 trillion cost opportunity, which potentially yields an 80% savings and allows for a $200 billion annual cash flow, that would imply an enterprise value space for a $2 trillion financial AI company. Our intuition is that Ant Financial or Facebook are far more likely to be that player than anyone in Fintech today.

However, there is still room for a number of well-articulated and narrow Fintech AI plays that could see $50-250 million exits. Traditional equity research and written investment notes on company earnings would fall squarely into the "GPT-3 automate" category. Certainly, nearly all human onboarding into financial products and the accounts which hold them can be replaced. Middle office platforms can use transformer models to fill out incomplete profiles for payments, fraud-prevention, AML, and underwriting use cases. Externally, marketing teams can use industrial content farming in every language to pull audience and attention towards their solutions, as well as nurturing client relationships with robot affection. What's wrong with a bit of algorithmic romance?

For more analysis parsing 12 frontier technology developments every week, a podcast conversation on operating fintechs, and novel food-for-thought essays, become a Blueprint member below.

Fintech Blueprint 🤖🏦🧭

Discussion about this post