Iterative Venture: Data Landscape in the 2020s

"First reckon, then risk." - Von Moltke.

Oct 08, 2021

To all the iterators in our community, we spent some time iterating and are back. We will focus on bringing you more insightful content going forward.

Data Landscape in the 2020s

Big thanks again to our panelists. Please refer to the full episode on our podcast here.

Why do we need data and historical perspective

Oftentimes, we overhear conversations that companies get acquired for their “data”, their data platform, or that we have clean (or unclean) “data” etc. What are they trying to say? Why would a company get acquired purely for their data? What is data and why do we need them anyway?

The way we understand data is that data are sample points much like the experiences we have in life. The more we see the world, the more data points we have, and therefore the more apt we are to make decisions (hopefully good ones) based on the experiences we have had. This is no different for a company.

Thus, data is essential knowledge for the company and our brains are just very complex and efficient data infrastructures enabling us to store data, make decisions, explain such decisions, and iterate after incorporating new information.

To make good decisions, therefore we should have a lot of data points to learn from and be efficient to access them. This is what led to Facebook’s creation of Hive, Presto, Scuba, and the likes to suit different problems such as quick data accessibility for debugging purposes and simple insights (scuba), cost-effective big-data data warehouse (Hive) for data crunching, as well as the mixture of the two (presto) so to have more “citizen data scientists” as per Ashu.

The new decade

Humanizing AI

At the turning of the new decade, and with the rise of the optimism of what machine models and AI can bring, so too comes the challenge of how to provide an explanation into how such models make decisions. This is where Krishna’s company, Fiddler.ai, comes in.

When it comes to complex machine learning models such as random forest, XgBoost, and neural network, etc. the models can sometimes be so complex with intertwined permutations that explanation is simply impossible as there is not a single formula we can boil down to.

For reference, below is an example of how neural network decision works at a very high level with each of the circle as a variable:

What are Neural Networks? | IBM — Source: IBM

Just how does it work? Many of us just take it as a black box.

We spoke with Krishna a while ago about this and he said that the reason why he is solving this problem is that there is a monumental shift in traditional industries such as banking and insurance.

Namely, there is a paradigm shift in the way how risks are assessed. In the past, models were deterministic. That means we have a complex if-else switch statement in place where we can evaluate whether someone is a risk and therefore we should deny their credit card application. The good thing is that we know why we rejected someone. The downside is that our models are fixed and the importance of each criterion is pre-determined.

Machine learning models on the other hand can easily incorporate new data and the models can easily be swapped with other models and thus can offer a dynamic solution to the problem.

Check out the podcast episode we had Krishna to learn more on how Fiddler is solving the problem.

Faster Iterative Loop

Another challenge that tech companies face in the new decade is the ever-competitive landscape as there are more and more entrants.

For reference, the number of apps hitting the Apple App Store is exponential.

Number of apps from the Apple App Store 2021 | Statista — Source: Statista (Note the time scale changes in 2020)

In this landscape, differentiation and the ability to learn and incorporate new changes become absolutely key. To this point, there is a saying in Silicon Valley that if there are two startups in competition against each other, the one that iterates faster will win.

This is where Vijaye and Statsig (short for statistical significance) comes in.

What Statsig is hoping to achieve is to allow any tech companies to easily set up the gatekeeper mechanism (the capability to show certain users a feature that a company wants to roll out and not others for comparison purposes) and the dashboard to collect relevant statistics for the control and test group in order to better understand whether the product launch had achieved the intended purpose.

Furthermore, Statisig offers the capability for companies to gradually roll out their features in releases in a controlled fashion.

This not only achieves the goal of relying on the platform for release as well as data collection but it also informs the relevant product teams of the effect of any of the releases thus inherently building in the data-driven culture as every release can now be backed up by statistics and data.

Democratizing Data Technology

From a broader overview and from investors’ perspectives, there is also general democratization of data technology as well as vertical integration for efficiency as per Ashu and Ravi.

As more and more folks coming from companies such as Google, Amazon, and Facebook, are spreading the data-driven culture, more and more people are realizing the power of data and are thus seeking to democratize data usage and thus allowing for more “citizen data scientists”.

One angle of such is to reduce the technical bar in order to achieve the same result. One prominent example that we came across in the past was Looker, acquired by Google in 2019 for $2.6 billion, as Looker looks at data as objects and thus allows for users to create LookML models (data objects) and to be joined with other datasets thus enriching the data in a drag and drop fashion reducing the need of having to write complex joins via SQL queries.

On the other hand, having vertically integrated platforms also means that we can have fewer people working on the same system as the systems can be centrally configured without companies having to write custom software.

One prominent example of such is the rise of cloud technology where in the past a company may have to set up their own data centers but now everything can be configured elastically via AWS, GCP, or Microsoft Azure.

Such trend is only accelerating as various buzzwords such as DataOps and MLOps enter into our common use.

Just a matter of time

With various trends emerging, the challenges for many companies, especially ones that are older, are that there are still a lot of data that reside in on-premise data centers as per Ashu. Such data centers, because of archaic technologies, do not lend themselves to easily transfer data across and thus people have to physically remove the hard drives in order to copy over the data.

On the other hand, while such trends are emerging in Silicon Valley and other tech hubs, many companies may not be able to afford the same level of compensation for tech employees to justify the said employees to go to other companies (a good number of which are legacy ones) and thus spread the data-driven culture outside of Silicon Valley.

However, just with all problems, it is a matter time that such culture will spread beyond the valley and I have no doubt that talented folks will come up with new solutions that it may be both cheaper, more accessible, and easy to use for companies of all kind to adopt a more data-driven solution and ultimately, shape a more data-driven culture.

Check out our other podcast episodes here.

Upcoming Events 📆

[In-person] Founders’ Night in Palo Alto (Venue: TBD) on Wed, 10/20 - Sign up
- Several founders in our network have requested a private gathering of founders for peer mentorship, education, and a support network. Please join us if you are early-stage founders based in the Bay Area.
- Note: The event will be hosted at outdoor patios and kept at a limited capacity to follow local COVID health mandates.

[In-person] Iterative x SVB DeFi Panel & Networking Event in San Francisco (Venue: TBD) on Wed, 11/10 - Sign up
- Co-hosting with SVB for inviting DeFi experts in our network for panel discussion & networking opportunities
- We will announce our panel lineup & event details for RSVP’d guests.
- Note: The event will be kept at a limited capacity to follow local COVID health mandates.

IV New Deals

Tamatem

Why we like Tamatem:

Krafton, the leading South Korean video games developer and distributor known for PUBG, is leading the current round (committed $6M) as there is deep synergy in collaboration.
Strong growth metrics:
- $14M gross revenue run rate as of 2020
- Achieved $900K profitability in 2020
- Grown by 100% Y-o-Y in revenue since 2016
- 100 million+ downloads to date across all games since 2014
MENA’s gaming market is growing at 25% Y-o-Y, the fastest growing market in the world according to this report.
Published 50 Games (of which all 50 reached top of the chart in respective market on both AppStore and Google Play store)
Backed by a number of top MENA-focused and global VCs:
- 500 Startups/ Social Capital (by Chamath Palihapitiya)/ Endeavor Catalyst/ Wamda Capital/ Raed Ventures/ Arzan Venture Capital/ Vision Ventures/ David Petraeus (Former Director of CIA; Expert in MENA)

Learn more about Tamatem here.

Kippo

Why we like Kippo:

Featured in AngelList’s private capital network, and the AngelList fund invested in the deal. It will most likely close in a day
Since its launch in Jan 2020, Kippo achieved avg. 33% M-o-M growth in revenue and reached $1.7M ARR with 135k MAU as of August 2021.
Launching Kippoverse, a social metaverse where users can create avatars and meet others over voice chat while playing mini-games together. Within Kippoverse, it’s introducing an NFT Marketplace with Kippocoin for purchasing various items for avatars (by EOY).
Jason Calcanis’s syndicate is leading the current round, and Leslie Benzies (Producer of GTA) is also in the round.

Learn more about Kippo here.

Contact Us

Iterative Ventures

Discussion about this post