At Pinterest, we are driven by the goal to bring everyone the inspiration to create a life they love. Delivering inspiring content to our users is no easy task, but machine learning makes this possible. Across our entire ecosystem, ML is the engine that powers personalized recommendation feeds to match our users’ dynamically evolving tastes, enriches content through knowledge graphs and learned representations, and finally empowers our users to take action on their inspiration through emerging technologies such as AR and generative AI.
In this talk, we will share lessons learned through years of research, development, and experimentation. We will discuss (1) our architecture for inspiration, from our information extraction systems enriching our content corpus to recommendation feeds, (2) advances in personalization through leveraging computer vision, NLP, graph neural networks, and sequence modeling to capture the temporal behavior of users and learn better representations, (3) the importance of ML industrialization to building sustainable innovation, and (4) open-ended challenges and disruptive technology we are investing in for the future of inspiration.
Andrew Zhai is a Senior Staff Applied Scientist and a Tech Lead for Machine Learning in the Core Engineering team at Pinterest. This multidisciplinary group combines product, infrastructure, ML and research to build the organic Pinterest product including recommendations and search algorithms and infrastructure, content and user understanding, computer vision, trust and safety, and growth channels.
During his career, Andrew was one of the founding engineers of Pinterest Labs leading the visual search team through multiple generations of serving, indexing, and computer vision algorithms for Pinterest Lens. More recently, he leads several teams working on recommender systems, growth ML, neural modeling for users and content, ML infrastructure, and computer vision. Andrew received his B.S. at UC Berkeley and M.S. at Stanford.
Bogdan Arsintescu and Scott Meyer present together LIquid, the Graph Database System built to serve the Economic Graph at LinkedIn as a ’one query, one graph’ application. The knowledge the members seek in LinkedIn applications is combined in a data corpus known as The Economic Graph. This one graph is a global view of valuable relationships between members, companies, schools, previous employments, events, activities; the application allows users a glimpse of a neighborhood in the graph through one query. Ideally, this class of applications would be built sending one query to the knowledge base represented as one graph.
LIquid is the graph database built from the ground up and now fully operational at LinkedIn, serving a 230B edges homogeneous graph with a daily peak of 1.4M QPS with a 99.99+% availability. We will discuss our solutions to scale the throughput, footprint, availability required for this service and the methodologies used to accelerate graph construction and developer velocity in adding new data and queries to the Economic Graph service.
Bogdan Arsintescu is the Engineering Director for Graph Systems at LinkedIn. Prior to this he worked on high-performance computing and is the co-author of PathQuery in the Google Knowledge Graph, and on graph algorithms in multiple industries. Bogdan is passionate about managing innovation and systems that augment knowledge.
Scott Meyer is the creator of LIquid, the graph database which powers LinkedIn. He spent the past 15 years on graph databases, previously at Metaweb and Google, he built the serving system for freebase.com. Scott is passionate about large-scale technical leadership, building teams and software systems and first-principles problem solving.
Technology Innovation Institute
Wednesday April 27th – 10:15 – 11:00 Energy Bill of Extreme-scale language models: From theory to practice
We present an estimate of the energy bill of an extreme-scale language model: Noor. The project intends to develop one of the largest multi-task Arabic language models in the Arab world, able to perform a wide range of downstream tasks via natural language prompts. With a transparent approach, we assess the full energy bill of the whole project: storage cost, research and development cost, training cost, serving cost and other exogenous costs like travels, personal laptops, etc. We find that the total energy bill is not only about the final trained model as some state-of-the-art works let us believe, but should consider all the costs relative to R&D, hyperparameters tuning, data storage, etc. We also show that inference is an energy expense omitted by the AI community and that may be substantial if the number of queries gets higher. Then, we highlight the carbon footprint of this large-scale model, by estimating the CO2 emission of each of the previous phases. Finally, we discuss various actions and future practices in order to reduce the carbon footprint of these extreme-scale deep learning models.
Mérouane Debbah Mérouane Debbah is Chief Research Officer at the Technology Innovation Institute in Abu Dhabi since 2021. He is also Adjunct Professor at the Mohamed bin Zayed University of Artificial Intelligence in Abu Dhabi. Previously, he was with Motorola Labs, France (1999-2002), Assistant Professor in EURECOM, France (2003-2007), Full Professor at CentraleSupelec (2007), Director of the Alcatel-Lucent (2007-2014) and Vice-President of the Huawei France Research Center (2014-2021). His research interests lie in fundamental mathematics, algorithms, statistics, information, and communication sciences research. He is an IEEE Fellow, a WWRF Fellow, a Eurasip Fellow, an AAIA Fellow, an Institut Louis Bachelier Fellow and a Membre émérite SEE. He was a recipient of the ERC Grant MORE (Advanced Mathematical Tools for Complex Network Engineering) from 2012 to 2017. He was a recipient of the Mario Boella Award in 2005, the IEEE Glavieux Prize Award in 2011, the Qualcomm Innovation Prize Award in 2012, the 2019 IEEE Radio Communications Committee Technical Recognition Award and the 2020 SEE Blondel Medal.
Orange has been conducting research for many years in various artificial intelligence (AI) fields (Voice Recognition, NLP, Video Encoding, etc.) as well as in Data Science which led to some significant improvements in our services and products. But as many other companies, Orange now needs to scale its data and AI adoption to drive its business. Orange is now more than ever committed to this data and AI transformation journey.
In this talk, Steve will explain that data and AI is one of the 4 pillars of Orange’s Engage 2025 strategic plan. Orange aims to reach a new level in this transformation of the Group by positioning data and AI at the heart of its innovation model. One of the big challenges that the company is facing is to have data and AI considered as a commonwealth: it’s about breaking silos and democratize access to data and AI. To accelerate this, Orange has signed some strong partnerships and deployed extensive trainings to enable employee upskilling and reskilling.
Steve will also share his thoughts on how to implement this data democracy through DataGovOps. Then, he will give some examples of our high impact use cases around smarter networks, improved operational efficiency and re-invented customer experience.
Steve Jarrett is Senior Vice President of Data and Artificial Intelligence at Orange Innovation. He has taken responsibility for a new Data and AI Department defining the Group’s data strategy. This new organisation consolidates key skills to help the company develop use cases, enrich services, improve processes based on data and artificial intelligence, as well as enhance the value of this data externally.
Steve has been a product leader and entrepreneur in AI and mobile for over 25 years, including building one of Facebook’s first strategic machine learning programs in 2013, working with Apple on the original iPod, and product managing the first mobile browser at General Magic. He has also been founding CEO of three software companies in the US and UK.