HACKER Q&A
📣 swyx

Why are ML developers leaving TensorFlow for Pytorch?


In the recent Lex Fridman podcast I noticed that geohotz was very happy about moving Comma.ai from Tensorflow to Pytorch (https://lexfridman.com/george-hotz-2). This sentiment was also echoed by Jeremy Howard doing the same with Fast.ai (https://twimlai.com/whats-next-for-fast-ai-w-jeremy-howard/)

I recognize these moves were made over a longish time period and some time ago but the coincidence was remarkable (to me, a developer who has dabbled but doesnt work with either). Is there some bigger picture going on here?


  👤 firebaze Accepted Answer ✓
From my own experience with colleagues doing ML: because they don't trust google anymore.

Rumor says it's related to the reason we're migrating from angular to react: the risk google will deprecate technology X is not quantifiable, if there are other solutions covering the same problem space not suffering from the "will it be deprecated tomorrow" problem, they'll win.


👤 yamrzou
Because when Pytorch came out, Tensorflow API was cumbersome and hard to work with. In comparison, Pytorch was much more intuitive. Being beginner friendly, it quickly gained traction. It was like Java vs Python :)

At the time, except for advanced use cases or prodution usage, most people used Keras, the simpler higher level API on top of Tensorflow. Until Pytorch came out, which supported both simple and complex use cases, with a simple API.

I heard that tensorflow made considerable changes to its API after that, but I didn't have a chance to work with either of them in the past two years. If I had to choose now, I'd choose Pytorch as I know it got its design right from the beginning.


👤 poletopole
I asked my friend who is both a web developer and does ML if he preferred pytorch to tensorflow and he confirmed what I had read online, which was that pytorch is better for parallel ML and other use cases that TF isn’t. It doesn’t appear that all TF libraries are made equal. An of course, Google does have a long history of abandoning their open source projects. However, my friend did admit that TF excels better at low level matrix operations.

👤 probinso
pytorch is so much easier than tensorflow. you can really do very cool things in both libraries but just in terms of lines of code and readability pytorch takes the Cake

👤 nyquistr8
Pytorch = imperative + pythonic.

Code is easy to read.


👤 Jugurtha
From my perspective in our effort to build our machine learning platform[0], I have to look at things in terms of impact on our capacity to execute projects faster in a more repeatable way, and help our data scientist colleagues.

I keep an eye on new things, but also notice and ask them about what they use and how, then work these frameworks into our platform without compromising flexibility.

This of course comes with some frustrations when you do actual projects that involve more than one person, and where documentation that starts with "first, download your dataset to disk" becomes off-putting.

For example, we look at how to use PyTorch with S3 object storage, and you stumble on threads where technical support staff tells the asker to find examples on the internet.[1]

This brings us to try and find ways to make it work on larger datasets in the context of our platform. This is not particular to PyTorch, though the support thread is hilarious. Looking for ways to use object storage with Tensorflow wasn't obvious. The docs show how you could use files giving a path string, but you have to dig a bit deeper in the source code to become aware you could give it a file-like object, and then you have to get the bytes from somewhere, wrap it, and give it to the function that consumes data. Sure, there's the `file_io`, but again, it is not super obvious and you have to inherit that to simplify usage and reduce the "activation energy" for data scientists.

This is generally true for other parts of the pipeline, and is a reason why we don't buy into the hype of "end-to-end" machine learning or "complete lifecycle management" announcement at conferences.

For example, we do automatic model detection from code and log the models and parameters with MLflow for now so that data scientists don't have to remember or know how to. It's all done for them. MLflow has documentation on the ability to "deploy" these models. However, it breaks when these models expect higher-dimensional input (tensors) and expects a DataFrame, so we're looking into pandas' MultiIndex and things like that[2]. But this shows how something that is obvious and common in the real world lacks support, or worse, the issue is closed by an intern who doesn't see how it is a problem, which has happened, or a bot automatically closing the issue.

We reach out to people to see how they are doing things, and they reply that they write custom code to handle these cases that are not edge cases. And for us, who are working precisely to reduce "custom code" so people can train, track, deploy, monitor, and manage models consistently, reliably, and systematically, this is not good enough and drives us to solve these problems without relying on our proposed changes to be merged into the main tree or forking the repo and having to maintain that fork and conflicts.

- [0]: https://iko.ai

- [1]: https://discuss.pytorch.org/t/will-pytorch-support-cloud-sto...

- [2]: https://github.com/mlflow/mlflow/issues/3570