A team just announced the release of the Common Pile, a large dataset for training large language models (LLMs). Unlike other datasets, Common Pile is built exclusively on “openly licensed text.” On one hand, this is an interesting effort to build a new type of training dataset that illustrates how even the “easy” parts of this process are actually hard. On the other hand, I worry that some people read “openly licensed training dataset” as the equivalent of (or very close to) “LLM free of copyright issues.”

Read More...

Moving from NYC to Berlin gave me an excuse to update my old Pi-Powered MTA Subway Alerts project for the BVG. Now, as then, the goal of the project is to answer the question “if I leave my house now, how long will I have to wait for my subway train?”. Although, in this case, instead of just answering that question about the subway train, it also answers it for trams.

Read More...

I made a bot that automatically translates bluesky tweets* from a list of accounts into english. It is written in python.

Read More...

This post originally appeared on the Engelberg Center blog

Read More...

This week Prusa Research, once one of the most prominent commercial members of the open source hardware community, announced its latest 3D printer. The printer is decidedly not open source.

Read More...