June 12, 2025Michael Weinberg

Does an AI Dataset of Openly Licensed Works Matter?

A team just announced the release of the Common Pile, a large dataset for training large language models (LLMs). Unlike other datasets, Common Pile is built exclusively on “openly licensed text.” On one hand, this is an interesting effort to build a new type of training dataset that illustrates how even the “easy” parts of this process are actually hard. On the other hand, I worry that some people read “openly licensed training dataset” as the equivalent of (or very close to) “LLM free of copyright issues.”

March 08, 2025Michael Weinberg

Pi-Powered Berlin BVG Alerts

Moving from NYC to Berlin gave me an excuse to update my old Pi-Powered MTA Subway Alerts project for the BVG. Now, as then, the goal of the project is to answer the question “if I leave my house now, how long will I have to wait for my subway train?”. Although, in this case, instead of just answering that question about the subway train, it also answers it for trams.

January 10, 2025Michael Weinberg

Bluesky Bot to Translate Local News

I made a bot that automatically translates bluesky tweets* from a list of accounts into english. It is written in python.

December 13, 2024Michael Weinberg

New Open GLAM Toolkit & Open GLAM Survey from the GLAM-E Lab

This post originally appeared on the Engelberg Center blog

November 21, 2024Michael Weinberg

What Does an Open Source Hardware Company Owe The Community When it Walks Away?

This week Prusa Research, once one of the most prominent commercial members of the open source hardware community, announced its latest 3D printer. The printer is decidedly not open source.

Michael Weinberg

I put things here so they are on the internet