• ExLisper@lemmy.curiana.net
    link
    fedilink
    English
    arrow-up
    21
    ·
    1 day ago

    I wonder where it’s gone wrong. What would it have cost github to keep operating decently for the vast majority of small users, and still have a business side?

    Why would Micro$oft keep project that doesn’t bring more and more profits? Github is no longer a product in itself for them. It’s a platform to sell Azure and Copilot subscriptions.

    • WhyJiffie@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      7 hours ago

      github is not a collaboration platform for them. It’s an AI service. just look at who are they reporting to since the CEO left last year

    • gravitas_deficiency@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      12 hours ago

      Microslop bought GitHub for the training data. That’s it. That was the whole point.

      The funniest part is that their model is considered to be rather shit-tier.

      • ExLisper@lemmy.curiana.net
        link
        fedilink
        English
        arrow-up
        4
        ·
        12 hours ago

        What? Microsoft bought GitHub in 2018. ChatGTP was released 4 years later. The AI boom wasn’t a thing when MS was buying Github and no one was thinking about using it for data back then. Cloud was big thing in 2018 and MS bought GitHub to integrate it with Azure and sell computing to people using github actions.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          3
          ·
          8 hours ago

          no one was thinking about using it for data back then

          Everyone with any foresight whatsoever has been thinking about using every source of data since the Babylonians were taking census 6000 years ago.

            • MangoCats@feddit.it
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 hours ago

              Before LLMs there were all manner of systems “trained on data” back through “expert systems” of the 1990s and beyond.

              Having direct access to all the code definitely gave Microsoft business data about which languages were being used, and how, most popularly, and by who.

              • ExLisper@lemmy.curiana.net
                link
                fedilink
                English
                arrow-up
                1
                ·
                5 hours ago

                And you think MS dropped $7.5B to get the data stackoverflow publishes every year for free?

                Of course owning data from the most popular development platform was useful to them but they didn’t buy to get data to train “expert system” or LLMs. They wanted to have direct contact with huge numbers of developers so they can sell them their products.

        • Croquette@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 hours ago

          LLMs are just one way to monetize the data. I would bet hand over fire that Microsoft used the data as soon as they bought GitHub.

          • ExLisper@lemmy.curiana.net
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 hours ago

            Yes but they specifically said “training data” which implies their use in LLMs. I agree they wanted user data, same as with linkedin, but I doubt they were thinking about “training data” in 2018.

        • NewNewAugustEast@lemmy.zip
          link
          fedilink
          English
          arrow-up
          4
          ·
          11 hours ago

          And they said years earlier at dev meetings: Microsoft is about data. Harvest all you can. Hence the linked in purchase. They may have not known chatgpt was around the corner, but they did believe that the value is in harvesting as much information as possible.

        • gravitas_deficiency@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          9 hours ago

          Google Voice was also a service designed to gather training data for speech to text / text to speech services at Google. That’s why it was free. The advent of LLMs just gave it something else to plug the data into. The Microslopening of GitHub, at its core, had similar motivations. Having effectively full backend visibility of all content on the (at the time) centralized service that damn near everyone who publicized their code was using to publicize their code was a valuable business proposition even before they shoved it all in to a training set.

          • ExLisper@lemmy.curiana.net
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            3
            ·
            12 hours ago

            We’re talking about using code to train models which wasn’t a thing until LLMs were able to generate code which was after they bought GitHub. I’m pretty sure in 2018 they weren’t looking at GitHub as source of training data. It was a way to get developers to use their tools. Everyone was using Github and MS wanted to market their products to them. First Azure, now Copilot.