Sunday, 14 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Getty Images drops ‘cleanest’ visual dataset for training foundation models
AI

Getty Images drops ‘cleanest’ visual dataset for training foundation models

Last updated: September 7, 2024 9:02 am
Published September 7, 2024
Share
Getty Images drops ‘cleanest’ visual dataset for training foundation models
SHARE

Be a part of our every day and weekly newsletters for the most recent updates and unique content material on industry-leading AI protection. Be taught Extra


Getty Images goes all in to ascertain itself as a trusted knowledge associate. The inventive firm, recognized for enabling the sharing, discovery and buy of visible content material from world photographers and videographers, as we speak introduced it’s releasing photographs from its library as a pattern open dataset on Hugging Face. 

Whereas there are many visible datasets on the Hugging Face hub, Getty says its providing stands out from the group for being dependable and commercially protected. This implies enterprise builders can combine it into their AI coaching pipeline with out worrying about high quality or authorized points cropping up sooner or later. 

“Think about constructing or enhancing your AI/ML capabilities with knowledge that’s not solely numerous and top quality but additionally comes with the peace of thoughts that it’s responsibly sourced. That’s what we’re bringing to the desk,” Andrea Gagliano, the pinnacle of information science and AI/ML on the firm, informed VentureBeat.

Ultimately, the corporate hopes the transfer will create an ecosystem the place AI firms would favor to go for formally licensed content material from its platform to coach their AI fashions.

What does the Getty Photos dataset have on supply?

When coaching AI/ML fashions, builders typically battle with the problem of poorly sourced, low-quality knowledge. To repair this, they resort to a number of layers of labor and clear/enrich the entire repository. This implies not solely eradicating duplicates and broken information but additionally filtering out harmful or pointless parts akin to movie star photographs, logos, NSFW content material, low-resolution photographs in addition to these with incomplete or lacking metadata (that helps fashions perceive context higher).

See also  Linux Foundation launches Essedum 1.0 to simplify AI integration in network operations

This process, given the dimensions of the dataset, can take quite a lot of time and sources, resulting in missed alternatives for the engineering staff. To not point out, even after all of the exhausting work, some dangerous or copyrighted supplies should still slip by way of the cracks and find yourself within the downstream mannequin outputs – stirring up authorized battles.

With its open dataset on Hugging Face, Getty Photos is attempting to resolve all these points, giving builders a ready-to-use repository of high-quality photographs protecting as many as 15 classes.

“This pattern Dataset contains 3,750 photographs from 15 classes, together with abstracts and backgrounds, constructed environments, enterprise, ideas, training, healthcare, icons, {industry}, nature, illustrations and journey,” Gagliano tells VentureBeat. 

Content from Getty Images sample dataset
Content material from Getty Photos pattern dataset

In response to the info science head, the repository comes from Getty’s wholly-owned inventive library, which implies the photographs are commercially protected and builders can use them with out having to fret about sudden authorized troubles at a later stage. There’s additionally no problem of cleansing or enrichment as the entire thing has been particularly curated for machine studying (ML) coaching with high-resolution photographs, supported by wealthy structured metadata, and no undesirable parts like NSFW content material. 

She described it because the “cleanest, highest high quality dataset” one might discover for coaching ML fashions.

Utilization circumstances to use

Whereas the pattern dataset is open to be used, it’s pertinent to notice that sure circumstances will apply to make sure the licensed content material is used responsibly for coaching/testing business functions and conducting tutorial analysis.

See also  Centralised AI is dangerous: how can we stop it?

“Among the restrictions embrace redistribution of the dataset, growth of fashions/software program to re-create/reproducing or producing digital reproductions of things of the content material contained within the dataset, creation of merchandise/providers in direct competitors with Getty Photos, create or use biometric identifiers derived from the dataset,  and use in any method that violates relevant legal guidelines or rules,” Gagliano famous.

Ultimately, Getty hopes the transfer will have interaction the developer group, serving to them perceive the depth and breadth of content material the corporate can supply, and lift consciousness that it may be a “trusted associate” for offering licensed, high-quality knowledge for accountable AI coaching.

“Our purpose is to point out that it’s attainable to accommodate licensing for all of the content material required to coach purposeful AI fashions – creating enterprise fashions that allow the creation of high-quality AI fashions whereas respecting creator IP,” Gagliano added. She famous if a developer wants extra knowledge, they’ll get in contact with the corporate with their respective use instances to supply a much bigger licensed repository.

This association may even see the unique suppliers/creators of the content material receiving compensation on an annual recurring foundation. Notably, Getty Photos additionally used the identical strategy for its AI picture era instrument developed in partnership with Nvidia.


Source link
TAGGED: cleanest, dataset, drops, Foundation, Getty, images, models, training, visual
Share This Article
Twitter Email Copy Link Print
Previous Article Rolling out AI Using genAI in IT operations boosts productivity, but security concerns linger
Next Article A step toward healthier indoor environments A step toward healthier indoor environments
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

A Guide to Moving to Tech-Savvy Norway

The attract of Norway, with its breathtaking fjords, vibrant cities and dedication to sustainability, is…

July 17, 2024

Digital divide – Virginia Business

Knowledge middle increase sparks opposition, regulation bids Printed March 28, 2024 by Elizabeth Cooper State…

March 29, 2024

CoreWeave sets AI infrastructure benchmark with NVIDIA GB300 NVL72 rollout

CoreWeave grew to become the primary AI GPU cloud supplier to deploy NVIDIA GB300 NVL72…

July 10, 2025

OpenAI isn’t going anywhere: raises $6.6B at $157B valuation

Be a part of our every day and weekly newsletters for the newest updates and…

October 3, 2024

EU launches strategy to strengthen research and technology

The European Fee has unveiled a brand new technique designed to bolster analysis and know-how…

September 16, 2025

You Might Also Like

Why most enterprise AI coding pilots underperform (Hint: It's not the model)
AI

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

By saad
Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Google’s new framework helps AI agents spend their compute and tool budget more wisely
AI

Google’s new framework helps AI agents spend their compute and tool budget more wisely

By saad
BBVA embeds AI into banking workflows using ChatGPT Enterprise
AI

BBVA embeds AI into banking workflows using ChatGPT Enterprise

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.