David Hendrawirawan of Data Integrity First and Nitin Singhal, VP of Engineering at SnapLogic shared their thoughts on data catalog trends, best practices, and challenges.
Nitin said his biggest challenge was determining whether the data catalog was “complete and correct.” He said he had to reconcile information with procurement to ensure every byte of data was accounted for. David said a data catalog should be driven by business use cases, not technical ones, and sponsored and owned by an executive leader, such as a product or risk. He also mentioned the importance of data literacy, data stewardship, and user experience. David also mentioned that metadata is data and needs to be managed like data.
In terms of lessons learned, Nitin mentioned that if the data catalog is compliance-driven, it doesn’t generate a lot of enthusiasm among users. In these cases, he said it’s important to rely on “facts”—explaining the repercussions to the company’s brand if sensitive data is exposed. He said he uses Hall of Fame and Hall of Shame lists to reinforce this message.
When it comes to evaluating data catalog products, Nitin says it’s important to define your organization’s problem statement well, whether it’s engineering velocity, discovery, regulatory, or operational efficiency issues. It’s also important to provide enough engineering support to ensure the product succeeds. It’s also important to define success metrics. David said data catalogs should integrate with other products, such as MLOps, data pipeline, data security, data marketplaces, and other data catalogs.