Pune, IND
13 hours ago
Innovative Data Architect AVP
We are seeking a highly skilled and hands-on Big Data Architect with deep expertise in designing, architecting, and managing large-scale data solutions, particularly leveraging Spark, Hive, Starburst, and Python. The ideal candidate will have significant experience in scalable data processing, data retention strategies, and the management of data movement across different storage tiers—cold, warm, and hot storage. You will be responsible for architecting and implementing solutions that handle large volumes of structured and unstructured data while ensuring high performance, scalability, and cost efficiency. This role will require you to work closely with cross-functional teams, including Data Engineers, Data Scientists, and IT teams, to deliver efficient, reliable, and innovative big data solutions. If you are passionate about big data technologies and thrive in designing cutting-edge, enterprise-level data architectures, we would love to have you join our team. You will play a critical role in designing, developing, and maintaining scalable, high-quality software solutions while ensuring best-in-class engineering standards, including trunk-based development, test automation, security, and modern ways of working. This role requires deep expertise in system design, hands-on coding, and strong problem-solving skills to create resilient, high-performing, and secure applications. **Key Responsibilities:** + **Big Data Architecture Design:** Architect and design large-scale, distributed big data solutions using Spark, Hive, Starburst, and other big data technologies to handle high-volume data processing and analytics. + **Scalable Data Solutions:** Build and optimize scalable data pipelines to process structured, semi-structured, and unstructured data. Ensure the solutions are flexible and adaptable to future needs as data grows. + **Data Storage Architecture:** Implement and maintain data retention strategies and architectures that support cold, warm, and hot data storage tiers. Define processes for archiving, retrieving, and efficiently moving data across these tiers to optimize cost and performance. + **Data Movement & Integration:** Design and implement data movement processes to efficiently transfer data between systems and storage solutions, ensuring that data is accessible in the appropriate tier (cold, warm, or hot) based on business requirements. + **Performance Optimization:** Continuously monitor and optimize the performance of data processing workflows, ensuring the systems can handle large-scale workloads with low latency and high throughput. + **Data Retention & Archival:** Define data retention policies and practices, ensuring compliance with business and regulatory requirements. Design efficient and reliable archival strategies that support long-term data storage. + **Automation & Scripting:** Use Python and other scripting languages to automate data pipeline workflows, monitor data quality, and manage data processing jobs. + **Collaboration with Cross-Functional Teams:** Work closely with data engineers, data scientists, and other stakeholders to understand business requirements, ensure data availability, and provide expert guidance on best practices for data architecture and engineering. + **Cloud Integration & Platform:** Integrate and optimize solutions with cloud-based data platforms (e.g., AWS, GCP, Azure) and work with cloud-native big data tools (e.g., Databricks, Redshift, BigQuery). + **Documentation & Best Practices:** Maintain comprehensive documentation on data architecture, design decisions, and data management processes. Establish and enforce best practices for data management, architecture, and security. + **Innovative Solutions & Tools Evaluation:** Stay up to date with new technologies and tools in the big data space. Continuously evaluate new solutions and integrate innovative technologies into the architecture. **Required Skills & Qualifications:** + **Experience:** 7+ years of experience in data architecture, with a strong focus on big data platforms and distributed data processing technologies. + **Big Data Technologies:** Extensive hands-on experience with Apache Spark, Hive, Starburst, and related big data frameworks for distributed data processing and analytics. + **Data Storage & Movement:** Deep experience in architecting solutions for managing data across different storage tiers (cold, warm, and hot) and designing data movement strategies for efficient storage, retrieval, and archival. + **Cloud Platforms:** Experience working with cloud-based platforms (e.g., AWS, Google Cloud, Azure) and cloud-native big data services (e.g., Databricks, BigQuery, Redshift, S3, GCS, etc.). + **Python & Automation:** Strong proficiency in Python for scripting, automation, and building custom data processing solutions. + **Data Retention & Archival:** Proven expertise in designing and implementing data retention policies and archival strategies for large-scale data environments, ensuring compliance with regulatory standards. + **Scalability & Performance Optimization:** Experience with designing and optimizing scalable systems that handle large amounts of data, including optimizing Spark jobs and performance tuning. + **Collaboration & Leadership:** Proven ability to collaborate effectively with cross-functional teams, including data engineers, analysts, and business stakeholders. Ability to provide technical leadership and guidance. + **Data Security & Governance:** Knowledge of data governance and security best practices to ensure that data is handled securely and in compliance with industry standards. **Preferred Qualifications:** + **Experience with Starburst:** Experience working with Starburst or similar distributed query engines for data lakes and data warehouses. + **Data Warehousing & ELT/ETL:** Familiarity with data warehousing concepts and building/optimizing ELT or ETL pipelines for large-scale data environments. + **Machine Learning & Data Science:** Familiarity with integrating big data solutions with machine learning or data science workflows is a plus. **Certifications:** Relevant certifications in cloud platforms (AWS Certified Solutions Architect, Google Cloud Professional Data Engineer, etc.) or big data technologies. **Certifications:** + Relevant certifications in cloud platforms (AWS Certified Solutions Architect, Google Cloud Professional Data Engineer, etc.) or big data technologies. ------------------------------------------------------ **Job Family Group:** Technology ------------------------------------------------------ **Job Family:** Applications Development ------------------------------------------------------ **Time Type:** Full time ------------------------------------------------------ Citi is an equal opportunity and affirmative action employer. Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. Citigroup Inc. and its subsidiaries ("Citi”) invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review **Accessibility at Citi (https://www.citigroup.com/citi/accessibility/application-accessibility.htm)** . View the "EEO is the Law (https://www.dol.gov/sites/dolgov/files/ofccp/regs/compliance/posters/pdf/eeopost.pdf) " poster. View the EEO is the Law Supplement (https://www.dol.gov/sites/dolgov/files/ofccp/regs/compliance/posters/pdf/OFCCP\_EEO\_Supplement\_Final\_JRF\_QA\_508c.pdf) . View the EEO Policy Statement (http://citi.com/citi/diversity/assets/pdf/eeo\_aa\_policy.pdf) . View the Pay Transparency Posting (https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp\_%20English\_formattedESQA508c.pdf) Citi is an equal opportunity and affirmative action employer. Minority/Female/Veteran/Individuals with Disabilities/Sexual Orientation/Gender Identity.
Confirm your E-mail: Send Email
All Jobs from Citigroup