Big Data Engineer (PySpark) & Hadoop
ONLY ACCEPTING DIRECT W2 APPLICANTS WHO ARE US CITIZENS OR LPRs AT THIS TIME
• Responsible for delivery in the areas of: big data engineering with Hadoop, Python and Spark (PySpark) and an high level understanding of machine learning • Develop scalable and reliable data solutions to move data across systems from multiple sources in real time (Kafka) as well as batch modes (Sqoop). • Construct data staging layers and fast real-time systems to feed BI applications and machine learning algorithms
• Utilize expertise in technologies and tools, such as Python, Hadoop, Spark, Azure/AWS, as well as other cutting-edge tools and applications for Big Data • Demonstrated ability to quickly learn new tools and paradigms to deploy cutting edge solutions.
• Develop both deployment architecture and scripts for automated system deployment in Azure/AWS
• Create large scale deployments using newly researched methodologies.
• Work in Agile environment
• Bachelor's degree in Mathematics, Statistics, Computer Science
• Solid experience with Hadoop including Hive, HDFS, Kafka and PySpark
• At least 3 years' experience in Python (NumPy, Pandas, PySpark) and any other open source programming languages for large scale data analysis
• At least 5 years' experience with relational database
• Master's Degree in Computer Science
• 3+ years of experience working with AWS/Azure
• 5+ years' experience in Java
• 2+ years of experience working with financial data
• Familiarity of modern statistical learning methods & machine learning (SciPy, scikit-learn)
• Familiarity with one or more streaming technologies, viz. Kafka, NiFi etc.
• Experience with NoSQL databases
• 3+ years of experience in Python (including NLP) for large scale data analysis
• 5+ years of experience with SQL
• Strong communication skills, with the ability to work both independently and in project teams Python, PySpark, Hadoop, Hive, HDFS, Sqoop, Oozie
12018 Sunrise Valley Drive, Suite 100 Reston, VA, 20191