Services

Cloud Enablement Data & AI Digitalization End-to-End Digital Marketing SaaS

Industries

Products

Retail Healthcare Hospitality Insurance Productivity Technology Marketing

Resources

Company

Approach

Careers

Blog Business Referral

Cloud Enablement Cloud Consulting Cloud-Native Apps Cloud Migration Strategies Cloud Migration Services Cloud Monitoring Cloud Security Posture AWS Cloud Cost Optimization Azure GCP App Engine Private Cloud

Data & AI Overview Generative AI Development Data Science AI Agents Data Engineering Artificial Intelligence Data Management Machine Learning Data Storage Computer Vision Data Visualization Video Analytics

Digitalization Mobility Extended Reality Web Development Internet of Things Blockchain CRM RTLS RPA E-learning Portals E-commerce Sites Intelligent Document Processing Product Information Management Enterprise Asset Management Digital Experience Platform Customer Data
Platform Enterprise Resource Planning

End-to-End Site Reliability Engineering UX Design Microservice Architecture QA Automation DevOps Performance Monitoring Cybersecurity Frontend Monitoring API Management Compliance Consulting

Digital Marketing Overview Marketing Automation Visualization Analytics Programmatic Advertising Paid Advertising SEO Email Marketing Content Marketing Social Media

SaaS Salesforce SharePoint Oracle HCM ServiceNow G Suite Microsoft Solutions Freshworks

Retail SlashQDigital queue management ContextIQRecommendation engine

Healthcare Patient Transporter ManagementRTLS for intra hospital transfers

Hospitality TalQCall accounting system

Insurance RehashLow-code insurance platform

Productivity AsQAI employee assistant Notification AppNotification builder KeverProject management tool QuickPicksSalesforce widget

Technology Open Source WorksShared with the community IIoT PlatformIoT solutions for industries

Marketing PartnerFrontAffiliate marketing platform

PySpark CLI—An Efficient Way to Manage Your PySpark Projects

By Jino Jossy and Mehul Agarwal, December 30, 2020 January 28, 2021

In the world of big data analytics, PySpark, the Python API for Apache Spark, has a lot of traction because of its rapid development possibilities. Apart from Python, it provides high-level APIs in Java, Scala, and R. Despite the simplicity of the Python interface, creating a new PySpark project involves the execution of long commands. Take for example the command to create a new project:

$SPARK_HOME/bin/spark-submit \ --master local[*] \ --packages 'com.somesparkjar.dependency:1.0.0' \ --py-files packages.zip \ --files configs/etl_config.json \ jobs/etl_job.py

It is NOT the most convenient or intuitive method to create a simple file structure.

So is there an easy way to get started with PySpark?