13964

How can I make my program to use multiple cores of my system in python?

Question:

I wanted to run my program on all the cores that I have. Here is the code below which I used in my program(which is a part of my full program. somehow, managed to write the working flow).

def ssmake(data): sslist=[] for cols in data.columns: sslist.append(cols) return sslist def scorecal(slisted): subspaceScoresList=[] if __name__ == '__main__': pool = mp.Pool(4) feature,FinalsubSpaceScore = pool.map(performDBScan, ssList) subspaceScoresList.append([feature, FinalsubSpaceScore]) #for feature in ssList: #FinalsubSpaceScore = performDBScan(feature) #subspaceScoresList.append([feature,FinalsubSpaceScore]) return subspaceScoresList def performDBScan(subspace): minpoi=2 Epsj=2 final_data = df[subspace] db = DBSCAN(eps=Epsj, min_samples=minpoi, metric='euclidean').fit(final_data) labels = db.labels_ FScore = calculateSScore(labels) return subspace, FScore def calculateSScore(cluresult): score = random.randint(1,21)*5 return score def StartingFunction(prvscore,curscore,fe_select,df): while prvscore<=curscore: featurelist=ssmake(df) scorelist=scorecal(featurelist) a = {'a' : [1,2,3,1,2,3], 'b' : [5,6,7,4,6,5], 'c' : ['dog', 'cat', 'tree','slow','fast','hurry']} df2 = pd.DataFrame(a) previous=0 current=0 dim=[] StartingFunction(previous,current,dim,df2)

I had a for loop in scorecal(slisted) method which was commented, takes each column to perform DBSCAN and has to calculate the score for that particular column based on the result(but I tried using random score here in example). This looping is making my code to run for a longer time. So I tried to parallelize each column of the DataFrame to perform DBSCAN on the cores that i had on my system and wrote the code in the above fashion which is not giving the result that i need. I was new to this multiprocessing library. I was not sure with the placement of '__main__' in my program. I also would like to know if there is any other way in python to run in a parallel fashion. Any help is appreciated.

Answer1:

Your code has all what is needed to run on multi-core processor using more than one core. But it is a mess. I don't know what problem you trying to solve with the code. Also I cannot run it since I don't know what is DBSCAN. To fix your code you should do several steps.

Function scorecal():

def scorecal(feature_list): pool = mp.Pool(4) result = pool.map(performDBScan, feature_list) return result

result is a list containing all the results returned by performDBSCAN(). You don't have to populate the list manually.

Main body of the program:

# imports # functions if __name__ == '__main__': # your code after functions' definition where you call StartingFunction()

I created very simplified version of your code (pool with 4 processes to handle 8 columns of my data) with dummy for loops (to achieve cpu-bound operation) and tried it. I got 100% cpu load (I have 4-core i5 processor) that naturally resulted in approx x4 faster computation (20 seconds vs 74 seconds) in comparison with single process implementation through for loop.

EDIT.

The complete code I used to try multiprocessing (I use Anaconda (Spyder) / Python 3.6.5 / Win10):

import multiprocessing as mp import pandas as pd import time def ssmake(): pass def score_cal(data): if True: pool = mp.Pool(4) result = pool.map( perform_dbscan, (data.loc[:, col] for col in data.columns)) else: result = list() for col in data.columns: result.append(perform_dbscan(data.loc[:, col])) return result def perform_dbscan(data): assert isinstance(data, pd.Series) for dummy in range(5 * 10 ** 8): dummy += 0 return data.name, 101 def calculate_score(): pass def starting_function(data): print(score_cal(data)) if __name__ == '__main__': data = { 'a': [1, 2, 3, 1, 2, 3], 'b': [5, 6, 7, 4, 6, 5], 'c': ['dog', 'cat', 'tree', 'slow', 'fast', 'hurry'], 'd': [1, 1, 1, 1, 1, 1]} data = pd.DataFrame(data) start = time.time() starting_function(data) print( 'running time = {:.2f} s' .format(time.time() - start))

Recommend

  • Creating NSDictionary with Initial Name to Sort
  • SQL Query: Complex Inner Joins and Outer Joins
  • How do I find all text nodes in an XML document with a namespace using XPath?
  • Use dictionary to replace a string within a string in Pandas columns
  • vlookup doesn't work
  • Regular Expression Negative Lookbehind Alternative for VBScript
  • Showing specific anchors for each pictures of a lightbox in URL bar
  • Convert categorical column to multiple binary columns [duplicate]
  • Regular Expression Negative Lookbehind Alternative for VBScript
  • Unscrambling words in a sentence using Natural Language Generation
  • extending a class with a generic T
  • What does this syntax mean? (Prolog)
  • Bind a String value to an enum in a @RequestBody entity in Spring Boot
  • Use jQuery.getJson to get Web API [duplicate]
  • Cell color change In Excel Using Conditional formatting in C#
  • How to get number of duplicate Rows of DISTINCT column as another column?
  • Why does my batch file echo the array name and number instead of the string?
  • using regular expressions in xpath and xml
  • How to use the string.match method to find multiple occurrences of the same word in a string?
  • Virtual Pet Games on iPhone
  • I want to give same number to the duplicate data in excel
  • Best way to sort a list using two different fields
  • Is there a simple way to count occurences of one text string within another text string?
  • Python RegEx, match words in string and get count
  • JFrame attached on the side of another JFrame
  • Why is JSON.NET adding all these backslashes
  • Getting a generic method to infer the type parameter from the runtime type
  • Finding parents in a tree hierarchy for a given child LINQ (lambda expression)
  • Yii2: Using Kartik Depdrop Widget?
  • How to determine the CCSID used in CPYFRMIMPF command?
  • php is_file always return false
  • Changing references to deprecated methods C++
  • Configure nginx to return different files to different authenticated users with the same URI
  • Copy to all folders batch file?
  • D3 nodes and links from JSON with nested arrays of children
  • Knitr HTML Loop - Some HTML output, some R output
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • java string with new operator and a literal